nkf (Network Kanji Filter) is one of my favorite command line utilities. There probably isn’t a week that goes by where I need to convert A Japanese text file to a different encoding.
nkf gets the job done.
nkf is available on GitHub: https://github.com/nurse/nkf
Once you get
nkf installed and up and running, the first step is to figure out the encoding of the original file:
nkf --guess your_filename.txt
Here is an example using the Japan Post database file that is freely available online:
Let’s say we need to convert the file to UTF-8. Here are the arguments:
nkf -S -w your_filename.txt > your_filename_utf8.txt
It’s easy as that.
The first argument is an upper-case letter that tells
nkf the encoding of the source file. The second argument is a lower-case letter that tells
nkf the encoding output. Don’t forget to including the > sign before the output filename. Otherwise,
nkf will send the converted text to
stdout (your screen/terminal in this case).
The arguments you can use include:
-S or -s for Shift-JIS
-J or -j for ISO-2022-JP
-E or -e for EUC
-W or -w for UTF-8
man nkfwill show you detailed information about the utility.
A Useful Example
Depending on your OS configurations, when you try to show the file contents from the command line, you might end up with a bunch of illegal characters (“moji bake”):
This can be easily handled by piping the output of nkf into the ‘more’ command:
This will display the text in human-readable format: