nkf

nkf (Network Kanji Filter) is one of my favorite command line utilities. There probably isn’t a week that goes by where I need to convert A Japanese text file to a different encoding. nkf gets the job done. nkf is available on GitHub: https://github.com/nurse/nkf

Once you get nkf installed and up and running, the first step is to figure out the encoding of the original file:

nkf --guess your_filename.txt

Here is an example using the Japan Post database file that is freely available online:

nkf --guess

Let’s say we need to convert the file to UTF-8. ┬áHere are the arguments:

nkf -S -w your_filename.txt > your_filename_utf8.txt

It’s easy as that.

The first argument is an upper-case letter that tells nkf the encoding of the source file. The second argument is a lower-case letter that tells nkf the encoding output. Don’t forget to including the > sign before the output filename. Otherwise, nkf will send the converted text to stdout (your screen/terminal in this case).

The arguments you can use include:

-S or -s for Shift-JIS

-J or -j for ISO-2022-JP

-E or -e for EUC

-W or -w for UTF-8

Of course, man nkfwill show you detailed information about the utility.

A Useful Example

Depending on your OS configurations, when you try to show the file contents from the command line, you might end up with a bunch of illegal characters (“moji bake”):

example of illegal characters "moji bake"

This can be easily handled by piping the output of nkf into the ‘more’ command:

example of nkf with 'more'

This will display the text in human-readable format: