The Jōyō Kanji Map ・ 地図で分かる常用漢字

It's an interactive way of arranging common Japanese ideographic characters, called kanji, based on their similarities. We see a map based on a given kanji distance function from the point of view of a focal kanji in the center. The kanji around it are placed according to their distances from the focal kanji in such a way that distances between peripheral kanji remain as accurate as possible. The thickness and highlight of each line always indicate the exact distance, even between peripheral kanji. Thicker lines indicate closer and hence more similar kanji.

Did you say kanji distance?

We intuitively understand that some kanji are more similar than others, for example а pair like 森, 林 sharing the component 木 (twice). A kanji distance function evaluates how dissimilar two kanji are based on their differences. The default Component Transport Distance we use here takes into account the nested component structure of the kanji, including relative positions and exact shape of the components, using a mathematical framework called optimal transport. See our technical report detailing the method. You can also select the Stroke Edit Distance by Yencken and Baldwin (2008) using the number of stroke edit operations and the Bag-of-Radicals Distance by Yeh and Li (2002) using the fraction of shared radicals. Last but not least, we have derived the Embeddings Distance by using OpenAI's text embeddings. It focusses mainly on whether two kanji frequently appear in a similar context.

Where can I learn more?

You can try out many aspects of kanji distances as well as other kanji-related computational tools with our package kanjistat, an open toolkit for the programming language R.