Chinese language database (中文数据库)

The Chinese language is a great hobby of mine. While I am studying it in general too (passed the HSK 3 level a couple of years back and plan to pass HSK 5 soon), I enjoy learning and compiling facts about even more. Specifically, I love everything that has to do with the Chinese writing system, including learning the characters, studying their history, and practicing calligraphy. The “discrete” nature of the Chinese language appeals to my love of statistics, because without grammatical forms and with a fixed set of used characters, everything in Chinese can be counted and analyzed.

My biggest project in regard to the Chinese language is the Chinese language database (中文数据库). The database consists of two large parts: one is dedicated to the language in general and can be of interest to anyone, and the second one is dedicated to my own progress in learning the language and can help those who want to start learning it.

General information

The first part contains extensive lists of Chinese characters and words with statistics for them. There are a total of eight lists:

For all the characters in the lists, the database provides various data: pronunciation, meaning, dictionary keys, and stroke count. For the words from the HSK levels, there are pronunciations and meanings. An additional list in the database is dedicated to compiling some statistics about all the 11,062 characters, like this:

Learning progress

The second part of the database describes my own learning progress and can be of use to anyone who decides to learn the language. The main sheet lists all the characters that I learned, their distribution among the frequency and the HSK levels, as well as the learned words and phrases. Additionally, the database tracks the progress in the set out goals: for example, learning all the HSK characters, learning 3,000 most frequent characters, etc.

I hope that the database can help you or make you interested in the Chinese language!