Chinese language database (中文数据库)
The Chinese language is a great hobby of mine. While I am studying it in general too (although only in reading and writing), I enjoy learning and compiling facts about it even more. Specifically, I love everything that has to do with the Chinese writing system, including learning the characters, studying their history, and practicing calligraphy. The “discrete” nature of the Chinese language appeals to my love of statistics, because without grammatical forms and with a fixed set of used characters, everything in Chinese can be counted and analyzed.
My biggest project in regard to the Chinese language is the Chinese language database (中文数据库). The database consists of two large parts: one is dedicated to the language in general and can be of interest to anyone, and the second one is dedicated to my own progress in learning the language and can help those who want to start learning it.
General information
The first part contains extensive lists of Chinese characters and words with statistics for them. This includes the lists of characters and words by frequency, by HSK 2.0 and HSK 3.0 levels, etc. For all the characters in the lists, the database provides various data: pronunciation, meaning, dictionary keys, and stroke count. For the words from the HSK levels, there are pronunciations and meanings. This general information is based on several studies and corpora (cited in the database itself) and can be used for various analysis. For example, some folks used it for ranging the suggestions on the pinyin keyboard. It can also be used for fun random statistics:
Learning progress
The second part of the database describes my own learning progress and can be of use to anyone who decides to learn the language. The main sheet lists all the characters that I learned, their distribution among the frequency and the HSK levels, as well as the learned words. Additionally, the database tracks the progress in the set out goals: for example, learning all the HSK characters, learning 3,000 most frequent characters, etc.
I hope that the database can help you or make you interested in the Chinese language!