Abstract
The Pre-Qin Chinese plays a key role in the history of Chinese. However, for the lack of annotated corpus, the overview of Pre-Qin Chinese vocabulary is still not clear. This paper introduces the corpus of 25 Pre-Qin classical texts, which are under manual word segmentation and part-of-speech tagging. Then, the character and word frequencies are calculated based on the corpus. The character entropy, the syllables of words and the multiple part-of-speech words are also statistically analyzed.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chen, X.H.: Information Processing of Pre-Qin Chinese. In: The 27th Anniversary of Chinese Information Processing Society of China, Beijing (2008)
Shi, M., Chen, X.H., Li, B.: CRF Based Research on a Unified Ap-proach to Word Segmentation and POS Tagging for Pre-Qin Chinese. Journal of Chinese Information Processing 2(24), 39–45 (2010)
Zhang, S.D.: Vocabulary Study of Lv Shi Chun Qiu. Shandong Education Press, Jinan (1989)
Chen, K.J.: Dictionary of Chunqiu Zuozhuan. Zhongzhou Ancient Books Publishing House, Henan (2004)
Che, S.Y.: Vocabulary Study of Hanfeizi. Bashu Publishing House, Chengdu (2008)
Ye, Z.B.: Vocabulary Study of Archaic Chinese. The Central Literature Publishing House, Beijing (2007)
Academia Sinica Tagged Corpus of Old Chinese, http://oldchinese.ling.sinica.edu.tw
Pan, Y.Z.: The Formation and Development of Chinese Basic Vocabulary. Journal of Zhongshan University 1, 98–121 (1959)
Zhou, J.: Distinction between Basic Vocabulary and General Vocabulary. Journal of Nankai University 3 (1987)
Feng, Z.W.: The Entropy of Chinese Characters. Revolution of Chinese Characters, 12–17 (1984)
Zhu, D.X.: Lecture Notes on Grammar. The Commercial Press, Beijing (1983)
Li, J.X.: The New Chinese Grammar. The Commercial Press, Beijing (1924)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, B., Xi, N., Feng, M., Chen, X. (2013). Corpus-Based Statistics of Pre-Qin Chinese. In: Ji, D., Xiao, G. (eds) Chinese Lexical Semantics. CLSW 2012. Lecture Notes in Computer Science(), vol 7717. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36337-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-36337-5_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36336-8
Online ISBN: 978-3-642-36337-5
eBook Packages: Computer ScienceComputer Science (R0)