Skip to main content

Statistical Part-of-Speech Tagging for Classical Chinese

  • Conference paper
  • First Online:
Text, Speech and Dialogue (TSD 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2448))

Included in the following conference series:

Abstract

Classical Chinese is essentially different from Modern Chinese, in both syntax and morphology. While there has recently been a number of works on partof- speech (PoS) tagging for Modern Chinese, the PoS tagging for Classical Chinese is largely neglected. To the best of our knowledge, this is the first work in the area. Fortunately however, in terms of tagging, Classical Chinese is easier than Modern Chinese in that most Classical Chinese words are single-character-formed, thus no segmentation is needed. So in this paper, we will propose and analyze a simple statistical approach for PoS tagging of Classical Chinese. We first designed a tagset for Classical Chinese that is later shown to be accurate and efficient. Then we apply the hidden Markov model (HMM) Viterbi algorithm and made several improvements, such as sparse data problem handling and unknown word guessing, both designed particularly for Classical Chinese. As the training set grows larger, the accuracies for bigram and trigram increase to 94.9% and 97.6 %, respectively. The contribution of our work also lies in proposing and solving some previously unseen problems in processing Classical Chinese.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Viterbi, A.: Error bounds for convolution codes and an asymptotically optimal decoding algorithm. IEEE Trans. on Information Theory 13:260–269. 1967.

    Article  MATH  Google Scholar 

  2. Leech, G. et al.: The Automatic Grammatical Tagging of the LOB Corpus, ICAME News, 7 (1983), pp. 13–33.

    Google Scholar 

  3. Merialdo, B.: Tagging Text with a Probabilistic Model, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1991, pp. 809–812.

    Google Scholar 

  4. Brill, E.: A simple rule-based part-of-speech tagger, In: Proceeding of the 3rd Conference on Applied Natural Language Processing (ACL), 1992, pp. 152–155.

    Google Scholar 

  5. Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics, 21(4), 1995, pp. 543–565.

    Google Scholar 

  6. Ratnaparkhi, A. et al.: A Maximum Entropy Model for Part-of-Speech Tagging. In: Proceedings of Conference on Empirical Methods in Natural Language Processing(EMNLP-1), 1996, pp. 133–142.

    Google Scholar 

  7. Charniak, E. et al.: Equations for Part-of-Speech Tagging. In: Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI-93), 1993. pp. 784–789.

    Google Scholar 

  8. Lua, K.: Part of Speech Tagging of Chinese Sentences Using Genetic Algorithm, Proceedings of Conference on Chinese Computing, Singapore, Jun. 1996, pp. 45–49.

    Google Scholar 

  9. Hindle, D.: Acquiring disambiguation rules from text. In: Proceedings of 27th Annual Meeting of the Association for Computational Linguistics, 1989.

    Google Scholar 

  10. Brant, T.: TnT-A Statistical Part-of-Speech Tagger. In: Proceedings of the 6th Applied NLP Conference (ANLP-2000), 2000, pp. 224–231.

    Google Scholar 

  11. Allen, J.: Natural Language Understanding, The Benjamin/Cummings Publishing Company, Inc., 1995.

    Google Scholar 

  12. Wei, P. et al.: Historical Corpora for Synchronic and Diachronic Linguistics Studies, Pacific Neighborhood Consortium, 1997.

    Google Scholar 

  13. Nakagawa, T. et al.: Unknown Word Guessing and Part-of-Speech Tagging Using Support Vector Machines, Proceedings of the 6th Natural Language Processing Pacific Rim Symposium, 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, L., Peng, Y., Wang, H., Wu, Z. (2002). Statistical Part-of-Speech Tagging for Classical Chinese. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2002. Lecture Notes in Computer Science(), vol 2448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46154-X_15

Download citation

  • DOI: https://doi.org/10.1007/3-540-46154-X_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44129-8

  • Online ISBN: 978-3-540-46154-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics