Skip to main content

Comparing Entropies within the Chinese Language

  • Conference paper
Natural Language Processing – IJCNLP 2004 (IJCNLP 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Included in the following conference series:

  • 1623 Accesses

Abstract

Using a large synchronous Chinese corpus, we show how word and character entropy variations exhibit interesting differences in terms of time and space for different Chinese speech communities. We find that word entropy values are affected by the quality of the segmentation process. We also note that word entropies can be affected by proper nouns, which is the most volatile segment of the stable lexicon of the language. Our word and character entropy results provide interesting comparison with the earlier results and the average joint character entropies (a.k.a. entropy rates) of Chinese up to order 20 provided by us indicate that the limits of the conditional character entropies of Chinese for the different speech communities should be about 1 (or less). This invites questions on whether early convergence of character entropies would also entail word entropy convergence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bell, T.C.: Text Compression. Prentice-Hall, Englewood Cliffs (1990)

    Google Scholar 

  2. Brown, P., Della Pietra, S., Della Pietra, V., Lai, J.C., Mercer, R.L.: An Estimate of an Upper Bound for the Entropy of English. Computational Linguistics 18(1), 31 (1992)

    Google Scholar 

  3. Yuan, C.: Xiandai Hanyu Dingliang Fenxi (Quantitative Analysis of Modern Chinese). Shanghai Education Press (1989)

    Google Scholar 

  4. Cover, T.M., King, R.: A Convergent Gambling Estimate of the Entropy of English. IEEE Trans. on Information Theory, IT 24(4), 413–421 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  5. Zhiwei, F.: Shuxue Yu Yuyan (Mathematics and Language). Hunan Education Press (February 1991)

    Google Scholar 

  6. Xuanjing, H., Lide, W., Yikun, G., Bingwei, L.: Computation of the Entropy of Modern Chinese and the Probability Estimation of Sparse Event in Statistical Language Model. Acta Electronica Sinica 28(8), 110–112 (2000)

    Google Scholar 

  7. Di, J.: An Entropy Value of Classical Tibetan Language and Some Other Questions. In: Proceedings of 1998 International Conference on Chinese Information Processing, November 18-20 (1998)

    Google Scholar 

  8. Yuan, L., Dejin, W., Sheying, Z.: The Probability Distribution and Entropy and Redundancy in Printed Chinese. In: Proceedings of International Conference on Chinese Information Processing, August 1987, pp. 505–509 (1987)

    Google Scholar 

  9. Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423, 623–656 (1948)

    MATH  MathSciNet  Google Scholar 

  10. Shannon, C.E.: Prediction and Entropy of Printed English. Bell System Technical Journal 3, 50–64 (1951)

    Google Scholar 

  11. Guiqing, S., Bingzeng, X.: Hanzi Zipin Fenbu, Zui Jia Bianma Yu Shuru Wenti (Character Frequency Distribution, Optimal Encoding and Input of Chinese). Acta Electronic Sinica 12(4), 94–96 (1984)

    Google Scholar 

  12. Teahan, W.J., Cleary, J.G.: The Entropy of English using PPM-based Models. In: Proceedings of Data Compression Conference (DCC 1996), pp. 53–62 (1996)

    Google Scholar 

  13. Xiaopeng, T.: The Design and Application of Language Model for the Minimum Entropy of Chinese Character (2003) (manuscript)

    Google Scholar 

  14. Tsou, B.K., Tsoi, W.F., Lai, T.B.Y., Hu, J., Chan, S.W.K.: LIVAC, A Chinese Synchronous Corpus, and Some Applications. In: Proceedings of the ICCLC International Conference on Chinese Language Computing, Chicago, pp. 233–238 (2000), http://livac.org

  15. Weaver, W., Shannon, C.E.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1949)

    MATH  Google Scholar 

  16. Jun, W., Zuoying, W.: Hanyu Xinxi Shang He Yuyan Muxing De Fuzadu (Entropy and Complexity of Language Model of Chinese). Acta Electronica Sinica 24(10), 69–71 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tsou, B.K., Lai, T.B.Y., Chow, Kp. (2005). Comparing Entropies within the Chinese Language. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30211-7_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24475-2

  • Online ISBN: 978-3-540-30211-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics