Skip to main content

Chinese New Words Detection Using Mutual Information

  • Conference paper
Trustworthy Computing and Services (ISCTCS 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 320))

Included in the following conference series:

Abstract

New words detection is one of the most important problems in Chinese information processing. Especial in the application of new event detection, new words show the current trend of hot event and public opinion. With the fast development of Internet, the existing work based on lexicon will not be capable for the effectiveness and efficiency. In this paper, we proposed a novel method to detect new words in domain-specific fields based on Mutual Information. Firstly, the framework of detecting new word is introduced based on the mathematical feature of Mutual Information. Then, we propose a new method for measuring the distance of Mutual Information by word instead of character. Comprehensive experimental studies on People’s Daily corpus show that our approach well matches the practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sun, X., Huang, D.G., Song, H.Y., et al.: Chinese new word identification: a latent discriminative model with global features. Journal of Computer Science and Technology 26(1), 14–24 (2011)

    Article  Google Scholar 

  2. Peng, F., Feng, F., Mccallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 562 (2004)

    Google Scholar 

  3. Li, D., Tu, W., Shi, L.: Chinese New Word Identification Using N-Gram and PPM Models. Applied Mechanics and Materials 109, 612–616 (2012)

    Article  Google Scholar 

  4. Zhang, H.J., Shi, S.M., Feng, C., et al.: A method of Part-Of-Speech guessing of Chinese Unknown Words based on combined features. In: 2009 International Conference on Machine Learning and Cybernetics, pp. 328–332 (2009)

    Google Scholar 

  5. Chen, K.J., Bai, M.H.: Unknown word detection for Chinese by a corpus-based learning method. International Journal of Computational Linguistics and Chinese Language Processing 3(1), 27–44 (1998)

    MathSciNet  Google Scholar 

  6. Li, H., Huang, C.N., Gao, J., et al.: The use of SVM for Chinese new word identification. Natural Language Processing–IJCNLP 2005, 723–732 (2004)

    Google Scholar 

  7. Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks 5(4), 537–550 (1994)

    Article  Google Scholar 

  8. Phillips, S.J., Anderson, R.P., Schapire, R.E.: Maximum entropy modeling of species geographic distributions. Ecological Modeling 190(3), 23–259 (2006)

    Article  Google Scholar 

  9. Nie, J.Y., Hannan, M.L., Jin, W.: Unknown word detection and segmentation of Chinese using statistical and heuristic knowledge. Communications of COLIPS 5(1), 47–57 (1995)

    Google Scholar 

  10. Chen, K.J., Bai, M.H.: Unknown word detection for Chinese by a corpus-based learning method. International Journal of Computational Linguistics and Chinese Language Processing 3(1), 27–44 (1998)

    MathSciNet  Google Scholar 

  11. Zheng, Y., Liu, Z., Sun, M., et al.: Incorporating user behaviors in new word detection. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, pp. 2101–2106 (2009)

    Google Scholar 

  12. Wang, M.C., Huang, C.R., Chen, K.J.: The Identification and classification of Unknown Words in Chinese: A N-gram-Based Approach. Festschrift for Professor Akira Ikeya, 113–123 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liang, Z., Xu, B., Zhao, J., Jia, Y., Zhou, B. (2013). Chinese New Words Detection Using Mutual Information. In: Yuan, Y., Wu, X., Lu, Y. (eds) Trustworthy Computing and Services. ISCTCS 2012. Communications in Computer and Information Science, vol 320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35795-4_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35795-4_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35794-7

  • Online ISBN: 978-3-642-35795-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics