Advertisement

Chinese New Words Detection Using Mutual Information

  • Zheng Liang
  • Bingying Xu
  • Jie Zhao
  • Yan Jia
  • Bin Zhou
Part of the Communications in Computer and Information Science book series (CCIS, volume 320)

Abstract

New words detection is one of the most important problems in Chinese information processing. Especial in the application of new event detection, new words show the current trend of hot event and public opinion. With the fast development of Internet, the existing work based on lexicon will not be capable for the effectiveness and efficiency. In this paper, we proposed a novel method to detect new words in domain-specific fields based on Mutual Information. Firstly, the framework of detecting new word is introduced based on the mathematical feature of Mutual Information. Then, we propose a new method for measuring the distance of Mutual Information by word instead of character. Comprehensive experimental studies on People’s Daily corpus show that our approach well matches the practice.

Keywords

new word detection mutual information measure metric natural language 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sun, X., Huang, D.G., Song, H.Y., et al.: Chinese new word identification: a latent discriminative model with global features. Journal of Computer Science and Technology 26(1), 14–24 (2011)CrossRefGoogle Scholar
  2. 2.
    Peng, F., Feng, F., Mccallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 562 (2004)Google Scholar
  3. 3.
    Li, D., Tu, W., Shi, L.: Chinese New Word Identification Using N-Gram and PPM Models. Applied Mechanics and Materials 109, 612–616 (2012)CrossRefGoogle Scholar
  4. 4.
    Zhang, H.J., Shi, S.M., Feng, C., et al.: A method of Part-Of-Speech guessing of Chinese Unknown Words based on combined features. In: 2009 International Conference on Machine Learning and Cybernetics, pp. 328–332 (2009)Google Scholar
  5. 5.
    Chen, K.J., Bai, M.H.: Unknown word detection for Chinese by a corpus-based learning method. International Journal of Computational Linguistics and Chinese Language Processing 3(1), 27–44 (1998)MathSciNetGoogle Scholar
  6. 6.
    Li, H., Huang, C.N., Gao, J., et al.: The use of SVM for Chinese new word identification. Natural Language Processing–IJCNLP 2005, 723–732 (2004)Google Scholar
  7. 7.
    Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks 5(4), 537–550 (1994)CrossRefGoogle Scholar
  8. 8.
    Phillips, S.J., Anderson, R.P., Schapire, R.E.: Maximum entropy modeling of species geographic distributions. Ecological Modeling 190(3), 23–259 (2006)CrossRefGoogle Scholar
  9. 9.
    Nie, J.Y., Hannan, M.L., Jin, W.: Unknown word detection and segmentation of Chinese using statistical and heuristic knowledge. Communications of COLIPS 5(1), 47–57 (1995)Google Scholar
  10. 10.
    Chen, K.J., Bai, M.H.: Unknown word detection for Chinese by a corpus-based learning method. International Journal of Computational Linguistics and Chinese Language Processing 3(1), 27–44 (1998)MathSciNetGoogle Scholar
  11. 11.
    Zheng, Y., Liu, Z., Sun, M., et al.: Incorporating user behaviors in new word detection. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, pp. 2101–2106 (2009)Google Scholar
  12. 12.
    Wang, M.C., Huang, C.R., Chen, K.J.: The Identification and classification of Unknown Words in Chinese: A N-gram-Based Approach. Festschrift for Professor Akira Ikeya, 113–123 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Zheng Liang
    • 1
  • Bingying Xu
    • 1
  • Jie Zhao
    • 2
  • Yan Jia
    • 1
  • Bin Zhou
    • 1
  1. 1.Institute of Software, Department of ComputerNational University of Defense TechnologyChangshaChina
  2. 2.Electromagnetic Spectrum Management Center of Nanjing Military RegionNanjingChina

Personalised recommendations