Abstract
New words detection is one of the most important problems in Chinese information processing. Especial in the application of new event detection, new words show the current trend of hot event and public opinion. With the fast development of Internet, the existing work based on lexicon will not be capable for the effectiveness and efficiency. In this paper, we proposed a novel method to detect new words in domain-specific fields based on Mutual Information. Firstly, the framework of detecting new word is introduced based on the mathematical feature of Mutual Information. Then, we propose a new method for measuring the distance of Mutual Information by word instead of character. Comprehensive experimental studies on People’s Daily corpus show that our approach well matches the practice.
Keywords
- new word detection
- mutual information
- measure metric
- natural language
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Sun, X., Huang, D.G., Song, H.Y., et al.: Chinese new word identification: a latent discriminative model with global features. Journal of Computer Science and Technology 26(1), 14–24 (2011)
Peng, F., Feng, F., Mccallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 562 (2004)
Li, D., Tu, W., Shi, L.: Chinese New Word Identification Using N-Gram and PPM Models. Applied Mechanics and Materials 109, 612–616 (2012)
Zhang, H.J., Shi, S.M., Feng, C., et al.: A method of Part-Of-Speech guessing of Chinese Unknown Words based on combined features. In: 2009 International Conference on Machine Learning and Cybernetics, pp. 328–332 (2009)
Chen, K.J., Bai, M.H.: Unknown word detection for Chinese by a corpus-based learning method. International Journal of Computational Linguistics and Chinese Language Processing 3(1), 27–44 (1998)
Li, H., Huang, C.N., Gao, J., et al.: The use of SVM for Chinese new word identification. Natural Language Processing–IJCNLP 2005, 723–732 (2004)
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks 5(4), 537–550 (1994)
Phillips, S.J., Anderson, R.P., Schapire, R.E.: Maximum entropy modeling of species geographic distributions. Ecological Modeling 190(3), 23–259 (2006)
Nie, J.Y., Hannan, M.L., Jin, W.: Unknown word detection and segmentation of Chinese using statistical and heuristic knowledge. Communications of COLIPS 5(1), 47–57 (1995)
Chen, K.J., Bai, M.H.: Unknown word detection for Chinese by a corpus-based learning method. International Journal of Computational Linguistics and Chinese Language Processing 3(1), 27–44 (1998)
Zheng, Y., Liu, Z., Sun, M., et al.: Incorporating user behaviors in new word detection. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, pp. 2101–2106 (2009)
Wang, M.C., Huang, C.R., Chen, K.J.: The Identification and classification of Unknown Words in Chinese: A N-gram-Based Approach. Festschrift for Professor Akira Ikeya, 113–123 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liang, Z., Xu, B., Zhao, J., Jia, Y., Zhou, B. (2013). Chinese New Words Detection Using Mutual Information. In: Yuan, Y., Wu, X., Lu, Y. (eds) Trustworthy Computing and Services. ISCTCS 2012. Communications in Computer and Information Science, vol 320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35795-4_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-35795-4_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35794-7
Online ISBN: 978-3-642-35795-4
eBook Packages: Computer ScienceComputer Science (R0)