Chinese New Words Detection Using Mutual Information
New words detection is one of the most important problems in Chinese information processing. Especial in the application of new event detection, new words show the current trend of hot event and public opinion. With the fast development of Internet, the existing work based on lexicon will not be capable for the effectiveness and efficiency. In this paper, we proposed a novel method to detect new words in domain-specific fields based on Mutual Information. Firstly, the framework of detecting new word is introduced based on the mathematical feature of Mutual Information. Then, we propose a new method for measuring the distance of Mutual Information by word instead of character. Comprehensive experimental studies on People’s Daily corpus show that our approach well matches the practice.
Keywordsnew word detection mutual information measure metric natural language
Unable to display preview. Download preview PDF.
- 2.Peng, F., Feng, F., Mccallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 562 (2004)Google Scholar
- 4.Zhang, H.J., Shi, S.M., Feng, C., et al.: A method of Part-Of-Speech guessing of Chinese Unknown Words based on combined features. In: 2009 International Conference on Machine Learning and Cybernetics, pp. 328–332 (2009)Google Scholar
- 6.Li, H., Huang, C.N., Gao, J., et al.: The use of SVM for Chinese new word identification. Natural Language Processing–IJCNLP 2005, 723–732 (2004)Google Scholar
- 9.Nie, J.Y., Hannan, M.L., Jin, W.: Unknown word detection and segmentation of Chinese using statistical and heuristic knowledge. Communications of COLIPS 5(1), 47–57 (1995)Google Scholar
- 11.Zheng, Y., Liu, Z., Sun, M., et al.: Incorporating user behaviors in new word detection. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, pp. 2101–2106 (2009)Google Scholar
- 12.Wang, M.C., Huang, C.R., Chen, K.J.: The Identification and classification of Unknown Words in Chinese: A N-gram-Based Approach. Festschrift for Professor Akira Ikeya, 113–123 (1995)Google Scholar