Abstract
This paper presents a Chinese entity detection and tracking system that takes advantages of character-based models and machine learning approaches. An entity here is defined as a link of all its mentions in text together with the associated attributes. Entity mentions of different types normally exhibit quite different linguistic patterns. Six separate Conditional Random Fields (CRF) models that incorporate character N-gram and word knowledge features are built to detect the extent and the head of three types of mentions, namely named, nominal and pronominal mentions. For each type of mentions, attributes are identified by Support Vector Machine (SVM) classifiers which take mention heads and their context as classification features. Mentions can then be merged into a unified entity representation by examining their attributes and connections in a rule-based coreference resolution process. The system is evaluated on ACE 2005 corpus and achieves competitive results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Linguistic data consortium (LDC): ACE (Automatic Content Extraction) Chinese annotation guidelines for entities. Version 5.5 (2005)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning of ICML-2001 (2001)
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of Human Language Technology of NAACL-2003 (2003)
Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter. In: Proceedings of SIGHAN Workshop on Chinese Language Processing (2005)
Chen, W., Zhang, Y., Hitoshi, I.: Named entity recognition with conditional random fields. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 118–121 (2006)
Wu, Y., Yang, J., Lin, Q.: Description of the NCU Chinese word segmentation and named entity recognition system for SIGHAN Bakeoff 2006. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 209–212 (2006)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of ECML-98, 10th European Conference on Machine Learning (1998)
Grishman, R., Sundheim, B.: Design of the muc-6 evaluation. In: Proceedings of MUC-6 (1995)
Krupka, G.R., Hausman, K.: Description of the NetOwl TM extractor system as used for MUC-7. In: Proceedings of the MUC-7 (1998)
Zhou, Y., Huang, C., Gao, J., Wu, L.: Transformation based Chinese entity detection and tracking. In: Proceedings of International Joint Conference on Natural Language Processing, pp. 232–237 (2005)
Bikel, D.M., Schwartz, R., Weischedel, R.M.: An algorithm that learns what’s in a name. The Machine Learning Journal, Special Issue on Natural Language Learning (1999)
Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named entity recognition with character-level models. In: Proceedings of CoNLL-2003 (2003)
Guo, H., Jiang, J., Hu, G., Zhang, T.: Chinese named entity recognition based on multilevel linguistic features. In: Proceedings of IJCNLP-2004 (2004)
Li, H., Huang, C., Gao, J., Fan, X.: The use of SVM for Chinese new word identification. In: Proceedings of IJCNLP2004 (2004)
Wu, Y., Zhao, J., Xu, B.: Chinese named entity recognition model based on multiple features. In: Proceedings of HLT/EMNLP, pp. 427–434 (2005)
Hobbs, J.R.: Resolving pronoun references. Lingua 44, 311–338 (1978)
Soon, W.M., Lim, D.C.Y., Ng, H.T.: Machine learning approach to coreference resolution of noun phrases. In: Computational Linguistics, pp. 521–544 (2001)
Luo, X., Ittycheriah, A., Jing, H., Kambhatla, N., Roukos, S.: A mention-synchronous coreference resolution algorithm based on the bell tree. In: Proc. of the 42nd Annual Meeting of the Association for Computational Linguistics, pp. 135–142 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qian, D., Li, W., Yuan, C., Lu, Q., Wu, M. (2007). Applying Machine Learning to Chinese Entity Detection and Tracking. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-70939-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70938-1
Online ISBN: 978-3-540-70939-8
eBook Packages: Computer ScienceComputer Science (R0)