Effective Chinese Organization Name Linking to a List-Like Knowledge Base
- 651 Downloads
Entity Linking is widely used in entity retrieval and semantic search. It refers mentions in unstructured documents to their representations in a knowledge base (KB). The frequently used KB (e.g. Wikipedia) usually contains abundant information corresponding to each entity, such as properties, name variations and text descriptions, which can help to find candidates and disambiguate the links. In this paper, we link organization names in Chinese documents to a list-like KB. Compared to typical KBs, the records in our KB are simply Chinese organization full names. The massive variations, or abbreviations in the documents cannot be directly matched to any organization name in the KB and bring about ambiguities, thus make the linking task difficult. At first, we enrich the KB with the abbreviations. Making use of the information from Hudong Baike and other sources, we design a pattern based full name annotation method to help generate abbreviations for all the names in the KB. To resolve the ambiguity problem, we propose a two-stage linking generation approach utilizing the co-occurrence of abbreviations and full names in the same document or document cluster, where the linked full names in the first stage constraint the linking of abbreviations in the second stage. We apply our approach to police inquiry document corpus. The experiment results show the effectiveness of our approach and outperforms the one-stage approach significantly in terms of precision and recall.
KeywordsKnowledge Base Core Part Name Entity Recognition Document Cluster Count Repeat
This work is funded by The 3rd Research Institute of The Ministry of Public Security through project No: C13601. We thank Tong Ruan for the guidance of the project, and thank Chen Wang for her proofreading.
- 1.Zhong, L.W., Zheng, F.: Study on approach to retrieval of chinese organization name based on its abbreviated name. J. Chin. Inf. Process. 21, 38–42 (2007)Google Scholar
- 2.Chua, T.S., Liu, J.: Learning pattern rules for chinese named entity extraction. In: Proceedings of AAAI/IAAI, 411–418 (2002)Google Scholar
- 5.Fu, C., Fu, G.: A dual-layer CRFs based method for chinese nested named entity recognition. In: 9th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 2546–2550. IEEE, New York (2012)Google Scholar
- 6.Wu, X., Wu, Z., Jia, J., et al.: Adaptive named entity recognition based on conditional random fields with automatic updated dynamic gazetteers. In: 8th International Symposium on Chinese Spoken Language Processing, pp. 363–367. IEEE, New York (2012)Google Scholar
- 7.Zhang, W., Su, J., Tan, C.L. et al.: Entity linking leveraging: automatically generated annotation. In: COLING 2010, pp. 1290–1298. ACL, Stroudsburg (2010)Google Scholar
- 8.Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: 34th ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 765–774. ACM, New York (2011)Google Scholar
- 9.Liu, X., Li, Y., Wu, H., et al.: Entity linking for tweets. In: The 51th Annual Meeting of the Association for Computational Linguistics, pp. 1304–1311. ACL, Stroudsburg (2013)Google Scholar
- 10.Shen, W., Wang, J., Luo, P., et al.: LIEGE: link entities in web lists with knowledge base. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1424–1432. ACM, New York (2012)Google Scholar