Skip to main content

Research on the Extraction of Wikipedia-Based Chinese-Khmer Named Entity Equivalents

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9362))

Abstract

Named entity equivalent has been playing a significant role in the processing of cross-language information. However limited by the corpora resource, few in-depth studies have been made on the extraction of the bilingual Chinese-Khmer named entity equivalents. On account of this, this paper proposes a Wikipedia-based approach, utilizes the internal web links in Wikipedia and computes the feature similarity to extract the bilingual Chinese-Khmer named entity equivalents. The experimental result shows that good effect has been achieved when the entity equivalents are acquired through the internal web links in Wikipedia with F value up to 90.67%. Also it shows that the result is quite favorable when the bilingual Chinese-Khmer named entity equivalents are acquired through the computation of feature similarity, turning out that the method proposed in this paper is able to give better effect.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ru, K., Xu, J., Zhang, Y., Wu, P.: A method to construct chinese-japanese named entity translation equivalents using monolingual corpora. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds.) NLPCC 2013. CCIS, vol. 400, pp. 164–175. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  2. Chen, H.X., Yin, C.Y., Chen, J.J.: An Approach to Extract Nam ed Entity Translingual Equivalence. Journal of Chinese information 22(4), 55–60 (2008)

    Google Scholar 

  3. Meng, H., Lo, W.K., Chen, B., et al.: Generating phonetic cognates to handle named entities in English-Chinese cross- language spoken doeument retrieval. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop, Trento, pp. 311–314 (2001)

    Google Scholar 

  4. Huang, F., Vogel, S., Waibel, A.: Automatic extraction of named entity translingual equivalence based on multi-feature cost minimization. In: Proceedings of Association of Computational linguistics, Sapporo, pp. 9–16 (2003)

    Google Scholar 

  5. Cao, G.H., Gao, J.F., Nie, J.Y.: A system to mine large-scale bilingual dictionaries from monolingual web pages. In: Proceedings of MT Summit XI, Copenhagen, Denmark, pp. 57–64 (2007)

    Google Scholar 

  6. Lee, L., Aw, A., Zhang, M., et al.: Em-based hybrid model for bilingual terminology extraction from comparable corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 639–646. Association for Computational Linguistics (2010)

    Google Scholar 

  7. Yu, K., Tsujii, J.: Bilingual dictionary extraction from wikipedia. In: Machine Translation Summit XII, Ottawa, Canada (2009)

    Google Scholar 

  8. Kim, J., Hwang, S., Jiang, L., et al.: Entity Translation Mining from Comparable Corpora: Combining Graph Mapping with Corpus Latent Features (2012)

    Google Scholar 

  9. Udupa, R., Saravanan, K., Kumaran, A., et al.: Mint: a method for effective and scalable mining of named entity transliterations from large comparable corpora. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 799–807. Association for Computational Linguistics (2009)

    Google Scholar 

  10. Li, L., Wang, P., Huang, D., et al.: Minning English-Chinese Named Entity Pairs from Comparable Corpora. ACM Transactions on Asian Language Information Processing 10(4) (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Yan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Xia, Q., Yan, X., Yu, Z., Gao, S. (2015). Research on the Extraction of Wikipedia-Based Chinese-Khmer Named Entity Equivalents. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2015. Lecture Notes in Computer Science(), vol 9362. Springer, Cham. https://doi.org/10.1007/978-3-319-25207-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25207-0_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25206-3

  • Online ISBN: 978-3-319-25207-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics