Skip to main content

Named Entity Recognition Based on Bilingual Co-training

  • Conference paper
Chinese Lexical Semantics (CLSW 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8229))

Included in the following conference series:

Abstract

Named entity recognition (NER) is a very important task in natural language processing (NLP). In this paper we present a semi-supervised approach to extract bilingual named entity, starting from a bilingual corpus where the named entities are extracted independently for each language. Then a bilingual co-training algorithm is used to improve the named entity annotation quality, and iterative process is applied to extract named entity pairs with higher bilingual conformity ratio. This leads to a significant improvement of the monolingual named entity annotation quality for both languages. Experimental result shows that the annotation quality of Chinese NE is improved from 87.17 to 88.28, and improved 80.37 to 81.76 of English NE in F-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Das, D., Petrov, S.: Unsupervised part-of-speech tagging with bilingual graph-based projections. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, pp. 600–609 (June 2011)

    Google Scholar 

  2. Burkett, D., Petrov, S., Blitzer, J., Klein, D.: Learning better monolingual models with unannotated bilingual text. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, Uppsala, Sweden, pp. 46–54 (July 2010)

    Google Scholar 

  3. Marsh, E., Perzanowski, D.: Muc-7 evaluation of ie technology. In: Overview of Proceedings of the Seventh Message Understanding Conference (MUC-7), vol. 20 (1998)

    Google Scholar 

  4. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: Language independent named entity recognition. In: Proceedings of CoNLL, Edmonton, Canada, pp. 142–147 (2003)

    Google Scholar 

  5. Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The automatic content extraction program-tasks, data, and evaluation. In: Proceedings of LREC, vol. 4, pp. 837–840 (2004)

    Google Scholar 

  6. Huang, F., Vogel, S., Waibel, A.: Automatic Extraction of Named Entity Translingual Equivalence Based on Multi-Feature Cost Minimization. In: ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition, Sapporo, Japan, pp. 9–16 (2003)

    Google Scholar 

  7. Moore, R.C.: Learning Translations of Named-Entity Phrases from Parallel Corpora. In: EACL 2003, Budapest, Hungary, pp. 259–266 (2003)

    Google Scholar 

  8. Donghui, F., Yajuan, L., Ming, Z.: A new approach for English-Chinese named entity alignment. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP, pp. 372–379 (2004)

    Google Scholar 

  9. Sungchul, K., Kristina, T., Hwanjo, Y.: Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, July 8-14, pp. 694–702 (2012)

    Google Scholar 

  10. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT 1998), Wisconsin, MI, pp. 92–100 (1998)

    Google Scholar 

  11. Abney, S.P.: Bootstrapping. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp. 360–367 (2002)

    Google Scholar 

  12. Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  13. Berger Adam, L., Della Pietra Stephen, A., Della Pietra Vincent, J.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–72 (1996)

    Google Scholar 

  14. Och, F.J., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 295–302. Association for Computational Linguistics (July 2002)

    Google Scholar 

  15. Brown, P.F., Della Pietra Stephen, A., Della Pietra Vincent, J., et al.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)

    Google Scholar 

  16. Church, K.W.: Char align: A program for aligning bilingual texts at the character level. In: The 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, pp. 1–8 (1993)

    Google Scholar 

  17. Tong, X., Jingbo, Z., Hao, Z., et al.: NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics System Demonstrations, Jeju, Korea, pp. 19–24 (July 2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, Y., Huang, H., Zhao, X., Shi, S. (2013). Named Entity Recognition Based on Bilingual Co-training. In: Liu, P., Su, Q. (eds) Chinese Lexical Semantics. CLSW 2013. Lecture Notes in Computer Science(), vol 8229. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45185-0_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45185-0_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45184-3

  • Online ISBN: 978-3-642-45185-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics