Named Entity Recognition Based on Bilingual Co-training

Li, Yegang; Huang, Heyan; Zhao, Xingjian; Shi, Shumin

doi:10.1007/978-3-642-45185-0_50

Yegang Li^21,22,23,
Heyan Huang^21,22,
Xingjian Zhao^21,22 &
…
Shumin Shi^21,22

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8229))

Included in the following conference series:

Workshop on Chinese Lexical Semantics

2427 Accesses
1 Citations

Abstract

Named entity recognition (NER) is a very important task in natural language processing (NLP). In this paper we present a semi-supervised approach to extract bilingual named entity, starting from a bilingual corpus where the named entities are extracted independently for each language. Then a bilingual co-training algorithm is used to improve the named entity annotation quality, and iterative process is applied to extract named entity pairs with higher bilingual conformity ratio. This leads to a significant improvement of the monolingual named entity annotation quality for both languages. Experimental result shows that the annotation quality of Chinese NE is improved from 87.17 to 88.28, and improved 80.37 to 81.76 of English NE in F-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unsupervised Active Learning of CRF Model for Cross-Lingual Named Entity Recognition

A Simple but Useful Multi-corpus Transferring Method for Biomedical Named Entity Recognition

An Empirical Study of Multi-domain and Multi-task Learning in Chinese Named Entity Recognition

References

Das, D., Petrov, S.: Unsupervised part-of-speech tagging with bilingual graph-based projections. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, pp. 600–609 (June 2011)
Google Scholar
Burkett, D., Petrov, S., Blitzer, J., Klein, D.: Learning better monolingual models with unannotated bilingual text. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, Uppsala, Sweden, pp. 46–54 (July 2010)
Google Scholar
Marsh, E., Perzanowski, D.: Muc-7 evaluation of ie technology. In: Overview of Proceedings of the Seventh Message Understanding Conference (MUC-7), vol. 20 (1998)
Google Scholar
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: Language independent named entity recognition. In: Proceedings of CoNLL, Edmonton, Canada, pp. 142–147 (2003)
Google Scholar
Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The automatic content extraction program-tasks, data, and evaluation. In: Proceedings of LREC, vol. 4, pp. 837–840 (2004)
Google Scholar
Huang, F., Vogel, S., Waibel, A.: Automatic Extraction of Named Entity Translingual Equivalence Based on Multi-Feature Cost Minimization. In: ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition, Sapporo, Japan, pp. 9–16 (2003)
Google Scholar
Moore, R.C.: Learning Translations of Named-Entity Phrases from Parallel Corpora. In: EACL 2003, Budapest, Hungary, pp. 259–266 (2003)
Google Scholar
Donghui, F., Yajuan, L., Ming, Z.: A new approach for English-Chinese named entity alignment. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP, pp. 372–379 (2004)
Google Scholar
Sungchul, K., Kristina, T., Hwanjo, Y.: Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, July 8-14, pp. 694–702 (2012)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT 1998), Wisconsin, MI, pp. 92–100 (1998)
Google Scholar
Abney, S.P.: Bootstrapping. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp. 360–367 (2002)
Google Scholar
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)
Article MATH Google Scholar
Berger Adam, L., Della Pietra Stephen, A., Della Pietra Vincent, J.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–72 (1996)
Google Scholar
Och, F.J., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 295–302. Association for Computational Linguistics (July 2002)
Google Scholar
Brown, P.F., Della Pietra Stephen, A., Della Pietra Vincent, J., et al.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Church, K.W.: Char align: A program for aligning bilingual texts at the character level. In: The 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, pp. 1–8 (1993)
Google Scholar
Tong, X., Jingbo, Z., Hao, Z., et al.: NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics System Demonstrations, Jeju, Korea, pp. 19–24 (July 2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Yegang Li, Heyan Huang, Xingjian Zhao & Shumin Shi
Beijing Engineering Applications Research Center of High Volume Language Information Processing and Cloud Computing, Beijing Institute of Technology, Beijing, China
Yegang Li, Heyan Huang, Xingjian Zhao & Shumin Shi
Department of Computer Science and Technology, Shandong University of Technology, Zibo, Shandong, China
Yegang Li

Authors

Yegang Li
View author publications
You can also search for this author in PubMed Google Scholar
Heyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xingjian Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shumin Shi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Applied Language Research Institute, Beijing Language and Culture University, No. 15 Xueyuan Road, Haidian District, 100083, Beijing, China
Pengyuan Liu
School of Foreign Languages, Peking University, No. 5, Yiheyuan Road, Haidian District, 100871, Beijing, China
Qi Su

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Huang, H., Zhao, X., Shi, S. (2013). Named Entity Recognition Based on Bilingual Co-training. In: Liu, P., Su, Q. (eds) Chinese Lexical Semantics. CLSW 2013. Lecture Notes in Computer Science(), vol 8229. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45185-0_50

Download citation

DOI: https://doi.org/10.1007/978-3-642-45185-0_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45184-3
Online ISBN: 978-3-642-45185-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Named Entity Recognition Based on Bilingual Co-training

Abstract

Access this chapter

Preview

Similar content being viewed by others

Unsupervised Active Learning of CRF Model for Cross-Lingual Named Entity Recognition

A Simple but Useful Multi-corpus Transferring Method for Biomedical Named Entity Recognition

An Empirical Study of Multi-domain and Multi-task Learning in Chinese Named Entity Recognition

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Named Entity Recognition Based on Bilingual Co-training

Abstract

Access this chapter

Preview

Similar content being viewed by others

Unsupervised Active Learning of CRF Model for Cross-Lingual Named Entity Recognition

A Simple but Useful Multi-corpus Transferring Method for Biomedical Named Entity Recognition

An Empirical Study of Multi-domain and Multi-task Learning in Chinese Named Entity Recognition

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation