Extraction of Name and Transliteration in Monolingual and Parallel Corpora

Lin, Tracy; Wu, Jian-Cheng; Chang, Jason S.

doi:10.1007/978-3-540-30194-3_20

Tracy Lin²⁰,
Jian-Cheng Wu²¹ &
Jason S. Chang²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3265))

Included in the following conference series:

Conference of the Association for Machine Translation in the Americas

1124 Accesses
1 Citations

Abstract

Named-entities in free text represent a challenge to text analysis in Machine Translation and Cross Language Information Retrieval. These phrases are often transliterated into another language with a different sound inventory and writing system. Named-entities found in free text are often not listed in bilingual dictionaries. Although it is possible to identify and translate named-entities on the fly without a list of proper names and transliterations, an extensive list of existing transliterations certainly will ensure high precision rate. We use a seed list of proper names and transliterations to train a Machine Transliteration Model. With the model it is possible to extract proper names and their transliterations in monolingual or parallel corpora with high precision and recall rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 400–408 (2002)
Google Scholar
Chen, H.H., Huang, S.-J., Ding, Y.-W., Tsai., S.-C.: Proper name translation in crosslanguage information retrieval. In: Proceedings of 17th COLING and 36th ACL, pp. 232–236 (1998)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Knight, K., Graehl, J.: Machine transliteration. Computational Linguistics 24(4), 599–612 (1998)
Google Scholar
Lee, C.J., Jason, S.: Acquisition of English-Chinese Transliterated Word Pairs from Parallel- Aligned Texts using a Statistical Machine Transliteration Model. In: Proceedings of HLT-NAACL 2003 Workshop, pp. 96–103 (2003)
Google Scholar
Lee, J.S., Choi, K.-S.: A statistical method to generate various foreign word transliterations in multilingual information retrieval system. In: Proceedings of the 2nd International Workshop on Information Retrieval with Asian Languages (IRAL 1997), Tsukuba, Japan, pp. 123–128 (1997)
Google Scholar
Lin, T., Wu, C. J., Chang, J.S.: Word Transliteration Alignment. In: Proceedings of the fifteenth Research on Computational Linguistics Conference, ROCLING XV, Hsinchu (2003)
Google Scholar
Lin, A., Lin, W.-H., Chen, H.-H.: Backward transliteration by learning phonetic similarity. In: CoNLL-2002, Sixth Conference on Natural Language Learning, Taiwan (2002)
Google Scholar
Oh, J.-H., Choi, K.-S.: An English-Korean transliteration model using pronunciation and contextual rules. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING), Taiwan (2002)
Google Scholar
Stalls, B.G., Knight, K.: Translating names and technical terms in Arabic text. In: Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages (1998)
Google Scholar
Tsujii, K.: Automatic extraction of translational Japanese-KATAKANA and English word pairs from bilingual corpora. International Journal of Computer Processing of Oriental Languages 15(3), 261–279 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Communication Engineering, National Chiao Tung University, 1001, Ta Hsueh Road, Hsinchu, Taiwan, ROC
Tracy Lin
Dept of Computer Science, National Tsing Hua Univ, 101, Sec. 2, Kuang Fu Rd, Hsinchu, Taiwan, ROC
Jian-Cheng Wu & Jason S. Chang

Authors

Tracy Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Cheng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jason S. Chang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, 15213, Pittsburgh, PA, USA
Robert E. Frederking
Intelligence Technology Innovation Center, 20505, Washington, D.C., USA
Kathryn B. Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, T., Wu, JC., Chang, J.S. (2004). Extraction of Name and Transliteration in Monolingual and Parallel Corpora. In: Frederking, R.E., Taylor, K.B. (eds) Machine Translation: From Real Users to Research. AMTA 2004. Lecture Notes in Computer Science(), vol 3265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30194-3_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-30194-3_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23300-8
Online ISBN: 978-3-540-30194-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics