Abstract
Named-entities in free text represent a challenge to text analysis in Machine Translation and Cross Language Information Retrieval. These phrases are often transliterated into another language with a different sound inventory and writing system. Named-entities found in free text are often not listed in bilingual dictionaries. Although it is possible to identify and translate named-entities on the fly without a list of proper names and transliterations, an extensive list of existing transliterations certainly will ensure high precision rate. We use a seed list of proper names and transliterations to train a Machine Transliteration Model. With the model it is possible to extract proper names and their transliterations in monolingual or parallel corpora with high precision and recall rates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 400–408 (2002)
Chen, H.H., Huang, S.-J., Ding, Y.-W., Tsai., S.-C.: Proper name translation in crosslanguage information retrieval. In: Proceedings of 17th COLING and 36th ACL, pp. 232–236 (1998)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)
Knight, K., Graehl, J.: Machine transliteration. Computational Linguistics 24(4), 599–612 (1998)
Lee, C.J., Jason, S.: Acquisition of English-Chinese Transliterated Word Pairs from Parallel- Aligned Texts using a Statistical Machine Transliteration Model. In: Proceedings of HLT-NAACL 2003 Workshop, pp. 96–103 (2003)
Lee, J.S., Choi, K.-S.: A statistical method to generate various foreign word transliterations in multilingual information retrieval system. In: Proceedings of the 2nd International Workshop on Information Retrieval with Asian Languages (IRAL 1997), Tsukuba, Japan, pp. 123–128 (1997)
Lin, T., Wu, C. J., Chang, J.S.: Word Transliteration Alignment. In: Proceedings of the fifteenth Research on Computational Linguistics Conference, ROCLING XV, Hsinchu (2003)
Lin, A., Lin, W.-H., Chen, H.-H.: Backward transliteration by learning phonetic similarity. In: CoNLL-2002, Sixth Conference on Natural Language Learning, Taiwan (2002)
Oh, J.-H., Choi, K.-S.: An English-Korean transliteration model using pronunciation and contextual rules. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING), Taiwan (2002)
Stalls, B.G., Knight, K.: Translating names and technical terms in Arabic text. In: Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages (1998)
Tsujii, K.: Automatic extraction of translational Japanese-KATAKANA and English word pairs from bilingual corpora. International Journal of Computer Processing of Oriental Languages 15(3), 261–279 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, T., Wu, JC., Chang, J.S. (2004). Extraction of Name and Transliteration in Monolingual and Parallel Corpora. In: Frederking, R.E., Taylor, K.B. (eds) Machine Translation: From Real Users to Research. AMTA 2004. Lecture Notes in Computer Science(), vol 3265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30194-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-30194-3_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23300-8
Online ISBN: 978-3-540-30194-3
eBook Packages: Springer Book Archive