Skip to main content

Extraction of Name and Transliteration in Monolingual and Parallel Corpora

  • Conference paper
Machine Translation: From Real Users to Research (AMTA 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3265))

Included in the following conference series:

Abstract

Named-entities in free text represent a challenge to text analysis in Machine Translation and Cross Language Information Retrieval. These phrases are often transliterated into another language with a different sound inventory and writing system. Named-entities found in free text are often not listed in bilingual dictionaries. Although it is possible to identify and translate named-entities on the fly without a list of proper names and transliterations, an extensive list of existing transliterations certainly will ensure high precision rate. We use a seed list of proper names and transliterations to train a Machine Transliteration Model. With the model it is possible to extract proper names and their transliterations in monolingual or parallel corpora with high precision and recall rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 400–408 (2002)

    Google Scholar 

  2. Chen, H.H., Huang, S.-J., Ding, Y.-W., Tsai., S.-C.: Proper name translation in crosslanguage information retrieval. In: Proceedings of 17th COLING and 36th ACL, pp. 232–236 (1998)

    Google Scholar 

  3. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  4. Knight, K., Graehl, J.: Machine transliteration. Computational Linguistics 24(4), 599–612 (1998)

    Google Scholar 

  5. Lee, C.J., Jason, S.: Acquisition of English-Chinese Transliterated Word Pairs from Parallel- Aligned Texts using a Statistical Machine Transliteration Model. In: Proceedings of HLT-NAACL 2003 Workshop, pp. 96–103 (2003)

    Google Scholar 

  6. Lee, J.S., Choi, K.-S.: A statistical method to generate various foreign word transliterations in multilingual information retrieval system. In: Proceedings of the 2nd International Workshop on Information Retrieval with Asian Languages (IRAL 1997), Tsukuba, Japan, pp. 123–128 (1997)

    Google Scholar 

  7. Lin, T., Wu, C. J., Chang, J.S.: Word Transliteration Alignment. In: Proceedings of the fifteenth Research on Computational Linguistics Conference, ROCLING XV, Hsinchu (2003)

    Google Scholar 

  8. Lin, A., Lin, W.-H., Chen, H.-H.: Backward transliteration by learning phonetic similarity. In: CoNLL-2002, Sixth Conference on Natural Language Learning, Taiwan (2002)

    Google Scholar 

  9. Oh, J.-H., Choi, K.-S.: An English-Korean transliteration model using pronunciation and contextual rules. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING), Taiwan (2002)

    Google Scholar 

  10. Stalls, B.G., Knight, K.: Translating names and technical terms in Arabic text. In: Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages (1998)

    Google Scholar 

  11. Tsujii, K.: Automatic extraction of translational Japanese-KATAKANA and English word pairs from bilingual corpora. International Journal of Computer Processing of Oriental Languages 15(3), 261–279 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lin, T., Wu, JC., Chang, J.S. (2004). Extraction of Name and Transliteration in Monolingual and Parallel Corpora. In: Frederking, R.E., Taylor, K.B. (eds) Machine Translation: From Real Users to Research. AMTA 2004. Lecture Notes in Computer Science(), vol 3265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30194-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30194-3_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23300-8

  • Online ISBN: 978-3-540-30194-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics