Similarity-based model for transliteration

Fattah, Mohamed Abdel; Ren, Fuji

doi:10.1007/978-0-387-76483-2_17

Mohamed Abdel Fattah^3,4 &
Fuji Ren⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 11))

1591 Accesses

Abstract

A significant proportion of out of vocabulary (OOV) words are named entities and technical terms. Typical analyses find around 50% of OOV words to be named entities. Yet these can be the most important words in the queries. For example, in the list of queries for TREC 2001 cross-language track, all 25 queries contained proper names. Cross-language retrieval performance (average precision) reduced more than 50% when named entities in the queries were not translated. One way to deal with OOV words when the two languages have different alphabets is to transliterate the unknown words, that is, to render them in the orthography of the second language. Transliteration is the process of formulating a representation of words in one language using the alphabet of another language. In the present study, we present different approaches for transliteration of proper noun pair’s extraction from parallel corpora based on different similarity measures between the English and the romanized Arabic proper nouns under consideration. The strength of our new system is that it works well for low-frequency proper noun pairs. We evaluate the presented new approaches using two different English–Arabic parallel corpora. Most of our results outperform previously published results in terms of precision, recall, and F-Measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Al-Onaizan Y, Knight K (2002) Translating named entities using monolingual and bilingual resources. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, pp 400–408
Google Scholar
Stalls B, Knight K (1998) Translating names and technical terms in Arabic text. In: Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages
Google Scholar
Chen HH, Huang SJ, Ding YW, Tsai SC (1998) Proper name translation in cross-language information retrieval. In: Proceedings of 17th COLING and 36th ACL, pp 232–236
Google Scholar
Knight K, Graehl J (1998) Machine transliteration. Computational Linguistics 24(4):599– 612
Google Scholar
Kang BJ, Choi KS (2001) Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval. International Journal of Computer Processing of Oriental Languages 14(2):109–131
Article Google Scholar
Fattah M, Ren F, Kuroiwa S (2006a) Stemming to improve translation lexicon creation form bitexts. Information Processing & Management 42(4):1003–1016
Article Google Scholar
Fung P (1995) A pattern matching method for finding noun and proper noun translations from noisy parallel corpora. CoRR cmp-lg/9505016
Google Scholar
Fung P, Yee L (1998) An IR approach for translating new words from nonparallel, comparable texts. COLING-ACL, pp 414-420
Google Scholar
Fung P (1998) A statistical view on bilingual lexicon extraction: from parallel corpora to non-parallel corpora. AMTA (1998), pp 1-17
Google Scholar
McEnery AM, Oakes MP (1996) Sentence and word alignment in the crater project. In: Thomas J, Short M (eds) Using Corpora for Language Research, Longman, London, pp 211–231
Google Scholar

Download references

Acknowledgment

This research has been partially supported by the Japan Society for the Promotion of Science (JSPS), Grant No. 07077, and the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research (B), 19300029.

Author information

Authors and Affiliations

Faculty of Engineering, University of Tokushima, 2-1 Minamijosanjima Tokushima, 770-8506, Tokushima, Japan
Mohamed Abdel Fattah
FIE, Helwan University, Cairo, Egypt
Mohamed Abdel Fattah
School of Information Engineering, Beijing University of Posts & Telecommunications, 100088, Beijing, China
Fuji Ren

Authors

Mohamed Abdel Fattah
View author publications
You can also search for this author in PubMed Google Scholar
Fuji Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Abdel Fattah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fattah, M.A., Ren, F. (2009). Similarity-based model for transliteration. In: Mastorakis, N., Sakellaris, J. (eds) Advances in Numerical Methods. Lecture Notes in Electrical Engineering, vol 11. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-76483-2_17

Download citation

DOI: https://doi.org/10.1007/978-0-387-76483-2_17
Published: 29 May 2009
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-76482-5
Online ISBN: 978-0-387-76483-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics