Direct Combination of Spelling and Pronunciation Information for Robust Back-Transliteration

Bilac, Slaven; Tanaka, Hozumi

doi:10.1007/978-3-540-30586-6_44

Slaven Bilac¹⁷ &
Hozumi Tanaka¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2240 Accesses
5 Citations

Abstract

Transliterating words and names from one language to another is a frequent and highly productive phenomenon. For example, English word cache is transliterated in Japanese as キャツシェ “kyasshu”. Transliteration is information losing since important distinctions are not always preserved in the process. Hence, automatically converting transliterated words back into their original form is a real challenge. Nonetheless, due to its wide applicability in MT and CLIR, it is an interesting problem from a practical point of view.

In this paper, we demonstrate that back-transliteration accuracy can be improved by directly combining grapheme-based (i.e. spelling) and phoneme-based (i.e. pronunciation) information. Rather than producing back-transliterations based on grapheme and phoneme model independently and then interpolating the results, we propose a method of first combining the sets of allowed rewrites (i.e. edits) and then calculating the back-transliterations using the combined set. Evaluation on both Japanese and Chinese transliterations shows that direct combination increases robustness and positively affects back-transliteration accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Knight, K., Graehl, J.: Machine transliteration. Computational Linguistics 24, 599–612 (1998)
Google Scholar
Fujii, A., Ishikawa, T.: Japanese/English cross-language information retrieval: Exploration of query translation and transliteration. Computers and Humanities 35, 389–420 (2001)
Article Google Scholar
Lin, W.H., Chen, H.H.: Backward machine transliteration by learning phonetic similarity. In: Proc. of the Sixth Conference on Natural Language Learning, pp. 139–145 (2002)
Google Scholar
Stalls, B.G., Knight, K.: Translating names and technical terms in Arabic text. In: Proc. of the COLING/ACL Workshop on Computational Approaches to Semitic Languages (1998)
Google Scholar
Jeong, K.S., Myaeng, S.H., Lee, J.S., Choi, K.S.: Automatic identification and back-transliteration of foreign words for information retrieval. Information Processing and Management 35, 523–540 (1999)
Article Google Scholar
Kang, B.J., Choi, K.S.: Effective foreign word extraction for Korean information retrieval. Information Processing and Management 38, 91–109 (2002)
Article MATH Google Scholar
Bilac, S., Tanaka, H.: A hybrid back-transliteration system for Japanese. In: Proc. of the 20th International Conference on Computational Linguistics (COLING 2004), pp. 597–603 (2004)
Google Scholar
Kang, B.J., Choi, K.S.: Automatic transliteration and back-transliteration by decision tree learning. In: Proc. of the Second International Conference on Language Resources and Evaluation (2000)
Google Scholar
Goto, I., Kato, N., Uratani, N., Ehara, T.: Transliteration considering context information based on the maximum entropy method. In: Proc. of the IXth MT Summit (2003)
Google Scholar
Li, H., Zhang, M., Su, J.: A joint source-channel model for machine transliteration. In: Proc. of the 42th Annual Meeting of the Association for Computational Linguistics, pp. 159–166 (2004)
Google Scholar
Brill, E., Kacmarcik, G., Brockett, C.: Automatically harvesting katakana-English term pairs from search engine query logs. In: Proc. of the Sixth Natural Language Processing Pacific Rim Symposium, pp. 393–399 (2001)
Google Scholar
Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proc. of the 38th Annual Meeting of the Association for Computational Linguistics, pp. 286–293 (2000)
Google Scholar
Damerau, F.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7, 659–664 (1964)
Article Google Scholar
Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics–Doklady 10, 707–710 (1966)
MathSciNet Google Scholar
Oh, J.H., Choi, K.S.: An English-Korean transliteration model using pronunciation and contextual rules. In: Proc. of the 19th International Conference on Computational Linguistics, pp. 758–764 (2002)
Google Scholar
Eppstein, D.: Finding the k shortest paths. In: Proc. of the 35th Symposium on the Foundations of Computer Science, pp. 154–165 (1994)
Google Scholar
Bilac, S., Tanaka, H.: Improving back-transliteration by combining information sources. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 542–547. Springer, Heidelberg (2005)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete via the EM algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Pereira, F.C.N., Riley, M.: Speech recognition by composition of weighted finite automata. In: Roche, E., Shabes, Y. (eds.) Finite-State Language Processing, pp. 431–453. MIT Press, Cambridge (1997)
Google Scholar
Breen, J.: EDICT Japanese/English dictionary file (2003), Available ftp://ftp.cc.monash.edu.au/pub/nihongo
EDR: EDR Electronic Dictionary Technical Guide. Japan Electronic Dictionary Research Institute, Ltd. (1995) (in Japanese)
Google Scholar
Kando, N., Kuriyama, K., Yoshioka, M.: Overview of Japanese and English Information Retrieval Tasks (JEIR) at the Second NTCIR Wordshop. In: Proc. of NTCIR Workshop, vol. 2 (2001)
Google Scholar
Carnegie Mellon University: The CMU pronouncing dictionary (1998), Available http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Mohri, M., Pereira, F.C.N., Riley, M.: AT&T FSM library (2003), Available http://www.research.att.com/~mohri/fsm
Xinhua News Agency: Chinese transliteration of foreign personal names. The Commercial Press (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Tokyo Institute of Technology, Ookayama 2-12-1, Meguro, 152-8552, Tokyo, Japan
Slaven Bilac & Hozumi Tanaka

Authors

Slaven Bilac
View author publications
You can also search for this author in PubMed Google Scholar
Hozumi Tanaka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bilac, S., Tanaka, H. (2005). Direct Combination of Spelling and Pronunciation Information for Robust Back-Transliteration. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_44

Download citation

DOI: https://doi.org/10.1007/978-3-540-30586-6_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics