Skip to main content

Direct Combination of Spelling and Pronunciation Information for Robust Back-Transliteration

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Abstract

Transliterating words and names from one language to another is a frequent and highly productive phenomenon. For example, English word cache is transliterated in Japanese as キャツシェ “kyasshu”. Transliteration is information losing since important distinctions are not always preserved in the process. Hence, automatically converting transliterated words back into their original form is a real challenge. Nonetheless, due to its wide applicability in MT and CLIR, it is an interesting problem from a practical point of view.

In this paper, we demonstrate that back-transliteration accuracy can be improved by directly combining grapheme-based (i.e. spelling) and phoneme-based (i.e. pronunciation) information. Rather than producing back-transliterations based on grapheme and phoneme model independently and then interpolating the results, we propose a method of first combining the sets of allowed rewrites (i.e. edits) and then calculating the back-transliterations using the combined set. Evaluation on both Japanese and Chinese transliterations shows that direct combination increases robustness and positively affects back-transliteration accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Knight, K., Graehl, J.: Machine transliteration. Computational Linguistics 24, 599–612 (1998)

    Google Scholar 

  2. Fujii, A., Ishikawa, T.: Japanese/English cross-language information retrieval: Exploration of query translation and transliteration. Computers and Humanities 35, 389–420 (2001)

    Article  Google Scholar 

  3. Lin, W.H., Chen, H.H.: Backward machine transliteration by learning phonetic similarity. In: Proc. of the Sixth Conference on Natural Language Learning, pp. 139–145 (2002)

    Google Scholar 

  4. Stalls, B.G., Knight, K.: Translating names and technical terms in Arabic text. In: Proc. of the COLING/ACL Workshop on Computational Approaches to Semitic Languages (1998)

    Google Scholar 

  5. Jeong, K.S., Myaeng, S.H., Lee, J.S., Choi, K.S.: Automatic identification and back-transliteration of foreign words for information retrieval. Information Processing and Management 35, 523–540 (1999)

    Article  Google Scholar 

  6. Kang, B.J., Choi, K.S.: Effective foreign word extraction for Korean information retrieval. Information Processing and Management 38, 91–109 (2002)

    Article  MATH  Google Scholar 

  7. Bilac, S., Tanaka, H.: A hybrid back-transliteration system for Japanese. In: Proc. of the 20th International Conference on Computational Linguistics (COLING 2004), pp. 597–603 (2004)

    Google Scholar 

  8. Kang, B.J., Choi, K.S.: Automatic transliteration and back-transliteration by decision tree learning. In: Proc. of the Second International Conference on Language Resources and Evaluation (2000)

    Google Scholar 

  9. Goto, I., Kato, N., Uratani, N., Ehara, T.: Transliteration considering context information based on the maximum entropy method. In: Proc. of the IXth MT Summit (2003)

    Google Scholar 

  10. Li, H., Zhang, M., Su, J.: A joint source-channel model for machine transliteration. In: Proc. of the 42th Annual Meeting of the Association for Computational Linguistics, pp. 159–166 (2004)

    Google Scholar 

  11. Brill, E., Kacmarcik, G., Brockett, C.: Automatically harvesting katakana-English term pairs from search engine query logs. In: Proc. of the Sixth Natural Language Processing Pacific Rim Symposium, pp. 393–399 (2001)

    Google Scholar 

  12. Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proc. of the 38th Annual Meeting of the Association for Computational Linguistics, pp. 286–293 (2000)

    Google Scholar 

  13. Damerau, F.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7, 659–664 (1964)

    Article  Google Scholar 

  14. Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics–Doklady 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  15. Oh, J.H., Choi, K.S.: An English-Korean transliteration model using pronunciation and contextual rules. In: Proc. of the 19th International Conference on Computational Linguistics, pp. 758–764 (2002)

    Google Scholar 

  16. Eppstein, D.: Finding the k shortest paths. In: Proc. of the 35th Symposium on the Foundations of Computer Science, pp. 154–165 (1994)

    Google Scholar 

  17. Bilac, S., Tanaka, H.: Improving back-transliteration by combining information sources. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 542–547. Springer, Heidelberg (2005)

    Google Scholar 

  18. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete via the EM algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  19. Pereira, F.C.N., Riley, M.: Speech recognition by composition of weighted finite automata. In: Roche, E., Shabes, Y. (eds.) Finite-State Language Processing, pp. 431–453. MIT Press, Cambridge (1997)

    Google Scholar 

  20. Breen, J.: EDICT Japanese/English dictionary file (2003), Available ftp://ftp.cc.monash.edu.au/pub/nihongo

  21. EDR: EDR Electronic Dictionary Technical Guide. Japan Electronic Dictionary Research Institute, Ltd. (1995) (in Japanese)

    Google Scholar 

  22. Kando, N., Kuriyama, K., Yoshioka, M.: Overview of Japanese and English Information Retrieval Tasks (JEIR) at the Second NTCIR Wordshop. In: Proc. of NTCIR Workshop, vol. 2 (2001)

    Google Scholar 

  23. Carnegie Mellon University: The CMU pronouncing dictionary (1998), Available http://www.speech.cs.cmu.edu/cgi-bin/cmudict

  24. Mohri, M., Pereira, F.C.N., Riley, M.: AT&T FSM library (2003), Available http://www.research.att.com/~mohri/fsm

  25. Xinhua News Agency: Chinese transliteration of foreign personal names. The Commercial Press (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bilac, S., Tanaka, H. (2005). Direct Combination of Spelling and Pronunciation Information for Robust Back-Transliteration. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30586-6_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24523-0

  • Online ISBN: 978-3-540-30586-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics