Skip to main content

Finding the Right Words: An Analysis of Not-Translated Words in Machine Translation

  • Conference paper
  • First Online:
Machine Translation and the Information Soup (AMTA 1998)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1529))

Included in the following conference series:

Abstract

A not-translated word (NTW) is a token which a machine translation (MT) system is unable to translate, leaving it untranslated in the output. The number of not-translated words in a document is used as one measure in the evaluation of MT systems. Many MT developers agree that in order to reduce the number of NTWs in their systems, designers must increase the size or coverage of the lexicon to include these untranslated tokens, so that the system can handle them in future processing. While we accept this method for enhancing MT capabilities, in assessing the nature of NTWs in real-world documents, we found surprising results. Our study looked at the NTW output from two commercially available MT systems (Systran and Globalink) and found that lexical coverage played a relatively small role in the words marked as not translated. In fact, 45% of the tokens in the list failed to translate for reasons other than that they were valid source language words not included in the MT lexicon. For instance, e-mail addresses, words already in the target language and acronyms were marked as not-translated words. This paper presents our analysis of NTWs and uses these results to argue that in addition to lexicon enhancement, MT systems could benefit from more sophisticated pre- and postprocessing of real-world documents in order to weed out such NTWs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bech, A. 1997. MT From an Everyday User’s Point of View. In MT Summit. pp. 98–105.

    Google Scholar 

  2. Dorr, B. 1997. Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation. Machine Translation, vol. 12, no. 1, pp. 1–55.

    Article  Google Scholar 

  3. Flanagan, M. 1994. Error Classification for MT Evaluation. In Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, MD.

    Google Scholar 

  4. Flanagan, M. 1996. Two Years Online: Experiences, Challenges and Trends. In Expanding MT Horizons: Proceedings of the Second Conference of the Association for Machine Translation in the Americas, (pp. 192–197). Washington, DC: AMTA.

    Google Scholar 

  5. Knight, K. & J. Graehl. 1997. Machine Transliteration. In Proceedings of the 35 th Annual meeting of the Association of Computational Linguistics

    Google Scholar 

  6. Kukich, K. 1992. Techniques for Automatically Correcting Words in Text. ACM Computing Surveys, Vol. 24, No. 4, Dec. 1992.

    Google Scholar 

  7. Kukich, K., 1992. Spelling Correction for the Telecommunications Network for the Deaf. Communications of the ACM, Vol. 35, no. 5, May 1992, pp. 80–90.

    Article  Google Scholar 

  8. Kumhyr, D., C. Merrill, K. Spalink. 1994. Internationalization and Translatability. In Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, MD.

    Google Scholar 

  9. Somers, H.L. 1997. The Current State of Machine Translation. In MT-Summit. San Diego, Calif. pp. 115–123.

    Google Scholar 

  10. Volk, M. 1997. Probing the lexicon in evaluating commercial MT systems. In Proceedings of the 35 th Annual Meeting of the Association of Computational Linguistics.

    Google Scholar 

  11. Wilks, Y., Slator, B., Guthrie, L. 1996. Electric Words: Dictionaries, Computers, and Meanings. MIT Press.

    Google Scholar 

  12. Yarowsky, D. 1994. Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. In Proceedings of the 32 nd Annual Meeting of the Association of Computational Linguistics

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Reeder, F., Loehr, D. (1998). Finding the Right Words: An Analysis of Not-Translated Words in Machine Translation. In: Farwell, D., Gerber, L., Hovy, E. (eds) Machine Translation and the Information Soup. AMTA 1998. Lecture Notes in Computer Science(), vol 1529. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49478-2_32

Download citation

  • DOI: https://doi.org/10.1007/3-540-49478-2_32

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65259-5

  • Online ISBN: 978-3-540-49478-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics