Abstract
A not-translated word (NTW) is a token which a machine translation (MT) system is unable to translate, leaving it untranslated in the output. The number of not-translated words in a document is used as one measure in the evaluation of MT systems. Many MT developers agree that in order to reduce the number of NTWs in their systems, designers must increase the size or coverage of the lexicon to include these untranslated tokens, so that the system can handle them in future processing. While we accept this method for enhancing MT capabilities, in assessing the nature of NTWs in real-world documents, we found surprising results. Our study looked at the NTW output from two commercially available MT systems (Systran and Globalink) and found that lexical coverage played a relatively small role in the words marked as not translated. In fact, 45% of the tokens in the list failed to translate for reasons other than that they were valid source language words not included in the MT lexicon. For instance, e-mail addresses, words already in the target language and acronyms were marked as not-translated words. This paper presents our analysis of NTWs and uses these results to argue that in addition to lexicon enhancement, MT systems could benefit from more sophisticated pre- and postprocessing of real-world documents in order to weed out such NTWs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bech, A. 1997. MT From an Everyday User’s Point of View. In MT Summit. pp. 98–105.
Dorr, B. 1997. Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation. Machine Translation, vol. 12, no. 1, pp. 1–55.
Flanagan, M. 1994. Error Classification for MT Evaluation. In Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, MD.
Flanagan, M. 1996. Two Years Online: Experiences, Challenges and Trends. In Expanding MT Horizons: Proceedings of the Second Conference of the Association for Machine Translation in the Americas, (pp. 192–197). Washington, DC: AMTA.
Knight, K. & J. Graehl. 1997. Machine Transliteration. In Proceedings of the 35 th Annual meeting of the Association of Computational Linguistics
Kukich, K. 1992. Techniques for Automatically Correcting Words in Text. ACM Computing Surveys, Vol. 24, No. 4, Dec. 1992.
Kukich, K., 1992. Spelling Correction for the Telecommunications Network for the Deaf. Communications of the ACM, Vol. 35, no. 5, May 1992, pp. 80–90.
Kumhyr, D., C. Merrill, K. Spalink. 1994. Internationalization and Translatability. In Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, MD.
Somers, H.L. 1997. The Current State of Machine Translation. In MT-Summit. San Diego, Calif. pp. 115–123.
Volk, M. 1997. Probing the lexicon in evaluating commercial MT systems. In Proceedings of the 35 th Annual Meeting of the Association of Computational Linguistics.
Wilks, Y., Slator, B., Guthrie, L. 1996. Electric Words: Dictionaries, Computers, and Meanings. MIT Press.
Yarowsky, D. 1994. Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. In Proceedings of the 32 nd Annual Meeting of the Association of Computational Linguistics
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Reeder, F., Loehr, D. (1998). Finding the Right Words: An Analysis of Not-Translated Words in Machine Translation. In: Farwell, D., Gerber, L., Hovy, E. (eds) Machine Translation and the Information Soup. AMTA 1998. Lecture Notes in Computer Science(), vol 1529. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49478-2_32
Download citation
DOI: https://doi.org/10.1007/3-540-49478-2_32
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65259-5
Online ISBN: 978-3-540-49478-2
eBook Packages: Springer Book Archive