Learning Rules to Improve a Machine Translation System

  • David Kauchak
  • Charles Elkan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2837)


In this paper we show how to learn rules to improve the performance of a machine translation system. Given a system consisting of two translation functions (one from language A to language B and one from B to A), training text is translated from A to B and back again to A. Using these two translations, differences in knowledge between the two translation functions are identified, and rules are learned to improve the functions. Context-independent rules are learned where the information suggests only a single possible translation for a word. When there are multiple alternate translations for a word, a likelihood ratio test is used to identify words that co-occur with each case significantly. These words are then used as context in context-dependent rules. Applied on the Pan American Health Organization corpus of 20,084 sentences, the learned rules improve the understandability of the translation produced by the SDL International engine on 78% of sentences, with high precision.


Machine Translation Word List Ambiguity Resolution Translation System Context Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [Brill, 1995]
    Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics 21(4), 543–565 (1995)Google Scholar
  2. [Damianos et al., 2002]
    Damianos, L., Ponte, J., Wohlever, S., Reeder, F., Day, D., Wilson, G., Hirschman, L.: MiTAP for Biosecurity: A Case Study. AI Magazine, 13–29 (Winter 2002)Google Scholar
  3. [Dunning, 1993]
    Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19(1), 61–74 (1993)Google Scholar
  4. [, 2002] powered by SDL International’s Enterprise Translation Server
  5. [Koehn and Knight, 2001]
    Koehn, P., Knight, K.: Knowledge Sources for Word- Level Translation Models. In: Empirical Methods in Natural Language Processing conference (2001)Google Scholar
  6. [Krenn, 2000]
    Krenn, B.: Collocation Mining: Exploiting Corpora for Collocation Identification and Representation. In: Proceedings of The Ninth EURALEX International Congress (2000)Google Scholar
  7. [Melamed, 2001]
    Melamed, D.: Empirical Methods for Exploiting Parallel Texts. The MIT Press, Cambridge (2001)Google Scholar
  8. [Macklovitch and Hannan, 1996]
    Macklovitch, E., Hannan, M.: Line’Em Up: Advance. In: Alignment Technology And Their Impact on Translation Support Tools. In: Proceedings of the Second Conference of the Association for Machine Translation in the Americas, pp. 41–57 (1996)Google Scholar
  9. [Manning and Schütze , 1999]
    Manning, C., Shütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  10. [PAHO, 2002]
    Pan American Health Organization documents,
  11. [Papineni et al., 2001]
    Papineni, K.A., Roukos, S., Ward, T., Zhu, W.J.: Bleu: A Method for Automatic Evaluation of Machine Translation. IBM Research Report, RC22176 (2001)Google Scholar
  12. [Porter, 1998]
    Porter, M.: An Algorithm for Suffix Stripping. Program (Automated Library and Information Systems) 14(3), 130–137 (1980)CrossRefGoogle Scholar
  13. [Red Hat, 2002]
    Linux, Red Hat 7.2, English word list /usr/dict/wordsGoogle Scholar
  14. [Systran, 2002]
    Systran Corporation (portals: Google, AOL, AltaVista, CompuServe, Lycos, Scholar
  15. [Yamada and Knight, 2001]
    Yamada, K., Knight, K.: A Syntax-based Statistical Translation Model. In: Proceedings of the Association for Computational Linguistics, pp. 523–530 (2001)Google Scholar
  16. [Yarowsky, 1994]
    Yarowsky, D.: Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish And French. In: Proceedings of the Association for Computational Linguistics, pp. 77–95 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • David Kauchak
    • 1
  • Charles Elkan
    • 1
  1. 1.Department of Computer ScienceUniversity of CaliforniaSan Diego, La JollaUSA

Personalised recommendations