Machine Learning Applied to Rule-Based Machine Translation

  • Annette RiosEmail author
  • Anne Göhring
Part of the Theory and Applications of Natural Language Processing book series (NLP)


Lexical and morphological ambiguities present a serious challenge in rule-based machine translation (RBMT). This chapter describes an approach to resolve morphologically ambiguous verb forms if a rule-based decision is not possible due to parsing or tagging errors. The rule-based core system has a set of rules to decide, based on context information, which verb form should be generated in the target language. However, if the parse tree is not correct, part of the context information might be missing and the rules cannot make a safe decision. In this case, we use a classifier to assign a verb form. We tested the classifier on a set of four texts, increasing the correct verb forms in the translation from 78.68 %, with the purely rule-based disambiguation, to 95.11 % with the hybrid approach.


Relative Clause Parse Tree Head Noun Main Clause Subordinate Clause 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research is funded by the Swiss National Science Foundation under grant 100015_132219/1.


  1. Adelaar, W.F.H., and P. Muysken. 2004. The languages of the Andes. Cambridge language surveys. Cambridge: Cambridge University Press.Google Scholar
  2. Alegria, I., A. Casillas, A. Díaz de Ilarraza, J. Iguartua, G. Labaka, M. Lersundi, A. Mayor, and K. Sarasola. 2008. Mixing approaches to MT for Basque: Selecting the best output from RBMT, EBMT and SMT. In Proceedings of the MATMT2008 Workshop: Mixing Approaches to Machine Translation.Google Scholar
  3. Chang, C.C., and C.J. Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3):27:1–27:27.Google Scholar
  4. Cusihuamán, A.G. 2001. Gramática Quechua: Cuzco-Collao, 2nd ed. Serie Saber Andino. Lima: Ministerio de Educación.Google Scholar
  5. Dedenbach-Salazar Sáenz, S., U. von Gleich, R. Hartmann, P. Masson, and C. Soto Ruiz. 2002. Rimaykullayki - Unterrichtsmaterialien zum Quechua Ayacuchano, 4th ed. Berlin: Dietrich Reimer Verlag GmbH.Google Scholar
  6. Eisele, A., C. Federmann, H. Uszkoreit, H. Saint-Amand, M. Kay, M. Jellinghaus, S. Hunsicker, T. Herrmann, and Y. Chen. 2008. Hybrid machine translation architectures within and beyond the EuroMatrix project. In Proceedings of the European Machine Translation Conference EAMT, European Association for Machine Translation, 27–34.Google Scholar
  7. España-Bonet, C., G. Labaka, A. Díaz de Ilarraza, L. Màrquez, and K. Sarasola. 2011. Hybrid machine translation guided by a rule-based system. In Proceedings of the 13th Machine Translation Summit, Xiamen, 554–561.Google Scholar
  8. Gonzalez-Agirre, A., E. Laparra, and G. Rigau. 2012. Multilingual central repository version 3.0: Upgrading a very large lexical knowledge base. In Proceedings of the Sixth International Global WordNet Conference (GWC’12), Matsue.Google Scholar
  9. Hunsicker, S., Y. Chen, and C. Federmann. 2012. Machine learning for hybrid machine translation. In Proceedings of the Seventh Workshop on Statistical Machine Translation, Montreal, 312–316.Google Scholar
  10. Marimon, M., N. Seghezzi, and N. Bel. 2007. An open-source Lexicon for Spanish. Procesamiento del Lenguaje Natural 39:131–137.Google Scholar
  11. Marimon, M., B. Fisas, N. Bel, B. Arias, S. Vázquez, J. Vivaldi, S. Torner, M. Villegas, and M. Lorente. 2012. The IULA treebank. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul.Google Scholar
  12. Melero, M., A. Oliver, T. Badia, and T. Suñol. 2007. Dealing with bilingual divergences in MT using target language N-gram models. In Proceedings of the METIS-II Workshop: New Approaches to Machine Translation, Leuven, 19–26.Google Scholar
  13. Oepen, S., E. Velldal, J.T. Lønning, P. Meurer, V. Rosén, and D. Flickinger. 2007. Towards hybrid quality-oriented machine translation. On linguistics and probabilities in MT. In Proceedings of Theoretical and Methodological Issues in Machine Translation, Skövde.Google Scholar
  14. Rios, A., and A. Göhring. 2013. Machine learning disambiguation of Quechua verb morphology. In Proceedings of the Second Workshop on Hybrid Approaches to Translation, Sofia, 13–18.Google Scholar
  15. Rudnick, A., and M. Gasser. 2013. Lexical selection for hybrid MT with sequence labeling. In Proceedings of the Second Workshop on Hybrid Approaches to Translation, Sofia, 102–108.Google Scholar
  16. Sawaf. H. 2010. Arabic dialect handling in hybrid machine translation. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas.Google Scholar
  17. Smith, J., and S. Clark. 2009. EBMT for SMT: A new EBMT-SMT hybrid. In Proceedings of the 3rd Workshop on ExampleBased Machine Translation, 3–10.Google Scholar
  18. Taulé, M., M.A. Martí, and M. Recasens. 2008. AnCora: Multilevel annotated corpora for Catalan and Spanish. In Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech.Google Scholar
  19. Valderrama Fernández, R., and C. Escalante Gutiérrez. 1982. Gregorio Condori Mamani: Autobiografía. Cuzco: Centro Bartolomé de las Casas.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Institute of Computational LinguisticsUniversity of ZurichZurichSwitzerland

Personalised recommendations