Factored Translation between Brazilian Portuguese and English

  • Helena de Medeiros Caseli
  • Israel Aono Nunes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6404)


Factored translation is an extension of the state-of-the-art phrase-based statistical machine translation (PB-SMT). The main difference in factored translation approach is that a word is not only a token (its surface form) but a vector composed of different information such as lemma, part-of-speech or morphologic/syntactic tags. In this paper we present some experiments carried out to train and test factored translation models on Brazilian Portuguese and English texts. Using part-of-speech and morphological information, the factored models showed better results than the baseline (a PB-SMT), but the same gain in performance was not reached when flat syntactic tags were considered.


Machine Translation Surface Form Parse Tree Factor Translation Statistical Machine Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual meeting of the Association for Computational Linguistics (ACL 2002), pp. 311–318 (2002)Google Scholar
  2. 2.
    Doddington, G.: Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In: Proceedings of the Human Language Technology Conference (HLT 2002), pp. 128–132 (2002)Google Scholar
  3. 3.
    Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the Human Language Technology (HLT/NAACL 2003), pp. 127–133 (2003)Google Scholar
  4. 4.
    Och, F.J., Ney, H.: The Alignment Template Approach to Statistical Machine Translation. Computational Linguistics 30(4), 417–449 (2004)CrossRefzbMATHGoogle Scholar
  5. 5.
    Koehn, P., Hoang, H.: Factored Translation Models. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, Prague, pp. 868–876 (June 2007)Google Scholar
  6. 6.
    Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the 38th Annual Meeting of the ACL (ACL 2000), Hong Kong, China, pp. 440–447 (2000)Google Scholar
  7. 7.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open Source Toolkit for Statistical Machine Translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic, pp. 177–180 (June 2007)Google Scholar
  8. 8.
    Och, F.J.: Minimum error rate training for statistical machine translation. In: Proceedings of the 41st Annual Meeting of the Association of Computational Linguistics (ACL) (2003)Google Scholar
  9. 9.
    Bojar, O.: English-to-Czech Factored Machine Translation. In: Proceedings of the Second Workshop on Statistical Machine Translation, ACL, Prague, pp. 232–239 (June 2007)Google Scholar
  10. 10.
    Bojar, O., Hajič, J.: Phrase-Based and Deep Syntactic English-to-Czech Statistical Machine Translation. In: Proceedings of the Third Workshop on Statistical Machine Translation, ACL, Columbus, Ohio, USA, pp. 143–146 (June 2008)Google Scholar
  11. 11.
    Zhang, Y., Vogel, S., Waibel, A.: Interpreting Bleu/NIST scores: How much improvement do we need to have a better system? In: Proceedings of LREC 2004, Lisbon, Portugal (May 2004)Google Scholar
  12. 12.
    Caseli, H.M., Nunes, M.G.V., Forcada, M.L.: Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation. Machine Translation 20, 227–245 (2006)CrossRefGoogle Scholar
  13. 13.
    Armentano-Oller, C., Carrasco, R.C., Corbí-Bellot, A.M., Forcada, M.L., Ginestí-Rosell, M., Ortiz-Rojas, S., Pérez-Ortiz, J.A., Ramírez-Sánchez, G., Sánchez-Martínez, F., Scalco, M.A.: Open-source Portuguese-Spanish machine translation. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 50–59. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Brown, P.F., Pietra, V.J., Pietra, S.A.D., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19, 263–311 (1993)Google Scholar
  15. 15.
    Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment statistical translation. In: Proceedings of the 16th International Conference on Computational Linguistics (COLING 1996), pp. 836–841 (1996)Google Scholar
  16. 16.
    Koehn, P., Birch, A., Steinberger, R.: 462 Machine Translation Systems for Europe. In: Machine Translation Summit XII (2009)Google Scholar
  17. 17.
    Koehn, P., Federico, M., Shen, W., Bertoldi, N., Bojar, O., Callison-Burch, C., Cowan, B., Dyer, C., Hoang, H., Zens, R., Constantin, A., Moran, C., Herbst, E.: Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Confusion Network Decoding. Technical report, Johns Hopkins University – Center for Speech and Language Processing (September 2007)Google Scholar
  18. 18.
    Galley, M., Graehl, J., Knight, K., Marcu, D., Deneefe, S., Wang, W., Thayer, I.: Scalable inference and training of context-rich syntactic translation models. In: ACL, pp. 961–968 (2006)Google Scholar
  19. 19.
    Nguyen, T.P., Shimazu, A., Ho, T.B., Nguyen, M.L., Nguyen, V.V.: A tree-to-string phrase-based model for statistical machine translation. In: CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning, Manchester, England, Coling 2008 Organizing Committee, pp. 143–150 (August 2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Helena de Medeiros Caseli
    • 1
  • Israel Aono Nunes
    • 1
  1. 1.Department of Computer ScienceFederal University of São Carlos (UFSCar)São CarlosBrazil

Personalised recommendations