Skip to main content

Enriching Statistical Translation Models Using a Domain-Independent Multilingual Lexical Knowledge Base

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5449))

  • 1761 Accesses

Abstract

This paper presents a method for improving phrase-based Statistical Machine Translation systems by enriching the original translation model with information derived from a multilingual lexical knowledge base. The method proposed exploits the Multilingual Central Repository (a group of linked WordNets from different languages), as a domain-independent knowledge database, to provide translation models with new possible translations for a large set of lexical tokens. Translation probabilities for these tokens are estimated using a set of simple heuristics based on WordNet topology and local context. During decoding, these probabilities are softly integrated so they can interact with other statistical models. We have applied this type of domain-independent translation modeling to several translation tasks obtaining a moderate but significant improvement in translation quality consistently according to a number of standard automatic evaluation metrics. This improvement is especially remarkable when we move to a very different domain, such as the translation of Biblical texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sekine, S.: The Domain Dependence of Parsing. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 96–102 (1997)

    Google Scholar 

  2. Escudero, G., Marquez, L., Rigau, G.: An Empirical Study of the Domain Dependence of Supervised Word Disambiguation Systems. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 172–180 (2000)

    Google Scholar 

  3. He, S., Gildea, D.: Self-training and Co-training for Semantic Role Labeling: Primary Report. Technical report, TR 891, Department of Computer Science, University of Rochester (2006)

    Google Scholar 

  4. Callison-Burch, C., Fordyce, C., Koehn, P., Monz, C., Schroeder, J.: (Meta-) Evaluation of Machine Translation. In: Proceedings of the ACL Workshop on Statistical Machine Translation, pp. 136–158 (2007)

    Google Scholar 

  5. Giménez, J., Màrquez, L.: Low-cost Enrichment of Spanish WordNet with Automatically Translated Glosses: Combining General and Specialized Models. In: Proceedings of COLING-ACL (2006)

    Google Scholar 

  6. Atserias, J., Villarejo, L., Rigau, G., Agirre, E., Carroll, J., Magnini, B., Vossen, P.: The MEANING Multilingual Central Repository. In: Proceedings of the 2nd Global WordNet Conference (GWC) (2004)

    Google Scholar 

  7. Fellbaum, C. (ed.): WordNet. An Electronic Lexical Database. The MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  8. Giménez, J., Màrquez, L., Rigau, G.: Automatic Translation of WordNet Glosses. In: Proceedings of Cross-Language Knowledge Induction Workshop, EUROLAN Summer School (2005)

    Google Scholar 

  9. Brown, P.F., Cocke, J., Pietra, S.A.D., Pietra, V.J.D., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16(2), 76–85 (1990)

    Google Scholar 

  10. Och, F.J., Ney, H.: Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. In: Proceedings of the 40th ACL, pp. 295–302 (2002)

    Google Scholar 

  11. Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-Based Translation. In: Proceedings of the Joint Conference on Human Language Technology and the North American Chapter of the Association for Computational Linguistics (HLT-NAACL) (2003)

    Google Scholar 

  12. Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  13. Och, F.J.: Statistical Machine Translation: From Single-Word Models to Alignment Templates. PhD thesis, RWTH Aachen, Germany (2002)

    Google Scholar 

  14. Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: Proceedings of ICSLP (2002)

    Google Scholar 

  15. Koehn, P.: Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation Models. In: Proceedings of AMTA (2004)

    Google Scholar 

  16. Giménez, J., Màrquez, L.: Context-aware Discriminative Phrase Selection for Statistical Machine Translation. In: Proceedings of the ACL Workshop on Statistical Machine Translation, pp. 159–166 (2007)

    Google Scholar 

  17. Giménez, J., Màrquez, L.: SVMTool: A general POS tagger generator based on Support Vector Machines. In: Proceedings of 4th LREC, pp. 43–46 (2004)

    Google Scholar 

  18. Carreras, X., Chao, I., Padró, L., Padró, M.: FreeLing: An Open-Source Suite of Language Analyzers. In: Proceedings of the 4th LREC, pp. 239–242 (2004)

    Google Scholar 

  19. Koehn, P.: Europarl: A Multilingual Corpus for Evaluation of Machine Translation. Technical report (2003), http://people.csail.mit.edu/people/koehn/publications/europarl/

  20. Giménez, J., Amigó, E.: IQMT: A Framework for Automatic Machine Translation Evaluation. In: Proceedings of the 5th LREC, pp. 685–690 (2006)

    Google Scholar 

  21. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation, rc22176. Technical report, IBM T.J. Watson Research Center (2001)

    Google Scholar 

  22. Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the 2nd Internation Conference on Human Language Technology, pp. 138–145 (2002)

    Google Scholar 

  23. Nießen, S., Och, F.J., Leusch, G., Ney, H.: An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research. In: Proceedings of the 2nd LREC (2000)

    Google Scholar 

  24. Tillmann, C., Vogel, S., Ney, H., Zubiaga, A., Sawaf, H.: Accelerated DP based Search for Statistical Translation. In: Proceedings of European Conference on Speech Communication and Technology (1997)

    Google Scholar 

  25. Melamed, I.D., Green, R., Turian, J.P.: Precision and Recall of Machine Translation. In: Proceedings of the Joint Conference on Human Language Technology and the North American Chapter of the Association for Computational Linguistics (HLT-NAACL) (2003)

    Google Scholar 

  26. Lin, C.Y., Och, F.J.: Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statics. In: Proceedings of the 42nd ACL (2004)

    Google Scholar 

  27. Banerjee, S., Lavie, A.: METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In: Proceedings of ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization (2005)

    Google Scholar 

  28. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A Study of Translation Edit Rate with Targeted Human Annotation. In: Proceedings of AMTA, pp. 223–231 (2006)

    Google Scholar 

  29. Giménez, J., Màrquez, L.: Discriminative Phrase Selection for Statistical Machine Translation. In: Learning Machine Translation. NIPS Workshop. MIT Press, Cambridge (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

García, M., Giménez, J., Màrquez, L. (2009). Enriching Statistical Translation Models Using a Domain-Independent Multilingual Lexical Knowledge Base. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00382-0_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00381-3

  • Online ISBN: 978-3-642-00382-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics