A Machine-Learning Framework for Hybrid Machine Translation

Federmann, Christian

doi:10.1007/978-3-642-33347-7_4

Christian Federmann²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7526))

Included in the following conference series:

Annual Conference on Artificial Intelligence

1258 Accesses

Abstract

We present a Machine-Learning-based framework for hybrid Machine Translation. Our approach combines translation output from several black-box source systems. We define an extensible, total order on translation output and use this to decompose the n-best translations into pairwise system comparisons. Using joint, binarised feature vectors we train an SVM-based classifier and show how its classification output can be used to generate hybrid translations on the sentence level. Evaluations using automated metrics shows promising results. An interesting finding in our experiments is the fact that our approach allows to leverage good translations from otherwise bad systems as the combination decision is taken on the sentence instead of the corpus level. We conclude by summarising our findings and by giving an outlook to future work, e.g., on probabilistic classification or the integration of manual judgements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Avramidis, E.: DFKI System Combination with Sentence Ranking at ML4HMT-2011. In: Proceedings of the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) and of the Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation (ML4HMT). META-NET, Barcelona (2011)
Google Scholar
Barrault, L.: Many: Open source machine translation system combination. Prague Bulletin of Mathematical Linguistics, Special Issue on Open Source Tools for Machine Translation 1(93), 145–155 (2010), http://www-lium.univ-lemans.fr/sites/default/files/Barrault-MANY2010.pdf
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen, Y., Eisele, A., Federmann, C., Hasler, E., Jellinghaus, M., Theison, S.: Multi-engine machine translation with an open-source SMT decoder. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 193–196. Association for Computational Linguistics, Prague (2007), http://www.aclweb.org/anthology/W/W07/W07-0726
Chapter Google Scholar
Denkowski, M., Lavie, A.: Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 85–91. Association for Computational Linguistics, Edinburgh (2011), http://www.aclweb.org/anthology-new/W/W11/W11-2107
Google Scholar
Doddington, G.: Automatic Evaluation of Machine Translation Quality Using n-gram Co-occurrence Statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, HLT 2002, pp. 138–145. Morgan Kaufmann Publishers Inc., San Francisco (2002), http://www.itl.nist.gov/iad/mig/tests/mt/doc/ngram-study.pdf
Chapter Google Scholar
Eisele, A., Federmann, C., Saint-Amand, H., Jellinghaus, M., Herrmann, T., Chen, Y.: Using Moses to integrate multiple rule-based machine translation engines into a hybrid system. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 179–182. Association for Computational Linguistics, Columbus (2008), http://www.aclweb.org/anthology/W/W08/W08-0328
Chapter Google Scholar
Frederking, R., Nirenburg, S.: Three Heads are Better Than One. In: Proceedings of the Fourth Conference on Applied Natural Language Processing, ANLC 1994, pp. 95–100. Association for Computational Linguistics, Stroudsburg (1994), http://ww2.cs.mu.oz.au/acl/A/A94/A94-1016.pdf
Chapter Google Scholar
Gamon, M., Aue, A., Smets, M.: Sentence-level MT Evaluation Without Reference Translations: Beyond Language Modeling. In: Proceedings of the 10th EAMT Conference ”Practical Applications of Machine Translation”, pp. 103–111. European Association for Machine Translation (May 2005), http://research.microsoft.com/research/pubs/view.aspx?pubid=1426
Green, S., Manning, C.D.: Better Arabic Parsing: Baselines, Evaluations, and Analysis. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 394–402. Association for Computational Linguistics, Stroudsburg (2010), http://dl.acm.org/citation.cfm?id=1873826
Google Scholar
He, Y., Ma, Y., van Genabith, J., Way, A.: Bridging SMT and TM with Translation Recommendation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 622–630. Association for Computational Linguistics, Stroudsburg (2010), http://aclweb.org/anthology-new/P/P10/P10-1064.pdf
Google Scholar
He, Y., Ma, Y., Way, A., van Genabith, J.: Integrating N-best SMT Outputs into a TM System. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 374–382. Association for Computational Linguistics, Stroudsburg (2010), http://doras.dcu.ie/15799/1/Integrating_N-best_SMT_Outputs_into_a_TM_System.pdf
Klein, D., Manning, C.: Accurate Unlexicalized Parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, ACL 2003, vol. 1, pp. 423–430. Association for Computational Linguistics, Stroudsburg (2003), http://acl.ldc.upenn.edu/P/P03/P03-1054.pdf
Levy, R., Manning, C.: Is it harder to parse Chinese, or the Chinese Treebank? In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, ACL 2003, pp. 439–446. Association for Computational Linguistics, Stroudsburg (2003), http://www.aclweb.org/anthology/P03-1056
Macherey, W., Och, F.J.: An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 986–995. Association for Computational Linguistics, Prague (2007), http://www.aclweb.org/anthology/D/D07/D07-1105
Google Scholar
Matusov, E., Ueffing, N., Ney, H.: Computing Consensus Translation from Multiple Machine Translation Systems Using Enhanced Hypotheses Alignment. In: Conference of the European Chapter of the Association for Computational Linguistics, pp. 33–40. Association for Computational Linguistics, Stroudsburg (2006), http://acl.ldc.upenn.edu/E/E06/E06-1005.pdf
Google Scholar
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003), http://acl.ldc.upenn.edu/J/J03/J03-1002.pdf
Article MATH Google Scholar
Okita, T., van Genabith, J.: DCU Confusion Network-based System Combination for ML4HMT. In: Proceedings of the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) and of the Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation (ML4HMT). META-NET, Barcelona (2011)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: A Method for Automatic Evaluation of Machine Translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL 2002, pp. 311–318. Association for Computational Linguistics, Stroudsburg (2002), http://acl.ldc.upenn.edu/P/P02/P02-1040.pdf
Google Scholar
Rosti, A.V., Ayan, N.F., Xiang, B., Matsoukas, S., Schwartz, R., Dorr, B.: Combining Outputs from Multiple Machine Translation Systems. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp. 228–235. Association for Computational Linguistics, Rochester (2007), http://www.aclweb.org/anthology/N/N07/N07-1029
Google Scholar
Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: Proceedings of the International Conference on Spoken Language Processing, pp. 257–286 (November 2002)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc., New York (1995)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Language Technology Lab, German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, D-66123, Saarbrücken, Germany
Christian Federmann

Authors

Christian Federmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Artificial Intelligence, University of Ulm, 89069, Ulm, Germany
Birte Glimm
Saarland University and German Research Center for Artificial Intelligence (DFKI), Stuhlsatzenhausweg 3, 66123, Saarbrücken, Germany
Antonio Krüger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Federmann, C. (2012). A Machine-Learning Framework for Hybrid Machine Translation. In: Glimm, B., Krüger, A. (eds) KI 2012: Advances in Artificial Intelligence. KI 2012. Lecture Notes in Computer Science(), vol 7526. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33347-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-33347-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33346-0
Online ISBN: 978-3-642-33347-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics