Abstract
We present a Machine-Learning-based framework for hybrid Machine Translation. Our approach combines translation output from several black-box source systems. We define an extensible, total order on translation output and use this to decompose the n-best translations into pairwise system comparisons. Using joint, binarised feature vectors we train an SVM-based classifier and show how its classification output can be used to generate hybrid translations on the sentence level. Evaluations using automated metrics shows promising results. An interesting finding in our experiments is the fact that our approach allows to leverage good translations from otherwise bad systems as the combination decision is taken on the sentence instead of the corpus level. We conclude by summarising our findings and by giving an outlook to future work, e.g., on probabilistic classification or the integration of manual judgements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Avramidis, E.: DFKI System Combination with Sentence Ranking at ML4HMT-2011. In: Proceedings of the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) and of the Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation (ML4HMT). META-NET, Barcelona (2011)
Barrault, L.: Many: Open source machine translation system combination. Prague Bulletin of Mathematical Linguistics, Special Issue on Open Source Tools for Machine Translation 1(93), 145–155 (2010), http://www-lium.univ-lemans.fr/sites/default/files/Barrault-MANY2010.pdf
Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen, Y., Eisele, A., Federmann, C., Hasler, E., Jellinghaus, M., Theison, S.: Multi-engine machine translation with an open-source SMT decoder. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 193–196. Association for Computational Linguistics, Prague (2007), http://www.aclweb.org/anthology/W/W07/W07-0726
Denkowski, M., Lavie, A.: Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 85–91. Association for Computational Linguistics, Edinburgh (2011), http://www.aclweb.org/anthology-new/W/W11/W11-2107
Doddington, G.: Automatic Evaluation of Machine Translation Quality Using n-gram Co-occurrence Statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, HLT 2002, pp. 138–145. Morgan Kaufmann Publishers Inc., San Francisco (2002), http://www.itl.nist.gov/iad/mig/tests/mt/doc/ngram-study.pdf
Eisele, A., Federmann, C., Saint-Amand, H., Jellinghaus, M., Herrmann, T., Chen, Y.: Using Moses to integrate multiple rule-based machine translation engines into a hybrid system. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 179–182. Association for Computational Linguistics, Columbus (2008), http://www.aclweb.org/anthology/W/W08/W08-0328
Frederking, R., Nirenburg, S.: Three Heads are Better Than One. In: Proceedings of the Fourth Conference on Applied Natural Language Processing, ANLC 1994, pp. 95–100. Association for Computational Linguistics, Stroudsburg (1994), http://ww2.cs.mu.oz.au/acl/A/A94/A94-1016.pdf
Gamon, M., Aue, A., Smets, M.: Sentence-level MT Evaluation Without Reference Translations: Beyond Language Modeling. In: Proceedings of the 10th EAMT Conference ”Practical Applications of Machine Translation”, pp. 103–111. European Association for Machine Translation (May 2005), http://research.microsoft.com/research/pubs/view.aspx?pubid=1426
Green, S., Manning, C.D.: Better Arabic Parsing: Baselines, Evaluations, and Analysis. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 394–402. Association for Computational Linguistics, Stroudsburg (2010), http://dl.acm.org/citation.cfm?id=1873826
He, Y., Ma, Y., van Genabith, J., Way, A.: Bridging SMT and TM with Translation Recommendation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 622–630. Association for Computational Linguistics, Stroudsburg (2010), http://aclweb.org/anthology-new/P/P10/P10-1064.pdf
He, Y., Ma, Y., Way, A., van Genabith, J.: Integrating N-best SMT Outputs into a TM System. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 374–382. Association for Computational Linguistics, Stroudsburg (2010), http://doras.dcu.ie/15799/1/Integrating_N-best_SMT_Outputs_into_a_TM_System.pdf
Klein, D., Manning, C.: Accurate Unlexicalized Parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, ACL 2003, vol. 1, pp. 423–430. Association for Computational Linguistics, Stroudsburg (2003), http://acl.ldc.upenn.edu/P/P03/P03-1054.pdf
Levy, R., Manning, C.: Is it harder to parse Chinese, or the Chinese Treebank? In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, ACL 2003, pp. 439–446. Association for Computational Linguistics, Stroudsburg (2003), http://www.aclweb.org/anthology/P03-1056
Macherey, W., Och, F.J.: An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 986–995. Association for Computational Linguistics, Prague (2007), http://www.aclweb.org/anthology/D/D07/D07-1105
Matusov, E., Ueffing, N., Ney, H.: Computing Consensus Translation from Multiple Machine Translation Systems Using Enhanced Hypotheses Alignment. In: Conference of the European Chapter of the Association for Computational Linguistics, pp. 33–40. Association for Computational Linguistics, Stroudsburg (2006), http://acl.ldc.upenn.edu/E/E06/E06-1005.pdf
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003), http://acl.ldc.upenn.edu/J/J03/J03-1002.pdf
Okita, T., van Genabith, J.: DCU Confusion Network-based System Combination for ML4HMT. In: Proceedings of the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) and of the Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation (ML4HMT). META-NET, Barcelona (2011)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: A Method for Automatic Evaluation of Machine Translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL 2002, pp. 311–318. Association for Computational Linguistics, Stroudsburg (2002), http://acl.ldc.upenn.edu/P/P02/P02-1040.pdf
Rosti, A.V., Ayan, N.F., Xiang, B., Matsoukas, S., Schwartz, R., Dorr, B.: Combining Outputs from Multiple Machine Translation Systems. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp. 228–235. Association for Computational Linguistics, Rochester (2007), http://www.aclweb.org/anthology/N/N07/N07-1029
Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: Proceedings of the International Conference on Spoken Language Processing, pp. 257–286 (November 2002)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc., New York (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Federmann, C. (2012). A Machine-Learning Framework for Hybrid Machine Translation. In: Glimm, B., Krüger, A. (eds) KI 2012: Advances in Artificial Intelligence. KI 2012. Lecture Notes in Computer Science(), vol 7526. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33347-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-33347-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33346-0
Online ISBN: 978-3-642-33347-7
eBook Packages: Computer ScienceComputer Science (R0)