Quality Estimation for English-Hungarian Machine Translation Systems with Optimized Semantic Features

Yang, Zijian Győző; Laki, László János; Siklósi, Borbála

doi:10.1007/978-3-319-75487-1_8

Quality Estimation for English-Hungarian Machine Translation Systems with Optimized Semantic Features

Zijian Győző Yang¹⁴,
László János Laki¹⁵ &
Borbála Siklósi¹⁴

Conference paper
First Online: 21 March 2018

1117 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9624))

Abstract

Quality estimation at run-time for machine translation systems is an important task. The standard automatic evaluation methods that use reference translations cannot evaluate MT results in real-time and the correlation between the results of these methods and that of human evaluation is very low in the case of translations from English to Hungarian. The new method to solve this problem is called quality estimation, which addresses the task by estimating the quality of translations as a prediction task for which features are extracted from the source and translated sentences only. In this study, we implement quality estimation for English-Hungarian. First, a corpus is created, which contains Hungarian human judgements. Using these human evaluation scores, different quality estimation models are described, evaluated and optimized. We created a corpus for English-Hungarian quality estimation and we developed 27 new semantic features using WordNet and word embedding models, then we created feature sets optimized for Hungarian, which produced better results than the baseline feature set.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Specia, L., Shah, K., de Souza, J.G., Cohn, T.: QuEst - a translation quality estimation framework. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Sofia, Bulgaria, pp. 79–84 (2013)
Google Scholar
Biçici, E.: Feature decay algorithms for fast deployment of accurate statistical machine translation systems. In: Proceedings of the Eighth Workshop on Statistical Machine Translation, Sofia, Bulgaria (2013)
Google Scholar
Camargo de Souza, J.G., Buck, C., Turchi, M., Negri, M.: FBK-UEdin participation to the WMT13 quality estimation shared task. In: Proceedings of the Eighth Workshop on Statistical Machine Translation, Sofia, Bulgaria, pp. 352–358 (2013)
Google Scholar
Beck, D., Shah, K., Cohn, T., Specia, L.: SHEF-Lite: when less is more for translation quality estimation. In: Proceedings of the Workshop on Machine Translation (WMT) (2013)
Google Scholar
Halácsy, P., Kornai, A., Németh, L., Sas, B., Varga, D., Váradi, T., Vonyó, A.: A Hunglish korpusz és szótár. In: III. Magyar Számítógépes Nyelvészeti Konferencia, Szegedi Egyetem (2005)
Google Scholar
Novák, A., Tihanyi, L., Prószéky, G.: The MetaMorpho translation system. In: Proceedings of the Third Workshop on Statistical Machine Translation. StatMT 2008, Stroudsburg, PA, USA, pp. 111–114 (2008)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL, pp. 177–180 (2007)
Google Scholar
Orosz, G., Novák, A.: PurePos 2.0: a hybrid tool for morphological disambiguation. In: RANLP 2013, pp. 539–545 (2013)
Google Scholar
Prószéky, G.: Industrial applications of unification morphology. In: Proceedings of the Fourth Conference on ANLP, Stuttgart, Germany, pp. 213–214 (1994)
Google Scholar
Recski, G., Varga, D.: A Hungarian NP Chunker. The Odd Yearbook. ELTE SEAS Undergraduate Papers Linguistics, pp. 87–93 (2009)
Google Scholar
Csendes, D., Csirik, J., Gyimóthy, T., Kocsor, A.: The Szeged treebank. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 123–131. Springer, Heidelberg (2005). https://doi.org/10.1007/11551874_16
Chapter Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books (1998)
Google Scholar
Miháltz, M., Hatvani, C., Kuti, J., Szarvas, G., Csirik, J., Prószéky, G., Váradi, T.: Methods and results of the hungarian wordnet project. In: Proceedings of the Fourth Global WordNet Conference GWC 2008, pp. 310–320 (2008)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held 5–8 December 2013, Lake Tahoe, Nevada, United States, pp. 3111–3119 (2013)
Google Scholar
Siklósi, B., Novák, A.: Beágyazási modellek alkalmazása lexikai kategorizációs feladatokra. XII. Magyar Számítógépes Nyelvészeti Konferencia, pp. 3–14 (2016)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Práter str. 50/A, Budapest, 1083, Hungary
Zijian Győző Yang & Borbála Siklósi
MTA-PPKE Hungarian Language Technology Research Group, Práter str. 50/A, Budapest, 1083, Hungary
László János Laki

Authors

Zijian Győző Yang
View author publications
You can also search for this author in PubMed Google Scholar
László János Laki
View author publications
You can also search for this author in PubMed Google Scholar
Borbála Siklósi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zijian Győző Yang .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Z.G., Laki, L.J., Siklósi, B. (2018). Quality Estimation for English-Hungarian Machine Translation Systems with Optimized Semantic Features. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-75487-1_8
Published: 21 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75486-4
Online ISBN: 978-3-319-75487-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics