Skip to main content

Quality Estimation for English-Hungarian Machine Translation Systems with Optimized Semantic Features

  • Conference paper
  • First Online:
  • 1117 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9624))

Abstract

Quality estimation at run-time for machine translation systems is an important task. The standard automatic evaluation methods that use reference translations cannot evaluate MT results in real-time and the correlation between the results of these methods and that of human evaluation is very low in the case of translations from English to Hungarian. The new method to solve this problem is called quality estimation, which addresses the task by estimating the quality of translations as a prediction task for which features are extracted from the source and translated sentences only. In this study, we implement quality estimation for English-Hungarian. First, a corpus is created, which contains Hungarian human judgements. Using these human evaluation scores, different quality estimation models are described, evaluated and optimized. We created a corpus for English-Hungarian quality estimation and we developed 27 new semantic features using WordNet and word embedding models, then we created feature sets optimized for Hungarian, which produced better results than the baseline feature set.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.statmt.org/wmt15/quality-estimation-task.html.

  2. 2.

    http://nlpg.itk.ppke.hu/node/65.

  3. 3.

    https://github.com/danielfrg/word2vec.

References

  1. Specia, L., Shah, K., de Souza, J.G., Cohn, T.: QuEst - a translation quality estimation framework. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Sofia, Bulgaria, pp. 79–84 (2013)

    Google Scholar 

  2. Biçici, E.: Feature decay algorithms for fast deployment of accurate statistical machine translation systems. In: Proceedings of the Eighth Workshop on Statistical Machine Translation, Sofia, Bulgaria (2013)

    Google Scholar 

  3. Camargo de Souza, J.G., Buck, C., Turchi, M., Negri, M.: FBK-UEdin participation to the WMT13 quality estimation shared task. In: Proceedings of the Eighth Workshop on Statistical Machine Translation, Sofia, Bulgaria, pp. 352–358 (2013)

    Google Scholar 

  4. Beck, D., Shah, K., Cohn, T., Specia, L.: SHEF-Lite: when less is more for translation quality estimation. In: Proceedings of the Workshop on Machine Translation (WMT) (2013)

    Google Scholar 

  5. Halácsy, P., Kornai, A., Németh, L., Sas, B., Varga, D., Váradi, T., Vonyó, A.: A Hunglish korpusz és szótár. In: III. Magyar Számítógépes Nyelvészeti Konferencia, Szegedi Egyetem (2005)

    Google Scholar 

  6. Novák, A., Tihanyi, L., Prószéky, G.: The MetaMorpho translation system. In: Proceedings of the Third Workshop on Statistical Machine Translation. StatMT 2008, Stroudsburg, PA, USA, pp. 111–114 (2008)

    Google Scholar 

  7. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL, pp. 177–180 (2007)

    Google Scholar 

  8. Orosz, G., Novák, A.: PurePos 2.0: a hybrid tool for morphological disambiguation. In: RANLP 2013, pp. 539–545 (2013)

    Google Scholar 

  9. Prószéky, G.: Industrial applications of unification morphology. In: Proceedings of the Fourth Conference on ANLP, Stuttgart, Germany, pp. 213–214 (1994)

    Google Scholar 

  10. Recski, G., Varga, D.: A Hungarian NP Chunker. The Odd Yearbook. ELTE SEAS Undergraduate Papers Linguistics, pp. 87–93 (2009)

    Google Scholar 

  11. Csendes, D., Csirik, J., Gyimóthy, T., Kocsor, A.: The Szeged treebank. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 123–131. Springer, Heidelberg (2005). https://doi.org/10.1007/11551874_16

    Chapter  Google Scholar 

  12. Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books (1998)

    Google Scholar 

  13. Miháltz, M., Hatvani, C., Kuti, J., Szarvas, G., Csirik, J., Prószéky, G., Váradi, T.: Methods and results of the hungarian wordnet project. In: Proceedings of the Fourth Global WordNet Conference GWC 2008, pp. 310–320 (2008)

    Google Scholar 

  14. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held 5–8 December 2013, Lake Tahoe, Nevada, United States, pp. 3111–3119 (2013)

    Google Scholar 

  15. Siklósi, B., Novák, A.: Beágyazási modellek alkalmazása lexikai kategorizációs feladatokra. XII. Magyar Számítógépes Nyelvészeti Konferencia, pp. 3–14 (2016)

    Google Scholar 

  16. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zijian Győző Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, Z.G., Laki, L.J., Siklósi, B. (2018). Quality Estimation for English-Hungarian Machine Translation Systems with Optimized Semantic Features. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75487-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75486-4

  • Online ISBN: 978-3-319-75487-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics