Skip to main content

Statistical Machine Translation from Slovenian to English Using Reduced Morphology

  • Conference paper
  • 646 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5603))

Abstract

This paper describes the study of word-based statistical machine translation to language pair Slovenian - English. The problem when dealing with Slovenian language is data sparsity and consequently, error-full translations. The aim of the work is to define the approach to reduce the inflectional morphology of the Slovenian language for translation into less inflected language. The reduction is performed by a Differential Evolution algorithm, which belongs to Evolutionary Algorithms, and is widely used for global optimization problems. The experiments were carried out using a freely-available parallel English-Slovenian SVEZ-IJS corpus, which is lemmatised and annotated with morpho-syntactic description (MSD) tags. A set of baseline experiments is described and compared with experiments done on reduced MSD tags. The paper reports an improvement in translation results when compared to using words, lemmas and fully morpho-syntactically annotated words.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brest, J., Mauèec, M.S.: Population Size Reduction for the Differential Evolution Algorithm. Applied Intelligence 29(3), 228–247 (2008)

    Article  Google Scholar 

  2. Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computa Linguistics 19(2), 263–311 (1993)

    Google Scholar 

  3. Erjavec, T. (ed.): Specifications and Notation for MULTEXT-East Laxicon Encoding. Tech. rep., Institute Jožef Stefan, Ljubljana (2001)

    Google Scholar 

  4. Erjavec, T.: The English-Slovene ACQUIS corpus. In: Proceedings of LREC (2006)

    Google Scholar 

  5. Feoktistov, V.: Differential Evolution. In: Search of Solutions. Springer, New York (2006)

    Google Scholar 

  6. Goldwater, S., McClosky, D.: Improving Statistical MT through Morphological Analysis. In: Proceedings of the Conference on EMNLP, Vancouver, Canada (2005)

    Google Scholar 

  7. Čerjek, M., Cuřin, J., Havelka, J.: Czech-English dependency-based machine translation. In: Proceedings of the European Chapter of the ACL, vol. 29 (2003)

    Google Scholar 

  8. Fishel, M., Kaalep, H.-L., Muischnek, K.: Estonian-English Statistical Machine translation: the First Results. In: Proceedings of the NODALIDA, pp. 278–283 (2007)

    Google Scholar 

  9. Popović, M., de Gispert, A., Deepa Gupta, A., Lambert, P., Ney, H., Marino, J.B., Federico, M., Banchs, R.: Morpho-syntactic information for automatic error analysis of statistical machine translation output. In: Proceedings of the Workshop on Statistical Machine Translation, HLT-NAACL, New York, NY, USA, pp. 1–6 (2006)

    Google Scholar 

  10. Mauèec, M.S., Brest, J., Kaèiè, Z.: Statistical machine translation from Slovenian to English. CIT. J. Comput. Inf. Technol. 15(1), 47–59 (2007)

    Article  Google Scholar 

  11. Mauèec, M.S., Brest, J., Kaèiè, Z.: Statistical Alignment Models in Machine Translation from Slovenian to English. Electrotechnical Review 73(5) (2006)

    Google Scholar 

  12. Mauèec, M.S., Brest, J., Rotovnik, T., Kaèiè, Z.: Using Data-Driven Sub-Word Units In Language Model Of Highly Inflective Slovenian Language. International Journal of Pattern Recognition and Artificial Intelligence (accepted)

    Google Scholar 

  13. Niessen, S., Ney, H.: Improving SMT Quality with Morpho-Syntactic Analysis. In: Proceedings of the 20th International Conference on Computational Linguistics, Saarbrucken, German (2000)

    Google Scholar 

  14. Pérez, A., Torres, I., Casacuberta, F.: Towards the improvement of statistical translation models using linguistic features. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 716–725. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Och, F.J., Ney, H.: Statistical Alignment Models. Computational Linguistics 29(1) (2003)

    Google Scholar 

  16. Popović, M., Ney, H.: Improving Word Alignment Quality using Morpho-syntactic Information. In: Proceedings of 20th International Conference on Computational Linguistics (CoLing), Geneva, Switzerland (2004)

    Google Scholar 

  17. Price, K.V., Storn, R.L., Lampinen, J.: Differential Evolution, A Practical Approach to Global Optimization. Springer, Heidelberg (2005)

    MATH  Google Scholar 

  18. Storn, R., Price, K.: Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. Journal of Global Optimization 11, 341–359 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  19. Virpioja, S., Väyrynen, J.J., Creutz, M., Sadeniemi, M.: Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner. In: Proceedings of the MT Summit XI, Copenhagen, Denmark, pp. 491–498 (2007)

    Google Scholar 

  20. Vičič, J., Erjavec, T.: The beginning is always hard: training of machine translation from Slovene to English (in Slovenian lang.). In: Proceedings of the Language Technologies Conference (2002)

    Google Scholar 

  21. Vogel, S., Zhang, Y., Huang, F., Tribble, A., Venugopal, A., Zhao, B., Waibel, A.: The CMU Statistical Machine Translation System. In: Proceedings of the Machine Translation Summit IX, New Orleans, Louisiana, USA, vol. 29 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maučec, M.S., Brest, J. (2009). Statistical Machine Translation from Slovenian to English Using Reduced Morphology. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04235-5_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04234-8

  • Online ISBN: 978-3-642-04235-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics