Abstract
This paper describes the study of word-based statistical machine translation to language pair Slovenian - English. The problem when dealing with Slovenian language is data sparsity and consequently, error-full translations. The aim of the work is to define the approach to reduce the inflectional morphology of the Slovenian language for translation into less inflected language. The reduction is performed by a Differential Evolution algorithm, which belongs to Evolutionary Algorithms, and is widely used for global optimization problems. The experiments were carried out using a freely-available parallel English-Slovenian SVEZ-IJS corpus, which is lemmatised and annotated with morpho-syntactic description (MSD) tags. A set of baseline experiments is described and compared with experiments done on reduced MSD tags. The paper reports an improvement in translation results when compared to using words, lemmas and fully morpho-syntactically annotated words.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Brest, J., Mauèec, M.S.: Population Size Reduction for the Differential Evolution Algorithm. Applied Intelligence 29(3), 228–247 (2008)
Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computa Linguistics 19(2), 263–311 (1993)
Erjavec, T. (ed.): Specifications and Notation for MULTEXT-East Laxicon Encoding. Tech. rep., Institute Jožef Stefan, Ljubljana (2001)
Erjavec, T.: The English-Slovene ACQUIS corpus. In: Proceedings of LREC (2006)
Feoktistov, V.: Differential Evolution. In: Search of Solutions. Springer, New York (2006)
Goldwater, S., McClosky, D.: Improving Statistical MT through Morphological Analysis. In: Proceedings of the Conference on EMNLP, Vancouver, Canada (2005)
Čerjek, M., Cuřin, J., Havelka, J.: Czech-English dependency-based machine translation. In: Proceedings of the European Chapter of the ACL, vol. 29 (2003)
Fishel, M., Kaalep, H.-L., Muischnek, K.: Estonian-English Statistical Machine translation: the First Results. In: Proceedings of the NODALIDA, pp. 278–283 (2007)
Popović, M., de Gispert, A., Deepa Gupta, A., Lambert, P., Ney, H., Marino, J.B., Federico, M., Banchs, R.: Morpho-syntactic information for automatic error analysis of statistical machine translation output. In: Proceedings of the Workshop on Statistical Machine Translation, HLT-NAACL, New York, NY, USA, pp. 1–6 (2006)
Mauèec, M.S., Brest, J., Kaèiè, Z.: Statistical machine translation from Slovenian to English. CIT. J. Comput. Inf. Technol. 15(1), 47–59 (2007)
Mauèec, M.S., Brest, J., Kaèiè, Z.: Statistical Alignment Models in Machine Translation from Slovenian to English. Electrotechnical Review 73(5) (2006)
Mauèec, M.S., Brest, J., Rotovnik, T., Kaèiè, Z.: Using Data-Driven Sub-Word Units In Language Model Of Highly Inflective Slovenian Language. International Journal of Pattern Recognition and Artificial Intelligence (accepted)
Niessen, S., Ney, H.: Improving SMT Quality with Morpho-Syntactic Analysis. In: Proceedings of the 20th International Conference on Computational Linguistics, Saarbrucken, German (2000)
Pérez, A., Torres, I., Casacuberta, F.: Towards the improvement of statistical translation models using linguistic features. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 716–725. Springer, Heidelberg (2006)
Och, F.J., Ney, H.: Statistical Alignment Models. Computational Linguistics 29(1) (2003)
Popović, M., Ney, H.: Improving Word Alignment Quality using Morpho-syntactic Information. In: Proceedings of 20th International Conference on Computational Linguistics (CoLing), Geneva, Switzerland (2004)
Price, K.V., Storn, R.L., Lampinen, J.: Differential Evolution, A Practical Approach to Global Optimization. Springer, Heidelberg (2005)
Storn, R., Price, K.: Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. Journal of Global Optimization 11, 341–359 (1997)
Virpioja, S., Väyrynen, J.J., Creutz, M., Sadeniemi, M.: Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner. In: Proceedings of the MT Summit XI, Copenhagen, Denmark, pp. 491–498 (2007)
Vičič, J., Erjavec, T.: The beginning is always hard: training of machine translation from Slovene to English (in Slovenian lang.). In: Proceedings of the Language Technologies Conference (2002)
Vogel, S., Zhang, Y., Huang, F., Tribble, A., Venugopal, A., Zhao, B., Waibel, A.: The CMU Statistical Machine Translation System. In: Proceedings of the Machine Translation Summit IX, New Orleans, Louisiana, USA, vol. 29 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maučec, M.S., Brest, J. (2009). Statistical Machine Translation from Slovenian to English Using Reduced Morphology. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-04235-5_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04234-8
Online ISBN: 978-3-642-04235-5
eBook Packages: Computer ScienceComputer Science (R0)