Advertisement

A Comparison of Lithuanian Morphological Analyzers

  • Jurgita Kapočiūtė-DzikienėEmail author
  • Erika Rimkutė
  • Loic Boizou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10415)

Abstract

In this paper we present the comparative research work disclosing strengths and weaknesses of two the most popular and publicly available Lithuanian morphological analyzers, in particular, Lemuoklis and Semantika.lt. Their lemmatization, part-of-speech tagging, and fined-grained annotation of the morphological categories (as case, gender, tense, etc.) performance was evaluated on the morphologically annotated gold standard corpus composed of four domains, in particular, administrative, fiction, scientific and periodical texts. Semantika.lt significantly outperformed Lemuoklis by \(\sim \)1.7%, \(\sim \)2.5%, and \(\sim \)8.1% on the lemmatization, part-of-speech tagging, and fine-grained annotation tasks achieving \(\sim \)98.0%, \(\sim \)95.3% and, \(\sim \)86.8% of the accuracy, respectively.

Semantika.lt was also superior on the administrative, fiction, and periodical texts; however, Lemuoklis yielded similar performance on the scientific texts and even bypassed Semantika.lt in the fine-grained annotation task.

Keywords

Lithuanian morphological analysers Gold-standard corpus Experimental evaluation The Lithuanian language 

Notes

Acknowledgments

The authors thank the researchers from LLC Fotonija, especially Virginijus Dadurkevičius, for providing information about the Semantika.lt morphological analyzer.

References

  1. 1.
    Agarwal, A., Pramila, Singh, S.P., Kumar, A., Darbari, H.: Morphological analyser for Hindi - a rule based implementation. Int. J. Adv. Comput. Res. 4(1), 19–25 (2014)Google Scholar
  2. 2.
    Akilan, R., Naganathan, E.R.: Morphological analyzer for classical Tamil texts: a rule-based approach. IJISET - Int. J. Innovative Sci. Eng. Technol. 1(5), 563–568 (2014)Google Scholar
  3. 3.
    Baisa, V., Suchomel, V.: Large corpora for Turkic languages and unsupervised morphological analysis. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation (LREC) (2012)Google Scholar
  4. 4.
    Bickel, B., Comrie, B., Haspelmath, M.: Leipzig Glossing Rules: Conventions for Interlinear Morpheme-by-Morpheme Glosses (2008)Google Scholar
  5. 5.
    Bögel, T., Butt, M., Hautli, A., Sulger, S.: Developing a finite-state morphological analyzer for Urdu and Hindi. In: The 6th International Workshop on Finite-State Methods and Natural Language Processing (FSMNLP 2007), pp. 86–96 (2007)Google Scholar
  6. 6.
    den Bosch, A.V., Daelemans, W.: Memory-based morphological analysis. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL 1999), pp. 285–292 (1999)Google Scholar
  7. 7.
    Byrd, R.J., Tzoukermann, E.: Adapting an English morphological analyzer for French. In: Proceedings of the 26th Annual Meeting on Association for Computational Linguistics (ACL 1988), pp. 1–6 (1988)Google Scholar
  8. 8.
    Daudaravičius, V., Rimkutė, E., Utka, A.: Morphological annotation of the Lithuanian corpus. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies (ACL 2007), pp. 94–99 (2007)Google Scholar
  9. 9.
    Gelbukh, A., Sidorov, G.: Approach to construction of automatic morphological analysis systems for inflective languages with little effort. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 215–220. Springer, Heidelberg (2003). doi: 10.1007/3-540-36456-0_21 CrossRefGoogle Scholar
  10. 10.
    Jȩrzejowicz, P., Strychowski, J.: A neural network based morphological analyser of the natural language. In: Proceedings of the International Conference on Intelligent Information Processing and Web Mining (IIPWM 2005), pp. 199–208 (2005)Google Scholar
  11. 11.
    Karp, D., Schabes, Y., Zaidel, M., Egedi, D.: A freely available wide coverage morphological analyzer for English. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 3, pp. 950–955 (1992)Google Scholar
  12. 12.
    Kessikbayeva, G., Cicekli, I.: A rule based morphological analyzer and a morphological disambiguator for Kazakh language. Linguist. Lit. Stud. 4(1), 96–104 (2016)CrossRefGoogle Scholar
  13. 13.
    Khoufi, N., Boudokhane, M.: Statistical-based system for morphological annotation of Arabic texts. In: Recent Advances in Natural Language Processing (RANLP 2013), pp. 100–106 (2013)Google Scholar
  14. 14.
    Koskenniemi, K.: Two-level model for morphological analysis. In: Proceedings of the International Joint Conferences on Artificial Intelligence Organization (IJCAI 1983), pp. 683–685 (1983)Google Scholar
  15. 15.
    Malladi, D.K., Mannem, P.: Statistical morphological analyzer for Hindi. In: International Joint Conference on Natural Language Processing (IJCNLP 2013), pp. 1007–1011 (2013)Google Scholar
  16. 16.
    McNemar, Q.M.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947)CrossRefGoogle Scholar
  17. 17.
    Pauw, G.D., de Schryver, G.M.: Improving the computational morphological analysis of a Swahili corpus for lexicographic purposes. Lexikos 18, 303–318 (2008)Google Scholar
  18. 18.
    Rimkutė, E.: Morfologinio daugiareikšmiškumo ribojimas kompiuteriniame tekstyne [The Limitation of the Morphological Disambiguation in the Digitalized Corpus] (in Lithuanian). Ph.D. thesis, Vytautas Magnus University (2006)Google Scholar
  19. 19.
    Russell, G.J., Pulman, S.G., Ritchie, G.D., Black, A.W.: A dictionary and morphological analyser for English. In: Proceedings of the 11th Conference on Computational Linguistics (COLING 1986), pp. 277–279 (1986)Google Scholar
  20. 20.
    Savickienė, I., Kempe, V., Brooks, P.J.: Acquisition of gender agreement in Lithuanian: exploring the effect of diminutive usage in an elicited production task. J. Child Lang. 36, 477–494 (2009)CrossRefGoogle Scholar
  21. 21.
    Žilinskienė, V.: Lietuviŭ kalbos dažninis žodynas [The Frequency Dictionary of the Lithuanian Language] (1990). (in Lithuanian)Google Scholar
  22. 22.
    Zinkevičius, V.: Lemuoklis - morfologinei analizei [Morphological analysis with Lemuoklis]. In: Gudaitis, L. (ed.) Darbai ir Dienos, vol. 24, pp. 246–273 (2000) (in Lithuanian)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Jurgita Kapočiūtė-Dzikienė
    • 1
    Email author
  • Erika Rimkutė
    • 2
  • Loic Boizou
    • 2
  1. 1.Department of Applied InformaticsVytautas Magnus UniversityKaunasLithuania
  2. 2.Centre of Computational LinguisticsVytautas Magnus UniversityKaunasLithuania

Personalised recommendations