Advertisement

Data-Driven Morphological Analysis and Disambiguation for Kazakh

  • Olzhas MakhambetovEmail author
  • Aibek Makazhanov
  • Islam Sabyrgaliyev
  • Zhandos Yessenbayev
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9041)

Abstract

We propose a method for morphological analysis and disambiguation for Kazakh language that accounts for both inflectional and derivational morphology, including not fully productive derivation. The method is data-driven and does not require manually generated rules. We leverage so called “transition chains” that help pruning false segmentations, while keeping correct ones. At the disambiguation step we use a standard HMM-based approach. Evaluating our method against open source solutions on several data sets, we show that it achieves better or on par performance. We also provide an extensive error analysis that sheds light on common problems of the morphological disambiguation of the language.

Keywords

Computational Linguistics Transition Chain Derivational Morphology Open Source Solution Disambiguation Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Washington, J., Salimzyanov, I., Tyers, F.: Finite-state morphological transducers for three kypchak languages. In: Calzolari N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), Reykjavik (May 2014)Google Scholar
  2. 2.
    Oflazer, K., Güzey, C.: Spelling correction in agglutinative languages. In: ANLP, pp. 194–195 (1994)Google Scholar
  3. 3.
    Sak, H., Güngör, T., Saraçlar, M.: A stochastic finite-state morphological parser for turkish. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACLShort 2009, pp. 273–276. Association for Computational Linguistics, Stroudsburg (2009)Google Scholar
  4. 4.
    Koskenniemi, K.: A general computational model for word-form recognition and production. In: Proceedings of the 10th International Conference on Computational Linguistics, pp. 178–181. Association for Computational Linguistics (1984)Google Scholar
  5. 5.
    Hulden, M.: Foma: a finite-state compiler and library. In: Lascarides, A., Gardent, C., Nivre, J. (eds.) EACL (Demos), pp. 29–32. The Association for Computer Linguistics (2009)Google Scholar
  6. 6.
    Lindén, K., Axelson, E., Hardwick, S., Pirinen, T.A., Silfverberg, M.: HFST-Framework for Compiling and Applying Morphologies. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2011. CCIS, vol. 100, pp. 67–85. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Makhambetov, O., Makazhanov, A., Yessenbayev, Z., Matkarimov, B., Sabyrgaliyev, I., Sharafudinov, A.: Assembling the kazakh language corpus. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1022–1031. Association for Computational Linguistics, Seattle(2013)Google Scholar
  8. 8.
    Grzegorz Chrupała, G.D., van Genabith, J.: Learning morphology with morfette. In: Calzolari, N., Khalid Choukri, B.M.J.M.J.O.S.P.D.T. (eds.) Proceedings of the Sixth International Conference on Language Resources and Evaluation, LREC 2008. European Language Resources Association (ELRA), Marrakech (May 2008), http://www.lrec-conf.org/proceedings/lrec2008/
  9. 9.
    Hakkani-Tur, D.Z., Oflazer, K., Tur, G.: Statistical morphological disambiguation for agglutinative languages. Computers and the Humanities 36(4), 381–410 (2002)CrossRefGoogle Scholar
  10. 10.
    Hajič, J., Krbec, P., Květoň, P., Oliva, K., Petkevič, V.: Serial combination of rules and statistics: A case study in czech tagging. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 268–275. Association for Computational Linguistics, Stroudsburg (2001)Google Scholar
  11. 11.
    Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing (TSLP) 4(1), 3 (2007)Google Scholar
  12. 12.
    Kohonen, O., Virpioja, S., Leppänen, L., Lagus, K.: Semi-supervised extensions to morfessor baseline. In: Proceedings of the Morpho Challenge 2010 Workshop. Aalto University School of Science and Technology Faculty of Information and Natural Sciences Department of Information and Computer Science, Espoo, Finland (September 2010)Google Scholar
  13. 13.
    Sharipbayev, A., Bekmanova, G., Ergesh, B., Buribayeva, A., Karabalayeva, M.K.: Intellectual morphological analyzer based on semantic networks. In: Proceedings of the OSTIS 2012, pp. 397–400 (2012)Google Scholar
  14. 14.
    Kessikbayeva, G., Cicekli, I.: Rule based morphological analyzer of kazakh language. In: Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM, pp. 46–54. Association for Computational Linguistics, Baltimore (2014)Google Scholar
  15. 15.
    Altenbek, G., Xiao-long, W.: Kazakh segmentation system of inflectional affixes. In: Joint Conference on Chinese Language Processing, CIPS-SIGHAN, pp. 183–190 (2010)Google Scholar
  16. 16.
    Kairakbay, B.M., Zaurbekov, D.L.: Finite state approach to the Kazakh nominal paradigm. In: Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing, pp. 108–112. Association for Computational Linguistics, St Andrews (2013)Google Scholar
  17. 17.
    Makazhanov, A., Makhambetov, O., Sabyrgaliyev, I., Yessenbayev, Z.: Spelling correction for kazakh. In: Gelbukh, A. (ed.) Proceedings of the 2014 Computational Linguistics and Intelligent Text Processing. LNCS, vol. 8404, pp. 533–541. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  18. 18.
    Zafer, H.R., Tilki, B., Kurt, A., Kara, M.: Two-level description of kazakh morphology. In: Proceedings of the 1st International Conference on Foreign Language Teaching and Applied Linguistics, FLTAL 2011, Sarajevo (May 2011)Google Scholar
  19. 19.
    Ranta, A.: A multilingual natural-language interface to regular expressions. In: Proceedings of the International Workshop on Finite State Methods in Natural Language Processing, FSMNLP 2009, pp. 79–90. Association for Computational Linguistics, Stroudsburg (1998)Google Scholar
  20. 20.
    Makazhanov, A., Yessenbayev, Z., Sabyrgaliyev, I., Sharafudinov, A., Makhambetov, O.: On certain aspects of kazakh part-of-speech tagging. In: 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–4 (October 2014)Google Scholar
  21. 21.
    Oflazer, K., Say, B., Hakkani-Tür, D.Z., Tür, G.: Building a turkish treebank. In: Treebanks, pp. 261–277. Springer (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Olzhas Makhambetov
    • 1
    Email author
  • Aibek Makazhanov
    • 1
  • Islam Sabyrgaliyev
    • 1
  • Zhandos Yessenbayev
    • 1
  1. 1.Nazarbayev University Research and Innovation SystemAstanaKazakhstan

Personalised recommendations