Abstract
We consider a method of constructing a statistical tagger for automated morphological tagging for Russian language texts. In this method, each word is assigned with a tag that contains information about the part of speech and a full set of the word’s morphological characteristics. We employ the set of morphological characteristics used in the SynTagRus corpus whose material has been used to train the tagger. The tagger is based on the SVM (Support Vector Machine) approach. The developed tagger has proven to be efficient and has shown high tagging quality.
Similar content being viewed by others
References
Apresyan, Yu.D., Boguslavskii, I.M., Iomdin, L.L., et al., Lingvisticheskoe obespechenie sistemy ETAP-2 (Linguistic Software for the STAGE-2 System), Moscow: Nauka, 1989.
Gimenez, J. and Marquez, L., SVMTool: A General POS Tagger Generator Based on Support Vector Machines, Proc. 4 Int. Conf. Language Resourc. Evaluat. (LREC’04), Lisbon, Portugal, 2004, pp. 43–46.
Joachims, T., Making Large-Scale SVM Learning Practical, in Advances in Kernel Methods—Support Vector Learning, Schölkopf, B., Burges, C., and Smola, A., Eds., Cambridge: MIT Press, 1999, pp. 169–184.
Kazennikov, A.O., Using Finite Automata for Morphological Analysis and Synthesis Based on the Dictionaries of the STAGE System, Sb. tr. konf. molodykh uchenykh i spetsialistov ITIS (Proc. Conf. Young Scientists and Specialists of ITIS), 2008, pp. 201–205.
Chang, C.-C. and Lin, C.-J., LIBSVM: A Library for Support Vector Machines, ACM Trans. Intelligent Syst. Technol., 2011, vol. 2, no. 27, pp. 1–27.
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., et al., LIBLINEAR: A Library for Large Linear Classification, J. Machine Learning Res., 2008, vol. 9, pp. 1871–1874.
Shi, Q., Petterson, J., Dror, G., et al., Hash Kernels for Structured Data, J. Machine Learning, 2009, vol. 10, pp. 2615–2637.
Author information
Authors and Affiliations
Additional information
Original Russian Text © V.V. Petrochenkov, A.O. Kazennikov, 2013, published in Avtomatika i Telemekhanika, 2013, No. 10, pp. 154–165.
Rights and permissions
About this article
Cite this article
Petrochenkov, V.V., Kazennikov, A.O. A statistical tagger for morphological tagging of Russian language texts. Autom Remote Control 74, 1724–1732 (2013). https://doi.org/10.1134/S0005117913100123
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0005117913100123