Abstract
Tagging Russian texts of the XIXth century has been evaluated. The causes have been determined why some words turned out to be unknown to the tagger, i.e. remained without lemmas and grammatical features. The investigation showed that the main reasons of the existence of the unknown words were as follows: 1) incompleteness of the tagger dictionary, particularly in the XIXth century lexical stock; 2) failure to tag the word-formative derivates; 3) problems with some inflexion models of Old Russian; 4) insufficiency of graphemic analysis; 5) inability of taggers to process multiwords. The results obtained provide a baseline to improve premorphological processing of Russian texts and to work out the more sophisticated approaches to morphological analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Verbitskaya, L.A., Kazanskij, N.N., Kasevich, V.B.: Nekotoryje problemy sozdanija Nacionalnogo korpusa russkogo jazyka. In: Russkoje slovo v mirovoj kulture, vol. 1, pp. 115–128. Saint-Petersburg, Russia (2003)
Zakharov, V.: Russian Corpus of the 19th Century. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 146–151. Springer, Heidelberg (2003)
Brants, T.: Tnt – Statistical Part-of-Speech Tagging (1999), http://www.coli.uni-sb.de/~thorsten/tnt/
Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, ACL, Trento, Italy (1992)
Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practical part-of-speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, Trento, Italy (1992)
Daelemans, W.J., Zavrel, P., Gillis, S.: MBT: A memory-based part of speech tagger-generator. In: Proceedings of the Fourth Workshop on Very Large Corpora. Copenhagen (1996)
Hajič, J., Hladká, B.: Czech Language Processing / POS Tagging. In: First International Conference on Language Resources and Evaluation, LREC 1998, ELRA, Granada (1998)
Sokirko, A.: Morfologiceskije moduli na sajte (2004), www.aot.ru , http://www.aot.ru/~docs/SOKIRKO/Dialog2004.htm
Zaliznjak, A.A.: Grammaticeskij slovar’ russkogo jazyka. Moscow, Russia (1977)
Segalovich, I.: A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine (2001), http://company.yandex.ru/articles/iseg-las-vegas.html
Mikheev, A.: Automatic Rule Induction for Unknown Word Guessing. Computational Linguistics 23(3), 405–423 (1997)
Anickov, I.E.: Ob opredelenii slova. In: Anickov, I.E. (ed.) Trudy po jazykoznaniju, Moscow, Russia, pp. 217–263 (1997)
Maslov Ju. S.: Vvedenie v jazykoznanie. In: Izd. 2-e. Moscow, Russia (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zakharov, V., Volkov, S. (2004). Morphological Tagging of Russian Texts of the XIXth Century. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-30120-2_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23049-6
Online ISBN: 978-3-540-30120-2
eBook Packages: Springer Book Archive