Abstract
This paper deals with automatic classification of questions in the Russian language, a natural early step in building a question answering system. We developed a typology of Russian questions using interrogative particles, pronouns and word order as the main features. A corpus of 2008 questions was manually compiled and annotated according to our typology. We used a fine-grained class set and a coarse-grained one (23 and 14 classes, respectively). The training data, represented as character bi-/trigrams and word uni-/bi-/trigrams, was used to approach the task of question classification. We tested several widely used machine-learning methods (logistic regression, support vector machines, naïve Bayes) against a regular expression baseline on a held-out test corpus annotated by an external expert. The best results were achieved by a SVM classifier (linear kernel) that achieved the accuracy of 65.3% (fine-grained) and 68.7% (coarse-grained), while the baseline regular expression model showed 52.7% accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bunescu, R., Huang, Y.: Towards a general model of answer typing: question focus identification. In: Proceedings of the 11th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2010), RCS Volume, pp. 231–242 (2010)
Burger, J., Cardie C., Chaudhri V., Gaizauskas R., Harabagiu S., Israel D., Jacquemin, C., Lin, C.Y., Maiorano, S., Miller, G., Moldovan, D.: Issues, tasks and program structures to roadmap research in question & answering (Q&A). In: Document Understanding Conferences Roadmapping Documents, pp. 1–35 (2001)
Damljanovic, D., Agatonovic, M., Cunningham, H.: Identification of the Question focus: combining syntactic analysis and ontology-based lookup through the user interaction. In: LREC (2010)
Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A.A., Lally, A., Murdock, J.W., Nyberg, E., Prager, J., Schlaefer, N.: Building Watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)
Galea, A.: Open-domain surface-based question answering system. In: Proceedings of CSAW, vol. 3 (2003)
Gobeill, J., Pasche, E., Teodoro, D., Veuthey, A.L., Ruch, P.: Answering gene ontology terms to proteomics questions by supervised macro reading in Medline. EMBnet Journal 18(B), 29–31 (2012)
Ittycheriah, A.: A statistical approach for open domain question answering. In: Strzalkowski, T., Harabagiu, S.M. (eds.) Advances in Open Domain Question Answering, vol. 32, pp. 35–69. Springer, Dordrecht (2008). https://doi.org/10.1007/978-1-4020-4746-6_2
Katz, B., Borchardt, G.C., Felshin, S.: Natural language annotations for question answering. In: FLAIRS Conference, pp. 303–306 (2006)
Klinkenberg, R. (ed.): RapidMiner: Data Mining Use Cases and Business Analytics Applications. Chapman and Hall/CRC, Boca Raton (2013)
Lauer, T.W., Peacock, E., Graesser, A.C.: Questions and Information Systems. Psychology Press, Routledge (2013)
Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, Association for Computational Linguistics, vol. 1, pp. 1–7 (2002)
Loni, B.: A survey of state-of-the-art methods on question classification. Literature survey, Published on TU Delft Repository (2011)
Monz, C.: Document retrieval in the context of question answering. In: Sebastiani, F. (ed.) Advances in Information Retrieval, ECIR 2003. LNCS, vol. 2633, pp. 571–579. Springer, Heidelberg (2003a). https://doi.org/10.1007/3-540-36618-0_44
Monz, C.: From Document Retrieval to Question Answering. Institute for Logic, Language and Computation (2003b)
Mozgovoy, M.V.: A simple question-answering system based on a semantic analyzer for the Russian language [Prostaya voprosno-otvetnaya sistema na osnove semanticheskogo analizatora russkogo yazyka], Vestnik of the St. Petersburg University. Series 10. Applied mathematics. Informatics. Management processes [Vestnik SPbGU. Seriya 10. Prikladnaya matematika. Informatika. Protsessy upravleniya], no. 1, pp. 116–122 (2006)
Nevolnikova, S.V.: Functional and semantic types of Russian interrogative sentences and their role in text formation [Funktsional’no-semanticheskie raznovidnosti russkikh voprositel’nykh predlozheniy i ikh rol’ v tekstoobrazovanii]. Rostov-on-Don (2004)
Pereira, F., Mitchell, T., Botvinick, M.: Machine learning classifiers and fMRI: a tutorial overview. Neuroimage 45(1), S199–S209 (2009)
Pinchak, C., Lin, D.A.: Probabilistic Answer Type Model. In: EACL (2006)
Sharoff, S.: Creating general-purpose corpora using automated search engine queries. In: WaCky, pp. 63–98 (2006)
Shvedova, N.Y.: Russkaja Grammatika [Russian Grammar]. AN SSSR Publ, Moscow (1980)
Silva, J., Coheur, L., Mendes, A.C., Wichert, A.: From symbolic to sub-symbolic information in question classification. Artif. Intell. Rev. 35(2), 137–154 (2011)
Solov’ev, A.A., Peskova, O.V.: Building a question-answering system for the Russian language: question analysis module [Postroenie voprosno-otvetnoy sistemy dlya russkogo yazyka: modul’ analiza voprosov], New information technologies in automated systems [Novye informatsionnye tekhnologii v avtomatizirovannykh sistemakh], no. 13, pp. 41–49 (2010)
Sosnin, P.I.: Question-Answer Modeling in the Development of Automated Systems [Voprosno-otvetnoe modelirovanie v razrabotke avtomatizovannykh sistem]. Ul’yanovsk, USTU (2007)
Suleymanov, D.S.: A study of the basic principles of building a semantic interpreter for questions and answers in natural language in AOS [Issledovanie bazovykh printsipov postroeniya semanticheskogo interpretatora voprosno-otvetnykh tekstov na estestvennom yazyke v AOS]. Educational technologies and society [Obrazovatel’nye tekhnologii i obshchestvo], no. 3, pp. 178–192 (2001)
Tikhomirov, I.A.: Question-answering search in the intelligent search system Exactus [Voprosno-otvetnyy poisk v intellektual’noy poiskovoy sisteme Exactus]. In: Proceedings of the Fourth Russian Seminar on Evaluation of Information Retrieval Methods ROMIP [Trudy chetvertogo rossiyskogo seminara po otsenke metodov informatsionnogo poiska ROMIP], pp. 80–85 (2006)
van Zaanen, M.: Multi-lingual Question Answering using OpenEphyra. CLEF (Working Notes) (2008)
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 26–32 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Nikolaev, K., Malafeev, A. (2018). Russian-Language Question Classification: A New Typology and First Results. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2017. Lecture Notes in Computer Science(), vol 10716. Springer, Cham. https://doi.org/10.1007/978-3-319-73013-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-73013-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73012-7
Online ISBN: 978-3-319-73013-4
eBook Packages: Computer ScienceComputer Science (R0)