Abstract
In this paper, we introduce a kernel-based approach to question classification. We employed a kernel function based on latent semantic information acquired from Wikipedia. This kernel allows including external semantic knowledge into the supervised learning process. We obtained a highly effective question classifier combining this knowledge with a bag-of-words approach by means of composite kernels. As the semantic information is acquired from unlabeled text, our system can be easily adapted to different languages and domains. We tested it on a parallel corpus of English and Spanish questions.
This research has been partially funded by the Spanish Government under project TEXT-MESS 2.0 (TIN2009-13391-C04-01) and Prometeo (PROMETEO/2009/199), and by the Italian Ministry of University and Research and by the Autonomous Province of Trento under project ITCH (RBIN045PXH).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Voorhees, E.M.: The trec-8 question answering track report. In: Eighth Text REtrieval Conference, vol. 500-246, pp. 77–82. National Institute of Standards and Technology, Gaithersburg (1999)
Li, X., Roth, D.: Learning question classifiers. In: 19th International Conference on Computational Linguistics, pp. 1–7. Association for Computational Linguistics, Morristown (2002)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Nguyen, T.T., Nguyen, L.M., Shimazu, A.: Using semi-supervised learning for question classification. Information and Media Technologies 3(1), 112–130 (2008)
Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Zaragoza, H., Atserias, J., Ciaramita, M., Attardi, G.: Semantically annotated snapshot of the english wikipedia, vol.1 (2007), http://www.yr-bcn.es/semanticWikipedia
Noreen, E.W.: Computer-Intensive Methods for Testing Hypotheses. John Wiley & Sons, New York (1989)
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: 26th Annual International ACM SIGIR Conference, pp. 26–32. ACM, New York (2003)
Hacioglu, K., Ward, W.: Question classification with support vector machines and error correcting codes. In: North American Chapter of the Association for Computational Linguistics, pp. 28–30. Association for Computational Linguistics, Morristown (2003)
Zhou, Z.H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tomás, D., Giuliano, C. (2011). Exploiting Unlabeled Data for Question Classification. In: Muñoz, R., Montoyo, A., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2011. Lecture Notes in Computer Science, vol 6716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22327-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-22327-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22326-6
Online ISBN: 978-3-642-22327-3
eBook Packages: Computer ScienceComputer Science (R0)