Exploiting Unlabeled Data for Question Classification

Tomás, David; Giuliano, Claudio

doi:10.1007/978-3-642-22327-3_13

David Tomás¹⁹ &
Claudio Giuliano²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6716))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

1808 Accesses

Abstract

In this paper, we introduce a kernel-based approach to question classification. We employed a kernel function based on latent semantic information acquired from Wikipedia. This kernel allows including external semantic knowledge into the supervised learning process. We obtained a highly effective question classifier combining this knowledge with a bag-of-words approach by means of composite kernels. As the semantic information is acquired from unlabeled text, our system can be easily adapted to different languages and domains. We tested it on a parallel corpus of English and Spanish questions.

This research has been partially funded by the Spanish Government under project TEXT-MESS 2.0 (TIN2009-13391-C04-01) and Prometeo (PROMETEO/2009/199), and by the Italian Ministry of University and Research and by the Autonomous Province of Trento under project ITCH (RBIN045PXH).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Voorhees, E.M.: The trec-8 question answering track report. In: Eighth Text REtrieval Conference, vol. 500-246, pp. 77–82. National Institute of Standards and Technology, Gaithersburg (1999)
Google Scholar
Li, X., Roth, D.: Learning question classifiers. In: 19th International Conference on Computational Linguistics, pp. 1–7. Association for Computational Linguistics, Morristown (2002)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Nguyen, T.T., Nguyen, L.M., Shimazu, A.: Using semi-supervised learning for question classification. Information and Media Technologies 3(1), 112–130 (2008)
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
MATH Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Zaragoza, H., Atserias, J., Ciaramita, M., Attardi, G.: Semantically annotated snapshot of the english wikipedia, vol.1 (2007), http://www.yr-bcn.es/semanticWikipedia
Noreen, E.W.: Computer-Intensive Methods for Testing Hypotheses. John Wiley & Sons, New York (1989)
Google Scholar
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: 26th Annual International ACM SIGIR Conference, pp. 26–32. ACM, New York (2003)
Google Scholar
Hacioglu, K., Ward, W.: Question classification with support vector machines and error correcting codes. In: North American Chapter of the Association for Computational Linguistics, pp. 28–30. Association for Computational Linguistics, Morristown (2003)
Google Scholar
Zhou, Z.H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Software and Computing Systems, University of Alicante, Spain
David Tomás
Human Language Technology Group, FBK-Irst, Italy
Claudio Giuliano

Authors

David Tomás
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Giuliano
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, University of Alicante, 03080, Alicante, Spain
Rafael Muñoz
Department of Software and Computing Systems, University of Alicante, Aptdo. de Correos 99, 03080, Alicante, Spain
Andrés Montoyo
CNAM- Laboratoire Cédric, 292 Rue St. Martin, 75141, Paris Cedex 03, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tomás, D., Giuliano, C. (2011). Exploiting Unlabeled Data for Question Classification. In: Muñoz, R., Montoyo, A., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2011. Lecture Notes in Computer Science, vol 6716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22327-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-22327-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22326-6
Online ISBN: 978-3-642-22327-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics