Improving Subtree-Based Question Classification Classifiers with Word-Cluster Models

Nguyen, Le Minh; Shimazu, Akira

doi:10.1007/978-3-642-22327-3_8

Le Minh Nguyen¹⁹ &
Akira Shimazu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6716))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

1801 Accesses

Abstract

Question classification has been recognized as a very important step for many natural language applications (i.e question answering). Subtree mining has been indicated that [10] it is helpful for question classification problem. The authors empirically showed that subtree features obtained by subtree mining, were able to improve the performance of Question Classification for boosting and maximum entropy models. In this paper, our first goal is to investigate that whether or not subtree mining features are useful for structured support vector machines. Secondly, to make the proposed models more robust, we incorporate subtree features with word-cluster models gained from a large collection of text documents. Experimental results show that the uses of word-cluster models with subtree mining can significantly improve the performance of the proposed question classification models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berger, A., Pietra, S.D., Pietra, V.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1) (1996)
Google Scholar
Brown, P.F., Della Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class-Based n-gram Models of Natural Language. Computational Linguistics 18(4), 467–479 (1992)
Google Scholar
Charniak, E.: A Maximum-Entropy Inspired Parser. In: Proc. ACL (2001)
Google Scholar
Charniak, E., Blaheta, D., Ge, N., Hall, K., Hale, J., Johnson, M.: BLLIP 1987-1989 WSJ Corpus Release 1. Linguistic Data Consortium (2000)
Google Scholar
Carlson, A., Cumby, C., Roth, D.: The SNoW learning architecture, Technical report UIUC-DCS-R-99-2101, UIUC Computer Science Department (1999)
Google Scholar
Kadri, H., Wayne, W.: Question classification with Support vector machines and error correcting codes. In: Proceedings of NAACL-HLT 2003, pp. 28–30 (2003)
Google Scholar
Kudo, T., Maeda, E., Matsumoto, Y.: An Application of Boosting to Graph Classification. In: Proceedings NIPS (2004)
Google Scholar
Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 556–562 (2002)
Google Scholar
Liang, P., Collins, M.: Semi-supervised learning for natural language. Master thesis, MIT (2005)
Google Scholar
Nguyen, M.L., Shimazu, A., Nguyen, T.T.: Subtree mining for question classification problem. In: Proceedings IJCAI 2007, pp. 1695–1700 (2007)
Google Scholar
Morishita, S.: Computing optimal hypotheses efficiently for boosting. In: Arikawa, S., Shinohara, A. (eds.) Progress in Discovery Science. LNCS (LNAI), vol. 2281, pp. 471–481. Springer, Heidelberg (2002)
Chapter Google Scholar
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support Vector Machine Learning for Interdependent and Structured Output Spaces. In: Proceedings ICML 2004 (2004)
Google Scholar
Zhang, D., Lee, W.S.: Question classification using Support vector machine. In: Proceedings of ACM SIGIR-2033, pp. 26–33 (2033)
Google Scholar
Zaki, M.J.: Efficiently Mining Frequent Trees in a Forest. In: Proceedings 8th ACM SIGKDD 2002 (2002)
Google Scholar
Schapire: A brief introduction to boosting. In: Proceedings of IJCAI 1999 (1999)
Google Scholar
Radev, D.R., Fan, W., Qi, H., Wu, H., Grewal, A.: Probabilistic Question Answering from the Web. In: Proceedings of WWW (2002)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, N.Y (1995)
Book MATH Google Scholar
Voorhees, E.: Overview of the TREC 2001 Question Answering Track. In: Proceedings of TREC 2010, pp. 157–165. NIST, Gaithersburg (2001)
Google Scholar
Ray, S.K., Singh, S., Joshi, B.P.: A semantic approach for question classification using WordNet and Wikipedia. Pattern Recognition Letters 31(13), 1935–1943 (2010)
Google Scholar
Huang, Z., Thint, M., Kin, Z.: Question classification using head words and their hypernyms. In: Proceedings EMNLP 2008, pp. 927–936 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Japan Advanced Institute of Science and Technology, School of Information Science, Japan
Le Minh Nguyen & Akira Shimazu

Authors

Le Minh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Akira Shimazu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, University of Alicante, 03080, Alicante, Spain
Rafael Muñoz
Department of Software and Computing Systems, University of Alicante, Aptdo. de Correos 99, 03080, Alicante, Spain
Andrés Montoyo
CNAM- Laboratoire Cédric, 292 Rue St. Martin, 75141, Paris Cedex 03, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, L.M., Shimazu, A. (2011). Improving Subtree-Based Question Classification Classifiers with Word-Cluster Models. In: Muñoz, R., Montoyo, A., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2011. Lecture Notes in Computer Science, vol 6716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22327-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-22327-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22326-6
Online ISBN: 978-3-642-22327-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics