Skip to main content

Improving Subtree-Based Question Classification Classifiers with Word-Cluster Models

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6716))

  • 1801 Accesses

Abstract

Question classification has been recognized as a very important step for many natural language applications (i.e question answering). Subtree mining has been indicated that [10] it is helpful for question classification problem. The authors empirically showed that subtree features obtained by subtree mining, were able to improve the performance of Question Classification for boosting and maximum entropy models. In this paper, our first goal is to investigate that whether or not subtree mining features are useful for structured support vector machines. Secondly, to make the proposed models more robust, we incorporate subtree features with word-cluster models gained from a large collection of text documents. Experimental results show that the uses of word-cluster models with subtree mining can significantly improve the performance of the proposed question classification models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berger, A., Pietra, S.D., Pietra, V.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1) (1996)

    Google Scholar 

  2. Brown, P.F., Della Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class-Based n-gram Models of Natural Language. Computational Linguistics 18(4), 467–479 (1992)

    Google Scholar 

  3. Charniak, E.: A Maximum-Entropy Inspired Parser. In: Proc. ACL (2001)

    Google Scholar 

  4. Charniak, E., Blaheta, D., Ge, N., Hall, K., Hale, J., Johnson, M.: BLLIP 1987-1989 WSJ Corpus Release 1. Linguistic Data Consortium (2000)

    Google Scholar 

  5. Carlson, A., Cumby, C., Roth, D.: The SNoW learning architecture, Technical report UIUC-DCS-R-99-2101, UIUC Computer Science Department (1999)

    Google Scholar 

  6. Kadri, H., Wayne, W.: Question classification with Support vector machines and error correcting codes. In: Proceedings of NAACL-HLT 2003, pp. 28–30 (2003)

    Google Scholar 

  7. Kudo, T., Maeda, E., Matsumoto, Y.: An Application of Boosting to Graph Classification. In: Proceedings NIPS (2004)

    Google Scholar 

  8. Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 556–562 (2002)

    Google Scholar 

  9. Liang, P., Collins, M.: Semi-supervised learning for natural language. Master thesis, MIT (2005)

    Google Scholar 

  10. Nguyen, M.L., Shimazu, A., Nguyen, T.T.: Subtree mining for question classification problem. In: Proceedings IJCAI 2007, pp. 1695–1700 (2007)

    Google Scholar 

  11. Morishita, S.: Computing optimal hypotheses efficiently for boosting. In: Arikawa, S., Shinohara, A. (eds.) Progress in Discovery Science. LNCS (LNAI), vol. 2281, pp. 471–481. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  12. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support Vector Machine Learning for Interdependent and Structured Output Spaces. In: Proceedings ICML 2004 (2004)

    Google Scholar 

  13. Zhang, D., Lee, W.S.: Question classification using Support vector machine. In: Proceedings of ACM SIGIR-2033, pp. 26–33 (2033)

    Google Scholar 

  14. Zaki, M.J.: Efficiently Mining Frequent Trees in a Forest. In: Proceedings 8th ACM SIGKDD 2002 (2002)

    Google Scholar 

  15. Schapire: A brief introduction to boosting. In: Proceedings of IJCAI 1999 (1999)

    Google Scholar 

  16. Radev, D.R., Fan, W., Qi, H., Wu, H., Grewal, A.: Probabilistic Question Answering from the Web. In: Proceedings of WWW (2002)

    Google Scholar 

  17. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, N.Y (1995)

    Book  MATH  Google Scholar 

  18. Voorhees, E.: Overview of the TREC 2001 Question Answering Track. In: Proceedings of TREC 2010, pp. 157–165. NIST, Gaithersburg (2001)

    Google Scholar 

  19. Ray, S.K., Singh, S., Joshi, B.P.: A semantic approach for question classification using WordNet and Wikipedia. Pattern Recognition Letters 31(13), 1935–1943 (2010)

    Google Scholar 

  20. Huang, Z., Thint, M., Kin, Z.: Question classification using head words and their hypernyms. In: Proceedings EMNLP 2008, pp. 927–936 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nguyen, L.M., Shimazu, A. (2011). Improving Subtree-Based Question Classification Classifiers with Word-Cluster Models. In: Muñoz, R., Montoyo, A., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2011. Lecture Notes in Computer Science, vol 6716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22327-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22327-3_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22326-6

  • Online ISBN: 978-3-642-22327-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics