Skip to main content

Refined Feature Extraction for Chinese Question Classification in CQA

  • Conference paper
  • First Online:
Testbeds and Research Infrastructure: Development of Networks and Communities (TridentCom 2014)

Abstract

Community-based Question Answering (CQA) services, such as Baidu Zhidao, have attracted increasing attention over recent years, where the users can voluntarily post the questions and obtain the answers by the other users from the community. Question classification module of a CQA system plays a very important role in understanding the user intents, which could effectively enhance the CQA systems to identify the similar questions and retrieve the candidate answers. However, the poor semantic information could be obtained from the questions because of the short sentences. This paper proposes a refined feature extraction method for question classification. The method aims to use Wikipedia to expand the semantic knowledge of sentences, and extract the features step by step to overcome the shortness of semantic knowledge. Experimental results on 714,582 Chinese questions crawled from Baidu Knows show that the proposed method could effectively improve the performance of question classification in CQA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Riahi, F., Zolaktaf, Z., Shafiei, M., Milios, E.: Finding Expert Users in Community Question Answering. In: Proceedings of the 21st International Conference Companion on World Wide Web, 791–798 (2012)

    Google Scholar 

  2. Chen, L., Zhang, D., Mark, L.: Understanding User Intent in Community Question Answering. In: Proceedings of the 21st International Conference Companion on World Wide Web, pp. 823–828 (2012)

    Google Scholar 

  3. Zhang, D., Lee, W.S.: Question Classification Using Support Vector Machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 26–32 (2003)

    Google Scholar 

  4. Moschitti, A., Quarteroni, S., Basili, R., et. al.: Exploiting syntactic and shallow semantic kernels for question answer classification. In: Proceedings of 45th Annual Meeting of the Association for Computational Linguistics: York, pp. 776–783 (2007)

    Google Scholar 

  5. Zhang, Y., Liu, T., Wen, X.: Modified bayesian model based question classification. Journal of Chinese Information Processing 19(2), 100–105 (2005). (in Chinese)

    Google Scholar 

  6. Hotho, A., Staab, S., Stumme, G.: WordNet Improves Text Document Clustering. In: Proceedings of the Semantic Web Workshop of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto Canada, pp. 541–544 (2003)

    Google Scholar 

  7. Cai, L., Zhou, G., Liu, K., Zhao, J.: Large-Scale Question Classification in cQA by Leveraging Wikipedia Semantic Knowledge. In: Proceeding of the 20th ACM Conference on Information and Knowledge Management (2011)

    Google Scholar 

  8. Gebrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorication with encyclopedia knowledge. In: IJCAI, pp. 1301–1306 (2006)

    Google Scholar 

  9. Wang, P., Domeniconl, C.: Building semantic kernels for text classification using wikipedia. In: KDD (2008)

    Google Scholar 

  10. Wang, P., Hu, J., Zeng, H.-J., Chen, L., Chen, Z.: Improving text classification by using encyclopedia knowledge. In: ICDM, pp. 332–341 (2007)

    Google Scholar 

  11. Hu, J., Fang, L., Cao, Y., Zeng, H., Li, H., Yang, Q., Chen, Z.: Enhancing text clustering by leveraging Wikipedia semantics. In: SIGIR (2008)

    Google Scholar 

  12. Hu, X., Sun, N., Zhang, C., Chua, T.-S.: Exploting internal and external semantics for the clustering of short texts using world knowledge. In: CIKM (2009)

    Google Scholar 

  13. Hu, X., Zhang, X., Lu, C., Park, E.K., Zhou, X.: Exploiting wikipedia as external knowledge for document clustering. In: KDD (2009)

    Google Scholar 

  14. Zesch, T., Müller, C., Gurevych, I.: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: LREC (2008)

    Google Scholar 

  15. Che, W., Li, Z., Liu, T.: LTP: A Chinese Language Technology Platform. In: Proceedings of the Coling 2010:Demonstrations, Beijing, China, pp. 13–16 (August 2010)

    Google Scholar 

  16. Le, Z.: Maximum Entropy Modeling Toolkit for Python and C++. Software available at. http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolikt.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Su .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Su, L., Yang, B., Qi, X., Xian, Y. (2014). Refined Feature Extraction for Chinese Question Classification in CQA. In: Leung, V., Chen, M., Wan, J., Zhang, Y. (eds) Testbeds and Research Infrastructure: Development of Networks and Communities. TridentCom 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 137. Springer, Cham. https://doi.org/10.1007/978-3-319-13326-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13326-3_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13325-6

  • Online ISBN: 978-3-319-13326-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics