Refined Feature Extraction for Chinese Question Classification in CQA

Su, Lei; Yang, Bin; Qi, Xiangxiang; Xian, Yantuan

doi:10.1007/978-3-319-13326-3_30

Lei Su¹⁹,
Bin Yang¹⁹,
Xiangxiang Qi¹⁹ &
…
Yantuan Xian¹⁹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 137))

Included in the following conference series:

International Conference on Testbeds and Research Infrastructures

1337 Accesses
1 Citations

Abstract

Community-based Question Answering (CQA) services, such as Baidu Zhidao, have attracted increasing attention over recent years, where the users can voluntarily post the questions and obtain the answers by the other users from the community. Question classification module of a CQA system plays a very important role in understanding the user intents, which could effectively enhance the CQA systems to identify the similar questions and retrieve the candidate answers. However, the poor semantic information could be obtained from the questions because of the short sentences. This paper proposes a refined feature extraction method for question classification. The method aims to use Wikipedia to expand the semantic knowledge of sentences, and extract the features step by step to overcome the shortness of semantic knowledge. Experimental results on 714,582 Chinese questions crawled from Baidu Knows show that the proposed method could effectively improve the performance of question classification in CQA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Riahi, F., Zolaktaf, Z., Shafiei, M., Milios, E.: Finding Expert Users in Community Question Answering. In: Proceedings of the 21st International Conference Companion on World Wide Web, 791–798 (2012)
Google Scholar
Chen, L., Zhang, D., Mark, L.: Understanding User Intent in Community Question Answering. In: Proceedings of the 21st International Conference Companion on World Wide Web, pp. 823–828 (2012)
Google Scholar
Zhang, D., Lee, W.S.: Question Classification Using Support Vector Machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 26–32 (2003)
Google Scholar
Moschitti, A., Quarteroni, S., Basili, R., et. al.: Exploiting syntactic and shallow semantic kernels for question answer classification. In: Proceedings of 45th Annual Meeting of the Association for Computational Linguistics: York, pp. 776–783 (2007)
Google Scholar
Zhang, Y., Liu, T., Wen, X.: Modified bayesian model based question classification. Journal of Chinese Information Processing 19(2), 100–105 (2005). (in Chinese)
Google Scholar
Hotho, A., Staab, S., Stumme, G.: WordNet Improves Text Document Clustering. In: Proceedings of the Semantic Web Workshop of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto Canada, pp. 541–544 (2003)
Google Scholar
Cai, L., Zhou, G., Liu, K., Zhao, J.: Large-Scale Question Classification in cQA by Leveraging Wikipedia Semantic Knowledge. In: Proceeding of the 20th ACM Conference on Information and Knowledge Management (2011)
Google Scholar
Gebrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorication with encyclopedia knowledge. In: IJCAI, pp. 1301–1306 (2006)
Google Scholar
Wang, P., Domeniconl, C.: Building semantic kernels for text classification using wikipedia. In: KDD (2008)
Google Scholar
Wang, P., Hu, J., Zeng, H.-J., Chen, L., Chen, Z.: Improving text classification by using encyclopedia knowledge. In: ICDM, pp. 332–341 (2007)
Google Scholar
Hu, J., Fang, L., Cao, Y., Zeng, H., Li, H., Yang, Q., Chen, Z.: Enhancing text clustering by leveraging Wikipedia semantics. In: SIGIR (2008)
Google Scholar
Hu, X., Sun, N., Zhang, C., Chua, T.-S.: Exploting internal and external semantics for the clustering of short texts using world knowledge. In: CIKM (2009)
Google Scholar
Hu, X., Zhang, X., Lu, C., Park, E.K., Zhou, X.: Exploiting wikipedia as external knowledge for document clustering. In: KDD (2009)
Google Scholar
Zesch, T., Müller, C., Gurevych, I.: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: LREC (2008)
Google Scholar
Che, W., Li, Z., Liu, T.: LTP: A Chinese Language Technology Platform. In: Proceedings of the Coling 2010:Demonstrations, Beijing, China, pp. 13–16 (August 2010)
Google Scholar
Le, Z.: Maximum Entropy Modeling Toolkit for Python and C++. Software available at. http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolikt.html

Download references

Author information

Authors and Affiliations

School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650093, China
Lei Su, Bin Yang, Xiangxiang Qi & Yantuan Xian

Authors

Lei Su
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangxiang Qi
View author publications
You can also search for this author in PubMed Google Scholar
Yantuan Xian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Su .

Editor information

Editors and Affiliations

Electrical and Computer Engineering, The University of British Columbia, Vancouver, British Columbia, Canada
Victor C.M. Leung
Huazhong University of Science and Technology, Wuhan City, China
Min Chen
School of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou, China
Jiafu Wan
Huazhong University of Science and Technology, Wuhan, China
Yin Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Su, L., Yang, B., Qi, X., Xian, Y. (2014). Refined Feature Extraction for Chinese Question Classification in CQA. In: Leung, V., Chen, M., Wan, J., Zhang, Y. (eds) Testbeds and Research Infrastructure: Development of Networks and Communities. TridentCom 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 137. Springer, Cham. https://doi.org/10.1007/978-3-319-13326-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-13326-3_30
Published: 26 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13325-6
Online ISBN: 978-3-319-13326-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics