Abstract
Questions in Community question answering (CQA) consisting of some labeled questions and numerous unlabeled questions are so complex and irregular. Therefore, question classification in CQA has become the research hotspot in recent years. In this paper, we propose to classify the questions in CQA through the label propagation algorithm (LPA) based on the concept of graph, where nodes represent the labeled and unlabeled sample questions and edges represent the distance between the sample questions, through the node label propagation to realize question classification. Experiments on corpuses from “Baidu Knows”, the accuracy in question classification through the LPA is not only higher than that through the KNN algorithm and SVM algorithm that have applied the labeled samples, but also higher than that through the SVM-based Bootstrapping algorithm that has utilized the labeled and unlabeled samples.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Qiu-dan, Z.Z.F.L.: Studies on Community Question Answering—A Survey. Computer Science 11, 008 (2010)
Yan, X., Fan, S.: CQA-Oriented Coarse-Grained Question Classification Algorithm. Jisuanji Yingyong yu Ruanjian, 30(1) (2013)
Roth, D., Small, K.: The role of semantic information in learning question classifiers. In: Proceedings of the Conference First International Joint Conference on Natural Language Processing (2004)
Xin, L., Dan, R.: Learning question classifier. In: Proceedings of the 19th International Conference on Computational Linguistics, Taipei, pp. 556–562 (2002)
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 26–32. ACM (2003)
Zhu, X.: Semi-supervised learning literature survey (2005)
Zhou, Z.H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. Knowledge and Data Engineering, IEEE Transactions on 17(11), 1529–1541 (2005)
Li, M., Zhou, Z.H.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 37(6), 1088–1098 (2007)
Zhang, Z.: Weakly-supervised relation classification for information extraction. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 581–588. ACM (2004)
Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University (2002)
Jiu-le, T.I.A.N., Wei, Z.H.A.O.: Words similarity algorithm based on Tongyici Cilin in semantic web adaptive learning system. Journal of Jilin University (Information Science Edition) 28(6), 602–608 (2010)
Hotho, A., Staab, S., Stumme, G.: Ontologies improve text document clustering. In: Third IEEE International Conference on Data Mining. ICDM 2003, pp. 541–544. IEEE (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, J., Su, L., Li, Y., Shu, P. (2015). Label Propagation for Question Classification in CQA. In: Tan, Y., Shi, Y., Buarque, F., Gelbukh, A., Das, S., Engelbrecht, A. (eds) Advances in Swarm and Computational Intelligence. ICSI 2015. Lecture Notes in Computer Science(), vol 9141. Springer, Cham. https://doi.org/10.1007/978-3-319-20472-7_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-20472-7_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20471-0
Online ISBN: 978-3-319-20472-7
eBook Packages: Computer ScienceComputer Science (R0)