Abstract
Presented herein is a novel model for similar question ranking within collaborative question answer platforms. The presented approach integrates a regression stage to relate topics derived from questions to those derived from question-answer pairs. This helps to avoid problems caused by the differences in vocabulary used within questions and answers, and the tendency for questions to be shorter than answers. The performance of the model is shown to outperform translation methods and topic modelling (without regression) on several real-world datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available from http://webscope.sandbox.yahoo.com/catalog.php?datatype=l.
- 2.
Available from https://archive.org/details/stackexchange.
- 3.
Mistakes in the questions are original to the data.
References
Jeon, J., Croft, B.W., Ho Lee, J.: Finding similar questions in large question and answer archives. In: CIKM, pp. 84–90 (2005)
Zhang, W.N., et al.: A topic clustering approach to finding similar questions from large question and answer archives. PLoS ONE 9, e71511 (2014)
Wang, K., Ming, Z., Chua, T.S.: A syntactic tree matching approach to finding similar questions in community-based QA services. In: SIGIR, pp. 187–194 (2009)
Cui, H., Sun, R., Li, K., Kan, M.Y., Chua, T.S.: Question answering passage retrieval using dependency relations. In: SIGIR, pp. 400–407 (2005)
Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: SIGIR, pp. 475–482 (2008)
Lee, J.T., et al.: Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models. In: EMNLP, pp. 410–418 (2008)
Bernhard, D., Gurevych, I.: Combining lexical semantic resources with question & answer archives for translation-based answer finding. In: ACL-IJCNLP, vol. 2, pp. 728–736 (2009)
Yang, L., et al.: CQArank: jointly model topics and expertise in community question answering. In: CIKM, pp. 99–108 (2013)
Berger, A., Caruana, R., Cohn, D., Freitag, D., Mittal, V.: Bridging the lexical chasm: statistical approaches to answer-finding. In: SIGIR, pp. 192–199 (2000)
Cai, L., Zhou, G., Liu, K., Zhao, J.: Learning the latent topics for question retrieval in community QA. In: IJCNLP, pp. 273–281 (2011)
Vasiljević, J., Ivanović, M., Lampert, T.: The application of the topic modeling to question answer retrieval. In: ICIST, pp. 241–246 (2016)
Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: SIGIR, pp. 232–241 (1994)
Brown, P., et al.: The mathematics of statistical machine translation: paramter estimation. Comput. Linguist. 19, 263–311 (1993)
Zhou, G., et al.: Improving question retrieval in community question answering using world knowledge. In: IJCAI, pp. 2239–2245 (2013)
Singh, A.: Entity based Q&A retrieval. In: EMNLP, pp. 1266–1277 (2012)
Zhou, G., et al.: Statistical machine translation improves question retrieval in community question answering via matrix factorization. In: ACL, pp. 852–861 (2013)
Blei, D.M., et al.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Zolaktaf, Z., Riahi, F., Shafiei, M., Milios, E.: Modeling community question-answering archives. In: Proceedings of the Workshop on Computational Social Science and the Wisdom of Crowds at NIPS (2011)
Petterson, J., et al.: Word features for latent dirichlet allocation. In: NIPS, vol. 23, pp. 1921–1929 (2010)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)
Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: SIGKDD, pp. 937–946 (2009)
Griffiths, T., Steyvers, M.: Finding scientific topics. PNAS 101, 5228–5235 (2004)
Wallach, H., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: ICML, pp. 1105–1112 (2009)
Ripley, B.: Pattern Recognition and Neural Networks. Cambridge University Press, London (1996)
Bentz, Y., Merunka, D.: Neural networks and the multinomial logit for brand choice modelling: a hybrid approach. J. Forecast. 19, 177–200 (2000)
Socher, R., et al.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: NIPS (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Chahuara, P., Lampert, T., Gançarski, P. (2016). Retrieving and Ranking Similar Questions from Question-Answer Archives Using Topic Modelling and Topic Distribution Regression. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2016. Lecture Notes in Computer Science(), vol 9819. Springer, Cham. https://doi.org/10.1007/978-3-319-43997-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-43997-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43996-9
Online ISBN: 978-3-319-43997-6
eBook Packages: Computer ScienceComputer Science (R0)