Retrieving and Ranking Similar Questions from Question-Answer Archives Using Topic Modelling and Topic Distribution Regression

Chahuara, Pedro; Lampert, Thomas; Gançarski, Pierre

doi:10.1007/978-3-319-43997-6_4

Pedro Chahuara^17,18,
Thomas Lampert^17,19 &
Pierre Gançarski¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9819))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

1609 Accesses
1 Citations
1 Altmetric

Abstract

Presented herein is a novel model for similar question ranking within collaborative question answer platforms. The presented approach integrates a regression stage to relate topics derived from questions to those derived from question-answer pairs. This helps to avoid problems caused by the differences in vocabulary used within questions and answers, and the tendency for questions to be shorter than answers. The performance of the model is shown to outperform translation methods and topic modelling (without regression) on several real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available from http://webscope.sandbox.yahoo.com/catalog.php?datatype=l.
2.
Available from https://archive.org/details/stackexchange.
3.
Mistakes in the questions are original to the data.

References

Jeon, J., Croft, B.W., Ho Lee, J.: Finding similar questions in large question and answer archives. In: CIKM, pp. 84–90 (2005)
Google Scholar
Zhang, W.N., et al.: A topic clustering approach to finding similar questions from large question and answer archives. PLoS ONE 9, e71511 (2014)
Article Google Scholar
Wang, K., Ming, Z., Chua, T.S.: A syntactic tree matching approach to finding similar questions in community-based QA services. In: SIGIR, pp. 187–194 (2009)
Google Scholar
Cui, H., Sun, R., Li, K., Kan, M.Y., Chua, T.S.: Question answering passage retrieval using dependency relations. In: SIGIR, pp. 400–407 (2005)
Google Scholar
Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: SIGIR, pp. 475–482 (2008)
Google Scholar
Lee, J.T., et al.: Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models. In: EMNLP, pp. 410–418 (2008)
Google Scholar
Bernhard, D., Gurevych, I.: Combining lexical semantic resources with question & answer archives for translation-based answer finding. In: ACL-IJCNLP, vol. 2, pp. 728–736 (2009)
Google Scholar
Yang, L., et al.: CQArank: jointly model topics and expertise in community question answering. In: CIKM, pp. 99–108 (2013)
Google Scholar
Berger, A., Caruana, R., Cohn, D., Freitag, D., Mittal, V.: Bridging the lexical chasm: statistical approaches to answer-finding. In: SIGIR, pp. 192–199 (2000)
Google Scholar
Cai, L., Zhou, G., Liu, K., Zhao, J.: Learning the latent topics for question retrieval in community QA. In: IJCNLP, pp. 273–281 (2011)
Google Scholar
Vasiljević, J., Ivanović, M., Lampert, T.: The application of the topic modeling to question answer retrieval. In: ICIST, pp. 241–246 (2016)
Google Scholar
Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: SIGIR, pp. 232–241 (1994)
Google Scholar
Brown, P., et al.: The mathematics of statistical machine translation: paramter estimation. Comput. Linguist. 19, 263–311 (1993)
Google Scholar
Zhou, G., et al.: Improving question retrieval in community question answering using world knowledge. In: IJCAI, pp. 2239–2245 (2013)
Google Scholar
Singh, A.: Entity based Q&A retrieval. In: EMNLP, pp. 1266–1277 (2012)
Google Scholar
Zhou, G., et al.: Statistical machine translation improves question retrieval in community question answering via matrix factorization. In: ACL, pp. 852–861 (2013)
Google Scholar
Blei, D.M., et al.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Zolaktaf, Z., Riahi, F., Shafiei, M., Milios, E.: Modeling community question-answering archives. In: Proceedings of the Workshop on Computational Social Science and the Wisdom of Crowds at NIPS (2011)
Google Scholar
Petterson, J., et al.: Word features for latent dirichlet allocation. In: NIPS, vol. 23, pp. 1921–1929 (2010)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)
Google Scholar
Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: SIGKDD, pp. 937–946 (2009)
Google Scholar
Griffiths, T., Steyvers, M.: Finding scientific topics. PNAS 101, 5228–5235 (2004)
Article Google Scholar
Wallach, H., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: ICML, pp. 1105–1112 (2009)
Google Scholar
Ripley, B.: Pattern Recognition and Neural Networks. Cambridge University Press, London (1996)
Book MATH Google Scholar
Bentz, Y., Merunka, D.: Neural networks and the multinomial logit for brand choice modelling: a hybrid approach. J. Forecast. 19, 177–200 (2000)
Article Google Scholar
Socher, R., et al.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: NIPS (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire ICube, Université de Strasbourg, Strasbourg, France
Pedro Chahuara, Thomas Lampert & Pierre Gançarski
Xerox Research Centre Europe, Meylan, France
Pedro Chahuara
Laboratoire Quantup, Strasbourg, France
Thomas Lampert

Authors

Pedro Chahuara
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Lampert
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Gançarski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Lampert .

Editor information

Editors and Affiliations

Universität Duisburg-Essen , Duisburg, Germany
Norbert Fuhr
Hungarian Academy of Science , Budapest, Hungary
László Kovács
Leibniz Universität Hannover , Hannover, Germany
Thomas Risse
Leibniz Universität Hannover , Hannover, Germany
Wolfgang Nejdl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chahuara, P., Lampert, T., Gançarski, P. (2016). Retrieving and Ranking Similar Questions from Question-Answer Archives Using Topic Modelling and Topic Distribution Regression. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2016. Lecture Notes in Computer Science(), vol 9819. Springer, Cham. https://doi.org/10.1007/978-3-319-43997-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-43997-6_4
Published: 10 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43996-9
Online ISBN: 978-3-319-43997-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics