Skip to main content

Retrieving and Ranking Similar Questions from Question-Answer Archives Using Topic Modelling and Topic Distribution Regression

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (TPDL 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9819))

Included in the following conference series:

Abstract

Presented herein is a novel model for similar question ranking within collaborative question answer platforms. The presented approach integrates a regression stage to relate topics derived from questions to those derived from question-answer pairs. This helps to avoid problems caused by the differences in vocabulary used within questions and answers, and the tendency for questions to be shorter than answers. The performance of the model is shown to outperform translation methods and topic modelling (without regression) on several real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available from http://webscope.sandbox.yahoo.com/catalog.php?datatype=l.

  2. 2.

    Available from https://archive.org/details/stackexchange.

  3. 3.

    Mistakes in the questions are original to the data.

References

  1. Jeon, J., Croft, B.W., Ho Lee, J.: Finding similar questions in large question and answer archives. In: CIKM, pp. 84–90 (2005)

    Google Scholar 

  2. Zhang, W.N., et al.: A topic clustering approach to finding similar questions from large question and answer archives. PLoS ONE 9, e71511 (2014)

    Article  Google Scholar 

  3. Wang, K., Ming, Z., Chua, T.S.: A syntactic tree matching approach to finding similar questions in community-based QA services. In: SIGIR, pp. 187–194 (2009)

    Google Scholar 

  4. Cui, H., Sun, R., Li, K., Kan, M.Y., Chua, T.S.: Question answering passage retrieval using dependency relations. In: SIGIR, pp. 400–407 (2005)

    Google Scholar 

  5. Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: SIGIR, pp. 475–482 (2008)

    Google Scholar 

  6. Lee, J.T., et al.: Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models. In: EMNLP, pp. 410–418 (2008)

    Google Scholar 

  7. Bernhard, D., Gurevych, I.: Combining lexical semantic resources with question & answer archives for translation-based answer finding. In: ACL-IJCNLP, vol. 2, pp. 728–736 (2009)

    Google Scholar 

  8. Yang, L., et al.: CQArank: jointly model topics and expertise in community question answering. In: CIKM, pp. 99–108 (2013)

    Google Scholar 

  9. Berger, A., Caruana, R., Cohn, D., Freitag, D., Mittal, V.: Bridging the lexical chasm: statistical approaches to answer-finding. In: SIGIR, pp. 192–199 (2000)

    Google Scholar 

  10. Cai, L., Zhou, G., Liu, K., Zhao, J.: Learning the latent topics for question retrieval in community QA. In: IJCNLP, pp. 273–281 (2011)

    Google Scholar 

  11. Vasiljević, J., Ivanović, M., Lampert, T.: The application of the topic modeling to question answer retrieval. In: ICIST, pp. 241–246 (2016)

    Google Scholar 

  12. Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: SIGIR, pp. 232–241 (1994)

    Google Scholar 

  13. Brown, P., et al.: The mathematics of statistical machine translation: paramter estimation. Comput. Linguist. 19, 263–311 (1993)

    Google Scholar 

  14. Zhou, G., et al.: Improving question retrieval in community question answering using world knowledge. In: IJCAI, pp. 2239–2245 (2013)

    Google Scholar 

  15. Singh, A.: Entity based Q&A retrieval. In: EMNLP, pp. 1266–1277 (2012)

    Google Scholar 

  16. Zhou, G., et al.: Statistical machine translation improves question retrieval in community question answering via matrix factorization. In: ACL, pp. 852–861 (2013)

    Google Scholar 

  17. Blei, D.M., et al.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  18. Zolaktaf, Z., Riahi, F., Shafiei, M., Milios, E.: Modeling community question-answering archives. In: Proceedings of the Workshop on Computational Social Science and the Wisdom of Crowds at NIPS (2011)

    Google Scholar 

  19. Petterson, J., et al.: Word features for latent dirichlet allocation. In: NIPS, vol. 23, pp. 1921–1929 (2010)

    Google Scholar 

  20. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)

    Google Scholar 

  21. Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: SIGKDD, pp. 937–946 (2009)

    Google Scholar 

  22. Griffiths, T., Steyvers, M.: Finding scientific topics. PNAS 101, 5228–5235 (2004)

    Article  Google Scholar 

  23. Wallach, H., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: ICML, pp. 1105–1112 (2009)

    Google Scholar 

  24. Ripley, B.: Pattern Recognition and Neural Networks. Cambridge University Press, London (1996)

    Book  MATH  Google Scholar 

  25. Bentz, Y., Merunka, D.: Neural networks and the multinomial logit for brand choice modelling: a hybrid approach. J. Forecast. 19, 177–200 (2000)

    Article  Google Scholar 

  26. Socher, R., et al.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: NIPS (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Lampert .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Chahuara, P., Lampert, T., Gançarski, P. (2016). Retrieving and Ranking Similar Questions from Question-Answer Archives Using Topic Modelling and Topic Distribution Regression. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2016. Lecture Notes in Computer Science(), vol 9819. Springer, Cham. https://doi.org/10.1007/978-3-319-43997-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43997-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43996-9

  • Online ISBN: 978-3-319-43997-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics