An Empirical Study of SLDA for Information Retrieval

Ma, Dashun; Rao, Lan; Wang, Ting

doi:10.1007/978-3-642-25631-8_8

Dashun Ma²¹,
Lan Rao²² &
Ting Wang²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7097))

Included in the following conference series:

Asia Information Retrieval Symposium

1357 Accesses
3 Citations

Abstract

A common limitation of many language modeling approaches is that retrieval scores are mainly based on exact matching of terms in the queries and documents, ignoring the semantic relations among terms. Latent Dirichlet Allocation (LDA) is an approach trying to capture the semantic dependencies among words. However, using as document representation, LDA has no successful applications in information retrieval (IR). In this paper, we propose a single-document-based LDA (SLDA) document model for IR. The proposed work has been evaluated on four TREC collections, which shows that SLDA document modeling method is comparable to the state-of-the-art language modeling approaches, and it’s a novel way to use LDA model to improve retrieval performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the Relationship between Language Model Perplexity and IR Precision-Recall Measures. In: Proc. of 26th SIGIR, pp. 367–370 (2003)
Google Scholar
Blei, M., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Cao, G.H., Nie, J.Y., Bai, J.: Integrating Word Relationships into Language Models. In: Proc. of 28th SIGIR, pp. 298–305 (2005)
Google Scholar
Chemudugunta, C., Smyth, P., Steyvers, M.: Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model. In: Proc. of 19th NIPS, pp. 241–248 (2006)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Article Google Scholar
Geman, S., Geman, D.: Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Transactions Pattern Analysis and Machine Intelligence 6, 721–741 (1984)
Article MATH Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. of the National Academy of Sciences 101(suppl. 1), 5228–5235 (2004)
Article Google Scholar
Griffiths, T., Steyvers, M., Blei, D., Tenenbaum, J.: Integrating topics and syntax. In: Proc. of 17th NIPS, pp. 537–544 (2005)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of 22nd SIGIR, pp. 35–44 (1999)
Google Scholar
Lafferty, J.D., Zhai, C.X.: Document language models, query models, and risk minimization for information retrieval. In: Proc. 24th of SIGIR, pp. 111–119 (2001)
Google Scholar
Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: Proc. of 27th SIGIR, pp. 186–193 (2004)
Google Scholar
Madsen, R.E., Kauchak, D., Elkan, C.: Modeling Word Burstiness Using the Distribution. In: Proc. of 22nd ICML, pp. 298–305 (2005)
Google Scholar
Tao, T., Zhai, C.X.: An Exploration of Proximity Measures in Information Retrieval. In: Proc. of 30th SIGIR, pp. 295–302 (2007)
Google Scholar
Wang, X.R., McCallum, A., Wei, X.: Topical N-grams: Phrase and Topic Discovery, with an Application to Information Retrieval. In: Proc. of IEEE 7th ICDM, pp. 697–702 (2007)
Google Scholar
Wei, X., Croft, W.B.: LDA-Based Document Models for Ad-hoc Retrieval. In: Proc. of 29th SIGIR, pp. 178–185 (2006)
Google Scholar
Zhai, C.X.: Statistical Language Models for Information Retrieval: A Critical Review. Foundations and Trends in Information Retrieval 2(3), 137–213 (2008)
Article Google Scholar
Zhai, C.X., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In: Proc. of 24th SIGIR, pp. 334–342 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, 410073, Changsha, Hunan, P.R. China
Dashun Ma & Ting Wang
College of Humanities and Social Sciences, National University of Defense Technology, 410073, Changsha, Hunan, P.R. China
Lan Rao

Authors

Dashun Ma
View author publications
You can also search for this author in PubMed Google Scholar
Lan Rao
View author publications
You can also search for this author in PubMed Google Scholar
Ting Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Computer Science and Engineering, University of Wollongong, Dubai Knowledge Village, P.O. Box 20182, Dubai, United Arab Emirates
Mohamed Vall Mohamed Salem
Faculty of Engineering and IT, Dubai International Academic City, Block 11, 1st and 2nd Floor, P.O. Box 345015, Dubai, United Arab Emirates
Khaled Shaalan
Faculty of Computer Science and Engineering, University of Wollongong, Dubai Knowledge Village, P.O. Box 20183, Dubai, United Arab Emirates
Farhad Oroumchian
Department of Electrical and Computer Engineering, University of Tehran, Faculty of Engineering, North Kargar Street, P.O. Box 14395-515, Tehran, Iran
Azadeh Shakery
Faculty of Computer Science and Engineering, University of Wollongong, Dubai knowledge Village, P.O. Box 20183, Dubai, United Arab Emirates
Halim Khelalfa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, D., Rao, L., Wang, T. (2011). An Empirical Study of SLDA for Information Retrieval. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds) Information Retrieval Technology. AIRS 2011. Lecture Notes in Computer Science, vol 7097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25631-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-25631-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25630-1
Online ISBN: 978-3-642-25631-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics