A Latent Dirichlet Framework for Relevance Modeling

Ha-Thuc, Viet; Srinivasan, Padmini

doi:10.1007/978-3-642-04769-5_2

A Latent Dirichlet Framework for Relevance Modeling

Viet Ha-Thuc²³ &
Padmini Srinivasan²³

Conference paper

877 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5839))

Abstract

Relevance-based language models operate by estimating the probabilities of observing words in documents relevant (or pseudo relevant) to a topic. However, these models assume that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. This could limit model robustness and effectiveness. In this study, we propose a Latent Dirichlet relevance model, which relaxes this assumption. Our approach derives from current research on Latent Dirichlet Allocation (LDA) topic models. LDA has been extensively explored, especially for discovering a set of topics from a corpus. LDA itself, however, has a limitation that is also addressed in our work. Topics generated by LDA from a corpus are synthetic, i.e., they do not necessarily correspond to topics identified by humans for the same corpus. In contrast, our model explicitly considers the relevance relationships between documents and given topics (queries). Thus unlike standard LDA, our model is directly applicable to goals such as relevance feedback for query modification and text classification, where topics (classes and queries) are provided upfront. Thus although the focus of our paper is on improving relevance-based language models, in effect our approach bridges relevance-based language models and LDA addressing limitations of both.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adrieu, C., Freitas, N., Doucet, A., Jordan, M.: An Introduction to Markov Chain Monte Carlo for Machine Learning. Machine Learning 50 (2003)
Google Scholar
Blei, M., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003)
Google Scholar
Casella, G., George, E.: Explaining the Gibbs Sampler. The American Statistician 46(3) (1992)
Google Scholar
Chemudugunta, C., Smyth, P., Steyvers, M.: Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model. In: Proceedings of the 20th NIPS (2006)
Google Scholar
Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership Models of Scientific Publication. In: Proceedings of National Academy of Science, PNAS (2004)
Google Scholar
Griffiths, T., Steyvers, M.: Finding Scientific Topics. In: Proceedings of National Academy of Science, PNAS (2004)
Google Scholar
Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious Language Models for Information Retrieval. In: Proceedings of the 27th ACM SIGIR (2004)
Google Scholar
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of the 15th UAI (1999)
Google Scholar
Lavrenko, V., Croft, W.B.: Relevance-based Language Models. In: Proceedings of the 24th ACM SIGIR (2001)
Google Scholar
Lavrenko, V., Croft, W.B.: Relevance Models in Information Retrieval. In: Croft, B., Lafferty, J. (eds.) Language Modeling for Information Retrieval. Kluwer Academic Publishers, Dordrecht (2003)
Google Scholar
Liu, X., Croft, B.: Passage Retrieval Based on Language Models. In: Proceedings of the 11th ACM CIKM (2002)
Google Scholar
Rijsbergen, C., Robertson, S., Porter, M.: New Models in Probabilistic Information Retrieval, British Library Research and Development Report, 5587 (1980)
Google Scholar
Robertson, S., Sparck-Jones, K.: Relevance Weighting of Search Terms. Journal of American Society for Information Science 27 (1988)
Google Scholar
Sparck-Jones, A., Robertson, S., Hiemstra, D., Zaragoza, H.: Language Modelling and Relevance. In: Croft, B., Lafferty, J. (eds.) Language Modeling for Information Retrieval. Kluwer Academic Publishers, Dordrecht (2003)
Google Scholar
Steyvers, M., Griffiths, T.: Probabilistic Topic Models. In: Landauer, T., et al. (eds.) Latent Semantic Analysis: A Road to Meaning. Lawrence Erlbaum, Mahwah (2006)
Google Scholar
Wei, X., Croft, B.: LDA-based Document Models for Ad-hoc Retrieval. In: Proceedings of the 29th ACM SIGIR (2006)
Google Scholar
Zhang, Y., Callan, J., Minka, T.: Novelty and Redundancy Detection in Adaptive Filtering. In: Proceedings of the 25th ACM SIGIR (2002)
Google Scholar
Zhou, D., Manavoglu, E., Li, J., Giles, L., Zha, H.: Probabilistic Models for Discovering E-Communities. In: Proceedings of the 15th ACM WWW (2006)
Google Scholar
Lucene, http://lucene.apache.org/

Download references

Author information

Authors and Affiliations

Computer Science Department, The University of Iowa, Iowa City, IA, 52246, USA
Viet Ha-Thuc & Padmini Srinivasan

Authors

Viet Ha-Thuc
View author publications
You can also search for this author in PubMed Google Scholar
Padmini Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, 790-784, Pohang, Korea
Gary Geunbae Lee
School of Computing, The Robert Gordon University, St Andrew Street, AB25 1HG, Aberdeen, UK
Dawei Song
Microsoft Reseach Asia, 5F Beijing Sigma Center, 49 Zhichun Road, Haidian District, 100190, Beijing, P.R. China
Chin-Yew Lin
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, 101-8430, Tokyo, Japan
Akiko Aizawa
School of Literature, Shirayuri College, 1-25 Midorigaoka, Chofu-shi, 182-8525, Tokyo, Japan
Kazuko Kuriyama
Graduate School of Information Science and Technology, Hokkaido University, North 14 West 9, Kita-ku. Sapporo-shi, 060-0814, Hokkaido, Japan
Masaharu Yoshioka
Microsoft Research Asia, 5F Beijing Sigma Center, 49 Zhichun Road, Haidian District, 100190, Beijing, P.R. China
Tetsuya Sakai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ha-Thuc, V., Srinivasan, P. (2009). A Latent Dirichlet Framework for Relevance Modeling. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-04769-5_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04768-8
Online ISBN: 978-3-642-04769-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics