Skip to main content

A Latent Dirichlet Framework for Relevance Modeling

  • Conference paper
  • 877 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5839))

Abstract

Relevance-based language models operate by estimating the probabilities of observing words in documents relevant (or pseudo relevant) to a topic. However, these models assume that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. This could limit model robustness and effectiveness. In this study, we propose a Latent Dirichlet relevance model, which relaxes this assumption. Our approach derives from current research on Latent Dirichlet Allocation (LDA) topic models. LDA has been extensively explored, especially for discovering a set of topics from a corpus. LDA itself, however, has a limitation that is also addressed in our work. Topics generated by LDA from a corpus are synthetic, i.e., they do not necessarily correspond to topics identified by humans for the same corpus. In contrast, our model explicitly considers the relevance relationships between documents and given topics (queries). Thus unlike standard LDA, our model is directly applicable to goals such as relevance feedback for query modification and text classification, where topics (classes and queries) are provided upfront. Thus although the focus of our paper is on improving relevance-based language models, in effect our approach bridges relevance-based language models and LDA addressing limitations of both.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adrieu, C., Freitas, N., Doucet, A., Jordan, M.: An Introduction to Markov Chain Monte Carlo for Machine Learning. Machine Learning 50 (2003)

    Google Scholar 

  2. Blei, M., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003)

    Google Scholar 

  3. Casella, G., George, E.: Explaining the Gibbs Sampler. The American Statistician 46(3) (1992)

    Google Scholar 

  4. Chemudugunta, C., Smyth, P., Steyvers, M.: Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model. In: Proceedings of the 20th NIPS (2006)

    Google Scholar 

  5. Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership Models of Scientific Publication. In: Proceedings of National Academy of Science, PNAS (2004)

    Google Scholar 

  6. Griffiths, T., Steyvers, M.: Finding Scientific Topics. In: Proceedings of National Academy of Science, PNAS (2004)

    Google Scholar 

  7. Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious Language Models for Information Retrieval. In: Proceedings of the 27th ACM SIGIR (2004)

    Google Scholar 

  8. Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of the 15th UAI (1999)

    Google Scholar 

  9. Lavrenko, V., Croft, W.B.: Relevance-based Language Models. In: Proceedings of the 24th ACM SIGIR (2001)

    Google Scholar 

  10. Lavrenko, V., Croft, W.B.: Relevance Models in Information Retrieval. In: Croft, B., Lafferty, J. (eds.) Language Modeling for Information Retrieval. Kluwer Academic Publishers, Dordrecht (2003)

    Google Scholar 

  11. Liu, X., Croft, B.: Passage Retrieval Based on Language Models. In: Proceedings of the 11th ACM CIKM (2002)

    Google Scholar 

  12. Rijsbergen, C., Robertson, S., Porter, M.: New Models in Probabilistic Information Retrieval, British Library Research and Development Report, 5587 (1980)

    Google Scholar 

  13. Robertson, S., Sparck-Jones, K.: Relevance Weighting of Search Terms. Journal of American Society for Information Science 27 (1988)

    Google Scholar 

  14. Sparck-Jones, A., Robertson, S., Hiemstra, D., Zaragoza, H.: Language Modelling and Relevance. In: Croft, B., Lafferty, J. (eds.) Language Modeling for Information Retrieval. Kluwer Academic Publishers, Dordrecht (2003)

    Google Scholar 

  15. Steyvers, M., Griffiths, T.: Probabilistic Topic Models. In: Landauer, T., et al. (eds.) Latent Semantic Analysis: A Road to Meaning. Lawrence Erlbaum, Mahwah (2006)

    Google Scholar 

  16. Wei, X., Croft, B.: LDA-based Document Models for Ad-hoc Retrieval. In: Proceedings of the 29th ACM SIGIR (2006)

    Google Scholar 

  17. Zhang, Y., Callan, J., Minka, T.: Novelty and Redundancy Detection in Adaptive Filtering. In: Proceedings of the 25th ACM SIGIR (2002)

    Google Scholar 

  18. Zhou, D., Manavoglu, E., Li, J., Giles, L., Zha, H.: Probabilistic Models for Discovering E-Communities. In: Proceedings of the 15th ACM WWW (2006)

    Google Scholar 

  19. Lucene, http://lucene.apache.org/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ha-Thuc, V., Srinivasan, P. (2009). A Latent Dirichlet Framework for Relevance Modeling. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04769-5_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04768-8

  • Online ISBN: 978-3-642-04769-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics