A Unified Framework for Monolingual and Cross-Lingual Relevance Modeling Based on Probabilistic Topic Models

Vulić, Ivan; Moens, Marie-Francine

doi:10.1007/978-3-642-36973-5_9

Ivan Vulić²³ &
Marie-Francine Moens²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7814))

Included in the following conference series:

European Conference on Information Retrieval

2987 Accesses
7 Citations

Abstract

We explore the potential of probabilistic topic modeling within the relevance modeling framework for both monolingual and cross-lingual ad-hoc retrieval. Multilingual topic models provide a way to represent documents in a structured and coherent way, regardless of their actual language, by means of language-independent concepts, that is, cross-lingual topics. We show how to integrate the topical knowledge into a unified relevance modeling framework in order to build quality retrieval models in monolingual and cross-lingual contexts. The proposed modeling framework processes all documents uniformly and does not make any conceptual distinction between monolingual and cross-lingual modeling. Our results obtained from the experiments conducted on the standard CLEF test collections reveal that fusing the topical knowledge and relevance modeling leads to building monolingual and cross-lingual retrieval models that outperform several strong baselines. We show that that the topical knowledge coming from a general Web-generated corpus boosts retrieval scores. Additionally, we show that within this framework the estimation of cross-lingual relevance models may be performed by exploiting only a general non-parallel corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ballesteros, L., Croft, W.B.: Phrasal translation and query expansion techniques for cross-language information retrieval. In: Proceedings of ACM SIGIR, pp. 84–91 (1997)
Google Scholar
Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: Proceedings of ACM SIGIR, pp. 222–229 (1999)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research (3), 993–1022 (2003)
Google Scholar
Boyd-Graber, J., Blei, D.M.: Multilingual topic models for unaligned text. In: Proceedings of UAI, pp. 75–82 (2009)
Google Scholar
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
De Smet, W., Moens, M.-F.: Cross-language linking of news stories on the Web using interlingual topic modeling. In: Proceedings of the CIKM Workshop on Social Web Search and Mining (SWSM), pp. 57–64 (2009)
Google Scholar
Hiemstra, D., de Jong, F.: Disambiguation Strategies for Cross-Language Information Retrieval. In: Abiteboul, S., Vercoustre, A.-M. (eds.) ECDL 1999. LNCS, vol. 1696, pp. 274–293. Springer, Heidelberg (1999)
Chapter Google Scholar
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of ACM SIGIR, pp. 50–57 (1999)
Google Scholar
Jagarlamudi, J., Daumé III, H.: Extracting Multilingual Topics from Unaligned Comparable Corpora. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 444–456. Springer, Heidelberg (2010)
Chapter Google Scholar
Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: Proceedings of the MT Summit, pp. 79–86 (2005)
Google Scholar
Lavrenko, V., Choquette, M., Croft, W.B.: Cross-lingual relevance models. In: Proceedings of ACM SIGIR, pp. 175–182 (2002)
Google Scholar
Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of ACM SIGIR, pp. 120–127 (2001)
Google Scholar
Lavrenko, V., Allan, J.: Real-time query expansion in relevance models. CIIR Technical Report IR-473 (2006)
Google Scholar
Mimno, D., Wallach, H.M., Naradowsky, J., Smith, D.A., McCallum, A.: Polylingual topic models. In: Proceedings of EMNLP, pp. 880–889 (2009)
Google Scholar
Ni, X., Sun, J.T., Hu, J., Chen, Z.: Cross lingual text classification by mining multilingual topics from Wikipedia. In: Proceedings of WSDM, pp. 375–384 (2011)
Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Article MATH Google Scholar
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of ACM SIGIR, pp. 275–281 (1998)
Google Scholar
Steyvers, M., Griffiths, T.: Probabilistic topic models. Handbook of Latent Semantic Analysis 427(7), 424–440 (2007)
Google Scholar
Vulić, I., De Smet, W., Moens, M.-F.: Identifying word translations from comparable corpora using latent topic models. In: Proceedings of ACL, pp. 479–484 (2011)
Google Scholar
Vulić, I., De Smet, W., Moens, M.-F.: Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora. Information Retrieval (2012)
Google Scholar
Wang, J., Oard, D.W.: Combining bidirectional translation and synonymy for cross-language information retrieval. In: Proceedings of ACM SIGIR, pp. 202–209 (2006)
Google Scholar
Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Proceedings of ACM SIGIR, pp. 178–185 (2006)
Google Scholar
Xu, J., Weischedel, R., Nguyen, C.: Evaluating a probabilistic model for cross-lingual information retrieval. In: Proceedings of ACM SIGIR, pp. 105–110 (2001)
Google Scholar
Yi, X., Allan, J.: A Comparative Study of Utilizing Topic Models for Information Retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 29–41. Springer, Heidelberg (2009)
Chapter Google Scholar
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems 22, 179–214 (2004)
Article Google Scholar
Zhang, D., Mei, Q., Zhai, C.: Cross-lingual latent topic extraction. In: Proceedings of ACL, pp. 1128–1137 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, KU Leuven, Belgium
Ivan Vulić & Marie-Francine Moens

Authors

Ivan Vulić
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Francine Moens
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Yandex, Leo Tolstoy, 16, 119021, Moscow, Russia
Pavel Serdyukov & Ilya Segalovich &
Kontur Labs and Ural Federal University, Fonvizina 3-27, 620078, Yekaterinburg, Russia
Pavel Braslavski
National Research University Higher School of Economics (HSE), Pokrovskii bd 11, 109028, Moscow, Russia
Sergei O. Kuznetsov
University of Amsterdam, Turfdraagsterpad 9, 1012 XT, Amsterdam, The Netherlands
Jaap Kamps
Knowledge Media Institute, The Open University, Walton Hall, MK7 6AA, Milton Keynes, UK
Stefan Rüger
Mathematics & Computer Science Department, Emory University, 400 dowman Drive, 30329, Atlanta, GA, USA
Eugene Agichtein
Department of Computer Science, University College London, Gower Street, WC1E 6BT, London, UK
Emine Yilmaz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vulić, I., Moens, MF. (2013). A Unified Framework for Monolingual and Cross-Lingual Relevance Modeling Based on Probabilistic Topic Models. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-36973-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36972-8
Online ISBN: 978-3-642-36973-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics