Abstract
The KL divergence framework, the extended language modeling approach, have a critical problem with estimation of query model, which is the probabilistic model that encodes user’s information need. However, at initial retrieval, it is difficult to expand query model using co-occurrence, because the two-dimensional matrix information such as term co-occurrence must be constructed in offline. Especially in large collection, constructing such large matrix of term co-occurrences prohibitively increases time and space complexity. This paper proposes an effective method to construct co-occurrence statistics by employing parsimonious translation model. Parsimonious translation model is a compact version of translation model, and it contains very small number of parameters that includes non-zero probabilities. Parsimonious translation model enables us to enormously reduce the number of remaining terms in document so that co-occurrence statistics can be calculated in tractable time. In experimentations, the results show that query model derived from parsimonious translation model significantly improves baseline language modeling performance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 222–229 (1999)
Dempster, A.: Maximum Likelihood from Incomplete Data via the EM algorithm. Journal of Royal Statistical Society 39(1), 1–39 (1977)
Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious Language Models for Information Retrieval. In: Proceedings of 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178–185 (2004)
Hiemstra, D.: Term Specific Smoothing for Language Modeling Approach to Information Retrieval: The Importance of a Query Term. In: Proceedings of 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 35–41 (2002)
Hiemstra, D.: Using Language Models for Information Retrieval. In PhD Thesis, University of Twente (2001)
Hofmann, T.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning 42(1-2) (2001)
Ide, N., Veronis, J.: Word Sense Disambiguation. Computational Linguistics 24(1) (1998)
Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 111–119 (2001)
Lavrenko, V., Choquette, M., Croft, W.: Cross-Lingual Relevance Model. In: Proceedings of 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 175–182 (2002)
Lavrenko, V., Croft, B.: Relevance-based language models. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127 (2001)
Liu, X.: Cluster-Based Retrieval Using Language Models. In: Proceedings of 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 186–193 (2004)
Lee, J., Cho, H., Park, H.: n-Gram-based indexing for Korean text retrieval. Information Processing & Management 35(4) (1999)
Miller, D., Leek, T., Schwartz, R.: A Hidden Markov Model Information Retrieval System. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 214–221 (1999)
Nallapati, R., Allen, J.: Capturing Term Dependencies using a Language Model based on Sentence Trees. In: Proceedings of the 10th international conference on Information and knowledge management, pp. 383–390 (2002)
Ponte, A., Croft, J.: A Language Modeling Approach to Information Retrieval. In: Proceedings of 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281 (1998)
Ponte, A.: A Language Modeling Approach to Information Retrieval. In PhD thesis, Dept. o Computer Science, Univercity of Massachusetts (1998)
Robertson, S., Hiemstra, D.: Language Models and Probability of Relevance. In: Proceedings of the Workshop on Language Modeling and Information Retrieval (2001)
Sperer, R., Oard, D.: Structured Translation for Cross-Language Information Retrieval. In: Proceedings of 23th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127 (2000)
Song, F., Croft, W.: A General Language Model for Information Retrieval. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 279–280 (1999)
Srikanth, M., Srihari, R.: Biterm Language Models for Document Retrieval. In: Proceedings of 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 425–426 (2002)
Zaragoza, H., Hiemstra, D.: Bayesian Extension to the Language Model for Ad Hoc Information Retrieval. In: Proceedings of 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4–9 (2003)
Zhai, C., Lafferty, J.: Model-based Feedback in the Language Modeling Approach to Information Retrieval. In: Proceedings of the 10th international conference on Information and knowledge management, pp. 430–410 (2002)
Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 334–342 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Na, SH., Kang, IS., Kang, SJ., Lee, JH. (2005). Estimation of Query Model from Parsimonious Translation Model. In: Myaeng, S.H., Zhou, M., Wong, KF., Zhang, HJ. (eds) Information Retrieval Technology. AIRS 2004. Lecture Notes in Computer Science, vol 3411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31871-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-31871-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25065-4
Online ISBN: 978-3-540-31871-2
eBook Packages: Computer ScienceComputer Science (R0)