Abstract
In information retrieval, the word mismatch problem is a critical issue. To resolve the problem, several techniques have been developed, such as query expansion, cluster-based retrieval, and dimensionality reduction. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance. By performing experimentation on seven test collections of NTCIR and TREC, we conclude that 1) query expansion using parsimony is well performed, 2) cluster-based retrieval by agglomerative clustering is better than that by partitioning clustering, and 3) query expansion is generally more effective than cluster-based retrieval in resolving the word-mismatch problem, and finally 4) their combinations are effective when each method significantly improves baseline performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 222–229 (1999)
Gao, J., Nie, J., Wu, G., Cao, G.: Dependence Language Model for Information Retrieval. In: Proceedings of 27nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 170–177 (2004)
Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious Language Models for Information Retrieval. In: Proceedings of 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178–185 (2004)
Hiemstra, D.: Term Specific Smoothing for Language Modeling Approach to Information Retrieval: The Importance of a Query Term. In: Proceedings of 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 35–41 (2002)
Hiemstra, D.: Using Language Models for Information Retrieval. In PhD Thesis, University of Twente (2001)
Hoffman, T.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning 42(1-2), 177–196 (2001)
Kamvar, S., Klein, D., Manning, C.: Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-based Approach. In: Proceedings of 19th International Conference on Machine Learning, pp. 283–290 (2002)
Kurland, O., Lee, L.: Corpus Structure, Language Models, and Ad hoc Information Retrieval. In: Proceedings of 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 194–201 (2004)
Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 111–119 (2001)
Lavrenko, V., Choquette, M., Croft, W.: Cross-Lingual Relevance Model. In: Proceedings of 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 175–182 (2002)
Lavrenko, V., Croft, B.: Relevance-based Language Models. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127 (2001)
Liu, X.: Cluster-Based Retrieval Using Language Models. In: Proceedings of 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 186–193 (2004)
Lee, J., Cho, H., Park, H.: n-Gram-based Indexing for Korean Text Retrieval. Information Processing & Management 35(4), 427–441 (1999)
Miller, D., Leek, T., Schwartz, R.: A Hidden Markov Model Information Retrieval System. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 214–221 (1999)
Na, S., Kang, I., Kang, S., Lee, J.: Estimation of Query Model from Parsimonious Translation Model. In: Myaeng, S.-H., Zhou, M., Wong, K.-F., Zhang, H.-J. (eds.) AIRS 2004. LNCS, vol. 3411, pp. 239–250. Springer, Heidelberg (2005)
Ponte, A., Croft, J.: A Language Modeling Approach to Information Retrieval. In: Proceedings of 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281 (1998)
Ponte, A.: A Language Modeling Approach to Information Retrieval. PhD thesis, University of Massachusetts (1998)
Robertson, S., Sparck Jones, K.: Relevance Weighting of Search Terms. Journal of the American Society for Information Science 27(3), 143–160 (1979)
Song, F., Croft, W.: A General Language Model for Information Retrieval. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 279–280 (1999)
Zhai, C., Lafferty, J.: Model-based Feedback in the Language Modeling Approach to Information Retrieval. In: Proceedings of the 10th International Conference on Information and Knowledge Management, pp. 430–410 (2002)
Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 334–342 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Na, SH., Kang, IS., Roh, JE., Lee, JH. (2005). An Empirical Study of Query Expansion and Cluster-Based Retrieval in Language Modeling Approach. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_21
Download citation
DOI: https://doi.org/10.1007/11562382_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)