An Empirical Study of Query Expansion and Cluster-Based Retrieval in Language Modeling Approach

Na, Seung-Hoon; Kang, In-Su; Roh, Ji-Eun; Lee, Jong-Hyeok

doi:10.1007/11562382_21

Seung-Hoon Na²⁰,
In-Su Kang²⁰,
Ji-Eun Roh²⁰ &
…
Jong-Hyeok Lee²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Included in the following conference series:

Asia Information Retrieval Symposium

1007 Accesses
5 Citations

Abstract

In information retrieval, the word mismatch problem is a critical issue. To resolve the problem, several techniques have been developed, such as query expansion, cluster-based retrieval, and dimensionality reduction. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance. By performing experimentation on seven test collections of NTCIR and TREC, we conclude that 1) query expansion using parsimony is well performed, 2) cluster-based retrieval by agglomerative clustering is better than that by partitioning clustering, and 3) query expansion is generally more effective than cluster-based retrieval in resolving the word-mismatch problem, and finally 4) their combinations are effective when each method significantly improves baseline performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 222–229 (1999)
Google Scholar
Gao, J., Nie, J., Wu, G., Cao, G.: Dependence Language Model for Information Retrieval. In: Proceedings of 27nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 170–177 (2004)
Google Scholar
Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious Language Models for Information Retrieval. In: Proceedings of 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178–185 (2004)
Google Scholar
Hiemstra, D.: Term Specific Smoothing for Language Modeling Approach to Information Retrieval: The Importance of a Query Term. In: Proceedings of 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 35–41 (2002)
Google Scholar
Hiemstra, D.: Using Language Models for Information Retrieval. In PhD Thesis, University of Twente (2001)
Google Scholar
Hoffman, T.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning 42(1-2), 177–196 (2001)
Article Google Scholar
Kamvar, S., Klein, D., Manning, C.: Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-based Approach. In: Proceedings of 19th International Conference on Machine Learning, pp. 283–290 (2002)
Google Scholar
Kurland, O., Lee, L.: Corpus Structure, Language Models, and Ad hoc Information Retrieval. In: Proceedings of 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 194–201 (2004)
Google Scholar
Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 111–119 (2001)
Google Scholar
Lavrenko, V., Choquette, M., Croft, W.: Cross-Lingual Relevance Model. In: Proceedings of 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 175–182 (2002)
Google Scholar
Lavrenko, V., Croft, B.: Relevance-based Language Models. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127 (2001)
Google Scholar
Liu, X.: Cluster-Based Retrieval Using Language Models. In: Proceedings of 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 186–193 (2004)
Google Scholar
Lee, J., Cho, H., Park, H.: n-Gram-based Indexing for Korean Text Retrieval. Information Processing & Management 35(4), 427–441 (1999)
Article MathSciNet Google Scholar
Miller, D., Leek, T., Schwartz, R.: A Hidden Markov Model Information Retrieval System. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 214–221 (1999)
Google Scholar
Na, S., Kang, I., Kang, S., Lee, J.: Estimation of Query Model from Parsimonious Translation Model. In: Myaeng, S.-H., Zhou, M., Wong, K.-F., Zhang, H.-J. (eds.) AIRS 2004. LNCS, vol. 3411, pp. 239–250. Springer, Heidelberg (2005)
Chapter Google Scholar
Ponte, A., Croft, J.: A Language Modeling Approach to Information Retrieval. In: Proceedings of 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281 (1998)
Google Scholar
Ponte, A.: A Language Modeling Approach to Information Retrieval. PhD thesis, University of Massachusetts (1998)
Google Scholar
Robertson, S., Sparck Jones, K.: Relevance Weighting of Search Terms. Journal of the American Society for Information Science 27(3), 143–160 (1979)
Google Scholar
Song, F., Croft, W.: A General Language Model for Information Retrieval. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 279–280 (1999)
Google Scholar
Zhai, C., Lafferty, J.: Model-based Feedback in the Language Modeling Approach to Information Retrieval. In: Proceedings of the 10th International Conference on Information and Knowledge Management, pp. 430–410 (2002)
Google Scholar
Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 334–342 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Electrical and Computer Engineering, POSTECH, AITrc, Republic of Korea
Seung-Hoon Na, In-Su Kang, Ji-Eun Roh & Jong-Hyeok Lee

Authors

Seung-Hoon Na
View author publications
You can also search for this author in PubMed Google Scholar
In-Su Kang
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Eun Roh
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Hyeok Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, 790-784, Pohang, Korea
Gary Geunbae Lee
Computer and Communication Media Research, NEC Corp., Miyazaki 4-1-1, Miyamae-ku, 216-8555, Kawasaki, Japan
Akio Yamada
Human-Computer Communications Laboratory, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong
Helen Meng
School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, 305-732, Daejeon, Korea
Sung Hyon Myaeng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Na, SH., Kang, IS., Roh, JE., Lee, JH. (2005). An Empirical Study of Query Expansion and Cluster-Based Retrieval in Language Modeling Approach. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_21

Download citation

DOI: https://doi.org/10.1007/11562382_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics