Some Formal Analysis of Rocchio’s Similarity-Based Relevance Feedback Algorithm

Chen, Zhixiang; Zhu, Binhai

doi:10.1007/3-540-40996-3_10

Zhixiang Chen⁶ &
Binhai Zhu⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1969))

Included in the following conference series:

International Symposium on Algorithms and Computation

713 Accesses
10 Citations

Abstract

Rocchio’s similarity-based Relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive supervised learning algorithm from examples. In spite of its popularity in various applications there is little rigorous analysis of its learning complexity in literature. In this paper we show that in the Boolean vector space model, if the initial query vector is 0, then for any of the four typical similarities (inner product, dice coefficient, cosine coeffcient, and Jaccard coeffcient), Rocchio’s similarity- based relevance feedback algorithm makes at least n mistakes when used to search for a collection of documents represented by a monotone disjunction of at most k relevant features (or terms) over the n-dimensional Boolean vector space{0,1}ⁿ. When an arbitrary initial query vector in {0,1}ⁿ is used, it makes at least (n+k −3)/2 mistakes to search for the same collection of documents. The linear lower bounds are independent of the choices of the threshold and coefficients that the algorithm may use in updating its query vector and making its classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Baeza-Yates and B. Riberiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999.
Google Scholar
Z. Chen and X. Meng. Yarrow: A real-time client site meta search learner. In Proceedings of the AAAI 2000 Workshop on Artificial Intelligence for Web Search, pages 12–17, Austin, July 2000.
Google Scholar
Z. Chen, X. Meng, R.H. Fowler, and B. Zhu. Features: Real-time adaptive feature learning and document learning. Technical Report CS-00-23, Dept. od Computer Science, University of Texas-Pan American, May 26, 2000, 2000.
Google Scholar
Z. Chen, X. Meng, B. Zhu, and R. Fowler. Websail: From on-line learning to web search. In Proceedings of the 2000 International Conference on Web Information Systems Engineering, pages 192–199, Hong Kong, June 2000.
Google Scholar
E. Ide. Interactive search strategies and dynamic _le organization in information retrieval. In G. Salton, editor, The Smart System — Experiments in Automatic Document Processing, pages 373–393, Englewood Cliffs, NJ, 1971. Prentice-Hall Inc.
Google Scholar
E. Ide. New experiments in relevance feedback. In G. Salton, editor, The Smart System — Experiments in Automatic Document Processing, pages 337–354, Englewood Cliffs, NJ, 1971. Prentice-Hall Inc.
Google Scholar
Jr. J.J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The Smart Retrieval System — Experiments in Automatic Document Processing, pages 313–323, Englewood Cliffs, NJ, 1971. Prentice-Hall, Inc.
Google Scholar
J. Kivinen, M.K. Warmuth, and P. Auer. The perceptron algorithm vs. winnow: linear vs. logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, pages 325–343, 1997.
Google Scholar
D. Lewis. Learning in intelligent information retrieval. In Proceedings of the Eighth International Workshop on Machine Learning, pages 235–239, 1991.
Google Scholar
N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988.
Google Scholar
V.V. Raghavan and S.K.M. Wong. A critical analysis of the vector space model for information retrieval. Journal of the American Society for Information Science, 37(5):279–287, 1986.
Google Scholar
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386–407, 1958.
Article MathSciNet Google Scholar
G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.
Google Scholar
G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):288–297, 1990.
Article Google Scholar
G. Salton, A. Wong, and C.S. Yang. A vector space model for automatic indexing. Comm. of ACM, 18(11):613–620, 1975.
Article MATH Google Scholar
S. Sclaroff, L. Taycher, and M. La Cascia. Imagerover: A content-based image browser for the world wide web. In Proceedings of the IEEE Worshop on Contentbased Access of Image and Video Libraries, 1997.
Google Scholar
L. Taycher, M. La Cascia, and S. Sclaroff. Image digestion and relevance feedback in the imagerover www search engines. In Proceedings of the International Conference on Visual Information, pages 85–92, 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Texas-Pan American, 78539, Edinburg, TX, USA
Zhixiang Chen
Department of Computer Science, Montana State University, 59717, Bozeman, MT, USA
Binhai Zhu

Authors

Zhixiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Binhai Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Karlsruhe University, Germany
Gerhard Goos
Cornell University, NY, USA
Juris Hartmanis
Utrecht University, The Netherlands
Jan van Leeuwen
Academia Sinica, Institute of Information Science, 128 Academia Road, Section 2, Nankang, 115, Taipei, Taiwan, R.O.C.
D. T. Lee
Department of Computer Science and Akamai Technologies, University of Illinois at Urbana Champaign, 500 Technology Square, 02139, Cambridge, MA, USA
Shang-Hua Teng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Z., Zhu, B. (2000). Some Formal Analysis of Rocchio’s Similarity-Based Relevance Feedback Algorithm. In: Goos, G., Hartmanis, J., van Leeuwen, J., Lee, D.T., Teng, SH. (eds) Algorithms and Computation. ISAAC 2000. Lecture Notes in Computer Science, vol 1969. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-40996-3_10

Download citation

DOI: https://doi.org/10.1007/3-540-40996-3_10
Published: 29 January 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41255-7
Online ISBN: 978-3-540-40996-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics