Abstract
Rocchio’s similarity-based Relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive supervised learning algorithm from examples. In spite of its popularity in various applications there is little rigorous analysis of its learning complexity in literature. In this paper we show that in the Boolean vector space model, if the initial query vector is 0, then for any of the four typical similarities (inner product, dice coefficient, cosine coeffcient, and Jaccard coeffcient), Rocchio’s similarity- based relevance feedback algorithm makes at least n mistakes when used to search for a collection of documents represented by a monotone disjunction of at most k relevant features (or terms) over the n-dimensional Boolean vector space{0,1}n. When an arbitrary initial query vector in {0,1}n is used, it makes at least (n+k −3)/2 mistakes to search for the same collection of documents. The linear lower bounds are independent of the choices of the threshold and coefficients that the algorithm may use in updating its query vector and making its classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Baeza-Yates and B. Riberiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999.
Z. Chen and X. Meng. Yarrow: A real-time client site meta search learner. In Proceedings of the AAAI 2000 Workshop on Artificial Intelligence for Web Search, pages 12–17, Austin, July 2000.
Z. Chen, X. Meng, R.H. Fowler, and B. Zhu. Features: Real-time adaptive feature learning and document learning. Technical Report CS-00-23, Dept. od Computer Science, University of Texas-Pan American, May 26, 2000, 2000.
Z. Chen, X. Meng, B. Zhu, and R. Fowler. Websail: From on-line learning to web search. In Proceedings of the 2000 International Conference on Web Information Systems Engineering, pages 192–199, Hong Kong, June 2000.
E. Ide. Interactive search strategies and dynamic _le organization in information retrieval. In G. Salton, editor, The Smart System — Experiments in Automatic Document Processing, pages 373–393, Englewood Cliffs, NJ, 1971. Prentice-Hall Inc.
E. Ide. New experiments in relevance feedback. In G. Salton, editor, The Smart System — Experiments in Automatic Document Processing, pages 337–354, Englewood Cliffs, NJ, 1971. Prentice-Hall Inc.
Jr. J.J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The Smart Retrieval System — Experiments in Automatic Document Processing, pages 313–323, Englewood Cliffs, NJ, 1971. Prentice-Hall, Inc.
J. Kivinen, M.K. Warmuth, and P. Auer. The perceptron algorithm vs. winnow: linear vs. logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, pages 325–343, 1997.
D. Lewis. Learning in intelligent information retrieval. In Proceedings of the Eighth International Workshop on Machine Learning, pages 235–239, 1991.
N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988.
V.V. Raghavan and S.K.M. Wong. A critical analysis of the vector space model for information retrieval. Journal of the American Society for Information Science, 37(5):279–287, 1986.
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386–407, 1958.
G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.
G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):288–297, 1990.
G. Salton, A. Wong, and C.S. Yang. A vector space model for automatic indexing. Comm. of ACM, 18(11):613–620, 1975.
S. Sclaroff, L. Taycher, and M. La Cascia. Imagerover: A content-based image browser for the world wide web. In Proceedings of the IEEE Worshop on Contentbased Access of Image and Video Libraries, 1997.
L. Taycher, M. La Cascia, and S. Sclaroff. Image digestion and relevance feedback in the imagerover www search engines. In Proceedings of the International Conference on Visual Information, pages 85–92, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg 2000
About this paper
Cite this paper
Chen, Z., Zhu, B. (2000). Some Formal Analysis of Rocchio’s Similarity-Based Relevance Feedback Algorithm. In: Goos, G., Hartmanis, J., van Leeuwen, J., Lee, D.T., Teng, SH. (eds) Algorithms and Computation. ISAAC 2000. Lecture Notes in Computer Science, vol 1969. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-40996-3_10
Download citation
DOI: https://doi.org/10.1007/3-540-40996-3_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41255-7
Online ISBN: 978-3-540-40996-0
eBook Packages: Springer Book Archive