Skip to main content

Some Formal Analysis of Rocchio’s Similarity-Based Relevance Feedback Algorithm

  • Conference paper
  • First Online:
Book cover Algorithms and Computation (ISAAC 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1969))

Included in the following conference series:

Abstract

Rocchio’s similarity-based Relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive supervised learning algorithm from examples. In spite of its popularity in various applications there is little rigorous analysis of its learning complexity in literature. In this paper we show that in the Boolean vector space model, if the initial query vector is 0, then for any of the four typical similarities (inner product, dice coefficient, cosine coeffcient, and Jaccard coeffcient), Rocchio’s similarity- based relevance feedback algorithm makes at least n mistakes when used to search for a collection of documents represented by a monotone disjunction of at most k relevant features (or terms) over the n-dimensional Boolean vector space{0,1}n. When an arbitrary initial query vector in {0,1}n is used, it makes at least (n+k −3)/2 mistakes to search for the same collection of documents. The linear lower bounds are independent of the choices of the threshold and coefficients that the algorithm may use in updating its query vector and making its classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Baeza-Yates and B. Riberiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999.

    Google Scholar 

  2. Z. Chen and X. Meng. Yarrow: A real-time client site meta search learner. In Proceedings of the AAAI 2000 Workshop on Artificial Intelligence for Web Search, pages 12–17, Austin, July 2000.

    Google Scholar 

  3. Z. Chen, X. Meng, R.H. Fowler, and B. Zhu. Features: Real-time adaptive feature learning and document learning. Technical Report CS-00-23, Dept. od Computer Science, University of Texas-Pan American, May 26, 2000, 2000.

    Google Scholar 

  4. Z. Chen, X. Meng, B. Zhu, and R. Fowler. Websail: From on-line learning to web search. In Proceedings of the 2000 International Conference on Web Information Systems Engineering, pages 192–199, Hong Kong, June 2000.

    Google Scholar 

  5. E. Ide. Interactive search strategies and dynamic _le organization in information retrieval. In G. Salton, editor, The Smart System — Experiments in Automatic Document Processing, pages 373–393, Englewood Cliffs, NJ, 1971. Prentice-Hall Inc.

    Google Scholar 

  6. E. Ide. New experiments in relevance feedback. In G. Salton, editor, The Smart System — Experiments in Automatic Document Processing, pages 337–354, Englewood Cliffs, NJ, 1971. Prentice-Hall Inc.

    Google Scholar 

  7. Jr. J.J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The Smart Retrieval System — Experiments in Automatic Document Processing, pages 313–323, Englewood Cliffs, NJ, 1971. Prentice-Hall, Inc.

    Google Scholar 

  8. J. Kivinen, M.K. Warmuth, and P. Auer. The perceptron algorithm vs. winnow: linear vs. logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, pages 325–343, 1997.

    Google Scholar 

  9. D. Lewis. Learning in intelligent information retrieval. In Proceedings of the Eighth International Workshop on Machine Learning, pages 235–239, 1991.

    Google Scholar 

  10. N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988.

    Google Scholar 

  11. V.V. Raghavan and S.K.M. Wong. A critical analysis of the vector space model for information retrieval. Journal of the American Society for Information Science, 37(5):279–287, 1986.

    Google Scholar 

  12. F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386–407, 1958.

    Article  MathSciNet  Google Scholar 

  13. G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.

    Google Scholar 

  14. G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):288–297, 1990.

    Article  Google Scholar 

  15. G. Salton, A. Wong, and C.S. Yang. A vector space model for automatic indexing. Comm. of ACM, 18(11):613–620, 1975.

    Article  MATH  Google Scholar 

  16. S. Sclaroff, L. Taycher, and M. La Cascia. Imagerover: A content-based image browser for the world wide web. In Proceedings of the IEEE Worshop on Contentbased Access of Image and Video Libraries, 1997.

    Google Scholar 

  17. L. Taycher, M. La Cascia, and S. Sclaroff. Image digestion and relevance feedback in the imagerover www search engines. In Proceedings of the International Conference on Visual Information, pages 85–92, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg 2000

About this paper

Cite this paper

Chen, Z., Zhu, B. (2000). Some Formal Analysis of Rocchio’s Similarity-Based Relevance Feedback Algorithm. In: Goos, G., Hartmanis, J., van Leeuwen, J., Lee, D.T., Teng, SH. (eds) Algorithms and Computation. ISAAC 2000. Lecture Notes in Computer Science, vol 1969. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-40996-3_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-40996-3_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41255-7

  • Online ISBN: 978-3-540-40996-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics