Combining Families of Information Retrieval Algorithms Using Metalearning

  • Michael Cornelson
  • Ed Greengrass
  • Robert L. Grossman
  • Ron Karidi
  • Daniel Shnidman

Abstract

This chapter describes some experiments that use metalearning to combine families of information retrieval (IR) algorithms obtained by varying the normalizations and similarity functions. By metalearning, we mean the following simple idea: a family of IR algorithms is applied to a corpus of documents in which relevance is known to produce a learning set. A machine learning algorithm is then applied to this data set to produce a classifier that combines the different IR algorithms. In experiments with TREC-3 data, we could significantly improve precision at the same level of recall with this technique. Most prior work in this area has focused on combining different IR algorithms with various averaging schemes or has used a fixed combining function. The combining function in metalearning is a statistical model itself which in general depends on the document, the query, and the various scores produced by the different component IR algorithms.

Keywords

Feature Vector Information Retrieval Similarity Metrics Information Retrieval System Distinct Term 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Die97]
    T.G. Dietterich.Machine-learning research: Four current directions.AI Magazine, 18 (4): 97–136, 1997.Google Scholar
  2. [FS94]
    E.A. Fox and J.A. Shaw.Combination of multiple sources.In Proceedings of the Second Text Retrieval Conference (TREC-2), pages 97–136, 1994.Google Scholar
  3. [GBNP96]
    R.L. Grossman, H. Bodek, D. Northcutt, and H.V. Poor.Data mining and tree-based optimization.In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, E. Simoudis, J. Han and U. Fayyad, eds., AAAI Press, Menlo Park, CA, pages 323–326, 1996.Google Scholar
  4. [GL02]
    R.L. Grossman and R.G. Larson.A state space realization theorem for data mining. In subm., 2002.Google Scholar
  5. [Gre01]
    E. Greengrass.Information retrieval: A survey.United States Department of Defense Technical Report TR–R52–008–001, 2001.Google Scholar
  6. [Har95]
    D.K. Harman, editor.Proceedings of the Third Text Retrieval Conference (TREC-3). National Institute of Standards and Technology Special Publication 500–226, 1995.Google Scholar
  7. [HPS96]
    D.A. Hull, J.O. Pedersen, and H. Schütze.Method combination for document filtering.In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1996.Google Scholar
  8. [Lee95]
    J.H. Lee.Combining multiple evidence from different properties of weighting schemes.In Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1995.Google Scholar
  9. [Lee97]
    J.H. Lee.Analyses of multiple evidence combination.In Proceedings of the Twentieth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1997.Google Scholar
  10. [GL02]
    J. Mayfield.Personal communication, 2000.Google Scholar
  11. [MMP00]
    J. Mayfield, P. McNamee, and C. Piatko.The JHU/APL HAIRCUT System at TREC-8.National Institute of Standards and Technology Special Publication, 2000.Google Scholar
  12. [PAT99]
    PATTERN. The pattern system version 2.6, Magnify, Inc., 1999.Google Scholar
  13. [PCS00]
    A.L. Prodromidis, P.K. Chan, and S.J. Stolfo.Meta-learning in distributed data mining systems, issues and approaches.In Advances in Distributed Data Mining, Hillol Kargupta and Philip Chan, eds., MIT Press, Cambridge, MA, pages 81–113, 2000.Google Scholar
  14. [Sa189]
    G. Salton.Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer.Addison-Wesley, Reading, MA, 1989.Google Scholar
  15. [VC98]
    C.C. Vogt and G.W. Cottrell.Predicting the performance of linearly combined IR systems.In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,pages 190–196, 998.Google Scholar
  16. [vR79]
    C. J. van Rijsbergen.Information Retrieval, second edition. Butterworths, London, 1979.Google Scholar

Copyright information

© Springer Science+Business Media New York 2004

Authors and Affiliations

  • Michael Cornelson
  • Ed Greengrass
  • Robert L. Grossman
  • Ron Karidi
  • Daniel Shnidman

There are no affiliations available

Personalised recommendations