Skip to main content

The Racing Algorithm: Model Selection for Lazy Learners

  • Chapter
Lazy Learning

Abstract

Given a set of models and some training data, we would like to find the model that best describes the data. Finding the model with the lowest generalization error is a computationally expensive process, especially if the number of testing points is high or if the number of models is large. Optimization techniques such as hill climbing or genetic algorithms are helpful but can end up with a model that is arbitrarily worse than the best one or cannot be used because there is no distance metric on the space of discrete models. In this paper we develop a technique called “racing” that tests the set of models in parallel, quickly discards those models that are clearly inferior and concentrates the computational effort on differentiating among the better models. Racing is especially suitable for selecting among lazy learners since training requires negligible expense, and incremental testing using leave-one-out cross validation is efficient. We use racing to select among various lazy learning algorithms and to find relevant features in applications ranging from robot juggling to lesion detection in MRI scans.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aha, D. W. (1990). A Study of Instance-Based Algorithms for Supervised Learning Tasks: Mathematical, Empirical and Psychological Evaluations. PhD. Thesis; Technical Report No. 90–42, University of California, Irvine.

    Google Scholar 

  • Atkeson. C. G., Moore, A. W. and Schaal, S. A. (1997). Locally Weighted Learning. AI Review, this issue.

    Google Scholar 

  • Atkeson, C. G. (1990). Memory-Based Approaches to Approximating Continuous Functions. In 1990 Workshop on Nonlinear Modeling and Forecasting. Adison-Wesley.

    Google Scholar 

  • Bottou, L. and Vapnik, V. (1992). Local Learning Algorithms. Neural Computation 4: 888–900.

    Article  Google Scholar 

  • Box, G. E. P., Hunter, W. G. and and Hunter, J. S. (1978). Statistics for Experimenters. Wiley. Caruana, R. A. and and Freitag, D. (1994). Greedy Attribute Selection. In Machine Learning: Proceedings of the Eleventh International Conference,pp. 28–36. Morgan Kaufmann. Cleveland, W. S., Devlin, S. J. and Grosse, E. (1988). Regression by local fitting: Methods

    Google Scholar 

  • properties, and computational algorithms. Journal of Econometrics 37: 87–114.

    Google Scholar 

  • Conte, S. D. and De Boor, C. (1980). Elementary Numerical Analysis. McGraw Hill.

    Google Scholar 

  • Dasarathy, B. V. (1991). Nearest Neighbor Norms: NN Patern Classifaction Techniques. IEEE Computer Society Press.

    Google Scholar 

  • Efron, B. and Tibshirani, R. (1991). Statistical Data Analysis in the Computer Age. Science 253: 390–395.

    Article  Google Scholar 

  • Fix, E. and Hodges, J. L. (1951). Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties. Project 21–49–004, Report Number 4, USAF School of Aviation Medicine.

    Google Scholar 

  • Goldberg, D. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Gratch, J., Chien, S. and DeJong, G. (1993). Learning Search Control Knowledge for Deep Space Network Scheduling. In Proceedings of the 10th International Conference on Machine Learning,pp. 135–142. Morgan Kaufmann.

    Google Scholar 

  • Gratch, J. (1994). An effective method for correlated selection problems. Department of Computer Science Technical Report Num. 1893, University of Illinois at Urbana-Champaign.

    Google Scholar 

  • Greiner, R. and Jurisica, I. (1992). A statistical approach to solving the EBL utility problem. In Proceedings of the Tenth International conference on Artificial Intelligence,pp. 241–248. MTT Press.

    Google Scholar 

  • Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models. Chapman and Hall. Haussler, D. (1992). Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and Computation 100: 78–150.

    Google Scholar 

  • Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58: 13–30.

    Article  MathSciNet  MATH  Google Scholar 

  • John, G. H., Kohavi, R. and Pfleger, K. (1994). Irrelevant features and the Subset Selection Problem. In Machine Learning: Proceedings of the Eleventh International Conference,pp. 121–129. Morgan Kaufmann.

    Google Scholar 

  • Kaelbling, L. P. (1990). Learning in Embedded Systems. PhD. Thesis; Technical Report No. TR-90–04, Stanford University, Department of Computer Science.

    Google Scholar 

  • Kreider, J. F. and Haberl, J. S. (1994). Predicting hourly building energy usage: The great energy predictor shootout - Overview and discussion of results. Transactions of the American Society of Heating, Refrigerating and Air-Conditioning Engineers, 100, Part 2.

    Google Scholar 

  • Lowe, D. G. (1995). Similarity metric learning for a variable-kernel classifier. Neural Computation 7: 72–85.

    Article  Google Scholar 

  • Maron, O. and Moore, A. W. (1994). Hoeffding Races: Accelerating model selection search for classification and function approximation. In Cowan, J. D., Tesauro, G. and Alspector, J. (eds.), Advances in Neural Information Processing Systems 6. Morgan Kaufmann.

    Google Scholar 

  • Maron, O. (1994). Hoeffding Races: Model Selection for MRI Classification. Masters Thesis, Dept. of Electrical Engeineering and Computer Science, M.I.T.

    Google Scholar 

  • Miller, A. J. (1990). Subset Selection in Regression. Chapman and Hall.

    Google Scholar 

  • Moore, A. W. and Lee, M. S. (1994). Efficient Algorithms for Minimizing Cross Validation Error. In Machine Learning: Proceedings of the Eleventh International Conference,pp. 190–198. Morgan Kaufmann.

    Google Scholar 

  • Moore, A. W., Hill, D. J. and Johnson, M. P. (1992). An empirical investigation of brute force to choose features, smoothers and function approximators. In Hanson, S., Judd, S. and Petsche, T. (eds.), Computational Learning Theory and Natural Learning Systems, Volume 3. MIT Press.

    Google Scholar 

  • Moore, A. W. (1992). Fast, robust adaptive control by learning only forward models. In Moody, J. E., Hanson, S. J. and Lippman, R. P. (eds.), Advances in Neural Information Processing Systems 4. Morgan Kaufmann.

    Google Scholar 

  • Murphy, P. M. (1996). UCI repository of machine learning databases. For more information contact ml-repository@ics.uci.edu.

    Google Scholar 

  • Omohundro, S. (1993). Private communication.

    Google Scholar 

  • Press, W. H., Teukolsky, S. A., Vetterling, W T. and Flannery, B. P. (1992). Numerical Recipes in C: the art of scientific computing. New York: Cambridge University Press, second edition.

    Google Scholar 

  • Rivest, R. L. and Yin, Y. (1993). Simulation Results for a new two-armed bandit heuristic. Technical report, Laboratory for Computer Science, M.I.T.

    Google Scholar 

  • Schaal, S. and Atkeson, C. G. (1993). Open loop stable control strategies for robot juggling. In Proceedings of IEEE conference on Robotics and Automation.

    Google Scholar 

  • Schmitt, S. A. (1969). Measuring Uncertainty: An elementary introduction to Bayesian Statistics. Addison-Wesley.

    Google Scholar 

  • Skalak, D. B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. In Machine Learning: Proceedings of the Eleventh International Conference,pp. 293–301. Morgan Kaufmann.

    Google Scholar 

  • Weiss, S. M. and Kulikowski, C. A. (1991). Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning, and expert systems. San Mateo, CA: Morgan-Kaufmann.

    Google Scholar 

  • Welch, B. L. (1937). The significance of the difference between two means when the population variances are unequal. Biometrika 29.

    Google Scholar 

  • Zhang, X, Mesirov, J. R and Waltz, D. L. (1992). Hybrid system for protein secondary structure prediction. Journal of Molecular Biology 225: 1049–1063.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Maron, O., Moore, A.W. (1997). The Racing Algorithm: Model Selection for Lazy Learners. In: Aha, D.W. (eds) Lazy Learning. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2053-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2053-3_8

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-4860-8

  • Online ISBN: 978-94-017-2053-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics