The Racing Algorithm: Model Selection for Lazy Learners

Maron, Oded; Moore, Andrew W.

doi:10.1007/978-94-017-2053-3_8

Oded Maron² &
Andrew W. Moore³

450 Accesses
22 Citations

Abstract

Given a set of models and some training data, we would like to find the model that best describes the data. Finding the model with the lowest generalization error is a computationally expensive process, especially if the number of testing points is high or if the number of models is large. Optimization techniques such as hill climbing or genetic algorithms are helpful but can end up with a model that is arbitrarily worse than the best one or cannot be used because there is no distance metric on the space of discrete models. In this paper we develop a technique called “racing” that tests the set of models in parallel, quickly discards those models that are clearly inferior and concentrates the computational effort on differentiating among the better models. Racing is especially suitable for selecting among lazy learners since training requires negligible expense, and incremental testing using leave-one-out cross validation is efficient. We use racing to select among various lazy learning algorithms and to find relevant features in applications ranging from robot juggling to lesion detection in MRI scans.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D. W. (1990). A Study of Instance-Based Algorithms for Supervised Learning Tasks: Mathematical, Empirical and Psychological Evaluations. PhD. Thesis; Technical Report No. 90–42, University of California, Irvine.
Google Scholar
Atkeson. C. G., Moore, A. W. and Schaal, S. A. (1997). Locally Weighted Learning. AI Review, this issue.
Google Scholar
Atkeson, C. G. (1990). Memory-Based Approaches to Approximating Continuous Functions. In 1990 Workshop on Nonlinear Modeling and Forecasting. Adison-Wesley.
Google Scholar
Bottou, L. and Vapnik, V. (1992). Local Learning Algorithms. Neural Computation 4: 888–900.
Article Google Scholar
Box, G. E. P., Hunter, W. G. and and Hunter, J. S. (1978). Statistics for Experimenters. Wiley. Caruana, R. A. and and Freitag, D. (1994). Greedy Attribute Selection. In Machine Learning: Proceedings of the Eleventh International Conference,pp. 28–36. Morgan Kaufmann. Cleveland, W. S., Devlin, S. J. and Grosse, E. (1988). Regression by local fitting: Methods
Google Scholar
properties, and computational algorithms. Journal of Econometrics 37: 87–114.
Google Scholar
Conte, S. D. and De Boor, C. (1980). Elementary Numerical Analysis. McGraw Hill.
Google Scholar
Dasarathy, B. V. (1991). Nearest Neighbor Norms: NN Patern Classifaction Techniques. IEEE Computer Society Press.
Google Scholar
Efron, B. and Tibshirani, R. (1991). Statistical Data Analysis in the Computer Age. Science 253: 390–395.
Article Google Scholar
Fix, E. and Hodges, J. L. (1951). Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties. Project 21–49–004, Report Number 4, USAF School of Aviation Medicine.
Google Scholar
Goldberg, D. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley.
Google Scholar
Gratch, J., Chien, S. and DeJong, G. (1993). Learning Search Control Knowledge for Deep Space Network Scheduling. In Proceedings of the 10th International Conference on Machine Learning,pp. 135–142. Morgan Kaufmann.
Google Scholar
Gratch, J. (1994). An effective method for correlated selection problems. Department of Computer Science Technical Report Num. 1893, University of Illinois at Urbana-Champaign.
Google Scholar
Greiner, R. and Jurisica, I. (1992). A statistical approach to solving the EBL utility problem. In Proceedings of the Tenth International conference on Artificial Intelligence,pp. 241–248. MTT Press.
Google Scholar
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models. Chapman and Hall. Haussler, D. (1992). Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and Computation 100: 78–150.
Google Scholar
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58: 13–30.
Article MathSciNet MATH Google Scholar
John, G. H., Kohavi, R. and Pfleger, K. (1994). Irrelevant features and the Subset Selection Problem. In Machine Learning: Proceedings of the Eleventh International Conference,pp. 121–129. Morgan Kaufmann.
Google Scholar
Kaelbling, L. P. (1990). Learning in Embedded Systems. PhD. Thesis; Technical Report No. TR-90–04, Stanford University, Department of Computer Science.
Google Scholar
Kreider, J. F. and Haberl, J. S. (1994). Predicting hourly building energy usage: The great energy predictor shootout - Overview and discussion of results. Transactions of the American Society of Heating, Refrigerating and Air-Conditioning Engineers, 100, Part 2.
Google Scholar
Lowe, D. G. (1995). Similarity metric learning for a variable-kernel classifier. Neural Computation 7: 72–85.
Article Google Scholar
Maron, O. and Moore, A. W. (1994). Hoeffding Races: Accelerating model selection search for classification and function approximation. In Cowan, J. D., Tesauro, G. and Alspector, J. (eds.), Advances in Neural Information Processing Systems 6. Morgan Kaufmann.
Google Scholar
Maron, O. (1994). Hoeffding Races: Model Selection for MRI Classification. Masters Thesis, Dept. of Electrical Engeineering and Computer Science, M.I.T.
Google Scholar
Miller, A. J. (1990). Subset Selection in Regression. Chapman and Hall.
Google Scholar
Moore, A. W. and Lee, M. S. (1994). Efficient Algorithms for Minimizing Cross Validation Error. In Machine Learning: Proceedings of the Eleventh International Conference,pp. 190–198. Morgan Kaufmann.
Google Scholar
Moore, A. W., Hill, D. J. and Johnson, M. P. (1992). An empirical investigation of brute force to choose features, smoothers and function approximators. In Hanson, S., Judd, S. and Petsche, T. (eds.), Computational Learning Theory and Natural Learning Systems, Volume 3. MIT Press.
Google Scholar
Moore, A. W. (1992). Fast, robust adaptive control by learning only forward models. In Moody, J. E., Hanson, S. J. and Lippman, R. P. (eds.), Advances in Neural Information Processing Systems 4. Morgan Kaufmann.
Google Scholar
Murphy, P. M. (1996). UCI repository of machine learning databases. For more information contact ml-repository@ics.uci.edu.
Google Scholar
Omohundro, S. (1993). Private communication.
Google Scholar
Press, W. H., Teukolsky, S. A., Vetterling, W T. and Flannery, B. P. (1992). Numerical Recipes in C: the art of scientific computing. New York: Cambridge University Press, second edition.
Google Scholar
Rivest, R. L. and Yin, Y. (1993). Simulation Results for a new two-armed bandit heuristic. Technical report, Laboratory for Computer Science, M.I.T.
Google Scholar
Schaal, S. and Atkeson, C. G. (1993). Open loop stable control strategies for robot juggling. In Proceedings of IEEE conference on Robotics and Automation.
Google Scholar
Schmitt, S. A. (1969). Measuring Uncertainty: An elementary introduction to Bayesian Statistics. Addison-Wesley.
Google Scholar
Skalak, D. B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. In Machine Learning: Proceedings of the Eleventh International Conference,pp. 293–301. Morgan Kaufmann.
Google Scholar
Weiss, S. M. and Kulikowski, C. A. (1991). Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning, and expert systems. San Mateo, CA: Morgan-Kaufmann.
Google Scholar
Welch, B. L. (1937). The significance of the difference between two means when the population variances are unequal. Biometrika 29.
Google Scholar
Zhang, X, Mesirov, J. R and Waltz, D. L. (1992). Hybrid system for protein secondary structure prediction. Journal of Molecular Biology 225: 1049–1063.
Article Google Scholar

Download references

Author information

Authors and Affiliations

M.I.T. Artificial Intelligence Lab, NE45-755, 545 Technology Square, Cambridge, MA, 02139, USA
Oded Maron
Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA
Andrew W. Moore

Authors

Oded Maron
View author publications
You can also search for this author in PubMed Google Scholar
Andrew W. Moore
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Navy Center for Applied Research in Artificial Intelligence, Naval Research Laboratory, Washington, D.C., USA
David W. Aha

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Maron, O., Moore, A.W. (1997). The Racing Algorithm: Model Selection for Lazy Learners. In: Aha, D.W. (eds) Lazy Learning. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2053-3_8

Download citation

DOI: https://doi.org/10.1007/978-94-017-2053-3_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-4860-8
Online ISBN: 978-94-017-2053-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics