Abstract
Given a set of models and some training data, we would like to find the model that best describes the data. Finding the model with the lowest generalization error is a computationally expensive process, especially if the number of testing points is high or if the number of models is large. Optimization techniques such as hill climbing or genetic algorithms are helpful but can end up with a model that is arbitrarily worse than the best one or cannot be used because there is no distance metric on the space of discrete models. In this paper we develop a technique called “racing” that tests the set of models in parallel, quickly discards those models that are clearly inferior and concentrates the computational effort on differentiating among the better models. Racing is especially suitable for selecting among lazy learners since training requires negligible expense, and incremental testing using leave-one-out cross validation is efficient. We use racing to select among various lazy learning algorithms and to find relevant features in applications ranging from robot juggling to lesion detection in MRI scans.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha, D. W. (1990). A Study of Instance-Based Algorithms for Supervised Learning Tasks: Mathematical, Empirical and Psychological Evaluations. PhD. Thesis; Technical Report No. 90–42, University of California, Irvine.
Atkeson. C. G., Moore, A. W. and Schaal, S. A. (1997). Locally Weighted Learning. AI Review, this issue.
Atkeson, C. G. (1990). Memory-Based Approaches to Approximating Continuous Functions. In 1990 Workshop on Nonlinear Modeling and Forecasting. Adison-Wesley.
Bottou, L. and Vapnik, V. (1992). Local Learning Algorithms. Neural Computation 4: 888–900.
Box, G. E. P., Hunter, W. G. and and Hunter, J. S. (1978). Statistics for Experimenters. Wiley. Caruana, R. A. and and Freitag, D. (1994). Greedy Attribute Selection. In Machine Learning: Proceedings of the Eleventh International Conference,pp. 28–36. Morgan Kaufmann. Cleveland, W. S., Devlin, S. J. and Grosse, E. (1988). Regression by local fitting: Methods
properties, and computational algorithms. Journal of Econometrics 37: 87–114.
Conte, S. D. and De Boor, C. (1980). Elementary Numerical Analysis. McGraw Hill.
Dasarathy, B. V. (1991). Nearest Neighbor Norms: NN Patern Classifaction Techniques. IEEE Computer Society Press.
Efron, B. and Tibshirani, R. (1991). Statistical Data Analysis in the Computer Age. Science 253: 390–395.
Fix, E. and Hodges, J. L. (1951). Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties. Project 21–49–004, Report Number 4, USAF School of Aviation Medicine.
Goldberg, D. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley.
Gratch, J., Chien, S. and DeJong, G. (1993). Learning Search Control Knowledge for Deep Space Network Scheduling. In Proceedings of the 10th International Conference on Machine Learning,pp. 135–142. Morgan Kaufmann.
Gratch, J. (1994). An effective method for correlated selection problems. Department of Computer Science Technical Report Num. 1893, University of Illinois at Urbana-Champaign.
Greiner, R. and Jurisica, I. (1992). A statistical approach to solving the EBL utility problem. In Proceedings of the Tenth International conference on Artificial Intelligence,pp. 241–248. MTT Press.
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models. Chapman and Hall. Haussler, D. (1992). Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and Computation 100: 78–150.
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58: 13–30.
John, G. H., Kohavi, R. and Pfleger, K. (1994). Irrelevant features and the Subset Selection Problem. In Machine Learning: Proceedings of the Eleventh International Conference,pp. 121–129. Morgan Kaufmann.
Kaelbling, L. P. (1990). Learning in Embedded Systems. PhD. Thesis; Technical Report No. TR-90–04, Stanford University, Department of Computer Science.
Kreider, J. F. and Haberl, J. S. (1994). Predicting hourly building energy usage: The great energy predictor shootout - Overview and discussion of results. Transactions of the American Society of Heating, Refrigerating and Air-Conditioning Engineers, 100, Part 2.
Lowe, D. G. (1995). Similarity metric learning for a variable-kernel classifier. Neural Computation 7: 72–85.
Maron, O. and Moore, A. W. (1994). Hoeffding Races: Accelerating model selection search for classification and function approximation. In Cowan, J. D., Tesauro, G. and Alspector, J. (eds.), Advances in Neural Information Processing Systems 6. Morgan Kaufmann.
Maron, O. (1994). Hoeffding Races: Model Selection for MRI Classification. Masters Thesis, Dept. of Electrical Engeineering and Computer Science, M.I.T.
Miller, A. J. (1990). Subset Selection in Regression. Chapman and Hall.
Moore, A. W. and Lee, M. S. (1994). Efficient Algorithms for Minimizing Cross Validation Error. In Machine Learning: Proceedings of the Eleventh International Conference,pp. 190–198. Morgan Kaufmann.
Moore, A. W., Hill, D. J. and Johnson, M. P. (1992). An empirical investigation of brute force to choose features, smoothers and function approximators. In Hanson, S., Judd, S. and Petsche, T. (eds.), Computational Learning Theory and Natural Learning Systems, Volume 3. MIT Press.
Moore, A. W. (1992). Fast, robust adaptive control by learning only forward models. In Moody, J. E., Hanson, S. J. and Lippman, R. P. (eds.), Advances in Neural Information Processing Systems 4. Morgan Kaufmann.
Murphy, P. M. (1996). UCI repository of machine learning databases. For more information contact ml-repository@ics.uci.edu.
Omohundro, S. (1993). Private communication.
Press, W. H., Teukolsky, S. A., Vetterling, W T. and Flannery, B. P. (1992). Numerical Recipes in C: the art of scientific computing. New York: Cambridge University Press, second edition.
Rivest, R. L. and Yin, Y. (1993). Simulation Results for a new two-armed bandit heuristic. Technical report, Laboratory for Computer Science, M.I.T.
Schaal, S. and Atkeson, C. G. (1993). Open loop stable control strategies for robot juggling. In Proceedings of IEEE conference on Robotics and Automation.
Schmitt, S. A. (1969). Measuring Uncertainty: An elementary introduction to Bayesian Statistics. Addison-Wesley.
Skalak, D. B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. In Machine Learning: Proceedings of the Eleventh International Conference,pp. 293–301. Morgan Kaufmann.
Weiss, S. M. and Kulikowski, C. A. (1991). Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning, and expert systems. San Mateo, CA: Morgan-Kaufmann.
Welch, B. L. (1937). The significance of the difference between two means when the population variances are unequal. Biometrika 29.
Zhang, X, Mesirov, J. R and Waltz, D. L. (1992). Hybrid system for protein secondary structure prediction. Journal of Molecular Biology 225: 1049–1063.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Maron, O., Moore, A.W. (1997). The Racing Algorithm: Model Selection for Lazy Learners. In: Aha, D.W. (eds) Lazy Learning. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2053-3_8
Download citation
DOI: https://doi.org/10.1007/978-94-017-2053-3_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-4860-8
Online ISBN: 978-94-017-2053-3
eBook Packages: Springer Book Archive