New Learning Paradigms in Soft Computing pp 97-136 | Cite as

# Lazy Learning: A Logical Method for Supervised Learning

Chapter

## Abstract

The traditional approach to supervised learning is *global* modeling which describes the relationship between the input and the output with an analytical function over the whole input domain. What makes global modeling appealing is the nice property that even for huge datasets, a parametric model can be stored in a small memory. Also, the evaluation of the parametric model requires a short program that can be executed in a reduced amount of time.

## Keywords

Local Model Feed Forward Neural Network Query Point Bandwidth Selection Local Linear Regression
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## Preview

Unable to display preview. Download preview PDF.

## References

- 1.Aha, D.W. (1989), “Incremental, instance-based learning of independent and graded concept descriptions,”
*Sixth International Machine Learning Workshop*, San Mateo, CA: Morgan Kaufmann, pp. 387–391.Google Scholar - 2.Aha, D.W. (1990),
*A Study of Instance-Based Algorithms for Supervised Learning Tasks: Mathematical, Empirical and Psychological Observations*, Ph.D. thesis, University of California, Irvine, Department of Information and Computer Science.Google Scholar - 3.Aha, D.W. (1997), Editorial in
*Artificial Intelligence Review*, vol. 11, no. 1–5, pp. 1–6.Google Scholar - 4.Allen, D.M. (1974), “The relationship between variable and data augmentation and a method of prediction,”
*Technometrics*, vol. 16, pp. 125–127.MathSciNetMATHCrossRefGoogle Scholar - 5.Atkeson, C.G. (1989), “Using local models to control movement,”
*Advances in Neural Information Processing Systems*, 1, D. Touretzky (Ed.), San Mateo, CA: Morgan Kaufmann, pp. 79–86.Google Scholar - 6.Atkeson, C.G., Moore, A.W., and Schaal, S. (1997), “Locally weighted learning,”
*Artificial Intelligence Review*, vol. 11, no. 1–5, pp. 11–73.CrossRefGoogle Scholar - 7.Babuska, R. (1996),
*Fuzzy Modeling and Identification*, Ph.D. thesis, Technische Universiteit Delft.Google Scholar - 8.Bierman, G.J. (1977),
*Factorization Methods for Discrete Sequential Estimation*, New York, NY: Academic Press.MATHGoogle Scholar - 9.Birattari, M. and Bontempi, G. (1999),
*Lazy Learning Vs. Speedy Gonzales: A fast algorithm for recursive identification and recursive validation of local constant models*, Tech. Rept. TR/IRIDIA/99–6, IRIDIA-ULB, Brussels, Belgium.Google Scholar - 10.Birattari, M., Bontempi, G., and Bersini, H. (1999), “Lazy learning meets the recursive least-squares algorithm,” Kearns, M.S., Solla, S.A., and Cohn, D.A. (Eds.),
*Advances in Neural Information Processing Systems 11*, Cambridge: MIT Press, pp. 375–381.Google Scholar - 11.Bishop, C.M. (1994),
*Neural Networks for Statistical Pattern Recognition*, Oxford, UK: Oxford University Press.Google Scholar - 12.Bontempi, G. (1999),
*Local Learning Techniques for Modeling, Prediction and Control*, Ph.D. thesis, IRIDIA- Université Libre de Bruxelles.Google Scholar - 13.Bontempi, G. and Birattari, M. (1999),
*Toolbox for Neuro-Fuzzy Identification and Data Analysis*, For use with Matlab, Tech. Rept. 99–9, IRIDIA-ULB, Bruxelles, Belgium.Google Scholar - 14.Bontempi, G., Birattari, M., and Bersini, H. (1998), “Recursive lazy learning for modeling and control,”
*Machine Learning: ECML-98 (10th European Conference on Machine Learning)*, pp. 292–303.Google Scholar - 15.Bontempi, G., Birattari, M., and Bersini, H. (1999a), “Lazy Learners at work: the Lazy Learning Toolbox,”
*Proceeding of the 7th European Congress on Inteligent Techniques and Soft Computing EUFIT ‘89*.Google Scholar - 16.Bontempi, G., Birattari, M., and Bersini, H. (1999b), “Lazy learning for modeling and control design,”
*International Journal of Control*, vol. 72, no. 7 /8, pp. 643–658.MathSciNetCrossRefGoogle Scholar - 17.Bontempi, G., Birattari, M., and Bersini, H. (1999c), “Local learning for iterated time-series prediction,” Bratko, I. and Dzeroski, S. (Eds.), Machine Learning:
*Proceedings of the Sixteenth International Conference*, San Francisco, CA: Morgan Kaufmann Publishers, pp. 32–38.Google Scholar - 18.Bontempi, G., Bersini, H., and Birattari, M. (1999d), “The local paradigm for modeling and control: From neuro-fuzzy to lazy learning,”
*Fuzzy Sets and Systems*,in press.Google Scholar - 19.Bontempi, G., Birattari, M., and Bersini, H (1999e), “A model selection approach for local learning,”
*Artificial Intelligence Communications*,in press.Google Scholar - 20.De Boor, C. (1978),
*A Practical Guide to Splines*, New York: Springer.MATHCrossRefGoogle Scholar - 21.Breiman, L. (1996), “Stacked regressions,”
*Machine Learning*, vol. 24, no. 1, pp. 49–64.MathSciNetMATHGoogle Scholar - 22.Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984),
*Classification and Regression Trees*, Belmont, CA: Wadsworth International Group.MATHGoogle Scholar - 23.Cleveland, W.S. (1979), “Robust locally weighted regression and smoothing scatterplots,”
*Journal of the American Statistical Association*, vol. 74, pp. 829836.Google Scholar - 24.Cleveland, W.S. and Devlin, S.J. (1988), “Locally weighted regression: an approach to regression analysis by local fitting,”
*Journal of American Statistical Association*, vol. 83, pp. 596–610.MATHCrossRefGoogle Scholar - 25.Cleveland, W.S. and Loader, C. (1995), “Smoothing by Local Regression: Principles and methods,”
*Computational Statistics*, vol. 11.Google Scholar - 26.Cover, T. and Hart, P. (1967), “Nearest neighbor pattern classification,”
*Proc. IEEE Trans. Inform. Theory*, pp. 21–27.Google Scholar - 27.Cybenko, G. (1996), “Just-in-Time Learning and Estimation,”
*Identification*,*Adaptation*,*Learning. The Science of Learning Models from data*, Bittanti, S. and Picci, G. (Eds.), NATO ASI Series, Springer, pp. 423–434.Google Scholar - 28.Draper, N.R. and Smith, H. (1981),
*Applied Regression Analysis*, New York: John Wiley and Sons.MATHGoogle Scholar - 29.Fan, J. and Gijbels, I. (1992), “Variable bandwidth and local linear regression smoothers,”
*The Annals of Statistics*, vol. 20, no. 4, pp. 2008–2036.MathSciNetMATHCrossRefGoogle Scholar - 30.Fan, J. and Gijbels, I. (1995), “Adaptive order polynomial fitting: bandwidth robustification and bias reduction,”
*J. Comp. Graph. Statist.*, vol. 4, pp. 213227.Google Scholar - 31.Fan, J. and Gijbels, I. (1996),
*Local Polynomial Modelling and Its Applications*,Chapman and Hall.Google Scholar - 32.Farmer, J.D. and Sidorowich, J.J. (1987), “Predicting chaotic time series,”
*Physical Review Letters*, vol. 8, no. 59, pp. 845–848.MathSciNetCrossRefGoogle Scholar - 33.Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996), “The KDD process for extracting useful knowledge from volumes of data,”
*Communications of the ACM*, vol. 39, no. 11, pp. 27–34.CrossRefGoogle Scholar - 34.Friedman, J.H. (1994),
*Flexible metric nearest neighbor classification*, Tech. Rept., Stanford University.Google Scholar - 35.Geman, S., Bienenstock, E., and Doursat, R. (1992), “Neural networks and the bias/variance dilemma,”
*Neural Computation*, vol. 4, no. 1, pp. 1–58.CrossRefGoogle Scholar - 36.Goodwin, G.C. and Sin, K.S. (1984),
*Adaptive Filtering Prediction and Control*,Prentice-Hall.Google Scholar - 37.Hardie, W. and Marron, J.S. (1995), “Fast and simple scatterplot smoothing,”
*Comp. Statist. Data Anal.*, vol. 20, pp. 1–17.CrossRefGoogle Scholar - 38.Hastie, T. and Loader, C. (1993), “Local regression: automatic kernel carpentry,”
*Statistical Science*, vol. 8, pp. 120–143.CrossRefGoogle Scholar - 39.Hastie, T. and Tibshirani, R. (1990),
*Generalized Additive Models*, London, UK: Chapman and Hall.MATHGoogle Scholar - 40.Hastie, T. and Tibshirani, R. (1996), “Discriminant adaptive nearest neighbor classification,”
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, vol. 18, no. 6, pp. 607–615.CrossRefGoogle Scholar - 41.Johansen, T.A. and Foss, B.A. (1993), “Constructing NARMAX models using ARMAX models,”
*International Journal of Control*, vol. 58, pp. 1125–1153.MathSciNetMATHCrossRefGoogle Scholar - 42.Jones, M.C., Marron, J.S., and Sheather, S.J. (1995), “A brief survey of bandwidth selection for density estimation,”
*Journal of American Statistical Association*, vol. 90.Google Scholar - 43.Jordan, M.J. and Jacobs, R.A. (1994), “Hierarchical mixtures of experts and the EM algorithm,”
*Neural Computation*, vol. 6, pp. 181–214.CrossRefGoogle Scholar - 44.Katkovnik, V.Y. (1979), “Linear and nonlinear methods of nonparametric regression analysis,”
*Soviet Automatic Control*, vol. 5, pp. 25–34.Google Scholar - 45.Kolodner, J. (1993),
*Case-Based Reasoning*,Morgan KaufmannGoogle Scholar - 46.Loader, C.R. (1987),
*Old Faithful Erupts: Bandwidth Selection Reviewed*, Tech. Rept., Bell-Labs.Google Scholar - 47.Mallows, C. (1974), “Discussion of a paper of Beaton and Tukey,”
*Technometrics*, vol. 16, pp. 187–188.Google Scholar - 48.Maron, O. and Moore, A. (1997), “The racing algorithm: Model selection for lazy learners,”
*Artificial Intelligence Review*, vol. 11, no. 1–5, pp. 193–225.CrossRefGoogle Scholar - 49.Masters, T. (1995),
*Practical Neural Network Recipes in C++*, New York, NY: Academic Press.Google Scholar - 50.Merz, C.J. and Murphy, P.M. (1998),
*UCI Repository of machine learning databases*,http://www.ics.uci.edu/”mlearn /MLRepository.html. - 51.Moody, J. and Darken, C.J. (1989), “Fast learning in networks of locally-tuned processing units,”
*Neural Computation*, vol. 1, no. 2, pp. 281–294.CrossRefGoogle Scholar - 52.Moore, A. (1991), “Fast, robust adaptive control by learning only forward models,”
*Advances in Neural Information Processing Systems*,*NIPS*4, Moody, J.E., Hanson, S.J., and Lippman, R.P. (Eds.), San Mateo, CA: Morgan Kaufmann.Google Scholar - 53.Moore, A.W., Hill, D.J., and Johnson, M.P. (1992), “An empirical investigation of brute force to choose features, smoothers and function approximators,”
*Computational Learning Theory and Natural Learning Systems*, Janson, S., Judd, S., and Petsche, T. (Eds.), vol. 3, Cambridge, MA: MIT Press.Google Scholar - 54.Murray-Smith, R. (1994),
*A local model network approach to nonlinear modelling*, Ph.D. thesis, Department of Computer Science, University of Strathclyde, Strathclyde, UK.Google Scholar - 55.Myers, R.H. (1994),
*Classical and Modern Regression with Applications*, second ed., Boston, MA: PWS-KENT Publishing Company.Google Scholar - 56.Nadaraya, E. (1964), “On estimating regression,”
*Theory of Prob. and Appl.*, vol. 9, pp. 141–142.CrossRefGoogle Scholar - 57.Park, B.U. and Marron, J.S. (1990), “Comparison of data-driven bandwidth selectors,”
*Journal of American Statistical Association*, vol. 85, pp. 66–72.CrossRefGoogle Scholar - 58.Perrone, M.P. and Cooper, L.N. (1993), “When networks disagree: Ensemble methods for hybrid neural networks,”
*Artificial Neural Networks for Speech and Vision*, Mammone, R.J. ( Ed. ), Chapman and Hall, pp. 126–142.Google Scholar - 59.Priestley, M.B. and Chao, M.T. (1972), “Non-parametric Function Fitting,”
*Journal of Royal Statistical Society*,*Series B*, vol. 34, pp. 385–392.MathSciNetMATHGoogle Scholar - 60.Quinlan, J.R. (1993), “Combining instance-based and model-based learning,”
*Machine Learning. Proceedings of the Tenth International Conference*, Morgan Kaufmann, pp. 236–243.Google Scholar - 61.Rice, J. (1984), “Bandwidth choice for nonparametric regression,” The Annals of Statistics, vol. 12, pp. 1215–1230.MathSciNetMATHCrossRefGoogle Scholar
- 62.Rumelhart, D.E., Hinton, G.E., and Williams, R.K. (1986), “Learning representations by backpropagating errors,” Nature, vol. 323, no. 9, pp. 533–536.CrossRefGoogle Scholar
- 63.Ruppert, D. and Wand, M.P. (1994), “Multivariate locally weighted least squares regression,” The Annals of Statistics, vol. 22, no. 3, pp. 1346–1370.MathSciNetMATHCrossRefGoogle Scholar
- 64.Ruppert, D., Sheather, S.J., and Wand, M.P. (1995), “An effective bandwidth selector for local least squares regression,” Journal of American Statistical Association, vol. 90, pp. 1257–1270.MathSciNetMATHCrossRefGoogle Scholar
- 65.Scott, D.W. (1992),
*Multivariate density estimation*, New York: Wiley.MATHCrossRefGoogle Scholar - 66.Seber, G.A.F. and Wild, C.J. (1989),
*Nonlinear regression*, New York: Wiley.MATHCrossRefGoogle Scholar - 67.Stanfill, C. and Waltz, D. (1987), “Toward memory-based reasoning,”
*Communications of the ACM*, vol. 29, no. 12, pp. 1213–1228.CrossRefGoogle Scholar - 68.Stone, C. (1977), “Consistent nonparametric regression,”
*The Annals of Statistics*, vol. 5, pp. 595–645.MathSciNetMATHCrossRefGoogle Scholar - 69.Stone, M. (1974), “Cross-validatory choice and assessment of statistical predictions,”
*Journal of the Royal Statistical Society B*, vol. 36, no. 1, pp. 111–147.MATHGoogle Scholar - 70.Suykens, J.A.K. and Vandewalle, J. (Eds.) (1998), “The K.U. Leuven Time Series Prediction Competition,” in
*Nonlinear Modeling: Advanced Black-Box Techniques*, Kluwer Academic Publishers, pp. 241–251.Google Scholar - 71.Takagi, T. and Sugeno, M. (1985), “Fuzzy identification of systems and its applications to modeling and control,”
*IEEE Transactions on Systems, Man, and Cybernetics*, vol. 15, no. 1, pp. 116–132.MATHCrossRefGoogle Scholar - 72.Vapnik, V.N. (1995),
*The Nature of Statistical Learning Theory*, New York, NY: Springer.MATHCrossRefGoogle Scholar - 73.Watson, G. (1969), “
*Smooth regression analysis*,” Sankhya, Series, vol. A, no. 26, pp. 359–372.Google Scholar - 74.Wolpert, D. (1992), “Stacked generalization,”
*Neural Networks*, vol. 5, pp. 241259.Google Scholar - 75.Woodrofe, M. (1970), “On choosing a delta-sequence,”
*Ann. Math. Statist.*, vol. 41, pp. 1665–1671.MathSciNetCrossRefGoogle Scholar - 76.Xu, L., Jordan, M.I., and Hinton, G.E. (1995), “An alternative model for mixtures of experts,”
*Advances in Neural Information Processing Systems*, Tesauro, G., Touretzky, D., and Leen, T. (Eds.), The MIT Press, vol. 7, pp. 633–640.Google Scholar

## Copyright information

© Springer-Verlag Berlin Heidelberg 2002