Skip to main content

Lazy Learning: A Logical Method for Supervised Learning

  • Chapter

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 84))

Abstract

The traditional approach to supervised learning is global modeling which describes the relationship between the input and the output with an analytical function over the whole input domain. What makes global modeling appealing is the nice property that even for huge datasets, a parametric model can be stored in a small memory. Also, the evaluation of the parametric model requires a short program that can be executed in a reduced amount of time.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha, D.W. (1989), “Incremental, instance-based learning of independent and graded concept descriptions,” Sixth International Machine Learning Workshop, San Mateo, CA: Morgan Kaufmann, pp. 387–391.

    Google Scholar 

  2. Aha, D.W. (1990),A Study of Instance-Based Algorithms for Supervised Learning Tasks: Mathematical, Empirical and Psychological Observations, Ph.D. thesis, University of California, Irvine, Department of Information and Computer Science.

    Google Scholar 

  3. Aha, D.W. (1997), Editorial in Artificial Intelligence Review, vol. 11, no. 1–5, pp. 1–6.

    Google Scholar 

  4. Allen, D.M. (1974), “The relationship between variable and data augmentation and a method of prediction,” Technometrics, vol. 16, pp. 125–127.

    Article  MathSciNet  MATH  Google Scholar 

  5. Atkeson, C.G. (1989), “Using local models to control movement,” Advances in Neural Information Processing Systems, 1, D. Touretzky (Ed.), San Mateo, CA: Morgan Kaufmann, pp. 79–86.

    Google Scholar 

  6. Atkeson, C.G., Moore, A.W., and Schaal, S. (1997), “Locally weighted learning,” Artificial Intelligence Review, vol. 11, no. 1–5, pp. 11–73.

    Article  Google Scholar 

  7. Babuska, R. (1996), Fuzzy Modeling and Identification, Ph.D. thesis, Technische Universiteit Delft.

    Google Scholar 

  8. Bierman, G.J. (1977), Factorization Methods for Discrete Sequential Estimation, New York, NY: Academic Press.

    MATH  Google Scholar 

  9. Birattari, M. and Bontempi, G. (1999), Lazy Learning Vs. Speedy Gonzales: A fast algorithm for recursive identification and recursive validation of local constant models, Tech. Rept. TR/IRIDIA/99–6, IRIDIA-ULB, Brussels, Belgium.

    Google Scholar 

  10. Birattari, M., Bontempi, G., and Bersini, H. (1999), “Lazy learning meets the recursive least-squares algorithm,” Kearns, M.S., Solla, S.A., and Cohn, D.A. (Eds.), Advances in Neural Information Processing Systems 11, Cambridge: MIT Press, pp. 375–381.

    Google Scholar 

  11. Bishop, C.M. (1994), Neural Networks for Statistical Pattern Recognition, Oxford, UK: Oxford University Press.

    Google Scholar 

  12. Bontempi, G. (1999), Local Learning Techniques for Modeling, Prediction and Control, Ph.D. thesis, IRIDIA- Université Libre de Bruxelles.

    Google Scholar 

  13. Bontempi, G. and Birattari, M. (1999), Toolbox for Neuro-Fuzzy Identification and Data Analysis, For use with Matlab, Tech. Rept. 99–9, IRIDIA-ULB, Bruxelles, Belgium.

    Google Scholar 

  14. Bontempi, G., Birattari, M., and Bersini, H. (1998), “Recursive lazy learning for modeling and control,” Machine Learning: ECML-98 (10th European Conference on Machine Learning), pp. 292–303.

    Google Scholar 

  15. Bontempi, G., Birattari, M., and Bersini, H. (1999a), “Lazy Learners at work: the Lazy Learning Toolbox,” Proceeding of the 7th European Congress on Inteligent Techniques and Soft Computing EUFIT ‘89.

    Google Scholar 

  16. Bontempi, G., Birattari, M., and Bersini, H. (1999b), “Lazy learning for modeling and control design,” International Journal of Control, vol. 72, no. 7 /8, pp. 643–658.

    Article  MathSciNet  Google Scholar 

  17. Bontempi, G., Birattari, M., and Bersini, H. (1999c), “Local learning for iterated time-series prediction,” Bratko, I. and Dzeroski, S. (Eds.), Machine Learning: Proceedings of the Sixteenth International Conference, San Francisco, CA: Morgan Kaufmann Publishers, pp. 32–38.

    Google Scholar 

  18. Bontempi, G., Bersini, H., and Birattari, M. (1999d), “The local paradigm for modeling and control: From neuro-fuzzy to lazy learning,” Fuzzy Sets and Systems,in press.

    Google Scholar 

  19. Bontempi, G., Birattari, M., and Bersini, H (1999e), “A model selection approach for local learning,” Artificial Intelligence Communications,in press.

    Google Scholar 

  20. De Boor, C. (1978), A Practical Guide to Splines, New York: Springer.

    Book  MATH  Google Scholar 

  21. Breiman, L. (1996), “Stacked regressions,” Machine Learning, vol. 24, no. 1, pp. 49–64.

    MathSciNet  MATH  Google Scholar 

  22. Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984), Classification and Regression Trees, Belmont, CA: Wadsworth International Group.

    MATH  Google Scholar 

  23. Cleveland, W.S. (1979), “Robust locally weighted regression and smoothing scatterplots,” Journal of the American Statistical Association, vol. 74, pp. 829836.

    Google Scholar 

  24. Cleveland, W.S. and Devlin, S.J. (1988), “Locally weighted regression: an approach to regression analysis by local fitting,” Journal of American Statistical Association, vol. 83, pp. 596–610.

    Article  MATH  Google Scholar 

  25. Cleveland, W.S. and Loader, C. (1995), “Smoothing by Local Regression: Principles and methods,” Computational Statistics, vol. 11.

    Google Scholar 

  26. Cover, T. and Hart, P. (1967), “Nearest neighbor pattern classification,” Proc. IEEE Trans. Inform. Theory, pp. 21–27.

    Google Scholar 

  27. Cybenko, G. (1996), “Just-in-Time Learning and Estimation,” Identification, Adaptation, Learning. The Science of Learning Models from data, Bittanti, S. and Picci, G. (Eds.), NATO ASI Series, Springer, pp. 423–434.

    Google Scholar 

  28. Draper, N.R. and Smith, H. (1981), Applied Regression Analysis, New York: John Wiley and Sons.

    MATH  Google Scholar 

  29. Fan, J. and Gijbels, I. (1992), “Variable bandwidth and local linear regression smoothers,” The Annals of Statistics, vol. 20, no. 4, pp. 2008–2036.

    Article  MathSciNet  MATH  Google Scholar 

  30. Fan, J. and Gijbels, I. (1995), “Adaptive order polynomial fitting: bandwidth robustification and bias reduction,” J. Comp. Graph. Statist., vol. 4, pp. 213227.

    Google Scholar 

  31. Fan, J. and Gijbels, I. (1996), Local Polynomial Modelling and Its Applications,Chapman and Hall.

    Google Scholar 

  32. Farmer, J.D. and Sidorowich, J.J. (1987), “Predicting chaotic time series,” Physical Review Letters, vol. 8, no. 59, pp. 845–848.

    Article  MathSciNet  Google Scholar 

  33. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996), “The KDD process for extracting useful knowledge from volumes of data,” Communications of the ACM, vol. 39, no. 11, pp. 27–34.

    Article  Google Scholar 

  34. Friedman, J.H. (1994), Flexible metric nearest neighbor classification, Tech. Rept., Stanford University.

    Google Scholar 

  35. Geman, S., Bienenstock, E., and Doursat, R. (1992), “Neural networks and the bias/variance dilemma,” Neural Computation, vol. 4, no. 1, pp. 1–58.

    Article  Google Scholar 

  36. Goodwin, G.C. and Sin, K.S. (1984), Adaptive Filtering Prediction and Control,Prentice-Hall.

    Google Scholar 

  37. Hardie, W. and Marron, J.S. (1995), “Fast and simple scatterplot smoothing,” Comp. Statist. Data Anal., vol. 20, pp. 1–17.

    Article  Google Scholar 

  38. Hastie, T. and Loader, C. (1993), “Local regression: automatic kernel carpentry,” Statistical Science, vol. 8, pp. 120–143.

    Article  Google Scholar 

  39. Hastie, T. and Tibshirani, R. (1990), Generalized Additive Models, London, UK: Chapman and Hall.

    MATH  Google Scholar 

  40. Hastie, T. and Tibshirani, R. (1996), “Discriminant adaptive nearest neighbor classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 6, pp. 607–615.

    Article  Google Scholar 

  41. Johansen, T.A. and Foss, B.A. (1993), “Constructing NARMAX models using ARMAX models,” International Journal of Control, vol. 58, pp. 1125–1153.

    Article  MathSciNet  MATH  Google Scholar 

  42. Jones, M.C., Marron, J.S., and Sheather, S.J. (1995), “A brief survey of bandwidth selection for density estimation,” Journal of American Statistical Association, vol. 90.

    Google Scholar 

  43. Jordan, M.J. and Jacobs, R.A. (1994), “Hierarchical mixtures of experts and the EM algorithm,” Neural Computation, vol. 6, pp. 181–214.

    Article  Google Scholar 

  44. Katkovnik, V.Y. (1979), “Linear and nonlinear methods of nonparametric regression analysis,” Soviet Automatic Control, vol. 5, pp. 25–34.

    Google Scholar 

  45. Kolodner, J. (1993), Case-Based Reasoning,Morgan Kaufmann

    Google Scholar 

  46. Loader, C.R. (1987), Old Faithful Erupts: Bandwidth Selection Reviewed, Tech. Rept., Bell-Labs.

    Google Scholar 

  47. Mallows, C. (1974), “Discussion of a paper of Beaton and Tukey,” Technometrics, vol. 16, pp. 187–188.

    Google Scholar 

  48. Maron, O. and Moore, A. (1997), “The racing algorithm: Model selection for lazy learners,” Artificial Intelligence Review, vol. 11, no. 1–5, pp. 193–225.

    Article  Google Scholar 

  49. Masters, T. (1995), Practical Neural Network Recipes in C++, New York, NY: Academic Press.

    Google Scholar 

  50. Merz, C.J. and Murphy, P.M. (1998), UCI Repository of machine learning databases,http://www.ics.uci.edu/”mlearn /MLRepository.html.

  51. Moody, J. and Darken, C.J. (1989), “Fast learning in networks of locally-tuned processing units,” Neural Computation, vol. 1, no. 2, pp. 281–294.

    Article  Google Scholar 

  52. Moore, A. (1991), “Fast, robust adaptive control by learning only forward models,” Advances in Neural Information Processing Systems, NIPS 4, Moody, J.E., Hanson, S.J., and Lippman, R.P. (Eds.), San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  53. Moore, A.W., Hill, D.J., and Johnson, M.P. (1992), “An empirical investigation of brute force to choose features, smoothers and function approximators,” Computational Learning Theory and Natural Learning Systems, Janson, S., Judd, S., and Petsche, T. (Eds.), vol. 3, Cambridge, MA: MIT Press.

    Google Scholar 

  54. Murray-Smith, R. (1994), A local model network approach to nonlinear modelling, Ph.D. thesis, Department of Computer Science, University of Strathclyde, Strathclyde, UK.

    Google Scholar 

  55. Myers, R.H. (1994), Classical and Modern Regression with Applications, second ed., Boston, MA: PWS-KENT Publishing Company.

    Google Scholar 

  56. Nadaraya, E. (1964), “On estimating regression,” Theory of Prob. and Appl., vol. 9, pp. 141–142.

    Article  Google Scholar 

  57. Park, B.U. and Marron, J.S. (1990), “Comparison of data-driven bandwidth selectors,” Journal of American Statistical Association, vol. 85, pp. 66–72.

    Article  Google Scholar 

  58. Perrone, M.P. and Cooper, L.N. (1993), “When networks disagree: Ensemble methods for hybrid neural networks,” Artificial Neural Networks for Speech and Vision, Mammone, R.J. ( Ed. ), Chapman and Hall, pp. 126–142.

    Google Scholar 

  59. Priestley, M.B. and Chao, M.T. (1972), “Non-parametric Function Fitting,” Journal of Royal Statistical Society, Series B, vol. 34, pp. 385–392.

    MathSciNet  MATH  Google Scholar 

  60. Quinlan, J.R. (1993), “Combining instance-based and model-based learning,” Machine Learning. Proceedings of the Tenth International Conference, Morgan Kaufmann, pp. 236–243.

    Google Scholar 

  61. Rice, J. (1984), “Bandwidth choice for nonparametric regression,” The Annals of Statistics, vol. 12, pp. 1215–1230.

    Article  MathSciNet  MATH  Google Scholar 

  62. Rumelhart, D.E., Hinton, G.E., and Williams, R.K. (1986), “Learning representations by backpropagating errors,” Nature, vol. 323, no. 9, pp. 533–536.

    Article  Google Scholar 

  63. Ruppert, D. and Wand, M.P. (1994), “Multivariate locally weighted least squares regression,” The Annals of Statistics, vol. 22, no. 3, pp. 1346–1370.

    Article  MathSciNet  MATH  Google Scholar 

  64. Ruppert, D., Sheather, S.J., and Wand, M.P. (1995), “An effective bandwidth selector for local least squares regression,” Journal of American Statistical Association, vol. 90, pp. 1257–1270.

    Article  MathSciNet  MATH  Google Scholar 

  65. Scott, D.W. (1992), Multivariate density estimation, New York: Wiley.

    Book  MATH  Google Scholar 

  66. Seber, G.A.F. and Wild, C.J. (1989), Nonlinear regression, New York: Wiley.

    Book  MATH  Google Scholar 

  67. Stanfill, C. and Waltz, D. (1987), “Toward memory-based reasoning,” Communications of the ACM, vol. 29, no. 12, pp. 1213–1228.

    Article  Google Scholar 

  68. Stone, C. (1977), “Consistent nonparametric regression,” The Annals of Statistics, vol. 5, pp. 595–645.

    Article  MathSciNet  MATH  Google Scholar 

  69. Stone, M. (1974), “Cross-validatory choice and assessment of statistical predictions,” Journal of the Royal Statistical Society B, vol. 36, no. 1, pp. 111–147.

    MATH  Google Scholar 

  70. Suykens, J.A.K. and Vandewalle, J. (Eds.) (1998), “The K.U. Leuven Time Series Prediction Competition,” in Nonlinear Modeling: Advanced Black-Box Techniques, Kluwer Academic Publishers, pp. 241–251.

    Google Scholar 

  71. Takagi, T. and Sugeno, M. (1985), “Fuzzy identification of systems and its applications to modeling and control,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 15, no. 1, pp. 116–132.

    Article  MATH  Google Scholar 

  72. Vapnik, V.N. (1995), The Nature of Statistical Learning Theory, New York, NY: Springer.

    Book  MATH  Google Scholar 

  73. Watson, G. (1969), “Smooth regression analysis,” Sankhya, Series, vol. A, no. 26, pp. 359–372.

    Google Scholar 

  74. Wolpert, D. (1992), “Stacked generalization,” Neural Networks, vol. 5, pp. 241259.

    Google Scholar 

  75. Woodrofe, M. (1970), “On choosing a delta-sequence,” Ann. Math. Statist., vol. 41, pp. 1665–1671.

    Article  MathSciNet  Google Scholar 

  76. Xu, L., Jordan, M.I., and Hinton, G.E. (1995), “An alternative model for mixtures of experts,” Advances in Neural Information Processing Systems, Tesauro, G., Touretzky, D., and Leen, T. (Eds.), The MIT Press, vol. 7, pp. 633–640.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bontempi, G., Birattari, M., Bersini, H. (2002). Lazy Learning: A Logical Method for Supervised Learning. In: Jain, L.C., Kacprzyk, J. (eds) New Learning Paradigms in Soft Computing. Studies in Fuzziness and Soft Computing, vol 84. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1803-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-7908-1803-1_4

  • Publisher Name: Physica, Heidelberg

  • Print ISBN: 978-3-7908-2499-5

  • Online ISBN: 978-3-7908-1803-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics