Bayes optimal instance-based learning

  • Petri Kontkanen
  • Petri Myllymdki
  • Tomi Silander
  • Henry Tirri
Bayesian Networks
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1398)


In this paper we present a probabilistic formalization of the instance-based learning approach. In our Bayesian framework, moving from the construction of an explicit hypothesis to a data-driven instancebased learning approach, is equivalent to averaging over all the (possibly infinitely many) individual models. The general Bayesian instance-based learning framework described in this paper can be applied with any set of assumptions defining a parametric model family, and to any discrete prediction task where the number of simultaneously predicted attributes is small, which includes for example all classification tasks prevalent in the machine learning literature. To illustrate the use of the suggested general framework in practice, we show how the approach can be implemented in the special case with the strong independence assumptions underlying the so called Naive Bayes classifier. The resulting Bayesian instance-based classifier is validated empirically with public domain data sets and the results are compared to the performance of the traditional Naive Bayes classifier. The results suggest that the Bayesian instancebased learning approach yields better results than the traditional Naive Bayes classifier, especially in cases where the amount of the training data is small.


Bayesian Network Predictive Distribution Model Family Feedforward Neural Network Model Sixth International Workshop 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    D. Aha. A Study of Instance-Based Algorithms for Supervised Learning Tasks: Mathematical, Empirical, an Psychological Observations. PhD thesis, University of California, Irvine, 1990.Google Scholar
  2. 2.
    D. Aha, editor. Lazy Learning. Kluwer Academic Publishers, Dordrecht, 1997. Reprinted from Artificial Intelligence Review, 11:1-5.Google Scholar
  3. 3.
    K. Ali and M. Pazzani. Error reduction through learning multiple descriptions. Machine Learning, 24(3):173–202, September 1997.Google Scholar
  4. 4.
    C. Atkeson. Memory based approaches to approximating continuous functions. In M. Casdagli and S. Eubank, editors, Nonlinear Modeling and Forecasting. Proceedings Volume XII in the Santa Fe Institute Studies in the Sciences of Complexity. Addison Wesley, New York, NY, 1992.Google Scholar
  5. 5.
    C. Atkeson, A. Moore, and S. Schaal. Locally weighted learning. In Aha [2], pages 11–73.Google Scholar
  6. 6.
    J.O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, New York, 1985.Google Scholar
  7. 7.
    G. Cooper and E. Herskovits. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347, 1992.Google Scholar
  8. 8.
    M.H. DeGroot. Optimal statistical decisions. McGraw-Hill, 1970.Google Scholar
  9. 9.
    B.S. Everitt and D.J. Hand. Finite Mixture Distributions. Chapman and Hall, London, 1981.Google Scholar
  10. 10.
    U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge, MA, 1996.Google Scholar
  11. 11.
    D. Fisher. Noise-tolerant conceptual clustering. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 825–830, Detroit, Michigan, 1989.Google Scholar
  12. 12.
    D. Fisher and D. Talbert. Inference using probabilistic concept trees. In Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, pages 191–202, Ft. Lauderdale, Florida, January 1997.Google Scholar
  13. 13.
    J.H. Friedman. Flexible metric nearest neighbor classification. Unpublished manuscript. Available by anonymous ftp from Stanford Research Institute (Menlo Park, CA) at, 1994.Google Scholar
  14. 14.
    A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian Data Analysis. Chapman & Hall, 1995.Google Scholar
  15. 15.
    D. Heckerman, D. Geiger, and D.M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3):197–243, September 1995.Google Scholar
  16. 16.
    S. Kasif, S. Salzberg, D. Waltz, J. Rachlin, and D. Aha. Towards a better understanding of memory-based reasoning systems. In Proceedings of the Eleventh International Machine Learning Conference, pages 242–250, New Brunswick, NJ, 1994. Morgan Kaufmann Publishers.Google Scholar
  17. 17.
    P. Kontkanen, P. Myllymäki, T. Silander, H. Tirri, and P. Grünwald. Comparing predictive inference methods for discrete domains. In Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, pages 311–318, Ft. Lauderdale, Florida, January 1997. Also: NeuroCOLT Technical Report NCTR-97-004.Google Scholar
  18. 18.
    P. Kontkauen, P. Myllymäki, T. Silander, H. Tirri, and P. Grünwald. On predictive distributions and Bayesian networks. In W. Daelemans, P. Flach, and A. van den Bosch, editors, Proceedings of the Seventh Belgian-Dutch Conference on Machine Learning (BeNeLearn'97), pages 59–68, Tilburg, the Netherlands, October 1997.Google Scholar
  19. 19.
    P. Kontkanen, P. Myllymäki, and H. Tirri. Comparing Bayesian model class selection criteria by discrete finite mixtures. In D. Dowe, K. Korb, and J. Oliver, editors, Information, Statistics and Induction in Science, pages 364–374, Proceedings of the ISIS'96 Conference, Melbourne, Australia, August 1996. World Scientific, Singapore.Google Scholar
  20. 20.
    P. Kontkanen, P. Myllymäki, and H. Tirri. Experimenting with the CheesemanStutz evidence approximation for predictive modeling and data mining. In D. Dankel, editor, Proceedings of the Tenth International FLAIRS Conference, pages 204–211, Daytona Beach, Florida, May 1997.Google Scholar
  21. 21.
    D. Mackay. Bayesian Methods for Adaptive Models. PhD thesis, California Institute of Technology, 1992.Google Scholar
  22. 22.
    D. Madigan, A. Raftery, C. Volinsky, and J. Hoeting. Bayesian model averaging. In AAAI Workshop on Integrating Multiple Learned Models, 1996.Google Scholar
  23. 23.
    D. Michie, D.J. Spiegelhalter, and C.C. Taylor, editors. Machine Learning, Neural and Statistical Classification. Ellis Horwood, London, 1994.Google Scholar
  24. 24.
    A. Moore. Acquisition of dynamic control knowledge for a robotic manipulator. In Seventh International Machine Learning Workshop. Morgan Kaufmann, 1990.Google Scholar
  25. 25.
    P. Myllymäki and H. Tirri. Bayesian case-based reasoning with neural networks. In Proceedings of the IEEE International Conference on Neural Networks, volume 1, pages 422–427, San Francisco, March 1993. IEEE, Piscataway, NJ.CrossRefGoogle Scholar
  26. 26.
    P. Myllymäki and H. Tirri. Massively parallel case-based reasoning with probabilistic similarity metrics. In S. Wess, K.-D. Althoff, and M Richter, editors, Topics in Case-Based Reasoning, volume 837 of Lecture Notes in Artificial Intelligence, pages 144–154. Springer-Verlag, 1994.Google Scholar
  27. 27.
    J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific Publishing Company, New Jersey, 1989.Google Scholar
  28. 28.
    J. Rissanen. Fisher information and stochastic complexity. IEEE Transactions on Information Theory, 42(1):40–47, January 1996.CrossRefGoogle Scholar
  29. 29.
    C. Stanfill and D. Waltz. Toward memory-based reasoning. Communications of the ACM, 29(12):1213–1228, 1986.CrossRefGoogle Scholar
  30. 30.
    K. Ting and R. Cameron-Jones. Exploring a framework for instance based learning and Naive Bayes classifiers. In Proceedings of the Seventh Australian Joint Conference on Artificial Intelligence, pages 100–107, 1994.Google Scholar
  31. 31.
    H. Tirri, P. Kontkanen, and P. Myllymäki. A Bayesian framework for case-based reasoning. In I. Smith and B. Faltings, editors, Advances in Case-Based Reasoning, volume 1168 of Lecture Notes in Artificial Intelligence, pages 413–427. Springer-Verlag, Berlin Heidelberg, November 1996.Google Scholar
  32. 32.
    H. Tirri, P. Kontkanen, and P. Myllymäki. Probabilistic instance-based learning. In L. Saitta, editor, Machine Learning: Proceedings of the Thirteenth International Conference, pages 507–515. Morgan Kaufmann Publishers, 1996.Google Scholar
  33. 33.
    D.M. Titterington, A.F.M. Smith, and U.E. Makov. Statistical Analysis of Finite Mixture Distributions. John Wiley & Son, New York, 1985.Google Scholar
  34. 34.
    D. Wettschereck, D. Aha, and T. Mohri. A review and empirical evaluation of feature-weighting methods for a class of lazy learning algorithms. In Aha [2], pages 273–314.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Petri Kontkanen
    • 1
  • Petri Myllymdki
    • 1
  • Tomi Silander
    • 1
  • Henry Tirri
    • 1
  1. 1.Complex Systems Computation Group (CoSCo) P.O.Box 26, Department of Computer ScienceFIN-00014 University of HelsinkiFinland

Personalised recommendations