A Decomposition of Classes via Clustering to Explain and Improve Naive Bayes

  • Ricardo Vilalta
  • Irina Rish
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2837)


We propose a method to improve the probability estimates made by Naive Bayes to avoid the effects of poor class conditional probabilities based on product distributions when each class spreads into multiple regions. Our approach is based on applying a clustering algorithm to each subset of examples that belong to the same class, and to consider each cluster as a class of its own. Experiments on 26 real-world datasets show a significant improvement in performance when the class decomposition process is applied, particularly when the mean number of clusters per class is large.


Cluster Algorithm Discriminant Function Input Space Product Distribution Product Approximation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Blake, C.L., Merz, C.J.: UCI, Repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences (1998),
  2. 2.
    Domingos, P., Pazzani, M.: On the Optimality of the Simple Bayesian Classifier Under Zero-One Loss. Machine Learning 29, 103–130 (1997)zbMATHCrossRefGoogle Scholar
  3. 3.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley, Chichester (2001)zbMATHGoogle Scholar
  4. 4.
    Friedman, N., Geiger, D., Goldzmidt, M.: Bayesian Network Classifiers. Machine Learning 29, 131–163 (1997)zbMATHCrossRefGoogle Scholar
  5. 5.
    Garg, A., Roth, D.: Understanding Probabilistic Classifiers. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 179–191. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Holte, R.C., Acker, L.E., Porter, B.W.: Concept Learning and the Problem of Small Disjuncts. In: Eleventh International Joint Conference on Artificial Intelligence, pp. 813–818. Morgan Kaufmann, San Francisco (1989)Google Scholar
  7. 7.
    Kohavi, R., Becker, B., Sommerfield, D.: Improving Simple Bayes. In: European Conference on Machine Learning (1997)Google Scholar
  8. 8.
    Kohavi, R.: Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tee Hybrid. In: International Conference on Knowledge Discovery and Data Mining (1996)Google Scholar
  9. 9.
    Lewis, P.M.: Approximating Probability Distributions to Reduce Storage Requirements. Information and Control 2, 214–225 (1959)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions. John Wiley and Sons, Chichester (1997)zbMATHGoogle Scholar
  11. 11.
    Rish I., Hellerstein, J., Jayram, T.: An Analysis of Naive Bayes on Low-Entropy Distributions. IBM T.J. Watson Research Center, RC91994 (2001) Google Scholar
  12. 12.
    Webb, G.I., Pazzani, M.J.: Adjusted Probability Naive Bayes Induction. In: Tenth Australian Joint Conference on Artificial Intelligence, pp. 285–295. Springer, Heidelberg (1998)Google Scholar
  13. 13.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Academic Press, London (2000)Google Scholar
  14. 14.
    Zadrozny, B., Elkan, C.: Obtaining Calibrated Probability Estimates From Decision Trees and Naive Bayesian Classifiers. In: International Conference on Machine Learning (2001)Google Scholar
  15. 15.
    Zhang, H., Ling, C.X.: Geometric Properties of Naive Bayes in Nominal Domains. In: European Conference on Machine Learning, pp. 588–599 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Ricardo Vilalta
    • 1
  • Irina Rish
    • 2
  1. 1.Department of Computer ScienceUniversity of HoustonHoustonUSA
  2. 2.IBM T.J. Watson Research CenterHawthorneUSA

Personalised recommendations