Abstract
We investigate why discretization can be effective in naive-Bayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naive-Bayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functions were employed. We discuss the factors that might affect naive-Bayes classification error under discretization. We suggest that the use of different discretization techniques can affect the classification bias and variance of the generated classifiers. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification error.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bay, S.D.: The UCI KDD archive. Department of Information and Computer Science, University of California, Irvine (1999), http://kdd.ics.uci.edu
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/mlrepository.html
Bluman, A.G.: Elementary Statistics, A Step By Step Approach. Wm.C.Brown Publishers (1992)
Breiman, L.B.: variance and arcing classifiers. Tech. Rep., Statistics Department, University of California, Berkerley (1996)
Casella, G., Berger, R.L.: Statistical Inference. Pacific Grove, Calif. (1990)
Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: Proc. 9th European Conf. Artificial Intelligence, pp. 147–149 (1990)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proc. 12th Int. Conf. Machine Learning, pp. 194–202 (1995)
Duda, R., Hart, P.: Pattern Classification and Scene Analysis. John Wiley & Sons, Chichester (1973)
Friedman, J.H.: On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1(1), 55–77 (1997)
Gama, J., Torgo, L., Soares, C.: Dynamic discretization of continuous attributes. In: Proc. 6th Ibero-American Conf. AI, pp. 160–169 (1998)
Hsu, C.-N., Huang, H.-J., Wong, T.-T.: Why discretization works for naive Bayesian classifiers. In: Proc. 17th Int. Conf. Machine Learning, pp. 309–406 (2000)
Hsu, C.-N., Huang, H.-J., Wong, T.-T.: Implications of the Dirichlet assumption for discretization of continuous variables in naive Bayesian classifiers. Machine Learning (in press)
Hussain, F., Liu, H., Tan, C.L., Dash, M.: Discretization: An enabling technique, Tech. Rep., TRC6/99, School of Computing, National University of Singapore (1999)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proc. 11th Conf. Uncertainty in Artificial Intelligence, pp. 338–345 (1995)
Kohavi, R., Wolpert, D.: Bias plus variance decomposition for zero-one loss functions. In: Proc. 13th Int. Conf. Machine Learning, pp. 275–283 (1996)
Kong, E.B., Dietterich, T.G.: Error-correcting output coding corrects bias and variance. Proc. 12th Int. Conf. Machine Learning, 313–321 (1995)
Kononenko, I.: Naive Bayesian classifier and continuous attributes. Informatica 16(1), 1–8 (1992)
Mitchell, T.M.: Machine Learning. McGraw-Hill Companies, New York (1997)
Moore, D.S., McCabe, G.P.: Introduction to the Practice of Statistics, 4th edn. Michelle Julet (2002)
Mora, L., Fortes, I., Morales, R., Triguero, F.: Dynamic discretization of continuous values from time series. In: Proc. 11th European Conf. Machine Learning, pp. 280–291 (2000)
Pazzani, M.J.: An iterative improvement approach for the discretization of numeric attributes in Bayesian classifiers. In: Proc. 1st Int. Conf. Knowledge Discovery and Data Mining, 228–233 (1995)
Samuels, M.L., Witmer, J.A.: Statistics For The Life Sciences, 2nd edn., pp. 10–11. Prentice-Hall, Englewood Cliffs (1999)
Torgo, L., Gama, J.: Search-based class discretization. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol. 1224, pp. 266–273. Springer, Heidelberg (1997)
Webb, G.I.: Multiboosting: A technique for combining boosting and wagging. Machine Learning 40(2), 159–196 (2000)
Yang, Y., Webb, G.I.: Proportional k-interval discretization for naive-Bayes classifiers. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 564–575. Springer, Heidelberg (2001)
Yang, Y., Webb, G.I.: Discretization for naive-Bayes learning: Managing discretization bias and variance. Tech. Rep. 2003/131, School of Computer Science and Software Engineering, Monash University (2003) (submitted for journal publication)
Zheng, Z., Webb, G.I.: Lazy learning of Bayesian rules. Machine Learning 41(1), 53–84 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, Y., Webb, G.I. (2003). On Why Discretization Works for Naive-Bayes Classifiers. In: Gedeon, T.(.D., Fung, L.C.C. (eds) AI 2003: Advances in Artificial Intelligence. AI 2003. Lecture Notes in Computer Science(), vol 2903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24581-0_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-24581-0_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20646-0
Online ISBN: 978-3-540-24581-0
eBook Packages: Springer Book Archive