On Concentration of Discrete Distributions with Applications to Supervised Learning of Classifiers

Ekdahl, Magnus; Koski, Timo

doi:10.1007/978-3-540-73499-4_2

Magnus Ekdahl¹ &
Timo Koski¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4571))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

3647 Accesses

Abstract

Computational procedures using independence assumptions in various forms are popular in machine learning, although checks on empirical data have given inconclusive results about their impact. Some theoretical understanding of when they work is available, but a definite answer seems to be lacking. This paper derives distributions that maximizes the statewise difference to the respective product of marginals. These distributions are, in a sense the worst distribution for predicting an outcome of the data generating mechanism by independence. We also restrict the scope of new theoretical results by showing explicitly that, depending on context, independent (’Naïve’) classifiers can be as bad as tossing coins. Regardless of this, independence may beat the generating model in learning supervised classification and we explicitly provide one such scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Russell, S., Norvig, P.: Artificial intelligence: a modern approach. Prentice-Hall, Englewood Cliffs (1995)
MATH Google Scholar
Chow, C., Liu, C.: Approximating discrete probability distributions with dependency trees. IEEE Transactions on Information Theory 14(3), 462–467 (1968)
Article MATH Google Scholar
Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning Journal 20(3), 197–243 (1995)
MATH Google Scholar
Hand, D., Yu, K.: Idiot’s bayes–not so stupid after all? International Statistical Review 69(3), 385–398 (2001)
Article MATH Google Scholar
Lewis, P.: Approximating probability distributions to reduce storage requirements. Information and Control 2, 214–225 (1959)
Article MATH MathSciNet Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)
MATH Google Scholar
Catoni, O.: Statistical Learning Theory and Stochastic Optimization. Springer, Heidelberg (2004)
MATH Google Scholar
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)
MATH Google Scholar
Huang, K., King, I., Lyu, M.: Finite mixture model of bounded semi-naive Bayesian network classifier. In: Kaynak, O., Alpaydın, E., Oja, E., Xu, L. (eds.) ICANN 2003 and ICONIP 2003. LNCS, vol. 2714, Springer, Heidelberg (2003)
Google Scholar
Ripley, B.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
MATH Google Scholar
Titterington, D., Murray, G., Murray, L., Spiegelhalter, D., Skene, A., Habbema, J., Gelpke, G.: Comparison of discrimination techniques applied to a complex data set of head injured patients. Journal of the Royal Statistical Society 144(2), 145–175 (1981)
MATH MathSciNet Google Scholar
Chickering, D.: Learning equivalence classes of bayesian-network structures. The Journal of Machine Learning Research 2, 445–498 (2002)
Article MATH MathSciNet Google Scholar
Rish, I., Hellerstein, J., Thathachar, J.: An analysis of data characteristics that affect naive bayes performance. Technical Report RC21993, IBM (2001)
Google Scholar
Ekdahl, M.: Approximations of Bayes Classifiers for Statistical Learning of Clusters. Licentiate thesis, Linköpings Universitet (2006)
Google Scholar
Ekdahl, M., Koski, T., Ohlson, M.: Concentrated or non-concentrated discrete distributions are almost independent. IEEE Transactions on Information Theory (submitted)
Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning 29(2), 103–130 (1997)
Article MATH Google Scholar
Ekdahl, M., Koski, T.: Bounds for the loss in probability of correct classification under model based approximation. Journal of Machine Learning Research 7, 2473–2504 (2006)
MathSciNet Google Scholar
Hagerup, T., Rub, C.: A guided tour of Chernoff bounds. Information Processing Letters 33, 305–308 (1989)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Linköpings University, SE-581 83 Linköping, Sweden
Magnus Ekdahl & Timo Koski

Authors

Magnus Ekdahl
View author publications
You can also search for this author in PubMed Google Scholar
Timo Koski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ekdahl, M., Koski, T. (2007). On Concentration of Discrete Distributions with Applications to Supervised Learning of Classifiers. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-73499-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics