Skip to main content

On Concentration of Discrete Distributions with Applications to Supervised Learning of Classifiers

  • Conference paper
Book cover Machine Learning and Data Mining in Pattern Recognition (MLDM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4571))

  • 3647 Accesses

Abstract

Computational procedures using independence assumptions in various forms are popular in machine learning, although checks on empirical data have given inconclusive results about their impact. Some theoretical understanding of when they work is available, but a definite answer seems to be lacking. This paper derives distributions that maximizes the statewise difference to the respective product of marginals. These distributions are, in a sense the worst distribution for predicting an outcome of the data generating mechanism by independence. We also restrict the scope of new theoretical results by showing explicitly that, depending on context, independent (’Naïve’) classifiers can be as bad as tossing coins. Regardless of this, independence may beat the generating model in learning supervised classification and we explicitly provide one such scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Russell, S., Norvig, P.: Artificial intelligence: a modern approach. Prentice-Hall, Englewood Cliffs (1995)

    MATH  Google Scholar 

  2. Chow, C., Liu, C.: Approximating discrete probability distributions with dependency trees. IEEE Transactions on Information Theory 14(3), 462–467 (1968)

    Article  MATH  Google Scholar 

  3. Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning Journal 20(3), 197–243 (1995)

    MATH  Google Scholar 

  4. Hand, D., Yu, K.: Idiot’s bayes–not so stupid after all? International Statistical Review 69(3), 385–398 (2001)

    Article  MATH  Google Scholar 

  5. Lewis, P.: Approximating probability distributions to reduce storage requirements. Information and Control 2, 214–225 (1959)

    Article  MATH  MathSciNet  Google Scholar 

  6. Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)

    MATH  Google Scholar 

  7. Catoni, O.: Statistical Learning Theory and Stochastic Optimization. Springer, Heidelberg (2004)

    MATH  Google Scholar 

  8. Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)

    MATH  Google Scholar 

  9. Huang, K., King, I., Lyu, M.: Finite mixture model of bounded semi-naive Bayesian network classifier. In: Kaynak, O., Alpaydın, E., Oja, E., Xu, L. (eds.) ICANN 2003 and ICONIP 2003. LNCS, vol. 2714, Springer, Heidelberg (2003)

    Google Scholar 

  10. Ripley, B.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)

    MATH  Google Scholar 

  11. Titterington, D., Murray, G., Murray, L., Spiegelhalter, D., Skene, A., Habbema, J., Gelpke, G.: Comparison of discrimination techniques applied to a complex data set of head injured patients. Journal of the Royal Statistical Society 144(2), 145–175 (1981)

    MATH  MathSciNet  Google Scholar 

  12. Chickering, D.: Learning equivalence classes of bayesian-network structures. The Journal of Machine Learning Research 2, 445–498 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  13. Rish, I., Hellerstein, J., Thathachar, J.: An analysis of data characteristics that affect naive bayes performance. Technical Report RC21993, IBM (2001)

    Google Scholar 

  14. Ekdahl, M.: Approximations of Bayes Classifiers for Statistical Learning of Clusters. Licentiate thesis, Linköpings Universitet (2006)

    Google Scholar 

  15. Ekdahl, M., Koski, T., Ohlson, M.: Concentrated or non-concentrated discrete distributions are almost independent. IEEE Transactions on Information Theory (submitted)

    Google Scholar 

  16. Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning 29(2), 103–130 (1997)

    Article  MATH  Google Scholar 

  17. Ekdahl, M., Koski, T.: Bounds for the loss in probability of correct classification under model based approximation. Journal of Machine Learning Research 7, 2473–2504 (2006)

    MathSciNet  Google Scholar 

  18. Hagerup, T., Rub, C.: A guided tour of Chernoff bounds. Information Processing Letters 33, 305–308 (1989)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ekdahl, M., Koski, T. (2007). On Concentration of Discrete Distributions with Applications to Supervised Learning of Classifiers. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73499-4_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73498-7

  • Online ISBN: 978-3-540-73499-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics