Skip to main content

Certainty upon Empirical Distributions

  • Conference paper
New Frontiers in Applied Data Mining (PAKDD 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7104))

Included in the following conference series:

Abstract

We address the problem of assessing the information conveyed by a finite discrete probability distribution, within the context of knowledge discovery. Our approach is based on two main axiomatic intuitions: (i) the minimum information is given in the case of a uniform distribution, and (ii) knowledge is akin to a notion of richness, related to the dimension of the distribution. From this perspective, we define a statistic that has a clear interpretation in terms of a measure of certainty, and we build up a plausible hypothesis, which offers a comprehensible insight of knowledge, with a consistent algebraic structure. This includes a native value for the uncertainty related to unseen events. Our approach is then faced up with entropy based measures. Finally, by implementing our measure in a decision tree induction algorithm, we show an empirical validation of the behavior of our measure with respect to entropy. Our conclusion is that the contributions of our measure are significant, and should definitely lead to more robust models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aczél, J., Forte, B., Ng, C.T.: Why Shannon and Hartley entropies are natural. Adv. Appl. Probab. 6, 131–146 (1974)

    MathSciNet  MATH  Google Scholar 

  2. Aczél, J., Daróczy, Z.: On Measures of Information and Their Characterizations. Academic Press, New York (1975)

    MATH  Google Scholar 

  3. Daróczy, Z.: Generalized information functions. Information and Control 16, 36–51 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  4. Gini, C.W.: Variability and Mutability, contribution to the study of statistical distributions and relations. In: Studi Economico-Giuricici della R. Universita de Cagliari (1912)

    Google Scholar 

  5. Herfindahl, O.C.: Concentration in the U.S. Steel Industry. Unpublished doctoral dissertation. Columbia University (1950)

    Google Scholar 

  6. Kvalseth, T.O.: Entropy and correlation: some comments. IEEE transactions on Systems, Man and Cybernetics 17(3), 517–519 (1987)

    Article  Google Scholar 

  7. Lenca, P., Lallich, S., Vaillant, B.: Construction of an Off-Centered Entropy for the Supervised Learning of Imbalanced Classes: Some First Results. Communications in Statistics- Theory and Methods 39(3), 493–507 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  8. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  9. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  10. Rényi, A.: On Measures of Entropy and Information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 547–561. University of California Press (1961)

    Google Scholar 

  11. Shannon, C.E.: A Mathematical Theory of Communication. The Bell System Technical Journal 27, 379–423, 623-656 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  12. Simpson, E.H.: Measurement of Diversity. Nature 163, 688 (1949)

    Article  MATH  Google Scholar 

  13. Theil, H.: On the estimation of relationships involving qualitative variables. The American Journal of Sociology 76(1), 103–154 (1970)

    Article  Google Scholar 

  14. Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2010), http://archive.ics.uci.edu/ml

    Google Scholar 

  15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Garriga, J. (2012). Certainty upon Empirical Distributions. In: Cao, L., Huang, J.Z., Bailey, J., Koh, Y.S., Luo, J. (eds) New Frontiers in Applied Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 7104. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28320-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28320-8_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28319-2

  • Online ISBN: 978-3-642-28320-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics