Discovering Surprising Patterns by Detecting Occurrences of Simpson’s Paradox

  • Carem C. Fabris
  • Alex A. Freitas
Conference paper

Abstract

This paper addresses the discovery of surprising patterns. Recently, several authors have addressed the task of discovering surprising prediction rules. However, we do not focus on prediction rules, but rather on a quite different kind of pattern, namely the occurrence of Simpson’s paradox. Intuitively, the fact that this is a paradox suggests that it has a great potential to be a surprising pattern for the user. With this motivation, we make the detection of Simpson’s paradox the central goal of a data mining algorithm explicitly designed to discover surprising patterns. We present computational results showing surprising occurrences of the paradox in some public-domain data sets. In addition, we propose a method for ranking the discovered instances of the paradox in decreasing order of estimated degree of surprisingness.

Keywords

Hepatitis Tuberculosis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    G. Dong & J. Li. Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. Research and Development in Knowledge Discovery & Data Mining (Proa 2 nd Pacific-Asian Conf., PAKDD-98). LNAI 1394, 72–86. Springer-Verlag, 1998.Google Scholar
  2. [2]
    A.A. Freitas . On objective measures of rule surprisingness. Principles of Data Mining and Knowledge Discovery: Proc. 2nd European Symp. (PKDD’98). LNAI 1510, 1–9. Nantes, France, Sep. 1998.CrossRefGoogle Scholar
  3. [3]
    A.A. Freitas. On rule interestingness measures. To appear in Knowledge-Based Systems journal, 1999.Google Scholar
  4. [4]
    C. Glymour, D. Madigan, D. Pregibon and P. Smyth. Statistical themes and lessons for data mining. Data Mining and Knowl. Discov.1 (1), 11–28. 1997.CrossRefGoogle Scholar
  5. [5]
    B. Liu & W. Hsu. Post-analysis of learned rules. Proc. 1996 Nat. Conf. American Assoc. for Artificial Intelligence (AAAI-96), 828–834. AAAI Press, 1996.Google Scholar
  6. [6]
    B. Liu, W. Hsu and S. Chen. Using general impressions to analyze discovered classification rules. Proc. 3rd Int. Conf. Knowledge Discovery & Data Mining, 31–36. AAAI, 1997.Google Scholar
  7. G. Newson . Simpson’s paradox revisited. The Mathematical Gazette 75(473), 290–293. Oct. 1991.Google Scholar
  8. [8]
    B. Padmanabhan and A. Tuzhilin. A belief-driven method for discovering unexpected patterns. Proc. 4th Int. Conf. Knowledge Discovery & Data Mining (KDD-98), 94–100. AAAI Press, 1998.Google Scholar
  9. [9]
    A. Silberschatz & A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Trans. Knowledge & Data Engineering, 8(6), 970–974, Dec./1996.CrossRefGoogle Scholar
  10. [10]
    E.H. Simpson. The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society, Series B, 13, 238–241. 1951.MathSciNetMATHGoogle Scholar
  11. [11]
    R. Subramonian . Defining diffas a data mining primitive. Proc. 4th Int. Conf. Knowledge Discovery & Data Mining, 334–338. AAAI, 1998.Google Scholar
  12. [12]
    E. Suzuki . Autonomous discovery of reliable exception rules. Proc. 3rd Int. Conf. Knowledge Discovery & Data Mining, 259–262. AAAI Press, 1997.Google Scholar
  13. [13]
    E. Suzuki & Y. Kodratoff. Discovery of surprising exception rules based on intensity of implication. Proc. 2nd European Symp. Principles of Data Mining and Knowledge Discovery (PKDD’98). LNAI1510, 10–18. Nantes, France, Sep. 1998.CrossRefGoogle Scholar
  14. [14]
    C.H. Wagner. Simpson’s paradox in real life. The American Statistician, 36 (1), Feb. 1982, 46–48.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2000

Authors and Affiliations

  • Carem C. Fabris
    • 1
  • Alex A. Freitas
    • 2
  1. 1.CEFET-PR CPGEICuritiba-PRBrazil
  2. 2.PUC-PR PPGIA-CCETCuritiba-PRBrazil

Personalised recommendations