Discovering Surprising Patterns by Detecting Occurrences of Simpson’s Paradox
This paper addresses the discovery of surprising patterns. Recently, several authors have addressed the task of discovering surprising prediction rules. However, we do not focus on prediction rules, but rather on a quite different kind of pattern, namely the occurrence of Simpson’s paradox. Intuitively, the fact that this is a paradox suggests that it has a great potential to be a surprising pattern for the user. With this motivation, we make the detection of Simpson’s paradox the central goal of a data mining algorithm explicitly designed to discover surprising patterns. We present computational results showing surprising occurrences of the paradox in some public-domain data sets. In addition, we propose a method for ranking the discovered instances of the paradox in decreasing order of estimated degree of surprisingness.
KeywordsData Mining Knowledge Discovery Prediction Rule Categorical Attribute Central Goal
Unable to display preview. Download preview PDF.
- G. Dong & J. Li. Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. Research and Development in Knowledge Discovery & Data Mining (Proa 2 nd Pacific-Asian Conf., PAKDD-98). LNAI 1394, 72–86. Springer-Verlag, 1998.Google Scholar
- A.A. Freitas. On rule interestingness measures. To appear in Knowledge-Based Systems journal, 1999.Google Scholar
- B. Liu & W. Hsu. Post-analysis of learned rules. Proc. 1996 Nat. Conf. American Assoc. for Artificial Intelligence (AAAI-96), 828–834. AAAI Press, 1996.Google Scholar
- B. Liu, W. Hsu and S. Chen. Using general impressions to analyze discovered classification rules. Proc. 3rd Int. Conf. Knowledge Discovery & Data Mining, 31–36. AAAI, 1997.Google Scholar
- G. Newson . Simpson’s paradox revisited. The Mathematical Gazette 75(473), 290–293. Oct. 1991.Google Scholar
- B. Padmanabhan and A. Tuzhilin. A belief-driven method for discovering unexpected patterns. Proc. 4th Int. Conf. Knowledge Discovery & Data Mining (KDD-98), 94–100. AAAI Press, 1998.Google Scholar
- R. Subramonian . Defining diffas a data mining primitive. Proc. 4th Int. Conf. Knowledge Discovery & Data Mining, 334–338. AAAI, 1998.Google Scholar
- E. Suzuki . Autonomous discovery of reliable exception rules. Proc. 3rd Int. Conf. Knowledge Discovery & Data Mining, 259–262. AAAI Press, 1997.Google Scholar