Skip to main content

Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious

  • Conference paper
  • First Online:
AI 2016: Advances in Artificial Intelligence (AI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9992))

Included in the following conference series:

Abstract

Data scientists, with access to fast growing data and computing power, constantly look for algorithms with greater detection power to discover “novel” knowledge. But more often than not, their algorithms give them too many outputs that are either highly speculative or simply confirming what the domain experts already know. To escape this dilemma, we need algorithms that move beyond the obvious association analyses and leverage domain analytic objectives (aka. KPIs) to look for higher order connections. We propose a new technique Exceptional Contrast Set Mining that first gathers a succinct collection of affirmative contrast sets based on the principle of redundant information elimination. Then it discovers exceptional contrast sets that contradict the affirmative contrast sets. The algorithm has been successfully applied to several analytic consulting projects. In particular, during an analysis of a state-wide cancer registry, it discovered a surprising regional difference in breast cancer screening.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Timna, J., Marc, M., Henrietta, C.: Primary-aged students in private schools perform only slightly better: NAPLAN. The Age Victoria, July 2015. http://goo.gl/hQ1q8V

  2. Luo, W., Cao, J., Gallagher, M., Wiles, J.: Estimating the intensity of ward admission and its effect on emergency department access block. Stat. Med. 32(15), 2681–2694 (2013)

    Article  MathSciNet  Google Scholar 

  3. Bay, S., Pazzani, M.: Detecting group differences: mining contrast sets. Data Min. Knowl. Disc. 5(3), 213–246 (2001)

    Article  MATH  Google Scholar 

  4. Neubarth, K., Conklin, D.: Contrast pattern mining in folk music analysis. In: Meredith, D. (ed.) Computational Music Analysis, pp. 393–424. Springer, New York (2016)

    Chapter  MATH  Google Scholar 

  5. Hilderman, R., Peckham, T.: Statistical methodologies for mining potentially interesting contrast sets. In: Guillet, F.J., Hamilton, H.J. (eds.) Quality Measures in Data Mining, vol. 43, pp. 153–177. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, ser. VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)

    Google Scholar 

  7. Bay, S., Pazzani, M.: Detecting change in categorical data: mining contrast sets. In: The 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 302–306. ACM (1999)

    Google Scholar 

  8. Simeon, M., Hilderman, R.: COSINE: a vertical group difference approach to contrast set mining. In: Butz, C., Lingras, P. (eds.) AI 2011. LNCS (LNAI), vol. 6657, pp. 359–371. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21043-3_43

    Chapter  Google Scholar 

  9. Simeon, M., Hilderman, R., Hamilton, H.: Mining interesting correlated contrast sets. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXIX, pp. 49–62. Springer, London (2012)

    Google Scholar 

  10. Nguyen, D., Nguyen, L.T., Vo, B., Hong, T.-P.: A novel method for constrained class association rule mining. Inf. Sci. 320, 107–125 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  11. Jabbar, M.S., Zaïane, O.R.: Learning statistically significant contrast sets. In: Khoury, R., Drummond, C. (eds.) AI 2016. LNCS (LNAI), vol. 9673, pp. 237–242. Springer, Heidelberg (2016). doi:10.1007/978-3-319-34111-8_29

    Chapter  Google Scholar 

  12. Suzuki, E.: Autonomous discovery of reliable exception rules. In: KDD, vol. 97, pp. 159–176 (1997)

    Google Scholar 

  13. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57(1), 289–300 (1995)

    MathSciNet  MATH  Google Scholar 

  14. Liu, G., Zhang, H., Wong, L.: Controlling false positives in association rule mining. Proc. VLDB Endow. 5(2), 145–156 (2011)

    Article  Google Scholar 

  15. Cortez, P., Silva, A.M.G.: Using data mining to predict secondary school student performance. In: Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), pp. 5–12. EUROSIS (2008)

    Google Scholar 

  16. Geng, L., Hamilton, H.: Interestingness measures for data mining: a survey. ACM Comput. Surv. (CSUR) 38(3), 9 (2006)

    Article  Google Scholar 

  17. Chapman, S., McLeod, K., Wakefield, M., Holding, S.: Impact of news of celebrity illness on breast cancer screening: Kylie Minogue’s breast cancer diagnosis. Med. J. Aust. 183(5), 247–250 (2005)

    Article  Google Scholar 

Download references

Acknowledgment

This work is partially supported by the Telstra-Deakin Centre of Excellence in Big Data and Machine Learning.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dang Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Nguyen, D., Luo, W., Phung, D., Venkatesh, S. (2016). Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50127-7_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50126-0

  • Online ISBN: 978-3-319-50127-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics