Skip to main content

Subgroup Mining

  • Conference paper
  • 120 Accesses

Part of the book series: International Centre for Mechanical Sciences ((CISM,volume 408))

Abstract

Statistical findings on subgroups belong to the most popular and simple forms of knowledge we encounter in all domains of science, business, or even daily life. We read or hear such messages as: Lung cancer mortality rate has considerably increased for women during the last 10 years, unemployment rate is overproportionally high for young men with low educational level, potential of violance is the highest for males between 14 and 18. In this paper, we first compare knowledge expressed by subgroup patterns with other popular knowledge types of Knowledge Discovery in Databases (KDD), introduce types of description languages for subgroups, summarize general pattern classes for subgroup deviations and associations. A deviation pattern describes a deviating behavior of a target variable in a subgroup. Deviation patterns rely on statistical tests and thus capture knowledge about a subgroup in form of a verified (alternative) hypothesis on the distribution of a target variable. Search for deviating subgroups is organized in two phases. In a brute force search, alternative search heuristics can be applied to find a set of deviating subgroups. In a second refinement phase, redundancy elimination operators identify a system of subgroups. We discuss the role of tests for subgroup mining, introduce specializations of the general deviation pattern, summarize search approaches, and deal with navigation and visualization operations that support an analyst in interactively constructing a best system of deviating subgroups.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Michalski, R.S.: A Theory and Methodology of Inductive Learning, in: Machine Learning: An Artificial Intelligence Approach (eds. Michalski, R.S.; Carbonell, J. and Mitchell, T. ), Tioga Publishing, Palo Alto 1983, 83–134.

    Chapter  Google Scholar 

  2. Wrobel, S.: An Algorithm for Multi-relational Discovery of Subgroups, in: Proceedings of the First European Symposium on Principles of KDD (eds. Komorowski, J. and Zytkow, J. ), Springer-Verlag, Berlin 1997, 78–87.

    Google Scholar 

  3. Klösgen, W.: Explora: A Multipattern and Multistrategy Discovery Assistant, in: Advances in Knowledge Discovery and Data Mining (eds. Fayyad, U.; PiatetskyShapiro, G.; Smyth, P. and Uthurusamy, R. ), MIT Press, Cambridge 1996. 249–271.

    Google Scholar 

  4. Friedman, J. and Fisher, N.: Bump Hunting in High-Dimensional Data, in: Statistics and Computing 1998.

    Google Scholar 

  5. Smyth, P. and Goodman, R.: An information theoretic approach to rule induction, in: IEEE Trans. Knowledge and Data Engineering 4, 1992.

    Google Scholar 

  6. Gebhardt, F.: Choosing among Competing Generalizations, in: Knowledge Acquisition 3, 1991.

    Google Scholar 

  7. Friendly, M.: Conceptual and Visual Models for Categorical Data, in: The American Statistician 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Wien

About this paper

Cite this paper

Klösgen, W. (2000). Subgroup Mining. In: Della Riccia, G., Kruse, R., Lenz, HJ. (eds) Computational Intelligence in Data Mining. International Centre for Mechanical Sciences, vol 408. Springer, Vienna. https://doi.org/10.1007/978-3-7091-2588-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-2588-5_2

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-211-83326-1

  • Online ISBN: 978-3-7091-2588-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics