Skip to main content

Clustering Overview and Applications

  • Reference work entry
  • First Online:

Synonyms

Unsupervised learning

Definition

Clustering is the assignment of objects to groups of similar objects (clusters). The objects are typically described as vectors of features (also called attributes). Attributes can be numerical (scalar) or categorical. The assignment can be hard, where each object belongs to one cluster, or fuzzy, where an object can belong to several clusters with a probability. The clusters can be overlapping, though typically they are disjoint. A distance measure is a function that quantifies the similarity of two objects.

Historical Background

Clustering is one of the most useful tasks in data analysis. The goal of clustering is to discover groups of similar objects and to identify interesting patterns in the data. Typically, the clustering problem is about partitioning a given data set into groups (clusters) such that the data points in a cluster are more similar to each other than points in different clusters [4, 8]. For example, consider a retail...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 94–105.

    Google Scholar 

  2. Bezdeck JC, Ehrlich R, Full W. FCM: Fuzzy C-Means algorithm. Comput Geosci. 1984;10(2–3):191–203.

    Article  Google Scholar 

  3. Ester M, Kriegel H.-Peter, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining; 1996. p. 226–31.

    Google Scholar 

  4. Everitt BS, Landau S, Leese M. Cluster analysis. London: Hodder Arnold; 2001.

    MATH  Google Scholar 

  5. Fayyad UM, Piatesky-Shapiro G, Smuth P, Uthurusamy R. Advances in knowledge discovery and data mining. Menlo Park: AAAI Press; 1996.

    Google Scholar 

  6. Han J, Kamber M. Data mining: concepts and techniques. San Fransisco: Morgan Kaufmann Publishers; 2001.

    MATH  Google Scholar 

  7. Huang Z. A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery; 1997.

    Google Scholar 

  8. Jain AK, Murty MN, Flyn PJ. Data clustering: a review. ACM Comput Surv. 1999;31(3):264–323.

    Article  Google Scholar 

  9. Karypis G, Han E-H, Kumar V. CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 1999;32(8):68–75.

    Article  Google Scholar 

  10. MacQueen JB Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability; 1967. p. 281–97.

    Google Scholar 

  11. Mitchell T. Machine learning. New York: McGraw-Hill; 1997.

    MATH  Google Scholar 

  12. Ng R, Han J. Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th International Conference on Very Large Data Bases; 1994. p. 144–55.

    Google Scholar 

  13. Theodoridis S, Koutroubas K. Pattern recognition. New York: Academic; 1999.

    Google Scholar 

  14. Vazirgiannis M, Halkidi M, Gunopulos D. Uncertainty handling and quality assessment in data mining. New York: Springer; 2003.

    Book  MATH  Google Scholar 

  15. Wang W, Yang J, Muntz R. STING: a statistical information grid approach to spatial data mining. In: Proceedings of the 23th Internationa Conference on Very Large Data Bases; 1997. p. 186–95.

    Google Scholar 

  16. Zhang T, Ramakrishnman R, Linvy M. BIRCH: an efficient method for very large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1996. p. 103–14.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitrios Gunopulos .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Gunopulos, D. (2018). Clustering Overview and Applications. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_602

Download citation

Publish with us

Policies and ethics