Clustering Overview and Applications
Clustering is the assignment of objects to groups of similar objects (clusters). The objects are typically described as vectors of features (also called attributes). Attributes can be numerical (scalar) or categorical. The assignment can be hard, where each object belongs to one cluster, or fuzzy, where an object can belong to several clusters with a probability. The clusters can be overlapping, though typically they are disjoint. A distance measure is a function that quantifies the similarity of two objects.
Clustering is one of the most useful tasks in data analysis. The goal of clustering is to discover groups of similar objects and to identify interesting patterns in the data. Typically, the clustering problem is about partitioning a given data set into groups (clusters) such that the data points in a cluster are more similar to each other than points in different clusters [4, 8]. For example, consider a retail...
- 1.Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 94–105.Google Scholar
- 3.Ester M, Kriegel H.-Peter, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining; 1996. p. 226–31.Google Scholar
- 5.Fayyad UM, Piatesky-Shapiro G, Smuth P, Uthurusamy R. Advances in knowledge discovery and data mining. Menlo Park: AAAI Press; 1996.Google Scholar
- 7.Huang Z. A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery; 1997.Google Scholar
- 10.MacQueen JB Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability; 1967. p. 281–97.Google Scholar
- 12.Ng R, Han J. Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th International Conference on Very Large Data Bases; 1994. p. 144–55.Google Scholar
- 13.Theodoridis S, Koutroubas K. Pattern recognition. New York: Academic; 1999.Google Scholar
- 15.Wang W, Yang J, Muntz R. STING: a statistical information grid approach to spatial data mining. In: Proceedings of the 23th Internationa Conference on Very Large Data Bases; 1997. p. 186–95.Google Scholar
- 16.Zhang T, Ramakrishnman R, Linvy M. BIRCH: an efficient method for very large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1996. p. 103–14.Google Scholar