Cluster Analysis of Data with Reduced Dimensionality: An Empirical Study
Cluster analysis is an important high-level data mining procedure that can be used to identify meaningful groups of objects within large data sets. Various dimension reduction methods are used to reduce the complexity of data before further processing. The lower-dimensional projections of original data sets can be seen as simplified models of the original data. In this paper, several clustering algorithms are used to process low-dimensional projections of complex data sets and compared with each other. The properties and quality of clustering obtained by each method is evaluated and their suitability to process reduced data sets is assessed.
KeywordsClustering Metric multidimensional scaling Sammon’s projection Affinity propagation Mean shift DBSCAN
This work was supported by the IT4Innovations Centre of Excellence project (CZ.1.05/1.1.00/02.0070), funded by the European Regional Development Fund and the national budget of the Czech Republic via the Research and Development for Innovations Operational Programme and by Project SP2015/146 of the Student Grant System, VŠB—Technical University of Ostrava.
- 1.Abdi, H.: Metric multidimensional scaling. In: Salkind, N. (ed.) Encyclopedia of Measurement and Statistics, pp. 598–605. Sage, Thousand Oaks (2007)Google Scholar
- 2.Bandyopadhyay, S., Saha, S.: Unsupervised Classification: Similarity Measures, Classical and Metaheuristic Approaches, and Applications. SpringerLink, Bücher. Springer, Berlin (2012), https://books.google.cz/books?id=Vb21R9_rMNoC
- 3.Borg, I., Groenen, P., Mair, P.: Mds algorithms. In: Applied Multidimensional Scaling, pp. 81–86. SpringerBriefs in Statistics, Springer, Berlin (2013), http://dx.doi.org/10.1007/978-3-642-31848-1_8
- 4.Burges, C.J.C.: Dimension reduction: a guided tour. Found. Trends Mach. Learn. 2(4) (2010), http://dx.doi.org/10.1561/2200000002
- 8.Everitt, B., Landau, S., Leese, M., Stahl, D.: Cluster Analysis. Wiley Series in Probability and Statistics, Wiley, Hoboken (2011), https://books.google.cz/books?id=w3bE1kqd-48C
- 9.Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007), http://www.sciencemag.org/content/315/5814/972.abstract
- 11.Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer series in statistics, Springer, Berlin (2001), https://books.google.cz/books?id=VRzITwgNV2UC
- 15.Wang, J.: Geometric Structure of High-Dimensional Data and Dimensionality Reduction. Springer, Berlin (2012), https://books.google.cz/books?id=0RmZRb2fLpgC