What Can Fuzzy Cluster Analysis Contribute to Clustering of High-Dimensional Data?
- 1.4k Downloads
Cluster analysis of high-dimensional data has become of special interest in recent years. The term high-dimensional data can refer to a larger number of attributes – 20 or more – as they often occur in database tables. But high-dimensional data can also mean that we have to deal with thousands of attributes as in the context of genomics or proteomics data where thousands of genes or proteins are measured and are considered in some analysis tasks as attributes.
A main reason, why cluster analysis of high-dimensional data is different from clustering low-dimensional data, is the concentration of norm phenomenon, which states more or less that the relative differences between distances between randomly distributed points tend to be more and more similar in higher dimensions.
On the one hand, fuzzy cluster analysis has been shown to be less sensitive to initialisation than, for instance, the classical k-means algorithm. On the other, standard fuzzy clustering is stronger affected by the concentration of norm phenomenon and tends to fail easily in high dimensions. Here we present a review of why fuzzy clustering has special problems with high-dimensional data and how this can be amended by modifying the fuzzifier concept. We also describe a recently introduced approach based on correlation and an attribute selection fuzzy clustering technique that can be applied when clusters can only be found in lower dimensions.
KeywordsCluster Centre Fuzzy Cluster Membership Degree Subspace Cluster Fuzzy Cluster Analysis
Unable to display preview. Download preview PDF.
- 3.Pommerenke, C., Müsken, M., Becker, T., Dötsch, A., Klawonn, F., Häussler, S.: Global genotype-phenotype correlations in pseudomonas aeruginosa. PLoS Pathogenes 6(8) (2010), doi:10.1371/journal.ppat.1001074Google Scholar
- 4.Klawonn, F., Höppner, F., Jayaram, B.: What are clusters in high dimensions and are they difficult to find? In: Proc. CHDD 2013, Springer, Berlin (to appear, 2013)Google Scholar
- 11.Domeniconi, C., Papadopoulos, D., Gunopulos, D.: Subspace clustering of high dimensional data. In: Proceedings of SIAM Conference on Data Mining 2004, pp. 517–521 (2004)Google Scholar
- 16.Tanay, A., Sharan, R., Shamir, R.: Biclustering algorithms: A survey. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology. Chapman and Hall, Boca Raton (2006)Google Scholar
- 23.Gustafson, D., Kessel, W.: Fuzzy clustering with a fuzzy covariance matrix. In: IEEE CDC, San Diego, pp. 761–766 (1979)Google Scholar
- 24.Keller, A., Klawonn, F.: Adaptation of cluster sizes in objective function based fuzzy clustering. In: Leondes, C. (ed.) Intelligent Systems: Technology and Applications. Database and Learning Systems, vol. IV, pp. 181–199. CRC Press, Boca Raton (2003)Google Scholar
- 29.Krone, M., Klawonn, F., Jayaram, B.: RaCoCl: Robust rank correlation based clustering – an exploratory study for high-dimensional data. In: FUZZ-IEEE 2013, Hyderabad (2013)Google Scholar