Advertisement

What Can Fuzzy Cluster Analysis Contribute to Clustering of High-Dimensional Data?

  • Frank Klawonn
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8256)

Abstract

Cluster analysis of high-dimensional data has become of special interest in recent years. The term high-dimensional data can refer to a larger number of attributes – 20 or more – as they often occur in database tables. But high-dimensional data can also mean that we have to deal with thousands of attributes as in the context of genomics or proteomics data where thousands of genes or proteins are measured and are considered in some analysis tasks as attributes.

A main reason, why cluster analysis of high-dimensional data is different from clustering low-dimensional data, is the concentration of norm phenomenon, which states more or less that the relative differences between distances between randomly distributed points tend to be more and more similar in higher dimensions.

On the one hand, fuzzy cluster analysis has been shown to be less sensitive to initialisation than, for instance, the classical k-means algorithm. On the other, standard fuzzy clustering is stronger affected by the concentration of norm phenomenon and tends to fail easily in high dimensions. Here we present a review of why fuzzy clustering has special problems with high-dimensional data and how this can be amended by modifying the fuzzifier concept. We also describe a recently introduced approach based on correlation and an attribute selection fuzzy clustering technique that can be applied when clusters can only be found in lower dimensions.

Keywords

Cluster Centre Fuzzy Cluster Membership Degree Subspace Cluster Fuzzy Cluster Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berthold, M., Borgelt, C., Höppner, F., Klawonn, F.: Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data. Springer, London (2010)CrossRefGoogle Scholar
  2. 2.
    Kerr, G., Ruskin, H., Crane, M.: Techniques for clustering gene expression data. Computers in Biology and Medicine 38(3), 383–393 (2008)CrossRefGoogle Scholar
  3. 3.
    Pommerenke, C., Müsken, M., Becker, T., Dötsch, A., Klawonn, F., Häussler, S.: Global genotype-phenotype correlations in pseudomonas aeruginosa. PLoS Pathogenes 6(8) (2010), doi:10.1371/journal.ppat.1001074Google Scholar
  4. 4.
    Klawonn, F., Höppner, F., Jayaram, B.: What are clusters in high dimensions and are they difficult to find? In: Proc. CHDD 2013, Springer, Berlin (to appear, 2013)Google Scholar
  5. 5.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  6. 6.
    Durrant, R.J., Kabán, A.: When is ’nearest neighbour’ meaningful: A converse theorem and implications. J. Complexity 25(4), 385–397 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    François, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 19(7), 873–886 (2007)CrossRefGoogle Scholar
  8. 8.
    Jayaram, B., Klawonn, F.: Can unbounded distance measures mitigate the curse of dimensionality? Int. Journ. Data Mining, Modelling and Management 4, 361–383 (2012)CrossRefGoogle Scholar
  9. 9.
    Aggarwal, C.C.: Re-designing distance functions and distance-based applications for high dimensional data. SIGMOD Record 30(1), 13–18 (2001)CrossRefGoogle Scholar
  10. 10.
    Hsu, C.M., Chen, M.S.: On the design and applicability of distance functions in high-dimensional data space. IEEE Trans. Knowl. Data Eng. 21(4), 523–536 (2009)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Domeniconi, C., Papadopoulos, D., Gunopulos, D.: Subspace clustering of high dimensional data. In: Proceedings of SIAM Conference on Data Mining 2004, pp. 517–521 (2004)Google Scholar
  12. 12.
    Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: A review. ACM SIGKDD Explorations Newsletter 6(1), 90–105 (2004)CrossRefGoogle Scholar
  13. 13.
    Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1–58 (2009)CrossRefGoogle Scholar
  14. 14.
    Keller, A., Klawonn, F.: Fuzzy clustering with weighting of data variables. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 8, 735–746 (2000)zbMATHGoogle Scholar
  15. 15.
    Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: A survey. IEEE Trans. Comput. Biol. Bioinf. 1(1), 24–45 (2004)CrossRefGoogle Scholar
  16. 16.
    Tanay, A., Sharan, R., Shamir, R.: Biclustering algorithms: A survey. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology. Chapman and Hall, Boca Raton (2006)Google Scholar
  17. 17.
    Van Mechelen, I., Bock, H.H., De Boeck, P.: Two-mode clustering methods: a structured overview. Statistical Methods in Medical Research 13, 363–394 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Dunn, J.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Cybernetics and Systems 3(3), 32–57 (1973)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)CrossRefzbMATHGoogle Scholar
  20. 20.
    Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973)zbMATHGoogle Scholar
  21. 21.
    Jayaram, B., Klawonn, F.: Can fuzzy clustering avoid local minima and undesired partitions? In: Moewes, C., Nürnberger, A. (eds.) Computational Intelligence in Intelligent Data Analysis. SCI, vol. 445, pp. 31–44. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  22. 22.
    Klawonn, F., Höppner, F.: What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier. In: Berthold, M.R., Lenz, H.J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 254–264. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  23. 23.
    Gustafson, D., Kessel, W.: Fuzzy clustering with a fuzzy covariance matrix. In: IEEE CDC, San Diego, pp. 761–766 (1979)Google Scholar
  24. 24.
    Keller, A., Klawonn, F.: Adaptation of cluster sizes in objective function based fuzzy clustering. In: Leondes, C. (ed.) Intelligent Systems: Technology and Applications. Database and Learning Systems, vol. IV, pp. 181–199. CRC Press, Boca Raton (2003)Google Scholar
  25. 25.
    Bezdek, J., Keller, J., Krishnapuram, R., Pal, N.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer, Boston (1999)zbMATHGoogle Scholar
  26. 26.
    Höppner, F., Klawonn, F., Kruse, R., Runkler, T.: Fuzzy Cluster Analysis. Wiley, Chichester (1999)zbMATHGoogle Scholar
  27. 27.
    Winkler, R., Klawonn, F., Kruse, R.: Fuzzy c-means in high dimensional spaces. Fuzzy System Applications 1, 1–17 (2011)CrossRefGoogle Scholar
  28. 28.
    Höppner, F., Klawonn, F.: A contribution to convergence theory of fuzzy c-means and its derivatives. IEEE Transactions on Fuzzy Systems 11, 682–694 (2003)CrossRefGoogle Scholar
  29. 29.
    Krone, M., Klawonn, F., Jayaram, B.: RaCoCl: Robust rank correlation based clustering – an exploratory study for high-dimensional data. In: FUZZ-IEEE 2013, Hyderabad (2013)Google Scholar
  30. 30.
    Bodenhofer, U., Klawonn, F.: Robust rank correlation coefficients on the basis of fuzzy orderings: Initial steps. Mathware and Soft Computing 15, 5–20 (2008)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Bodenhofer, U., Krone, M., Klawonn, F.: Testing noisy numerical data for monotonic association. Information Sciences 245, 21–37 (2013)CrossRefGoogle Scholar
  32. 32.
    Krishnapuram, R., Freg, C.: Fitting an unknown number of lines and planes to image data through compatible cluster merging. Pattern Recognition 25, 385–400 (1992)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Frank Klawonn
    • 1
    • 2
  1. 1.Bioinformatics & StatisticsHelmholtz-Centre for Infection ResearchBraunschweigGermany
  2. 2.Department of Computer ScienceOstfalia University of Applied SciencesWolfenbuettelGermany

Personalised recommendations