Advertisement

Comparing Fuzzy Clusterings in High Dimensionality

  • Stefano RovettaEmail author
  • Francesco Masulli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7627)

Abstract

Due to the specificity of clustering, a problem that is intrinsically ill-posed, there are several approaches to comparing clusterings. Comparison of clusterings obtained in different conditions is often the only affordable evaluation strategy, due to the lack of a ground truth. In this chapter we address a class of dimensionality-independent methods which can be applied in the presence of a high-dimensional input space. Specifically, we review some generalizations of this class of methods to the case of fuzzy clustering, in several variants.

Keywords

Jaccard Index Rand Index Fuzzy Partition Adjusted Rand Index Possibilistic Approach 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Anderson, D.T., Bezdek, J.C., Popescu, M., Keller, J.M.: Comparing fuzzy, probabilistic, and possibilistic partitions. IEEE Trans. Fuzzy Syst. 18(5), 906–918 (2010)CrossRefGoogle Scholar
  2. 2.
    Anguita, D., Ridella, S., Rovetta, S.: Worst case analysis of weight inaccuracy effects in multilayer perceptrons. IEEE Trans. Neural Networks 10(2), 415–418 (1999)CrossRefGoogle Scholar
  3. 3.
    Barni, M., Cappellini, V., Mecocci, A.: Comments on ‘A possibilistic approach to clustering’. IEEE Trans. Fuzzy Syst. 4(3), 393–396 (1996)CrossRefGoogle Scholar
  4. 4.
    Baroni-Urbani, C., Buser, M.W.: Similarity of binary data. Syst. Biol. 25(3), 251–259 (1976). http://sysbio.oxfordjournals.org/content/25/3/251.abstract Google Scholar
  5. 5.
    Ben-David, S., von Luxburg, U., Pál, D.: A sober look at clustering stability. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 5–19. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  6. 6.
    Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Altman, R.B., Dunker, A.K., Hunter, L., Lauderdale, K., Klein, T.E. (eds.) BIOCOMPUTING 2002 Proceedings of the Pacific Symposium, pp. 6–17 (2001)Google Scholar
  7. 7.
    Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell (1981) CrossRefzbMATHGoogle Scholar
  8. 8.
    Brouwer, R.K.: Extending the rand, adjusted rand and jaccard indices to fuzzy partitions. J. Intell. Inf. Syst. 32(3), 213–235 (2009)CrossRefGoogle Scholar
  9. 9.
    Buser, M.W., Baroni-Urbani, C.: A direct nondimensional clustering method for binary data. Biometrics 38(2), 351–360 (1982). http://www.jstor.org/stable/2530449 CrossRefzbMATHGoogle Scholar
  10. 10.
    Campello, R.J.G.B.: Generalized external indexes for comparing data partitions with overlapping categories. Pattern Recogn. Lett. 31, 966–975 (2010)CrossRefGoogle Scholar
  11. 11.
    Carpineto, C., Romano, G.: Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 34(12), 2315–2326 (2012)CrossRefGoogle Scholar
  12. 12.
    Choi, S.S., Cha, S.H., Tappert, C.C.: A survey of binary similarity and distance measures. J. Systemics Cybern. Inf. 8, 43–48 (2010)Google Scholar
  13. 13.
    Corana, A., Marchesi, M., Martini, C., Ridella, S.: Minimizing multimodal functions of continuous variables with the “simulated annealing” algorithm. ACM Trans. Math. Softw. 13(3), 262–280 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Davé, R.N., Krishnapuram, R.: Robust clustering methods: a unified view. IEEE Trans. Fuzzy Syst. 5(2), 270–293 (1997)CrossRefGoogle Scholar
  15. 15.
    Filippone, M., Masulli, F., Rovetta, S.: Applying the possibilistic c-means algorithm in kernel-induced spaces. IEEE Trans. Fuzzy Syst. 18, 572–584 (2010)CrossRefGoogle Scholar
  16. 16.
    Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983). http://dx.doi.org/10.2307/2288117 CrossRefzbMATHGoogle Scholar
  17. 17.
    Fred, A.L.N., Jain, A.K.: Data clustering using evidence accumulation. Int. Conf. Pattern Recog. 4, 276–280 (2002)CrossRefGoogle Scholar
  18. 18.
    Frigui, H., Krishnapuram, R.: A robust competitive clustering algorithm with applications in computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 450–465 (1999)CrossRefGoogle Scholar
  19. 19.
    Frigui, H., Krishnapuram, R.: A robust clustering algorithm based on m-estimator. In: Proceedings of the 1st International Conference on Neural, Parallel and Scientific Computations, Atlanta, USA, vol. 1, pp. 163–166, May 1995Google Scholar
  20. 20.
    Huber, P.J.: Robust Stat. Wiley, New York (1981) CrossRefGoogle Scholar
  21. 21.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRefzbMATHGoogle Scholar
  22. 22.
    Jaccard, P.: Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull. Soc. Vaudoise des Sci. Nat. 37, 547–579 (1901)Google Scholar
  23. 23.
    Kearns, M., Schapire, R.: Efficient distribution-free learning of probabilistic concepts. J. Comput. Syst. Sci. 48(3), 464–497 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Klawonn, F.: Fuzzy clustering: insights and a new approach. Mathware Soft Comput. 11(3), 125–142 (2004)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Krishnapuram, R., Keller, J.M.: A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1(2), 98–110 (1993)CrossRefGoogle Scholar
  26. 26.
    Krishnapuram, R., Keller, J.M.: The possibilistic \(C\)-Means algorithm: insights and recommendations. IEEE Trans. Fuzzy Syst. 4(3), 385–393 (1996)CrossRefGoogle Scholar
  27. 27.
    Kuncheva, L.I., Vetrov, D.P.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1798–1808 (2006)CrossRefGoogle Scholar
  28. 28.
    Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)CrossRefzbMATHGoogle Scholar
  29. 29.
    Masulli, F., Rovetta, S.: Clustering High-Dimensional Data. In: Proceedings of CHDD 2012, Clustering High-Dimensional Data, Series Lecture Notes in Computer Science, LNCS 7627, 1, Springer-Verlag, Heidelberg, Germany (2015)Google Scholar
  30. 30.
    Masulli, F., Rovetta, S.: Soft transition from probabilistic to possibilistic fuzzy clustering. IEEE Trans. Fuzzy Syst. 14(4), 516–527 (2006)CrossRefGoogle Scholar
  31. 31.
    Meilă, M.: Comparing clusterings-an information based distance. J. Multivar. Anal. 98(5), 873–895 (2007). http://dx.doi.org/10.1016/j.jmva.2006.11.013 MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Ménard, M., Courboulay, V., Dardignac, P.A.: Possibilistic and probabilistic fuzzy clustering: unification within the framework of the non-extensive thermostatistics. Pattern Recogn. 36(6), 1325–1342 (2003)CrossRefzbMATHGoogle Scholar
  33. 33.
    Menger, K.: Statistical metrics. Proc. Natl. Acad. Sci. U.S.A. 28(12), 535–537 (1942)MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Moore, R.E., Kearfott, R.B., Cloud, M.J.: Introduction to Interval Analysis. Society for Industrial Mathematics, Philadelphia (2009)CrossRefzbMATHGoogle Scholar
  35. 35.
    Pal, N.R., Pal, K., Bezdek, J.C.: A mixed c-Means clustering model. In: FUZZIEEE97: Proceedings of the International Conference on Fuzzy Systems, pp. 11–21. IEEE, Barcelona (1997)Google Scholar
  36. 36.
    Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)CrossRefGoogle Scholar
  37. 37.
    Real, R., Vargas, J.M.: The probabilistic basis of jaccard’s index of similarity. Syst. Biol. 45, 380–385 (1996)CrossRefGoogle Scholar
  38. 38.
    Rose, K., Gurewitz, E., Fox, G.: A deterministic annealing approach to clustering. Pattern Recogn. Lett. 11, 589–594 (1990)CrossRefzbMATHGoogle Scholar
  39. 39.
    Rose, K., Gurewitz, E., Fox, G.: Statistical mechanics and phase transitions in clustering. Phys. Rev. Lett. 65, 945–948 (1990)CrossRefGoogle Scholar
  40. 40.
    Rovetta, S., Masulli, F.: An experimental validation of some indexes of fuzzy clustering similarity. In: Di Gesù, V., Pal, S.K., Petrosino, A. (eds.) WILF 2009. LNCS, vol. 5571, pp. 132–139. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  41. 41.
    Rovetta, S., Masulli, F.: Visual stability analysis for model selection in graded possibilistic clustering. Inf. Sci. 279, 37–51 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Ruspini, E.H.: A new approach to clustering. Inf. Control 15(1), 22–32 (1969)CrossRefzbMATHGoogle Scholar
  43. 43.
    Shi, G.: Multivariate data analysis in palaeoecology and palaeobiogeographya review. Palaeogeogr. Palaeoclimatol. Palaeoecol. 105(3–4), 199–234 (1993)CrossRefGoogle Scholar
  44. 44.
    Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. Roy. Stat. Soc. Ser. B Stat. Methodol. 63(2), 411–423 (2001). http://dx.doi.org/10.1111/1467-9868.00293 MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.DIBRIS – Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei SistemiUniversità di GenovaGenovaItaly
  2. 2.Center for BiotechnologyTemple UniversityPhiladelphiaUSA

Personalised recommendations