Advertisement

MaxMin Linear Initialization for Fuzzy C-Means

  • Aybüke ÖztürkEmail author
  • Stéphane Lallich
  • Jérôme Darmont
  • Sylvie Yona Waksman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10934)

Abstract

Clustering is an extensive research area in data science. The aim of clustering is to discover groups and to identify interesting patterns in datasets. Crisp (hard) clustering considers that each data point belongs to one and only one cluster. However, it is inadequate as some data points may belong to several clusters, as is the case in text categorization. Thus, we need more flexible clustering. Fuzzy clustering methods, where each data point can belong to several clusters, are an interesting alternative. Yet, seeding iterative fuzzy algorithms to achieve high quality clustering is an issue. In this paper, we propose a new linear and efficient initialization algorithm MaxMin Linear to deal with this problem. Then, we validate our theoretical results through extensive experiments on a variety of numerical real-world and artificial datasets. We also test several validity indices, including a new validity index that we propose, Transformed Standardized Fuzzy Difference (TSFD).

Keywords

Clustering Fuzzy C-Means Seeding Initialization Maxmin linear method Validity indices 

Notes

Acknowledgements

This project is supported by the Rhône Alpes Region’s ARC 5: “Cultures, Sciences, Sociétés et Médiations” through A. Öztürk’s Ph.D. grant.

References

  1. 1.
    Ruspini, E.H.: Numerical methods for fuzzy clustering. Inf. Sci. 2(3), 319–350 (1970)CrossRefGoogle Scholar
  2. 2.
    MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)Google Scholar
  3. 3.
    Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters (1973)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)CrossRefGoogle Scholar
  5. 5.
    Kaufman, L., Rousseeuw, P.J.: Partitioning around medoids (program PAM). In: Finding Groups in Data: An Introduction to Cluster Analysis, pp. 68–125 (1990)Google Scholar
  6. 6.
    Steinley, D., Brusco, M.J.: Initializing k-means batch clustering: a critical evaluation of several techniques. J. Classif. 24(1), 99–121 (2007)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Maitra, R., Peterson, A.D., Ghosh, A.P.: A systematic evaluation of different methods for initializing the k-means clustering algorithm. IEEE Trans. Knowl. Data Eng. 41 (2011)Google Scholar
  8. 8.
    Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013)CrossRefGoogle Scholar
  9. 9.
    Norušis, M.J.: IBM SPSS Statistics 19 Statistical Procedures Companion. Prentice Hall, Upper Saddle River (2012)Google Scholar
  10. 10.
    Faber, V.: Clustering and the continuous k-means algorithm. Los Alamos Sci. 22(138144.21), 138–144 (1994)Google Scholar
  11. 11.
    Hand, D.J., Krzanowski, W.J.: Optimising k-means clustering results with standard software packages. Comput. Stat. Data Anal. 49(4), 969–973 (2005)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: ICML, vol. 98, pp. 91–99 (1998)Google Scholar
  13. 13.
    Su, T., Dy, J.G.: In search of deterministic methods for initializing k-means and Gaussian mixture clustering. Intell. Data Anal. 11(4), 319–338 (2007)CrossRefGoogle Scholar
  14. 14.
    Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometr. Intell. Lab. Syst. 2(1–3), 37–52 (1987)CrossRefGoogle Scholar
  15. 15.
    Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, pp. 1027–1035 (2007)Google Scholar
  16. 16.
    Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies: II. Clustering systems. Comput. J. 10(3), 271–277 (1967)CrossRefGoogle Scholar
  17. 17.
    Astrahan, M.: Speech analysis by clustering, or the hyperphoneme method. Technical report, Department of Computer Science, Stanford University, CA (1970)Google Scholar
  18. 18.
    Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theoret. Comput. Sci. 38, 293–306 (1985)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Wang, W., Zhang, Y.: On fuzzy cluster validity indices. Fuzzy Sets Syst. 158(19), 2095–2117 (2007)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Bezdek, J.C.: Cluster validity with fuzzy sets (1973)Google Scholar
  21. 21.
    Chen, M.Y., Linkens, D.A.: Rule-base self-generation and simplification for data-driven fuzzy models. In: The 10th IEEE International Conference on Fuzzy Systems, vol. 1, pp. 424–427. IEEE (2001)Google Scholar
  22. 22.
    Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat.-Theory Methods 3(1), 1–27 (1974)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Fukuyama, Y.: A new method of choosing the number of clusters for the fuzzy c-mean method. In: Proceedings of 5th Fuzzy Systems Symposium, pp. 247–250 (1989)Google Scholar
  24. 24.
    Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 841–847 (1991)CrossRefGoogle Scholar
  25. 25.
    Pal, N.R., Bezdek, J.C.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3(3), 370–379 (1995)CrossRefGoogle Scholar
  26. 26.
    Park, H.S., Jun, C.H.: A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 36(2), 3336–3341 (2009)CrossRefGoogle Scholar
  27. 27.
    Bensaid, A.M., Hall, L.O., Bezdek, J.C., Clarke, L.P., Silbiger, M.L., Arrington, J.A., Murtagh, R.F.: Validity-guided (re)clustering with applications to image segmentation. IEEE Trans. Fuzzy Syst. 4(2), 112–123 (1996)CrossRefGoogle Scholar
  28. 28.
    Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., Chang, C.C., Lin, C.C., Meyer, M.D.: Package e1071. Version 1.6-8 (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Aybüke Öztürk
    • 1
    • 2
    Email author
  • Stéphane Lallich
    • 1
  • Jérôme Darmont
    • 1
  • Sylvie Yona Waksman
    • 2
  1. 1.ERIC EA 3083, Université de Lyon, Lyon 2Bron CedexFrance
  2. 2.ArAr UMR 5138, Université de Lyon, Lyon 2Lyon Cedex 7France

Personalised recommendations