Abstract
We shall now formulate in general terms the basic or “generic” problem of cluster analysis, and then discuss the consequences of this formulation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Actually, given the sense of step i of the procedure, we do not deal with the matrix of distances between individual objects, d, but with the matrix of distances between clusters, D, which is, at the beginning of the procedure, identical with d, and then is transformed.
- 2.
Since we do by no means intend a survey of methods, only some selected, telling references shall be given for the methods considered. In this case—just three generic references will be mentioned: Florek, Łukaszewicz, Perkal, Steinhaus, and Zubrzycki (1956), where the origins of the so-called “single-linkage” algorithm can be found, and Lance and Williams (1966, 1967), who developed a more general theory of the agglomerative clustering procedures.
- 3.
Here, the seminal references are, first of all, Steinhaus (1956)—again(!), see the preceding footnote in order to appreciate the contribution of this Polish mathematician from the Lwów school of mathematics, largely founded by Stefan Banach; then there come Lloyd (1957)—soon afterwards, but similarly not ‘piercing’, and then Forgy (1965), Ball and Hall (1965), and MacQueen (1967). The fuzzy-set based version of the general k-means method, which became enormously popular, was formulated by Bezdek (1981).
- 4.
We stop here, since his is not really a survey, but also because not so many proper clustering methods exist outside of the paradigms mentioned. Thus, for instance, the so-called spectral clustering is actually simply a dimension reduction technique, which is, in practice, coupled with the other, proper clustering methods.
- 5.
- 6.
Note that in this context we refer only to those of the partitioning or clustering criteria alluded to that are called “internal” (see, e.g., Rendón, Abundez, Arizmendi, & Quiroz, 2011), since the ones called “external” actually verify the classification capabilities of the respective methods, and do not address the clustering performance as such.
- 7.
This supposition is, of course, true, when we deal with a definite, very narrow class of data sets, e.g. we can assume all clusters correspond to some Gaussian distribution functions.
References
Bação, F., Lobo, V., & Painho, M. (2005). Self-organizing maps as substitutes for k-means clustering. In V. S. Sunderam et al (Eds.), ICCS 2005, (LNCS 3516, pp. 476–483).
Ball, G., & Hall D. (1965). ISODATA, a novel method of data analysis and pattern classification. Technical report NTIS AD 699616. Stanford Research Institute, Stanford, CA.
Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press.
Chiu, S. L. (1994). Fuzzy model identification based on cluster estimation. Journal of Intelligent & fuzzy systems, 2, 267–278.
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X.-W. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In E. Simondis, J. Han, U. M. Fayyad. (Eds.), Proceeding of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96) (pp 226–231). AAAI Press.
Florek, K., Łukaszewicz, J., Perkal, J., Steinhaus, H., & Zubrzycki, S. (1956). Taksonomia Wrocławska (The Wrocław Taxonomy; in Polish). Przegląd Antropologiczny, 17.
Forgy, E. W. (1965). Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometric Society Meeting, Riverside, California, 1965. Abstract in Biometrics (1965) 21, 768.
Kohonen, T. (2001). Self-organizing maps. Berlin-Heidelberg: Springer.
Lance, G. N., & Williams, W. T. (1966). A generalized sorting strategy for computer classifications. Nature, 212, 218.
Lance, G. N., & Williams, W. T. (1967). A general theory of classification sorting strategies. 1. Hierarchical Systems. The Computer Journal, 9, 373–380.
Lindsten, F., Ohlsson, H., & Ljung, L. (2011). Just relax and come clustering! A convexification of k-means clustering. Technical Report, Automatic Control, Linköping University, LiTH-ISY-R-2992.
Lloyd, S. P. (1957). Least squares quantization in PCM. Bell Telephone Labs Memorandum, Murray Hill, NJ; reprinted in IEEE Transactions Information Theory, IT-28 (1982), 2, 129–137.
MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In L. M. LeCam, J. Neyman, (Eds.), Proceedings 5th Berkeley Symposium on Mathematical Statistics and Probability 1965/66 (vol. I, pp. 281–297). University of California Press, Berkeley.
Owsiński, J. W. (1981). Intuition versus formalization: local and global criteria of grouping. Control and Cybernetics, 10(1–2), 73–88.
Owsiński, J.W. (2004a). Group opinion structure: The ideal structures, their relevance and effective use. In D. Baier & K.-D. Wernecke, (Eds.), Innovations in Classification, Data Science, and Information Systems. Proceeding 27th Annual GfKl Conference, University of Cottbus, March 12-14, 2003 (pp. 471–481), Springer, Heidelberg-Berlin.
Owsiński, J. W., & Milczewski, M. (2010). Rekursja w problemie regionalizacji (Recursion in the regionalisation problem; in Polish). In J. W. Owsiński, (Ed.) Analiza systemów przestrzennych. Wybrane zagadnienia. Badania Systemowe (vol. 6, pp. 47–587). Instytut Badań Systemowych PAN, Warszawa.
Rendón, E., Abundez, I., Arizmendi, A., & Quiroz, E. M. (2011). Internal versus external cluster validation indexes. International Journal of Computers and Communications, 5(1), 27–34.
Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 322, 1492.
Rota, G.-C. (1964). The number of partitions of a set. The American Mathematical Monthly, 71(5), 498–504.
Steinhaus, H. (1956). Sur la division des corps matériels en parties. Bulletin de l’Academie Polonaise des Sciences, IV (C1.III), 801–804.
Tremolières, R. (1979). The percolation method for an efficient grouping of data. Pattern Recognition, 11.
Tremolières, R. (1981). Introduction aux fonctions de densité d`inertie (p. 234). IAE: Université Aix-Marseille, WP.
Vendramin, L., Campello, R. J. G. B., & Hruschka, E. R. (2010). Relative clustering validity criteria: A comparative overview. Wiley InterScience. https://doi.org/10.1002/sam.10080.
Yager, R. R., & Filev, D. P. (1994). Approximate clustering via the mountain method. IEEE Transactions on Systems, Man, and Cybernetics, 24, 1279–1284.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Owsiński, J.W. (2020). The Problem of Cluster Analysis. In: Data Analysis in Bi-partial Perspective: Clustering and Beyond. Studies in Computational Intelligence, vol 818. Springer, Cham. https://doi.org/10.1007/978-3-030-13389-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-13389-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13388-7
Online ISBN: 978-3-030-13389-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)