The Problem of Cluster Analysis

Owsiński, Jan W.

doi:10.1007/978-3-030-13389-4_2

Jan W. Owsiński³

Part of the book series: Studies in Computational Intelligence ((SCI,volume 818))

441 Accesses

Abstract

We shall now formulate in general terms the basic or “generic” problem of cluster analysis, and then discuss the consequences of this formulation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Actually, given the sense of step i of the procedure, we do not deal with the matrix of distances between individual objects, d, but with the matrix of distances between clusters, D, which is, at the beginning of the procedure, identical with d, and then is transformed.
2.
Since we do by no means intend a survey of methods, only some selected, telling references shall be given for the methods considered. In this case—just three generic references will be mentioned: Florek, Łukaszewicz, Perkal, Steinhaus, and Zubrzycki (1956), where the origins of the so-called “single-linkage” algorithm can be found, and Lance and Williams (1966, 1967), who developed a more general theory of the agglomerative clustering procedures.
3.
Here, the seminal references are, first of all, Steinhaus (1956)—again(!), see the preceding footnote in order to appreciate the contribution of this Polish mathematician from the Lwów school of mathematics, largely founded by Stefan Banach; then there come Lloyd (1957)—soon afterwards, but similarly not ‘piercing’, and then Forgy (1965), Ball and Hall (1965), and MacQueen (1967). The fuzzy-set based version of the general k-means method, which became enormously popular, was formulated by Bezdek (1981).
4.
We stop here, since his is not really a survey, but also because not so many proper clustering methods exist outside of the paradigms mentioned. Thus, for instance, the so-called spectral clustering is actually simply a dimension reduction technique, which is, in practice, coupled with the other, proper clustering methods.
5.
The apparently highly intuitively appealing formulation: “a cluster is a set of points x_i such that all the distances between them are smaller than between any of them and any point outside of this set” is analysed and criticized, in particular, in Owsiński (1981, 2004a).
6.
Note that in this context we refer only to those of the partitioning or clustering criteria alluded to that are called “internal” (see, e.g., Rendón, Abundez, Arizmendi, & Quiroz, 2011), since the ones called “external” actually verify the classification capabilities of the respective methods, and do not address the clustering performance as such.
7.
This supposition is, of course, true, when we deal with a definite, very narrow class of data sets, e.g. we can assume all clusters correspond to some Gaussian distribution functions.

References

Bação, F., Lobo, V., & Painho, M. (2005). Self-organizing maps as substitutes for k-means clustering. In V. S. Sunderam et al (Eds.), ICCS 2005, (LNCS 3516, pp. 476–483).
Google Scholar
Ball, G., & Hall D. (1965). ISODATA, a novel method of data analysis and pattern classification. Technical report NTIS AD 699616. Stanford Research Institute, Stanford, CA.
Google Scholar
Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press.
Book Google Scholar
Chiu, S. L. (1994). Fuzzy model identification based on cluster estimation. Journal of Intelligent & fuzzy systems, 2, 267–278.
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X.-W. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In E. Simondis, J. Han, U. M. Fayyad. (Eds.), Proceeding of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96) (pp 226–231). AAAI Press.
Google Scholar
Florek, K., Łukaszewicz, J., Perkal, J., Steinhaus, H., & Zubrzycki, S. (1956). Taksonomia Wrocławska (The Wrocław Taxonomy; in Polish). Przegląd Antropologiczny, 17.
Google Scholar
Forgy, E. W. (1965). Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometric Society Meeting, Riverside, California, 1965. Abstract in Biometrics (1965) 21, 768.
Google Scholar
Kohonen, T. (2001). Self-organizing maps. Berlin-Heidelberg: Springer.
Book Google Scholar
Lance, G. N., & Williams, W. T. (1966). A generalized sorting strategy for computer classifications. Nature, 212, 218.
Article Google Scholar
Lance, G. N., & Williams, W. T. (1967). A general theory of classification sorting strategies. 1. Hierarchical Systems. The Computer Journal, 9, 373–380.
Google Scholar
Lindsten, F., Ohlsson, H., & Ljung, L. (2011). Just relax and come clustering! A convexification of k-means clustering. Technical Report, Automatic Control, Linköping University, LiTH-ISY-R-2992.
Google Scholar
Lloyd, S. P. (1957). Least squares quantization in PCM. Bell Telephone Labs Memorandum, Murray Hill, NJ; reprinted in IEEE Transactions Information Theory, IT-28 (1982), 2, 129–137.
Google Scholar
MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In L. M. LeCam, J. Neyman, (Eds.), Proceedings 5th Berkeley Symposium on Mathematical Statistics and Probability 1965/66 (vol. I, pp. 281–297). University of California Press, Berkeley.
Google Scholar
Owsiński, J. W. (1981). Intuition versus formalization: local and global criteria of grouping. Control and Cybernetics, 10(1–2), 73–88.
MathSciNet MATH Google Scholar
Owsiński, J.W. (2004a). Group opinion structure: The ideal structures, their relevance and effective use. In D. Baier & K.-D. Wernecke, (Eds.), Innovations in Classification, Data Science, and Information Systems. Proceeding 27th Annual GfKl Conference, University of Cottbus, March 12-14, 2003 (pp. 471–481), Springer, Heidelberg-Berlin.
Google Scholar
Owsiński, J. W., & Milczewski, M. (2010). Rekursja w problemie regionalizacji (Recursion in the regionalisation problem; in Polish). In J. W. Owsiński, (Ed.) Analiza systemów przestrzennych. Wybrane zagadnienia. Badania Systemowe (vol. 6, pp. 47–587). Instytut Badań Systemowych PAN, Warszawa.
Google Scholar
Rendón, E., Abundez, I., Arizmendi, A., & Quiroz, E. M. (2011). Internal versus external cluster validation indexes. International Journal of Computers and Communications, 5(1), 27–34.
Google Scholar
Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 322, 1492.
Article Google Scholar
Rota, G.-C. (1964). The number of partitions of a set. The American Mathematical Monthly, 71(5), 498–504.
Article MathSciNet Google Scholar
Steinhaus, H. (1956). Sur la division des corps matériels en parties. Bulletin de l’Academie Polonaise des Sciences, IV (C1.III), 801–804.
Google Scholar
Tremolières, R. (1979). The percolation method for an efficient grouping of data. Pattern Recognition, 11.
Google Scholar
Tremolières, R. (1981). Introduction aux fonctions de densité d`inertie (p. 234). IAE: Université Aix-Marseille, WP.
Google Scholar
Vendramin, L., Campello, R. J. G. B., & Hruschka, E. R. (2010). Relative clustering validity criteria: A comparative overview. Wiley InterScience. https://doi.org/10.1002/sam.10080.
Yager, R. R., & Filev, D. P. (1994). Approximate clustering via the mountain method. IEEE Transactions on Systems, Man, and Cybernetics, 24, 1279–1284.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Polish Academy of Sciences, Systems Research Institute, Warsaw, Poland
Jan W. Owsiński

Authors

Jan W. Owsiński
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan W. Owsiński .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Owsiński, J.W. (2020). The Problem of Cluster Analysis. In: Data Analysis in Bi-partial Perspective: Clustering and Beyond. Studies in Computational Intelligence, vol 818. Springer, Cham. https://doi.org/10.1007/978-3-030-13389-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-13389-4_2
Published: 24 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13388-7
Online ISBN: 978-3-030-13389-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics