Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 818))

  • 441 Accesses

Abstract

We shall now formulate in general terms the basic or “generic” problem of cluster analysis, and then discuss the consequences of this formulation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Actually, given the sense of step i of the procedure, we do not deal with the matrix of distances between individual objects, d, but with the matrix of distances between clusters, D, which is, at the beginning of the procedure, identical with d, and then is transformed.

  2. 2.

    Since we do by no means intend a survey of methods, only some selected, telling references shall be given for the methods considered. In this case—just three generic references will be mentioned: Florek, Łukaszewicz, Perkal, Steinhaus, and Zubrzycki (1956), where the origins of the so-called “single-linkage” algorithm can be found, and Lance and Williams (1966, 1967), who developed a more general theory of the agglomerative clustering procedures.

  3. 3.

    Here, the seminal references are, first of all, Steinhaus (1956)—again(!), see the preceding footnote in order to appreciate the contribution of this Polish mathematician from the Lwów school of mathematics, largely founded by Stefan Banach; then there come Lloyd (1957)—soon afterwards, but similarly not ‘piercing’, and then Forgy (1965), Ball and Hall (1965), and MacQueen (1967). The fuzzy-set based version of the general k-means method, which became enormously popular, was formulated by Bezdek (1981).

  4. 4.

    We stop here, since his is not really a survey, but also because not so many proper clustering methods exist outside of the paradigms mentioned. Thus, for instance, the so-called spectral clustering is actually simply a dimension reduction technique, which is, in practice, coupled with the other, proper clustering methods.

  5. 5.

    The apparently highly intuitively appealing formulation: “a cluster is a set of points xi such that all the distances between them are smaller than between any of them and any point outside of this set” is analysed and criticized, in particular, in Owsiński (1981, 2004a).

  6. 6.

    Note that in this context we refer only to those of the partitioning or clustering criteria alluded to that are called “internal” (see, e.g., Rendón, Abundez, Arizmendi, & Quiroz, 2011), since the ones called “external” actually verify the classification capabilities of the respective methods, and do not address the clustering performance as such.

  7. 7.

    This supposition is, of course, true, when we deal with a definite, very narrow class of data sets, e.g. we can assume all clusters correspond to some Gaussian distribution functions.

References

  • Bação, F., Lobo, V., & Painho, M. (2005). Self-organizing maps as substitutes for k-means clustering. In V. S. Sunderam et al (Eds.), ICCS 2005, (LNCS 3516, pp. 476–483).

    Google Scholar 

  • Ball, G., & Hall D. (1965). ISODATA, a novel method of data analysis and pattern classification. Technical report NTIS AD 699616. Stanford Research Institute, Stanford, CA.

    Google Scholar 

  • Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press.

    Book  Google Scholar 

  • Chiu, S. L. (1994). Fuzzy model identification based on cluster estimation. Journal of Intelligent & fuzzy systems, 2, 267–278.

    Google Scholar 

  • Ester, M., Kriegel, H.-P., Sander, J., & Xu, X.-W. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In E. Simondis, J. Han, U. M. Fayyad. (Eds.), Proceeding of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96) (pp 226–231). AAAI Press.

    Google Scholar 

  • Florek, K., Łukaszewicz, J., Perkal, J., Steinhaus, H., & Zubrzycki, S. (1956). Taksonomia Wrocławska (The Wrocław Taxonomy; in Polish). Przegląd Antropologiczny, 17.

    Google Scholar 

  • Forgy, E. W. (1965). Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometric Society Meeting, Riverside, California, 1965. Abstract in Biometrics (1965) 21, 768.

    Google Scholar 

  • Kohonen, T. (2001). Self-organizing maps. Berlin-Heidelberg: Springer.

    Book  Google Scholar 

  • Lance, G. N., & Williams, W. T. (1966). A generalized sorting strategy for computer classifications. Nature, 212, 218.

    Article  Google Scholar 

  • Lance, G. N., & Williams, W. T. (1967). A general theory of classification sorting strategies. 1. Hierarchical Systems. The Computer Journal, 9, 373–380.

    Google Scholar 

  • Lindsten, F., Ohlsson, H., & Ljung, L. (2011). Just relax and come clustering! A convexification of k-means clustering. Technical Report, Automatic Control, Linköping University, LiTH-ISY-R-2992.

    Google Scholar 

  • Lloyd, S. P. (1957). Least squares quantization in PCM. Bell Telephone Labs Memorandum, Murray Hill, NJ; reprinted in IEEE Transactions Information Theory, IT-28 (1982), 2, 129–137.

    Google Scholar 

  • MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In L. M. LeCam, J. Neyman, (Eds.), Proceedings 5th Berkeley Symposium on Mathematical Statistics and Probability 1965/66 (vol. I, pp. 281–297). University of California Press, Berkeley.

    Google Scholar 

  • Owsiński, J. W. (1981). Intuition versus formalization: local and global criteria of grouping. Control and Cybernetics, 10(1–2), 73–88.

    MathSciNet  MATH  Google Scholar 

  • Owsiński, J.W. (2004a). Group opinion structure: The ideal structures, their relevance and effective use. In D. Baier & K.-D. Wernecke, (Eds.), Innovations in Classification, Data Science, and Information Systems. Proceeding 27th Annual GfKl Conference, University of Cottbus, March 12-14, 2003 (pp. 471–481), Springer, Heidelberg-Berlin.

    Google Scholar 

  • Owsiński, J. W., & Milczewski, M. (2010). Rekursja w problemie regionalizacji (Recursion in the regionalisation problem; in Polish). In J. W. Owsiński, (Ed.) Analiza systemów przestrzennych. Wybrane zagadnienia. Badania Systemowe (vol. 6, pp. 47–587). Instytut Badań Systemowych PAN, Warszawa.

    Google Scholar 

  • Rendón, E., Abundez, I., Arizmendi, A., & Quiroz, E. M. (2011). Internal versus external cluster validation indexes. International Journal of Computers and Communications, 5(1), 27–34.

    Google Scholar 

  • Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 322, 1492.

    Article  Google Scholar 

  • Rota, G.-C. (1964). The number of partitions of a set. The American Mathematical Monthly, 71(5), 498–504.

    Article  MathSciNet  Google Scholar 

  • Steinhaus, H. (1956). Sur la division des corps matériels en parties. Bulletin de l’Academie Polonaise des Sciences, IV (C1.III), 801–804.

    Google Scholar 

  • Tremolières, R. (1979). The percolation method for an efficient grouping of data. Pattern Recognition, 11.

    Google Scholar 

  • Tremolières, R. (1981). Introduction aux fonctions de densité d`inertie (p. 234). IAE: Université Aix-Marseille, WP.

    Google Scholar 

  • Vendramin, L., Campello, R. J. G. B., & Hruschka, E. R. (2010). Relative clustering validity criteria: A comparative overview. Wiley InterScience. https://doi.org/10.1002/sam.10080.

  • Yager, R. R., & Filev, D. P. (1994). Approximate clustering via the mountain method. IEEE Transactions on Systems, Man, and Cybernetics, 24, 1279–1284.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan W. Owsiński .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Owsiński, J.W. (2020). The Problem of Cluster Analysis. In: Data Analysis in Bi-partial Perspective: Clustering and Beyond. Studies in Computational Intelligence, vol 818. Springer, Cham. https://doi.org/10.1007/978-3-030-13389-4_2

Download citation

Publish with us

Policies and ethics