Divisive and Separate Cluster Structures

  • Boris MirkinEmail author
Part of the Undergraduate Topics in Computer Science book series (UTICS)


This Chapter is about dividing a dataset or its subset in two parts. If both parts are to be clusters, this is referred to as divisive clustering. If just one part is to be a cluster, this will be referred to as separative clustering. Iterative application of divisive clustering builds a binary hierarchy of which we will be interested at a partition of the dataset. Iterative application of separative clustering builds a set of clusters, possibly overlapping. The first three sections introduce three different approaches in divisive clustering: Ward clustering, Spectral clustering and Single link clustering. Ward clustering is an extension of K-means clustering dominated by the so-called Ward distance between clusters; also, this is a natural niche for conceptual clustering in which every division is made over a single feature to attain immediate interpretability of the hierarchy branches and clusters. Spectral clustering gained popularity with the so-called Normalized Cut approach to divisive clustering. A relaxation of this combinatorial problem appears to be equivalent to optimizing the Rayleigh quotient for a Laplacian transformation of the similarity matrix under consideration. In fact, other approaches under consideration, such as uniform clustering and semi-average clustering, also may be treated within the spectral approach. Single link clustering formalizes the nearest neighbor approach and is much related to graph-theoretic concepts: components and maximum spanning trees. One may think of divisive clustering as a process for building a binary hierarchy, which goes “top-down” in contrast to agglomerative clustering (in Sect.  4.6), which builds a binary hierarchy “bottom-up”. Two remaining sections describe two separative clustering approaches as extensions of popular approaches to the case. One tries to find a cluster with maximum inner summary similarity at a similarity matrix preprocessed according to the uniform and modularity approaches considered in Sect.  4.6.3 The other applies the encoder-decoder least-squares approach to modeling data by a one-cluster structure. It appears, criteria emerging within the latter approach are much akin to those described earlier, the summary and semi-average similarities, although parameters now can be adjusted according to the least-squares approach. This applies to a distinct direction, the so-called additive clustering approach, which can be usefully applied to the analysis of similarity data.


  1. L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees (Wadswarth, Belmont, Ca, 1984)Google Scholar
  2. B. Mirkin, Mathematical Classification and Clustering (Kluwer Academic Press, 1996)Google Scholar
  3. B. Mirkin, Clustering: A Data Recovery Approach (Chapman & Hall/CRC, 2012)Google Scholar
  4. F. Murtagh, Multidimensional Clustering Algorithms (Physica-Verlag, Vienna, 1985)zbMATHGoogle Scholar


  1. O. Boruvka, Příspěvek k řešení otázky ekonomické stavby elektrovodních sítí (Contribution to the solution of a problem of economical construction of electrical networks)” (in Czech). Elektronický Obzor 15, 153–154 (1926)Google Scholar
  2. D.H. Fisher, Knowledge acquisition via incremental conceptual clustering. Mach. Learn. 2, 139–172 (1987)Google Scholar
  3. S. Guattery, G. Miller, On the quality of spectral separators. SIAM J. Matrix Anal. Appl. 19(3), 701–719 (1998)MathSciNetCrossRefGoogle Scholar
  4. C. Klein, M. Randic, Resistance distance. J. Math. Chem. 12, 81–95 (1993)MathSciNetCrossRefGoogle Scholar
  5. J.B. Kruskal, On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7(1), 48–50 (1956)MathSciNetCrossRefGoogle Scholar
  6. G.N. Lance, W.T. Williams, A general theory of classificatory sorting strategies: 1. Hierarchical Systems. Comput. J. 9, 373–380 (1967)CrossRefGoogle Scholar
  7. U. Luxburg, A tutorial on spectral clustering. Stat. Comput. 17, 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  8. B. Mirkin, Additive clustering and qualitative factor analysis methods for similarity matrices. J. Classif. 4, 7–31 (1987); Erratum 6, 271–272 (1989)Google Scholar
  9. B. Mirkin, R. Camargo, T. Fenner, G. Loizou, P. Kellam, Similarity clustering of proteins using substantive knowledge and reconstruction of evolutionary gene histories in herpesvirus. Theor. Chem. Acc.: Theory, Comput., Model. 125(3–6), 569–582 (2010)CrossRefGoogle Scholar
  10. F. Murtagh, G. Downs, P. Contreras, Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding. SIAM J. Sci. Comput. 30, 707–730 (2008)MathSciNetCrossRefGoogle Scholar
  11. M.E.J. Newman, Modularity and community structure in networks. PNAS 103(23), 8577–8582 (2006)CrossRefGoogle Scholar
  12. R.C. Prim, Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36, 1389–1401 (1957)CrossRefGoogle Scholar
  13. J. Shi, J. Malik, Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRefGoogle Scholar
  14. R.N. Shepard, P. Arabie, Additive clustering: Representation of similarities as combinations of discrete overlapping properties. Psychol. Rev. 86, 87–123 (1979)CrossRefGoogle Scholar
  15. S.K. Tasoulis, D.K. Tasoulis, V.P. Plagianakos, Enhancing principal direction divisive clustering. Pattern Recogn. 43, 3391–3411 (2010)CrossRefGoogle Scholar
  16. J.H. Ward Jr., Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Data Analysis and Artificial Intelligence, Faculty of Computer ScienceNational Research University Higher School of EconomicsMoscowRussia
  2. 2.Professor Emeritus, Department of Computer Science and Information SystemsBirkbeck University of LondonLondonUK

Personalised recommendations