# Divisive and Separate Cluster Structures

## Abstract

This Chapter is about dividing a dataset or its subset in two parts. If both parts are to be clusters, this is referred to as divisive clustering. If just one part is to be a cluster, this will be referred to as separative clustering. Iterative application of divisive clustering builds a binary hierarchy of which we will be interested at a partition of the dataset. Iterative application of separative clustering builds a set of clusters, possibly overlapping. The first three sections introduce three different approaches in divisive clustering: Ward clustering, Spectral clustering and Single link clustering. Ward clustering is an extension of K-means clustering dominated by the so-called Ward distance between clusters; also, this is a natural niche for conceptual clustering in which every division is made over a single feature to attain immediate interpretability of the hierarchy branches and clusters. Spectral clustering gained popularity with the so-called Normalized Cut approach to divisive clustering. A relaxation of this combinatorial problem appears to be equivalent to optimizing the Rayleigh quotient for a Laplacian transformation of the similarity matrix under consideration. In fact, other approaches under consideration, such as uniform clustering and semi-average clustering, also may be treated within the spectral approach. Single link clustering formalizes the nearest neighbor approach and is much related to graph-theoretic concepts: components and maximum spanning trees. One may think of divisive clustering as a process for building a binary hierarchy, which goes “top-down” in contrast to agglomerative clustering (in Sect. 4.6), which builds a binary hierarchy “bottom-up”. Two remaining sections describe two separative clustering approaches as extensions of popular approaches to the case. One tries to find a cluster with maximum inner summary similarity at a similarity matrix preprocessed according to the uniform and modularity approaches considered in Sect. 4.6.3 The other applies the encoder-decoder least-squares approach to modeling data by a one-cluster structure. It appears, criteria emerging within the latter approach are much akin to those described earlier, the summary and semi-average similarities, although parameters now can be adjusted according to the least-squares approach. This applies to a distinct direction, the so-called additive clustering approach, which can be usefully applied to the analysis of similarity data.

## References

- L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone,
*Classification and Regression Trees*(Wadswarth, Belmont, Ca, 1984)Google Scholar - B. Mirkin,
*Mathematical Classification and Clustering*(Kluwer Academic Press, 1996)Google Scholar - B. Mirkin,
*Clustering: A Data Recovery Approach*(Chapman & Hall/CRC, 2012)Google Scholar - F. Murtagh,
*Multidimensional Clustering Algorithms*(Physica-Verlag, Vienna, 1985)zbMATHGoogle Scholar

## Articles

- O. Boruvka, Příspěvek k řešení otázky ekonomické stavby elektrovodních sítí (Contribution to the solution of a problem of economical construction of electrical networks)” (in Czech). Elektronický Obzor
**15**, 153–154 (1926)Google Scholar - D.H. Fisher, Knowledge acquisition via incremental conceptual clustering. Mach. Learn.
**2**, 139–172 (1987)Google Scholar - S. Guattery, G. Miller, On the quality of spectral separators. SIAM J. Matrix Anal. Appl.
**19**(3), 701–719 (1998)MathSciNetCrossRefGoogle Scholar - C. Klein, M. Randic, Resistance distance. J. Math. Chem.
**12**, 81–95 (1993)MathSciNetCrossRefGoogle Scholar - J.B. Kruskal, On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc.
**7**(1), 48–50 (1956)MathSciNetCrossRefGoogle Scholar - G.N. Lance, W.T. Williams, A general theory of classificatory sorting strategies: 1. Hierarchical Systems. Comput. J.
**9**, 373–380 (1967)CrossRefGoogle Scholar - U. Luxburg, A tutorial on spectral clustering. Stat. Comput.
**17**, 395–416 (2007)MathSciNetCrossRefGoogle Scholar - B. Mirkin, Additive clustering and qualitative factor analysis methods for similarity matrices. J. Classif.
**4**, 7–31 (1987); Erratum**6**, 271–272 (1989)Google Scholar - B. Mirkin, R. Camargo, T. Fenner, G. Loizou, P. Kellam, Similarity clustering of proteins using substantive knowledge and reconstruction of evolutionary gene histories in herpesvirus. Theor. Chem. Acc.: Theory, Comput., Model.
**125**(3–6), 569–582 (2010)CrossRefGoogle Scholar - F. Murtagh, G. Downs, P. Contreras, Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding. SIAM J. Sci. Comput.
**30**, 707–730 (2008)MathSciNetCrossRefGoogle Scholar - M.E.J. Newman, Modularity and community structure in networks. PNAS
**103**(23), 8577–8582 (2006)CrossRefGoogle Scholar - R.C. Prim, Shortest connection networks and some generalizations. Bell Syst. Tech. J.
**36**, 1389–1401 (1957)CrossRefGoogle Scholar - J. Shi, J. Malik, Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell.
**22**(8), 888–905 (2000)CrossRefGoogle Scholar - R.N. Shepard, P. Arabie, Additive clustering: Representation of similarities as combinations of discrete overlapping properties. Psychol. Rev.
**86**, 87–123 (1979)CrossRefGoogle Scholar - S.K. Tasoulis, D.K. Tasoulis, V.P. Plagianakos, Enhancing principal direction divisive clustering. Pattern Recogn.
**43**, 3391–3411 (2010)CrossRefGoogle Scholar - J.H. Ward Jr., Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc.
**58**, 236–244 (1963)MathSciNetCrossRefGoogle Scholar