Divisive and Separate Cluster Structures
This Chapter is about dividing a dataset or its subset in two parts. If both parts are to be clusters, this is referred to as divisive clustering. If just one part is to be a cluster, this will be referred to as separative clustering. Iterative application of divisive clustering builds a binary hierarchy of which we will be interested at a partition of the dataset. Iterative application of separative clustering builds a set of clusters, possibly overlapping. The first three sections introduce three different approaches in divisive clustering: Ward clustering, Spectral clustering and Single link clustering. Ward clustering is an extension of K-means clustering dominated by the so-called Ward distance between clusters; also, this is a natural niche for conceptual clustering in which every division is made over a single feature to attain immediate interpretability of the hierarchy branches and clusters. Spectral clustering gained popularity with the so-called Normalized Cut approach to divisive clustering. A relaxation of this combinatorial problem appears to be equivalent to optimizing the Rayleigh quotient for a Laplacian transformation of the similarity matrix under consideration. In fact, other approaches under consideration, such as uniform clustering and semi-average clustering, also may be treated within the spectral approach. Single link clustering formalizes the nearest neighbor approach and is much related to graph-theoretic concepts: components and maximum spanning trees. One may think of divisive clustering as a process for building a binary hierarchy, which goes “top-down” in contrast to agglomerative clustering (in Sect. 4.6), which builds a binary hierarchy “bottom-up”. Two remaining sections describe two separative clustering approaches as extensions of popular approaches to the case. One tries to find a cluster with maximum inner summary similarity at a similarity matrix preprocessed according to the uniform and modularity approaches considered in Sect. 4.6.3 The other applies the encoder-decoder least-squares approach to modeling data by a one-cluster structure. It appears, criteria emerging within the latter approach are much akin to those described earlier, the summary and semi-average similarities, although parameters now can be adjusted according to the least-squares approach. This applies to a distinct direction, the so-called additive clustering approach, which can be usefully applied to the analysis of similarity data.
- L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees (Wadswarth, Belmont, Ca, 1984)Google Scholar
- B. Mirkin, Mathematical Classification and Clustering (Kluwer Academic Press, 1996)Google Scholar
- B. Mirkin, Clustering: A Data Recovery Approach (Chapman & Hall/CRC, 2012)Google Scholar
- O. Boruvka, Příspěvek k řešení otázky ekonomické stavby elektrovodních sítí (Contribution to the solution of a problem of economical construction of electrical networks)” (in Czech). Elektronický Obzor 15, 153–154 (1926)Google Scholar
- D.H. Fisher, Knowledge acquisition via incremental conceptual clustering. Mach. Learn. 2, 139–172 (1987)Google Scholar
- B. Mirkin, Additive clustering and qualitative factor analysis methods for similarity matrices. J. Classif. 4, 7–31 (1987); Erratum 6, 271–272 (1989)Google Scholar