Abstract
In the previous chapter, the basic data clustering methods were introduced. In this chapter, several advanced clustering scenarios will be studied, such as the impact of the size, dimensionality, or type of the underlying data. In addition, it is possible to obtain significant insights with the use of advanced supervision methods, or with the use of ensemble-based algorithms. In particular, two important aspects of clustering algorithms will be addressed:
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It is possible to store the sum of the values in \(\overline {SS}\) across the \(d\) dimensions in lieu of \(\overline {SS}\), without affecting the usability of the cluster feature. This would result in a cluster feature of size \((d+2)\) instead of \((2 \cdot d +1)\).
- 2.
The original BIRCH algorithm proposes to use the pairwise root mean square (RMS) distance between cluster data points as the diameter. This is one possible measure of the intracluster distance. This value can also be shown to be computable from the CF vector as \(\sqrt {\frac { \sum _{i=1}^d ( 2 \cdot m \cdot SS_i - 2\cdot LS_i^2)}{m \cdot (m-1)}}\).
- 3.
- 4.
See discussion in Chap. 6 about Fig. 6.14.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Aggarwal, C. (2015). Cluster Analysis: Advanced Concepts. In: Data Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-14142-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-14142-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14141-1
Online ISBN: 978-3-319-14142-8
eBook Packages: Computer ScienceComputer Science (R0)