Metric Considerations in Clustering: Implications for Algorithms
Given measurements on p variables for each of n individuals, aspects of the problem of clustering the individuals are considered. Special attention is given to models based upon mixtures of distributions, esp. multivariate normal distributions. The relationship between the orientation(s) of the clusters and the nature of the within-cluster covariance matrices is reviewed, as is the inadequacy of transformation to principal components based on the overall (total) covariance matrix of the whole (mixed) sample. The nature of certain iterative algorithms is discussed; variations which result from allowing different covariance matrices within clusters are studied.
Key words and phrasesCluster analysis Mahalanobis distance mixture model isodata k-means
Unable to display preview. Download preview PDF.
- Akaike, H. (1983). ‘Statistical Inference and Measurement of Entropy.’ In G.E.P. Box, T. Leonard, and C.-F. Wu (eds.), Scientific Inference, Data Analysis, and Robustness, 165–189. New York: Academic Press.Google Scholar
- Akaike, H.(1985). ‘Prediction and Entropy.’ In A.C. Atkinson and S.E. Fienberg (eds.), A Celebration of Statistics: the ISI Centenary Volume, 1–24. New York: Springer-Verlag.Google Scholar
- Anderson, E. (1935). ‘The Irises of the Gaspe Peninsula,’ Bulletin of the American Iris Society 59, 2–5.Google Scholar
- Anderson, T.W.(1984). An Introduct ion to Multivariate Statistical Analysis, 2nd ed. New York: John Wiley and Sons.Google Scholar
- Chernoff, H. (1972). ‘Metric Considerations in Cluster Analysis,’ Proc. 6th Berkeley Symposium on Mathematical Statistics and Probability II, 621–630. Berkeley: University of California Press.Google Scholar
- Dixon, W.J., and Massey, F.J. (1969). Introduction to Statistical Analysis, 3rd ed. New York: McGraw-Hill.Google Scholar
- MacQueen, J. (1966). ‘Some Methods for Classification and Analysis of Multivariate Observations.’ In Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability I, 281–297. Berkeley: University of California Press.Google Scholar
- McLachlan, G.J. (1982). ‘The Classification and Mixture Maximum Likelihood Approaches to Cluster Analysis.’ In P.R. Krishnaiah and L.N. Kanal (eds.), Handbook of Statistics 2 (Classification, Pattern Recognition and Reduction of Dimensionality), 199–208. New York: North Holland.Google Scholar
- Solomon, H. (1977). ‘Data Dependent Clustering Techniques,’ In J. Van Ryzin (ed.), Classification and Clustering, 155–174. New York: Academic Press.Google Scholar
- Van Ryzin, J., ed.(1977). Classification and Clustering. New York: Academic Press.Google Scholar