A New Method for Detecting Influential Observations in Nonhierarchical Cluster Analysis
In this paper we propose a new approach to the exploratory analysis of multivariate clustered data. Our technique is based on a fast forward search algorithm which orders multivariate observations from those most in agreement with a specified clustering structure to those least in agreement with it. Simple graphical displays of a variety of statistics involved in the forward search lead to the identification of multiple outliers and influential observations in nonhierarchical cluster analysis, without being affected by masking and swamping problems. The suggested approach is applied to the convergent K-means method in two examples, both with real and simulated data.
Keywordscluster validity forward search K-means Masking multivariate outliers
Unable to display preview. Download preview PDF.
- Barnett V. and Lewis T. (1994). Outliers in Statistical Data. 3rd Edition Google Scholar
- Wiley, Chichester. Cerioli A. (1997). Measuring the Influence of Individual Observations and Variables in Cluster Analysis, submitted for publication in the series: Classification, Data Analysis and Knowledge Organization, Springer-Verlag, Berlin.Google Scholar
- Eurostat (1997). Regionen. Statistisches Jahrbuch. 1996, Luxembourg.Google Scholar
- Gordon A. D. (1996). Hierarchical Classification, in: Clustering and Classification, P. Arabie, L. J. Hubert and G. De Soete (eds.), World Scientific, Singapore, 65–121.Google Scholar
- Hadi A. S. (1992). Identifying Multiple Outliers in Multivariate Data, Journal of the Royal Statistical Society, B, 54, 761–771.Google Scholar
- Milligan G. W. (1996). Clustering Validation: Results and Implications for Applied Analyses, in: Clustering and Classification, P. Arabie, L. J. Hubert and G. De Soete (eds.), World Scientific, Singapore, 341–375.Google Scholar
- SAS (1990). SAS/STAT User’s Guide. Ver. 6. 4th Edition, SAS Institute, Cary, NC.Google Scholar