A New Method for Detecting Influential Observations in Nonhierarchical Cluster Analysis

  • Andrea Cerioli
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


In this paper we propose a new approach to the exploratory analysis of multivariate clustered data. Our technique is based on a fast forward search algorithm which orders multivariate observations from those most in agreement with a specified clustering structure to those least in agreement with it. Simple graphical displays of a variety of statistics involved in the forward search lead to the identification of multiple outliers and influential observations in nonhierarchical cluster analysis, without being affected by masking and swamping problems. The suggested approach is applied to the convergent K-means method in two examples, both with real and simulated data.


cluster validity forward search K-means Masking multivariate outliers 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Atkinson A. C. (1994). Fast Very Robust Methods for the Detection of Multiple Outliers, Journal of the American Statistical Association, 89, 1329–1339.CrossRefGoogle Scholar
  2. Atkinson A. C. and Riani M. (1997). Bivariate Boxplots, Multiple Outliers, Multivariate Transformations and Discriminant Analysis: the 1997 Hunter Lecture, Environmetrics, 8, 583–602.CrossRefGoogle Scholar
  3. Barnett V. and Lewis T. (1994). Outliers in Statistical Data. 3rd Edition Google Scholar
  4. Wiley, Chichester. Cerioli A. (1997). Measuring the Influence of Individual Observations and Variables in Cluster Analysis, submitted for publication in the series: Classification, Data Analysis and Knowledge Organization, Springer-Verlag, Berlin.Google Scholar
  5. Cheng R. and Milligan G. W. (1996). Measuring the Influence of Individual Data Points in a Cluster Analysis, Journal of Classification, 13, 315–335.CrossRefGoogle Scholar
  6. Cuesta-Albertos J. A., Gordaliza A. and Matrán C. (1997). Trimmed k- Means: an Attempt to Robustify Quantizers, The Annals of Statistics, 25, 553–576.CrossRefGoogle Scholar
  7. Eurostat (1997). Regionen. Statistisches Jahrbuch. 1996, Luxembourg.Google Scholar
  8. Gordon A. D. (1996). Hierarchical Classification, in: Clustering and Classification, P. Arabie, L. J. Hubert and G. De Soete (eds.), World Scientific, Singapore, 65–121.Google Scholar
  9. Hadi A. S. (1992). Identifying Multiple Outliers in Multivariate Data, Journal of the Royal Statistical Society, B, 54, 761–771.Google Scholar
  10. Hadi A. S. and Simonoff J. S. (1993). Procedures for the Identification of Multiple Outliers in Linear Models, Journal of the American Statistical Association, 88, 1264–1272.CrossRefGoogle Scholar
  11. Jolliffe I. T., Jones B. and Morgan B. J. T. (1995). Identifying Influential Observations in Hierarchical Cluster Analysis, Journal of Applied Statistics, 22, 61–80.CrossRefGoogle Scholar
  12. Milligan G. W. (1996). Clustering Validation: Results and Implications for Applied Analyses, in: Clustering and Classification, P. Arabie, L. J. Hubert and G. De Soete (eds.), World Scientific, Singapore, 341–375.Google Scholar
  13. SAS (1990). SAS/STAT User’s Guide. Ver. 6. 4th Edition, SAS Institute, Cary, NC.Google Scholar

Copyright information

© Springer-Verlag Berlin · Heidelberg 1998

Authors and Affiliations

  • Andrea Cerioli
    • 1
  1. 1.Istituto di StatisticaUniversità di ParmaParmaItaly

Personalised recommendations