Measuring the Influence of Individual Observations and Variables in Cluster Analysis

Cerioli, Andrea

doi:10.1007/978-3-642-60126-2_1

Andrea Cerioli⁶

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

339 Accesses

Abstract

In this paper we address some issues in the field of cluster stability. In particular, we study the effect of deleting individual cases and variables on the results of a (nonhierarchical) cluster analysis. We do not restrict to computation of a single influence measure for each data point, or variable, but we analyze how individual influence varies when the number of clusters changes. For this purpose we suggest the use of simple deletion diagnostics computed by cross-validation. The suggested approach is applied to real data and results are displayed by means of a simple tool of modern multivariate-data visualization. Furthermore, the performance of our diagnostics is assessed through Monte Carlo simulations both under the null hypothesis of well-behaved data and the alternative hypothesis of isolated contamination.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atkinson A. C. (1994). Fast Very Robust Methods for the Detection of Multiple Outliers, Journal of the American Statistical Association, 89, 1329–1339.
Article Google Scholar
Cerioli A. (1997). Comparing Three Partitions: An Inferential Approach Based on Multi-Way Contingency Tables, Communications in Statistics, Part A: Theory and Methods, 26, 2457–2471.
Article Google Scholar
Cheng R. and Milligan G. W. (1996). Measuring the Influence of Individual Data Points in a Cluster Analysis, Journal of Classification, 13, 315–335.
Article Google Scholar
Gnanadesikan R., Kettenring J. R. and Landwehr J. M. (1977). Interpreting and Assessing the Results of Cluster Analyses, Bulletin of the International Statistical Institute, 47, 451–463.
Google Scholar
Gordon A. D. (1996). Hierarchical Classification, in: Clustering and Classification, P. Arabie, L. J. Hubert and G. De Soete (eds.), World Scientific, Singapore, 65–121.
Google Scholar
Hubert L. J. and Arabie R (1985). Comparing Partitions, Journal of Classification, 2, 193–218.
Article Google Scholar
Jolliffe I. T., Jones B. and Morgan B. J. T. (1995). Identifying Influential Observations in Hierarchical Cluster Analysis, Journal of Applied Statistics, 22, 61–80.
Article Google Scholar
Milligan G. W. (1996). Clustering Validation: Results and Implications for Applied Analyses, in: Clustering and Classification, R Arabie, L. J. Hubert and G. De Soete (eds.), World Scientific, Singapore, 341–375.
Google Scholar
SAS (1990). SAS/STAT User’s Guide. Ver. 6. 4th Edition, SAS Institute, Cary, NC.
Google Scholar

Download references

Author information

Authors and Affiliations

Istituto di Statistica, Università di Parma, Via Kennedy 6, 43100, Parma, Italy
Andrea Cerioli

Authors

Andrea Cerioli
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento Metodi Quantitativi e Teoria Economica, University “G. D’Annunzio” di Chieti, Viale Pindaro 42, I-65127, Pescara, Italy
Maurizio Vichi
Lehrstuhl für Mathematische Methoden der Wirtschaftswissenschaften, University of Augsburg, D-86135, Augsburg, Germany
Otto Opitz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cerioli, A. (1999). Measuring the Influence of Individual Observations and Variables in Cluster Analysis. In: Vichi, M., Opitz, O. (eds) Classification and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60126-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-60126-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65633-3
Online ISBN: 978-3-642-60126-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics