Missing Data in Hierarchical Classification of Variables — a Simulation Study

  • Ana Lorga da Silva
  • Helena Bacelar-Nicolau
  • Gilbert Saporta
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


Here we develop from a first work the effect of missing data in hierarchical classification of variables according to the following factors: amount of missing data, imputation techniques, similarity coefficient, and aggregation criterion. We have used two methods of imputation, a regression method using an OLS method and an EM algorithm. For the similarity matrices we have used the basic affinity coefficient and the Pearson’s correlation coefficient. As aggregation criteria we apply average linkage, single linkage and complete linkage methods. To compare the structure of the hierarchical classifications the Spearman’s coefficient between the associated ultrametrics has been used. We present here simulation experiments in five multivariate normal cases.


Imputation Method Average Linkage Single Linkage Complete Linkage Hierarchical Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. BACELAR-NICOLAU, H. (1981): Contributions to the Study of Comparison Coefficients in Cluster Analysis, Univ. Lisbon.Google Scholar
  2. BACELAR-NICOLAU, 11. (1988), Two probabilistic models for classification of variables in frequency tables, Classif. and Relat. Meth. of Data Analysis, H..H. Bock (ed.), North Holland, pp. 181–186.Google Scholar
  3. BACELAR-NICOLAU(2000) The Affinity Coefficient in Analysis of Symbolic Data Exploratory Methods for Extracting Statistical Information from Complex Data. H.H. Bock and E.Diday (Eds.), Springer,160–165.Google Scholar
  4. BEALE, E. M. L. and LITTLE, R. J. A. (1975) Missing values in multivariate data analysis. J. R. Statist. Soc. B, 37, 129–145.MathSciNetzbMATHGoogle Scholar
  5. DEMPSTER, A. P., LAIRD, N. M. and RUBIN, D. B. (1977) Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion). J. R. Statist. Soc. B, 39, 1–38Google Scholar
  6. LITTLE, R. J. A. and RUBIN, D. B. (1987) Statistical Analysis With Missing Data, John Wiley Sons, New York.zbMATHGoogle Scholar
  7. MATUSITA,K. (1955) Decision rules, based on distance for problems of fit, two samples and estimation. Ann. Math. Stat., vol 26, n4, 631–640.MathSciNetCrossRefGoogle Scholar
  8. NICOLAU F.C., BACELAR-NICOLAU, H. (1998), Some Trends in the Classification of variables, Data Science, Classification, and Related Methods, C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H. H. Bock, Y. Baba (eds.), Springer, pp. 89–98Google Scholar
  9. ORCHARD, T. and WOODBURY, M. A. (1972) A missing information principle: theory and applications. Proceedings 6th Berkley Symposium on Mathematical Statistics and Probability, 1, 697–715.MathSciNetGoogle Scholar
  10. RUBIN, D. B. (1974) Characterising the estimation of parameters in the estimation of parameters in incomplete-data problems. JASA, 69, 467–474.zbMATHCrossRefGoogle Scholar
  11. SAPORTA, G. (1990) Probabilités, Analyse des Données et Statistique, Editions Technip, Paris.Google Scholar
  12. SILVA,A.L, BACELAR-NICOLAU, SAPORTA, G. and GEADA, M. (2001) Missing Data in Hierarchical Classification — a study with Personality development data,–32nd European Mathematical Psycology /EMPG 2001, 109–110.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Ana Lorga da Silva
    • 1
  • Helena Bacelar-Nicolau
    • 2
  • Gilbert Saporta
    • 3
  1. 1.Department of MathematicsISEG, Tecnic UniversityLisbonPortugal
  2. 2.FPCE LEAD — Lisbon UniversityLisbonPortugal
  3. 3.Statistics DepartmentCNAMParisFrance

Personalised recommendations