Missing Data in Hierarchical Classification of Variables — a Simulation Study
Here we develop from a first work the effect of missing data in hierarchical classification of variables according to the following factors: amount of missing data, imputation techniques, similarity coefficient, and aggregation criterion. We have used two methods of imputation, a regression method using an OLS method and an EM algorithm. For the similarity matrices we have used the basic affinity coefficient and the Pearson’s correlation coefficient. As aggregation criteria we apply average linkage, single linkage and complete linkage methods. To compare the structure of the hierarchical classifications the Spearman’s coefficient between the associated ultrametrics has been used. We present here simulation experiments in five multivariate normal cases.
KeywordsImputation Method Average Linkage Single Linkage Complete Linkage Hierarchical Classification
Unable to display preview. Download preview PDF.
- BACELAR-NICOLAU, H. (1981): Contributions to the Study of Comparison Coefficients in Cluster Analysis, Univ. Lisbon.Google Scholar
- BACELAR-NICOLAU, 11. (1988), Two probabilistic models for classification of variables in frequency tables, Classif. and Relat. Meth. of Data Analysis, H..H. Bock (ed.), North Holland, pp. 181–186.Google Scholar
- BACELAR-NICOLAU(2000) The Affinity Coefficient in Analysis of Symbolic Data Exploratory Methods for Extracting Statistical Information from Complex Data. H.H. Bock and E.Diday (Eds.), Springer,160–165.Google Scholar
- DEMPSTER, A. P., LAIRD, N. M. and RUBIN, D. B. (1977) Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion). J. R. Statist. Soc. B, 39, 1–38Google Scholar
- NICOLAU F.C., BACELAR-NICOLAU, H. (1998), Some Trends in the Classification of variables, Data Science, Classification, and Related Methods, C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H. H. Bock, Y. Baba (eds.), Springer, pp. 89–98Google Scholar
- SAPORTA, G. (1990) Probabilités, Analyse des Données et Statistique, Editions Technip, Paris.Google Scholar
- SILVA,A.L, BACELAR-NICOLAU, SAPORTA, G. and GEADA, M. (2001) Missing Data in Hierarchical Classification — a study with Personality development data,–32nd European Mathematical Psycology /EMPG 2001, 109–110.Google Scholar