Clustering of High-Dimensional and Correlated Data
Finite mixture models are being commonly used in a wide range of applications in practice concerning density estimation and clustering. An attractive feature of this approach to clustering is that it provides a sound statistical framework in which to assess the important question of how many clusters there are in the data and their validity. We consider the applications of normal mixture models to high-dimensional data of a continuous nature. One way to handle the fitting of normal mixture models is to adopt mixtures of factor analyzers. However, for extremely high-dimensional data, some variable-reduction method needs to be used in conjunction with the latter model such as with the procedure called EMMIX-GENE. It was developed for the clustering of microarray data in bioinformatics, but is applicable to other types of data. We shall also consider the mixture procedure EMMIX-WIRE (based on mixtures of normal components with random effects), which is suitable for clustering high-dimensional data that may be structured (correlated and replicated) as in longitudinal studies.
KeywordsMixture Model Component Density Factor Analyzer Model Finite Mixture Model Normal Mixture Model
- McLachlan, G., & Peel, D. (1998). Robust cluster analysis via mixtures of multivariate t-distributions. In: A. Amin, D. Dori, P. Pudil, & H. Freeman (Eds.), Lecture notes in computer science (Vol. 1451, pp. 658–666). Berlin: Springer.Google Scholar
- Soffritti, G. (2003). Identifying multiple cluster structures in a data matrix. Communications in Statistics – Simulation and Computation, 32, 1151–1177.Google Scholar
- Wolfe, J. (1965). A computer program for the computation of maximum likelihood analysis of types (Technical Report SRM 65-112). US Naval Personnel Research Activity, San Diego.Google Scholar