# Gaussian parsimonious clustering models with covariates and a noise component

- 152 Downloads
- 1 Citations

## Abstract

We consider model-based clustering methods for continuous, correlated data that account for external information available in the presence of mixed-type fixed covariates by proposing the MoEClust suite of models. These models allow different subsets of covariates to influence the component weights and/or component densities by modelling the parameters of the mixture as functions of the covariates. A familiar range of constrained eigen-decomposition parameterisations of the component covariance matrices are also accommodated. This paper thus addresses the equivalent aims of including covariates in Gaussian parsimonious clustering models and incorporating parsimonious covariance structures into all special cases of the Gaussian mixture of experts framework. The MoEClust models demonstrate significant improvement from both perspectives in applications to both univariate and multivariate data sets. Novel extensions to include a uniform noise component for capturing outliers and to address initialisation of the EM algorithm, model selection, and the visualisation of results are also proposed.

## Keywords

Model-based clustering Mixtures of experts EM algorithm Parsimony Multivariate response Covariates Noise component## Mathematics Subject Classification

62H25 62J12## Notes

### Acknowledgements

This work was supported by the Science Foundation Ireland funded Insight Centre for Data Analytics in University College Dublin under Grant Number SFI/12/RC/2289_P2.

## References

- Andrews JL, McNicholas PD (2012) Model-based clustering, classification, and discriminant analysis via mixtures of multivariate \(t\)-distributions: the \(t\)EIGEN family. Stat Comput 22(5):1021–1029MathSciNetzbMATHCrossRefGoogle Scholar
- Banfield J, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821MathSciNetzbMATHCrossRefGoogle Scholar
- Benaglia T, Chauveau D, Hunter DR, Young D (2009) mixtools: an R package for analyzing finite mixture models. J Stat Softw 32(6):1–29CrossRefGoogle Scholar
- Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725CrossRefGoogle Scholar
- Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkzbMATHGoogle Scholar
- Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46(2):373–388zbMATHCrossRefGoogle Scholar
- Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recogn 28(5):781–793CrossRefGoogle Scholar
- Cook RD, Weisberg S (1994) An introduction to regression graphics. Wiley, New YorkzbMATHCrossRefGoogle Scholar
- Dang UJ, McNicholas PD (2015) Families of parsimonious finite mixtures of regression models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis: studies in classification. Springer International Publishing, Switzerland, pp 73–84 data analysis, and knowledge organizationCrossRefGoogle Scholar
- Dang UJ, Punzo A, McNicholas PD, Ingrassia S, Browne RP (2017) Multivariate response and parsimony for Gaussian cluster-weighted models. J Classif 34(1):4–34MathSciNetzbMATHCrossRefGoogle Scholar
- Dayton CM, Macready GB (1988) Concomitant-variable latent-class models. J Am Stat Assoc 83(401):173–178MathSciNetCrossRefGoogle Scholar
- Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38MathSciNetzbMATHGoogle Scholar
- Fraley C, Raftery AE (2007) Bayesian regularization for normal mixture estimation and model-based clustering. J Classif 24(2):155–181MathSciNetzbMATHCrossRefGoogle Scholar
- García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Iscar A (2018) Eigenvalues and constraints in mixture modeling: geometric and computational issues. Adv Data Anal Classif 12(2):203–233MathSciNetzbMATHCrossRefGoogle Scholar
- Geweke J, Keane M (2007) Smoothly mixing regressions. J Econ 138(1):252–290MathSciNetzbMATHCrossRefGoogle Scholar
- Gormley IC, Murphy TB (2010) Clustering ranked preference data using sociodemographic covariates. In S. Hess & A. Daly, editors, Choice modelling: the state-of-the-art and the state-of-practice, chapter 25, pp 543–569. EmeraldGoogle Scholar
- Gormley IC, Murphy TB (2011) Mixture of experts modelling with social science applications. In: Mengersen K, Robert C, Titterington DM (eds) Mixtures: estimation and applications, chapter 9. Wiley, New York, pp 101–121CrossRefGoogle Scholar
- Grün B, Leisch F (2007) Fitting finite mixtures of generalized linear regressions in R. Comput Stat Data Anal 51(11):5247–5252MathSciNetzbMATHCrossRefGoogle Scholar
- Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28(4):1–35CrossRefGoogle Scholar
- Hennig C (2000) Identifiability of models for clusterwise linear regression. J Classif 17(2):273–296MathSciNetzbMATHCrossRefGoogle Scholar
- Hennig C, Coretto P (2008) The noise component in model-based cluster analysis. In: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R (eds) Data analysis. Springer, Berlin, pp 127–138 machine learning and applications: studies in classification, data analysis, and knowledge organizationGoogle Scholar
- Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218zbMATHCrossRefGoogle Scholar
- Hurn M, Justel A, Robert CP (2003) Estimating mixtures of regressions. J Comput Graph Stat 12(1):55–79MathSciNetCrossRefGoogle Scholar
- Ingrassia S, Minotti SC, Vittadini G (2012) Local statistical modeling via the cluster-weighted approach with elliptical distributions. J Classif 29(3):363–401MathSciNetzbMATHCrossRefGoogle Scholar
- Ingrassia S, Punzo A, Vittadini G, Minotti SC (2015) The generalized linear mixed cluster-weighted model. J Classif 32(1):85–113MathSciNetzbMATHCrossRefGoogle Scholar
- Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87CrossRefGoogle Scholar
- Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6(2):181–214CrossRefGoogle Scholar
- Lamont AE, Vermunt JK, Van Horn ML (2016) Regression mixture models: does modeling the covariance between independent variables and latent classes improve the results? Multivariate Behav Res 51(1):35–52CrossRefGoogle Scholar
- Lebret R, Iovleff S, Langrognet F, Biernacki C, Celeux G, Govaert G (2015) Rmixmod: the R package of the model-based unsupervised, supervised, and semi-supervised classification mixmod library. J Stat Softw 67(6):1–29CrossRefGoogle Scholar
- Mahalanobis PC (1936) On the generalised distance in statistics. Proc Natl Inst Sci India 2(1):49–55MathSciNetzbMATHGoogle Scholar
- Mazza A, Punzo A, Ingrassia S (2018) flexCWM: a flexible framework for cluster-weighted models. J Stat Softw pp 1–27Google Scholar
- McCullagh P, Nelder J (1983) Generalized linear models. Chapman and Hall, LondonzbMATHCrossRefGoogle Scholar
- McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18(3):285–296MathSciNetCrossRefGoogle Scholar
- McParland D, Gormley IC (2016) Model based clustering for mixed data: clustMD. Adv Data Anal Classif 10(2):155–169MathSciNetzbMATHCrossRefGoogle Scholar
- Murphy K, Murphy TB (2019) MoEClust: Gaussian parsimonious clustering models with covariates and a noise component. R package version 1.2.3. https://cran.r-project.org/package=MoEClust
- Ning H, Hu Y, Huang TS (2008) Efficient initialization of mixtures of experts for human pose estimation. In Proceedings of the international conference on image processing, ICIP 2008, October 12-15, 2008, San Diego, California, pp 2164–2167Google Scholar
- Punzo A, Ingrassia S (2015) Parsimonious generalized linear Gaussian cluster-weighted models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis: studies in classification. Springer International Publishing, Switzerland, pp 201–209 data analysis, and knowledge organizationCrossRefGoogle Scholar
- Punzo A, Ingrassia S (2016) Clustering bivariate mixed-type data via the cluster-weighted model. Comput Stat 31(3):989–1030MathSciNetzbMATHCrossRefGoogle Scholar
- Punzo A, McNicholas PD (2016) Parsimonious mixtures of multivariate contaminated normal distributions. Biom J 58(6):1506–1537MathSciNetzbMATHCrossRefGoogle Scholar
- R Core Team. R: a language and environment for statistical computing. Statistical Computing, Vienna, AustriaGoogle Scholar
- Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464MathSciNetzbMATHCrossRefGoogle Scholar
- Scrucca L, Fop M, Murphy TB, Raftery AE (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J 8(1):289–317CrossRefGoogle Scholar
- Thompson TJ, Smith PJ, Boyle JP (1998) Finite mixture models with concomitant information: assessing diagnostic criteria for diabetes. J Roy Stat Soc: Ser C 47(3):393–404zbMATHCrossRefGoogle Scholar
- Wang P, Puterman ML, Cockburn I, Le N (1996) Mixed Poisson regression models with covariate dependent rates. Biometrics 52(2):381–400zbMATHCrossRefGoogle Scholar
- Wang P, Puterman ML, Cockburn I (1998) Analysis of patent data: a mixed-Poisson regression-model approach. J Bus Econ Stat 16(1):27–41Google Scholar
- Wedel M, Kamakura WA (2012) Market segmentation: conceptual and methodological foundations. International Series in Quantitative Marketing. Springer, USGoogle Scholar
- Young DS, Hunter DR (2010) Mixtures of regressions with predictor-dependent mixing proportions. Comput Stat Data Anal 54(10):2253–2266MathSciNetzbMATHCrossRefGoogle Scholar
- Zellner A (1962) An efficient method of estimating seemingly unrelated regression equations and tests for aggregation bias. J Am Stat Assoc 57(298):348–368zbMATHCrossRefGoogle Scholar