Gaussian parsimonious clustering models with covariates and a noise component

Abstract

We consider model-based clustering methods for continuous, correlated data that account for external information available in the presence of mixed-type fixed covariates by proposing the MoEClust suite of models. These models allow different subsets of covariates to influence the component weights and/or component densities by modelling the parameters of the mixture as functions of the covariates. A familiar range of constrained eigen-decomposition parameterisations of the component covariance matrices are also accommodated. This paper thus addresses the equivalent aims of including covariates in Gaussian parsimonious clustering models and incorporating parsimonious covariance structures into all special cases of the Gaussian mixture of experts framework. The MoEClust models demonstrate significant improvement from both perspectives in applications to both univariate and multivariate data sets. Novel extensions to include a uniform noise component for capturing outliers and to address initialisation of the EM algorithm, model selection, and the visualisation of results are also proposed.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Andrews JL, McNicholas PD (2012) Model-based clustering, classification, and discriminant analysis via mixtures of multivariate \(t\)-distributions: the \(t\)EIGEN family. Stat Comput 22(5):1021–1029

    MathSciNet  MATH  Google Scholar 

  2. Banfield J, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821

    MathSciNet  MATH  Google Scholar 

  3. Benaglia T, Chauveau D, Hunter DR, Young D (2009) mixtools: an R package for analyzing finite mixture models. J Stat Softw 32(6):1–29

    Google Scholar 

  4. Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725

    Google Scholar 

  5. Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  6. Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46(2):373–388

    MATH  Google Scholar 

  7. Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recogn 28(5):781–793

    Google Scholar 

  8. Cook RD, Weisberg S (1994) An introduction to regression graphics. Wiley, New York

    MATH  Google Scholar 

  9. Dang UJ, McNicholas PD (2015) Families of parsimonious finite mixtures of regression models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis: studies in classification. Springer International Publishing, Switzerland, pp 73–84 data analysis, and knowledge organization

    Google Scholar 

  10. Dang UJ, Punzo A, McNicholas PD, Ingrassia S, Browne RP (2017) Multivariate response and parsimony for Gaussian cluster-weighted models. J Classif 34(1):4–34

    MathSciNet  MATH  Google Scholar 

  11. Dayton CM, Macready GB (1988) Concomitant-variable latent-class models. J Am Stat Assoc 83(401):173–178

    MathSciNet  Google Scholar 

  12. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  13. Fraley C, Raftery AE (2007) Bayesian regularization for normal mixture estimation and model-based clustering. J Classif 24(2):155–181

    MathSciNet  MATH  Google Scholar 

  14. García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Iscar A (2018) Eigenvalues and constraints in mixture modeling: geometric and computational issues. Adv Data Anal Classif 12(2):203–233

    MathSciNet  MATH  Google Scholar 

  15. Geweke J, Keane M (2007) Smoothly mixing regressions. J Econ 138(1):252–290

    MathSciNet  MATH  Google Scholar 

  16. Gormley IC, Murphy TB (2010) Clustering ranked preference data using sociodemographic covariates. In S. Hess & A. Daly, editors, Choice modelling: the state-of-the-art and the state-of-practice, chapter 25, pp 543–569. Emerald

  17. Gormley IC, Murphy TB (2011) Mixture of experts modelling with social science applications. In: Mengersen K, Robert C, Titterington DM (eds) Mixtures: estimation and applications, chapter 9. Wiley, New York, pp 101–121

    Google Scholar 

  18. Grün B, Leisch F (2007) Fitting finite mixtures of generalized linear regressions in R. Comput Stat Data Anal 51(11):5247–5252

    MathSciNet  MATH  Google Scholar 

  19. Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28(4):1–35

    Google Scholar 

  20. Hennig C (2000) Identifiability of models for clusterwise linear regression. J Classif 17(2):273–296

    MathSciNet  MATH  Google Scholar 

  21. Hennig C, Coretto P (2008) The noise component in model-based cluster analysis. In: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R (eds) Data analysis. Springer, Berlin, pp 127–138 machine learning and applications: studies in classification, data analysis, and knowledge organization

    Google Scholar 

  22. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    MATH  Google Scholar 

  23. Hurn M, Justel A, Robert CP (2003) Estimating mixtures of regressions. J Comput Graph Stat 12(1):55–79

    MathSciNet  Google Scholar 

  24. Ingrassia S, Minotti SC, Vittadini G (2012) Local statistical modeling via the cluster-weighted approach with elliptical distributions. J Classif 29(3):363–401

    MathSciNet  MATH  Google Scholar 

  25. Ingrassia S, Punzo A, Vittadini G, Minotti SC (2015) The generalized linear mixed cluster-weighted model. J Classif 32(1):85–113

    MathSciNet  MATH  Google Scholar 

  26. Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87

    Google Scholar 

  27. Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6(2):181–214

    Google Scholar 

  28. Lamont AE, Vermunt JK, Van Horn ML (2016) Regression mixture models: does modeling the covariance between independent variables and latent classes improve the results? Multivariate Behav Res 51(1):35–52

    Google Scholar 

  29. Lebret R, Iovleff S, Langrognet F, Biernacki C, Celeux G, Govaert G (2015) Rmixmod: the R package of the model-based unsupervised, supervised, and semi-supervised classification mixmod library. J Stat Softw 67(6):1–29

    Google Scholar 

  30. Mahalanobis PC (1936) On the generalised distance in statistics. Proc Natl Inst Sci India 2(1):49–55

    MathSciNet  MATH  Google Scholar 

  31. Mazza A, Punzo A, Ingrassia S (2018) flexCWM: a flexible framework for cluster-weighted models. J Stat Softw pp 1–27

  32. McCullagh P, Nelder J (1983) Generalized linear models. Chapman and Hall, London

    MATH  Google Scholar 

  33. McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18(3):285–296

    MathSciNet  Google Scholar 

  34. McParland D, Gormley IC (2016) Model based clustering for mixed data: clustMD. Adv Data Anal Classif 10(2):155–169

    MathSciNet  MATH  Google Scholar 

  35. Murphy K, Murphy TB (2019) MoEClust: Gaussian parsimonious clustering models with covariates and a noise component. R package version 1.2.3. https://cran.r-project.org/package=MoEClust

  36. Ning H, Hu Y, Huang TS (2008) Efficient initialization of mixtures of experts for human pose estimation. In Proceedings of the international conference on image processing, ICIP 2008, October 12-15, 2008, San Diego, California, pp 2164–2167

  37. Punzo A, Ingrassia S (2015) Parsimonious generalized linear Gaussian cluster-weighted models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis: studies in classification. Springer International Publishing, Switzerland, pp 201–209 data analysis, and knowledge organization

    Google Scholar 

  38. Punzo A, Ingrassia S (2016) Clustering bivariate mixed-type data via the cluster-weighted model. Comput Stat 31(3):989–1030

    MathSciNet  MATH  Google Scholar 

  39. Punzo A, McNicholas PD (2016) Parsimonious mixtures of multivariate contaminated normal distributions. Biom J 58(6):1506–1537

    MathSciNet  MATH  Google Scholar 

  40. R Core Team. R: a language and environment for statistical computing. Statistical Computing, Vienna, Austria

  41. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    MathSciNet  MATH  Google Scholar 

  42. Scrucca L, Fop M, Murphy TB, Raftery AE (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J 8(1):289–317

    Google Scholar 

  43. Thompson TJ, Smith PJ, Boyle JP (1998) Finite mixture models with concomitant information: assessing diagnostic criteria for diabetes. J Roy Stat Soc: Ser C 47(3):393–404

    MATH  Google Scholar 

  44. Wang P, Puterman ML, Cockburn I, Le N (1996) Mixed Poisson regression models with covariate dependent rates. Biometrics 52(2):381–400

    MATH  Google Scholar 

  45. Wang P, Puterman ML, Cockburn I (1998) Analysis of patent data: a mixed-Poisson regression-model approach. J Bus Econ Stat 16(1):27–41

    Google Scholar 

  46. Wedel M, Kamakura WA (2012) Market segmentation: conceptual and methodological foundations. International Series in Quantitative Marketing. Springer, US

  47. Young DS, Hunter DR (2010) Mixtures of regressions with predictor-dependent mixing proportions. Comput Stat Data Anal 54(10):2253–2266

    MathSciNet  MATH  Google Scholar 

  48. Zellner A (1962) An efficient method of estimating seemingly unrelated regression equations and tests for aggregation bias. J Am Stat Assoc 57(298):348–368

    MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the Science Foundation Ireland funded Insight Centre for Data Analytics in University College Dublin under Grant Number SFI/12/RC/2289_P2.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Keefe Murphy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: \(\hbox {CO}_{2}\) data: code examples

Code to reproduce both the exhaustive (Listing 1) and greedy forward stepwise (Listing 2) searches for the CO2 data described in Sect. 5.1, using the MoEClustR package (Murphy and Murphy 2019), is provided below. The code in Listing 1 can be used to reproduce the results in Table 2.

figurec
figured

Appendix B: \(\hbox {CO}_{2}\) data: initialisation

The solutions for the optimal \(G=3\) equal mixing proportion expert network MoEClust model with equal component variances and the explanatory variable ‘GNP’ fit to the CO2 data with and without the initial partition being passed through Algorithm 1 are shown in Fig. 6. A BIC value of \(-155.20\) is achieved after 18 EM iterations with our proposed initialisation strategy compared to a value of \(-161.06\) in 30 EM iterations without. While the models differ only in terms of the initialisation strategy employed, Table 2 shows that the model would not have been identified as optimal according to the BIC criterion had Algorithm 1 not been used. The superior solution in Fig. 6a has one cluster with a steep slope and two clusters with near-zero slopes but different intercepts.

Fig. 6
figure6

Scatter plots of GNP against CO2 emissions for \(n=28\) countries showing three linear regression components from the optimal MoEClust model, with equal variances and mixing proportions, with (a) and without (b) the initialisation strategy described in Algorithm 1 invoked

Appendix C: AIS data: stepwise model search

For the AIS data, Table 8 gives the results of the greedy forward stepwise model selection strategy described in Algorithm 2, showing the action yielding the best improvement in terms of BIC for each step. This forward search is less computationally onerous than its equivalent in the backwards direction. A 2-component EVEequal mixing proportion expert network MoE model is chosen, in which the mixing proportions are constrained to be equal and Sex enters the expert network. This same model was identified after an exhaustive search over a range of G values, the full range of GPCM covariance constraints, and every possible combination of the BMI and Sex covariates in the gating and expert networks (see Table 5). Note, however, that the remaining covariates in Table 4 are also considered for inclusion here.

Table 8 Results of the forward stepwise model selection algorithm applied to the AIS data where candidate models do not include a noise component
Table 9 Results of the forward stepwise model selection algorithm applied to the AIS data where all candidate models explicitly include a noise component

To give consideration to outlying observations departing from the prevailing pattern of Gaussianity, a separate stepwise search is conducted, starting from a \(G=0\) noise-only model, with all candidate models having an additional noise component. Thus, a distinction is made between the model found by following the steps shown in Table 8 with \(G=2\)EVE Gaussian components, and the model found by the second stepwise search described in Table 9 with three, of which two are EEE Gaussian and one is uniform. Ultimately, the model with the noise component identified in Table 9 is chosen, based on its superior BIC. Aside from the noise component, it similarly includes ‘Sex’ in the expert network, but differs in its treatment of the gating network and the GPCM constraints employed for the Gaussian clusters. It is a full MoE model where the Gaussian clusters have equal volume, shape, and orientation, the expert network includes the covariate ‘Sex’, and the both ‘SSF’ and ‘Ht’ influence the probability of belonging to the Gaussian clusters but not the additional noise component, as per (8).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Murphy, K., Murphy, T.B. Gaussian parsimonious clustering models with covariates and a noise component. Adv Data Anal Classif 14, 293–325 (2020). https://doi.org/10.1007/s11634-019-00373-8

Download citation

Keywords

  • Model-based clustering
  • Mixtures of experts
  • EM algorithm
  • Parsimony
  • Multivariate response
  • Covariates
  • Noise component

Mathematics Subject Classification

  • 62H25
  • 62J12