Random effects clustering in multilevel modeling: choosing a proper partition

  • Claudio ConversanoEmail author
  • Massimo Cannas
  • Francesco Mola
  • Emiliano Sironi
Regular Article


A novel criterion for estimating a latent partition of the observed groups based on the output of a hierarchical model is presented. It is based on a loss function combining the Gini income inequality ratio and the predictability index of Goodman and Kruskal in order to achieve maximum heterogeneity of random effects across groups and maximum homogeneity of predicted probabilities inside estimated clusters. The index is compared with alternative approaches in a simulation study and applied in a case study concerning the role of hospital level variables in deciding for a cesarean section.


Hierarchical modelling Model based clustering Label switching Bayesian nonparametric Gini income inequality ratio Goodman and Kruskal predictability index 

Mathematics Subject Classification

62C10 62C12 62H30 62J12 62J20 



We would like to thank the Autonomous Region of Sardinia for providing the data used in Sect. 6. We also thank the editors and the two anonymous referees for their comments, which allowed us to consistently improve the quality of the paper in several parts.


  1. Berger M, Tutz G (2018) Tree-structured clustering in fixed effects models. J Comput Graph Stat 27(2):380–392MathSciNetCrossRefGoogle Scholar
  2. Bragg F, Cromwell DA, Edozien L (2010) Variation in rates of caesarean section among English NHS trusts after accounting for maternal and clinical risk: cross sectional study. BMJ 341:c5065. CrossRefGoogle Scholar
  3. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, BelmontzbMATHGoogle Scholar
  4. Caceres IA, Arcaya M, Declercq E, Belanoff CM, Janakiraman V, Cohen B, Ecker J, Smith LA, Subramanian SV (2013) Hospital differences in cesarean deliveries in Massachusetts (US) 2004–2006: the case against case-mix artifact. PLoS ONE 8(3):e57817CrossRefGoogle Scholar
  5. Cannas M, Conversano C, Mola F, Sironi E (2017) Variation in caesarean delivery rates across hospitals: a Bayesian semi-parametric approach. J Appl Stat 44(12):2095–2107MathSciNetCrossRefGoogle Scholar
  6. Dagum C (1997) A new approach to the decomposition of the Gini income inequality ratio. Empir Econ 22:515–531CrossRefGoogle Scholar
  7. Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Do KA, Muller P, Vannucci M (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, Cambridge, pp 201–218CrossRefGoogle Scholar
  8. Dahl DB (2009) Modal clustering in a class of product partition models. Bayesian Anal 4:243–264MathSciNetCrossRefGoogle Scholar
  9. Duncan C, Jones K, Moon G (1998) Context, composition and heterogeneity: using multilevel models in health research. Soc Sci Med 46:97–117CrossRefGoogle Scholar
  10. Dunson D (2008) Nonparametric Bayes applications to biostatistics (Tech. Rep.). Biostatistics Branch, National Institute of Environmental Health Sciences, U.S. National, Institute of Health, USAGoogle Scholar
  11. Egidi L, Pappadá R, Pauli F, Torelli N (2018) Relabelling in Bayesian mixture models by pivotal units. Stat Comput 28(4):957–969MathSciNetCrossRefGoogle Scholar
  12. European Perinatal Health Report (2013) The health and care of pregnant women and babies in Europe in 2010. EURO-PERISTAT Project with SCPE and EUROCAT, BruxellesGoogle Scholar
  13. Ferguson TS (1973) A bayesian analysis of some nonparametric problems. Ann Stat 1:209–230MathSciNetCrossRefGoogle Scholar
  14. Fritsch A, Ickstadt K (2009) Improved criteria for clustering based on the posterior similarity matrix. Bayesian Anal 4:367–392MathSciNetCrossRefGoogle Scholar
  15. Goodman LA, Kruskal WH (1954) Measures of association for cross classification. J Am Stat Assoc 48:732–762zbMATHGoogle Scholar
  16. Grilli L, Panzera A, Rampichini C (2018) Clustering upper level units in multilevel models for ordinal data. In: Mola F, Conversano C, Vichi M (eds) Classification, (big) data analysis and statistical learning. Springer, Cham, pp 137–144CrossRefGoogle Scholar
  17. Guglielmi A, Ieva F, Paganoni AM, Ruggeri F, Soriano J (2014) Semiparametric bayesian models for clustering and classification in the presence of unbalanced in-hospital survival. J R Stat Soc C (Appl Stat) 63:25–46MathSciNetCrossRefGoogle Scholar
  18. Heinzl F, Tutz G (2014) Clustering in linear mixed models with a group fused lasso penalty. Biom J 1:44–68MathSciNetCrossRefGoogle Scholar
  19. Jara A, Hanson T, Quintana F, Mueller P, Rosner G (2011) DPpackage: Bayesian semi-and nonparametric modeling in R. J Stat Softw 40(5):1–30CrossRefGoogle Scholar
  20. Kleinman KP, Ibrahim JG (1998) A semi-parametric Bayesian approach to generalized linear mixed models. Stat Med 17:2579–2596CrossRefGoogle Scholar
  21. Kozhimannil KB, Law MR, Virnig BA (2013) Cesarean delivery rates vary among US hospitals: reducing variation may address quality and cost issues. Health Aff 32(3):527–535CrossRefGoogle Scholar
  22. Lau JW, Green PJ (2007) Bayesian model-based clustering procedures. J Comput Graph Stat 16:526–558MathSciNetCrossRefGoogle Scholar
  23. Lee Y, Roberts CL, Patterson JA, Simpson JM, Nicholl MC, Morris JM, Ford JB (2013) Unexplained variation in hospital caesarean section rates. Med J Aust 199(5):348–353CrossRefGoogle Scholar
  24. MacEachern SN (2000) Dependent nonparametric processes, Technical report. Dept. of Statistics, Ohio State University, OhioGoogle Scholar
  25. Medvedovic M, Yeung K, Bumgarner R (2004) Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 20:1222–1232CrossRefGoogle Scholar
  26. Meila M (2007) Comparing clusterings: an information based distance. J Multivar Anal 98:873–895MathSciNetCrossRefGoogle Scholar
  27. Mola F, Siciliano R (1997) A fast splitting procedure for classification trees. Stat Comput 7:209–216CrossRefGoogle Scholar
  28. Pauger D, Wagner H (2018) Bayesian effect fusion for categorical predictors. Bayesian Anal.
  29. Pitman J, Yor M (1997) The two-parameter Poisson Dirichlet distribution derived from a stable subordinator. Ann Probab 25:855–900MathSciNetCrossRefGoogle Scholar
  30. Rastelli R, Friel N (2017) Optimal Bayesian estimators for latent variable cluster models. Stat Comput 28(6):1169–1186MathSciNetCrossRefGoogle Scholar
  31. Roberts CL, Nippita TA (2015) International caesarean section rates: the rising tide. Lancet Glob Health 3(5):111–117CrossRefGoogle Scholar
  32. Sturtz S, Ligges U, Gelman A (2005) R2WinBUGS: a package for running WinBUGS from R. J Stat Softw 12(3):1–16CrossRefGoogle Scholar
  33. Tutz G, Oelker M (2017) Modeling clustered heterogeneity: fixed effects, random effects and mixtures. Int Stat Rev 85(2):204–227MathSciNetCrossRefGoogle Scholar
  34. Wade S, Gahrahmani Z (2018) Bayesian cluster analysis: point estimation and credible balls. Bayesian Anal 13(2):559–626MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Business and EconomicsUniversity of CagliariCagliariItaly
  2. 2.Department of Statistical SciencesCatholic University of MilanMilanItaly

Personalised recommendations