Advances in Data Analysis and Classification

, Volume 13, Issue 1, pp 117–143 | Cite as

Finite mixture biclustering of discrete type multivariate data

  • Daniel FernándezEmail author
  • Richard Arnold
  • Shirley Pledger
  • Ivy Liu
  • Roy Costilla
Regular Article


Many of the methods which deal with clustering in matrices of data are based on mathematical techniques such as distance-based algorithms or matrix decomposition and eigenvalues. In general, it is not possible to use statistical inferences or select the appropriateness of a model via information criteria with these techniques because there is no underlying probability model. This article summarizes some recent model-based methodologies for matrices of binary, count, and ordinal data, which are modelled under a unified statistical framework using finite mixtures to group the rows and/or columns. The model parameter can be constructed from a linear predictor of parameters and covariates through link functions. This likelihood-based one-mode and two-mode fuzzy clustering provides maximum likelihood estimation of parameters and the options of using likelihood information criteria for model comparison. Additionally, a Bayesian approach is presented in which the parameters and the number of clusters are estimated simultaneously from their joint posterior distribution. Visualization tools focused on ordinal data, the fuzziness of the clustering structures, and analogies of various standard plots used in the multivariate analysis are presented. Finally, a set of future extensions is enumerated.


Classification EM algorithm Fuzzy clustering Mixture models Ordinal data RJMCMC Visualisation tools 

Mathematics Subject Classification

62F15 62F86 62H12 62H30 62H86 



This work was supported by the Marsden Fund on “Dimension reduction for mixed type multivariate data” (Award Number E2987-3648) from New Zealand Government funding, administrated by the Royal Society of New Zealand.

Supplementary material

11634_2018_324_MOESM1_ESM.pdf (179 kb)
Supplementary material 1 (pdf 179 KB)


  1. Agresti A (2010) Analysis of ordinal categorical data, 2nd edn. Wiley series in probability and statistics. Wiley, HobokenzbMATHGoogle Scholar
  2. Agresti A (2013) Categorical data analysis, 3rd edn. Wiley series in probability and statistics. Wiley, HobokenzbMATHGoogle Scholar
  3. Agresti A, Lang JB (1993) Quasi-symmetric latent class models, with application to rater agreement. Biometrics 49(1):131–139Google Scholar
  4. Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) 2nd international symposium on information theory, pp 267–281Google Scholar
  5. Anderson JA (1984) Regression and ordered categorical variables. J R Stat Soc Ser B 46(1):1–30MathSciNetzbMATHGoogle Scholar
  6. Arnold R, Hayakawa Y, Yip P (2010) Capture-recapture estimation using finite mixtures of arbitrary dimension. Biometrics 66(2):644–655MathSciNetzbMATHGoogle Scholar
  7. Bartolucci F, Bacci S, Pennoni F (2014) Longitudinal analysis of self-reported health status by mixture latent auto-regressive models. J R Stat Soc Ser C (Appl Stat) 63(2):267–288MathSciNetGoogle Scholar
  8. Biernacki C, Celeux G, Govaert G (1998) Assessing a mixture model for clustering with the integrated completed likelihood. Technical Report 3521, INRIA, Rhne-AlpesGoogle Scholar
  9. Böhning D, Seidel W, Alfò M, Garel B, Patilea V, Walther G (2007) Advances in mixture models. Comput Stat Data Anal 51(11):5205–5210MathSciNetzbMATHGoogle Scholar
  10. Breen R, Luijkx R (2010) Assessing proportionality in the proportional odds model for ordinal logistic regression. Sociol Methods Res 39(1):3–24MathSciNetGoogle Scholar
  11. Browne RP, McNicholas PD (2012) Model-based clustering, classification, and discriminant analysis of data with mixed type. J Stat Plan Inference 142(11):2976–2984MathSciNetzbMATHGoogle Scholar
  12. Burnham KP, Anderson DR (2002) Model selection and multi-model inference: a practical information-theoretic approach, 2nd edn. Springer, BerlinzbMATHGoogle Scholar
  13. Cai JH, Song XY, Lam KH, Ip EHS (2011) A mixture of generalized latent variable models for mixed mode and heterogeneous data. Comput Stat Data Anal 55(11):2889–2907MathSciNetzbMATHGoogle Scholar
  14. Cappé O, Robert C, Rydén T (2003) Reversible jump, birth-and-death, and more general continuous time MCMC samplers. J R Stat Soc Ser B 65(3):679–700MathSciNetzbMATHGoogle Scholar
  15. Celeux G (1998) Bayesian inference for mixtures: the label switching problem. In: Proceedings in computational statistics 1998 (COMPSTAT98), Physica-Verlag HD, pp 227–232Google Scholar
  16. Costilla R, Liu I, Arnold R (2015) A Bayesian model-based approach to estimate clusters in repeated ordinal data. In: JSM Proceedings, biometrics section, pp 545–556Google Scholar
  17. Dellaportas P, Papageorgiou I (2006) Multivariate mixtures of normals with unknown number of components. Stat Comput 16(1):57–68MathSciNetGoogle Scholar
  18. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38MathSciNetzbMATHGoogle Scholar
  19. DeSantis SM, Houseman EA, Coull BA, Stemmer-Rachamimov A, Betensky RA (2008) A penalized latent class model for ordinal data. Biostatistics 9(2):249–262zbMATHGoogle Scholar
  20. Diggle PJ, Heagerty PJ, Liang KY, Zeger SL (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, OxfordzbMATHGoogle Scholar
  21. van Dijk B, van Rosmalen J, Paap R (2009) A Bayesian approach to two-mode clustering. Technical ReportGoogle Scholar
  22. Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis, 5th edn. Wiley, ChichesterzbMATHGoogle Scholar
  23. Fernández D, Arnold R (2016) Model selection for mixture-based clustering for ordinal data. Aust NZ J Stat 58(4):437–472MathSciNetzbMATHGoogle Scholar
  24. Fernández D, Liu I (2016) A goodness-of-fit test for the ordered stereotype model. Stat Med 35(25):4660–4696MathSciNetGoogle Scholar
  25. Fernández D, Pledger S (2016) Categorising count data into ordinal responses with application to ecological communities. J Agric Biol Environ Stat 21(2):348–362MathSciNetzbMATHGoogle Scholar
  26. Fernández D, Pledger S, Arnold R (2014) Introducing spaced mosaic plots. Research Report Series. ISSN: 1174-2011. 14-3, School of Mathematics, Statistics and Operations Research, VUW.
  27. Fernández D, Arnold R, Pledger S (2016) Mixture-based clustering for the ordered stereotype model. Comput Stat Data Anal 93:46–75MathSciNetzbMATHGoogle Scholar
  28. Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588zbMATHGoogle Scholar
  29. Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MathSciNetzbMATHGoogle Scholar
  30. Fraley C, Raftery AE (2007) Bayesian regularization for normal mixture estimation and model-based clustering. J Classif 24(2):155–181MathSciNetzbMATHGoogle Scholar
  31. Friedman HP, Rubin J (1967) On some invariant criteria for grouping data. J Amer Stat Assoc 62:1159–1178MathSciNetGoogle Scholar
  32. Friendly M (1991) Mosaic displays for multiway contingency tables. Technival Report 195, Department of Psychology Reports, New York UniversityGoogle Scholar
  33. Frühwirth-Schnatter S (2001) Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J Am Stat Assoc 453(96):194–209MathSciNetzbMATHGoogle Scholar
  34. Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Wiley, New YorkzbMATHGoogle Scholar
  35. Frühwirth-Schnatter S, Pamminger C, Weber A, Winter-Ebmer R (2012) Labor market entry and earnings dynamics: Bayesian inference using mixtures-of-experts markov chain clustering. J Appl Econom 27(7):1116–1137MathSciNetGoogle Scholar
  36. Frydman H (2005) Estimation in the mixture of markov chains moving with different speeds. J Am Stat Assoc 100(471):1046–1053MathSciNetzbMATHGoogle Scholar
  37. Goodman LA (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61:215–231MathSciNetzbMATHGoogle Scholar
  38. Gotelli NJ, Graves GR (1996) Null models in ecology. Smithsonian Institution Press, WashingtonGoogle Scholar
  39. Govaert G, Nadif M (2003) Clustering with block mixture models. Pattern Recognit 36(2):463–473zbMATHGoogle Scholar
  40. Govaert G, Nadif M (2005) An EM algorithm for the block mixture model. IEEE Trans Pattern Anal Mach Intell 27(4):643–647zbMATHGoogle Scholar
  41. Govaert G, Nadif M (2010) Latent block model for contingency table. Commun Stat Theory Methods 39(3):416–425MathSciNetzbMATHGoogle Scholar
  42. Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4):711–732MathSciNetzbMATHGoogle Scholar
  43. Haberman SJ (1979) Analysis of qualitative data, vol 2. Academic Press, New YorkGoogle Scholar
  44. Hartigan JA, Kleiner B (1981) Mosaics for contingency tables. In: Proceedings of the 13th symposium on the interface between computer sciencies and statistics, Springer, pp 268–273Google Scholar
  45. Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108zbMATHGoogle Scholar
  46. Hasnat MA, Velcin J, Bonnevay S, Jacques J (2015) Simultaneous clustering and model selection for multinomial distribution: a comparative study. In: International symposium on intelligent data analysis, Springer, pp 120–131Google Scholar
  47. Hui FK, Taskinen S, Pledger S, Foster SD, Warton DI (2015) Model-based approaches to unconstrained ordination. Methods Ecol Evol 6(4):399–411Google Scholar
  48. Hurn M, Justel A, Robert CP (2003) Estimating mixture of regressions. J Comput Graph Stat 12(1):55–79MathSciNetGoogle Scholar
  49. Hurvich CM, Tsai CL (1989) Regression and time series model selection in small samples. Biometrika 76(2):297–307MathSciNetzbMATHGoogle Scholar
  50. Jasra A, Holmes CC, Stephens DA (2005) MCMC and the label switching problem in Bayesian mixture models. Stat Sci 20(1):50–67zbMATHGoogle Scholar
  51. Jobson JD (1992) Applied multivariate data analysis: categorical and multivariate methods. Springer texts in statistics. Springer, BerlinzbMATHGoogle Scholar
  52. Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254zbMATHGoogle Scholar
  53. Lee K, Marin JM, Robert C, Mengersen K (2008) Bayesian inference on mixtures of distributions. In: Proceedings of the platinum jubilee of the Indian statistical institute, p 776Google Scholar
  54. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J (eds) Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, University of California Press, pp 281–297Google Scholar
  55. Manly BFJ (2005) Multivariate statistical methods: a primer. Chapman & Hall, LondonzbMATHGoogle Scholar
  56. Manly BFJ (2007) Randomization, bootstrap and monte carlo methods in biology, 3rd edn. Chapman & Hall, LondonzbMATHGoogle Scholar
  57. Marin JM, Robert C (2007) Bayesian core: a practical approach to computational Bayesian statistics. Springer texts in statistics. Springer, BerlinzbMATHGoogle Scholar
  58. Marin JM, Mengersen K, Robert C (2005) Bayesian modelling and inferences on mixtures of distributions. In: Dey D, Rao CR (eds) Handbook of statistics, vol 25. Springer, New YorkGoogle Scholar
  59. Marrs AD (1998) An application of reversible-jump MCMC to multivariate spherical Gaussian mixtures. In: Jordan MI, Kearns MJ, Solla SA (eds) Advances in neural information processing systems, vol 10. MIT Press, Cambridge, pp 577–583Google Scholar
  60. Matechou E, Liu I, Pledger S, Arnold R (2011) Biclustering models for ordinal data, presentation at the NZ Statistical Assn. In: Annual conference, University of Auckland, 28–31 Aug 2011Google Scholar
  61. Matechou E, Liu I, Fernández D, Farias M, Gjelsvik B (2016) Biclustering models for two-mode ordinal data. Psychometrika 81(3):611–624MathSciNetzbMATHGoogle Scholar
  62. Maurizio V (2001) Double k-means clustering for simultaneous classification of objects and variables. Advances in classification and data analysis. Springer, Berlin, Heidelberg, pp 43–52Google Scholar
  63. McCullagh P (1980) Regression models for ordinal data. J R Stat Soc 42(2):109–142MathSciNetzbMATHGoogle Scholar
  64. McCullagh P, Yang J (2008) How many clusters? Bayesian Anal 3(1):101–120MathSciNetzbMATHGoogle Scholar
  65. McCune B, Grace JB (2002) Analysis of ecological communities. Struct Equ Model 28(2)Google Scholar
  66. McCutcheon AL (1987) Latent class analysis. Sage Publications, Thousand OaksGoogle Scholar
  67. McLachlan G, Peel D (2004) Finite mixture models. Wiley series in probability and statistics. Wiley, New YorkzbMATHGoogle Scholar
  68. McLachlan GJ (1982) The classification and mixture maximum likelihood approaches to cluster analysis. Handb Stat 2(299):199–208zbMATHGoogle Scholar
  69. McLachlan GJ (1987) On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl Stat 36(3):318–324Google Scholar
  70. McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Statistics, textbooks and monographs. M. Dekker, New YorkzbMATHGoogle Scholar
  71. McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley series in probability and statistics: applied probability and statistics. Wiley, HobokenzbMATHGoogle Scholar
  72. McParland D, Gormley IC (2013) Clustering ordinal data via latent variable models. In: Lausen B, Van den Poel D, Ultsch A (eds) Algorithms from and for nature and life, studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 127–135Google Scholar
  73. McParland D, Gormley IC (2016) Model based clustering for mixed data: clustMD. Adv Data Anal Classif 10(2):155–169MathSciNetGoogle Scholar
  74. Melnykov V (2013) Finite mixture modelling in mass spectrometry analysis. J R Stat Soc Ser C (Appl Stat) 62(4):573–592MathSciNetGoogle Scholar
  75. Melnykov V, Maitra R (2010) Finite mixture models and model-based clustering. Stat Surv 4(9):80–116MathSciNetzbMATHGoogle Scholar
  76. Moustaki I (2000) A latent variable model for ordinal variables. Appl Psychol Meas 24(3):211–233MathSciNetGoogle Scholar
  77. Nadif M, Govaert G (2005) A comparison between block CEM and two-way CEM algorithms to cluster a contingency table. In: European conference on principles of data mining and knowledge discovery, Springer, pp 609–616Google Scholar
  78. Pamminger C, Frühwirth-Schnatter S et al (2010) Model-based clustering of categorical time series. Bayesian Anal 5(2):345–368MathSciNetzbMATHGoogle Scholar
  79. Pledger S (2000) Unified maximum likelihood estimates for closed capture-recapture models using mixtures. Biometrics 56(2):434–442zbMATHGoogle Scholar
  80. Pledger S, Arnold R (2014) Multivariate methods using mixtures: correspondence analysis, scaling and pattern-detection. Comput Stat Data Anal 71:241–261MathSciNetzbMATHGoogle Scholar
  81. Quinn GP, Keough MJ (2002) Experimental design and data analysis for biologists. Cambridge University Press, CambridgeGoogle Scholar
  82. Raftery AE, Dean N (2006) Variable selection for model-based clustering. J Am Stat Assoc 101(473):168–178MathSciNetzbMATHGoogle Scholar
  83. Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc Ser B 59(4):731–792MathSciNetzbMATHGoogle Scholar
  84. Rocci R, Vichi M (2008) Two-mode multi-partitioning. Comput Stat Data Anal 52(4):1984–2003MathSciNetzbMATHGoogle Scholar
  85. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464MathSciNetzbMATHGoogle Scholar
  86. Self SG, Liang KY (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82(398):605–610MathSciNetzbMATHGoogle Scholar
  87. Silvestre C, Cardoso MG, Figueiredo MA (2014) Identifying the number of clusters in discrete mixture models. arXiv:1409.7419
  88. Skrondal A, Rabe-Hesketh S (2004) Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Monographs on statistics and applied probability. Chapman & Hall, LondonzbMATHGoogle Scholar
  89. Stahl D, Sallis H (2012) Model-based cluster analysis. Wiley Interdiscip Rev Comput Stat 4(4):341–358Google Scholar
  90. Stephens M (2000a) Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump methods. Ann Stat 28(1):40–74MathSciNetzbMATHGoogle Scholar
  91. Stephens M (2000b) Dealing with label switching in mixture models. J R Stat Soc Ser B 62(4):795–809MathSciNetzbMATHGoogle Scholar
  92. Sugar CA, James GM (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98(463):750–763MathSciNetzbMATHGoogle Scholar
  93. Tibshirani R, Walther G (2005) Cluster validation by prediction strength. J Comput Graph Stat 14(3):511–528MathSciNetGoogle Scholar
  94. Vermunt JK (2001) The use of restricted latent class models for defining and testing nonparametric and parametric item response theory models. Appl Psychol Meas 25(3):283–294MathSciNetGoogle Scholar
  95. Vermunt JK, Hagenaars JA (2004) Ordinal longitudinal data analysis. In: Hauspie R, Cameron N, Molinari L (eds) Methods in human growth research. Cambridge University Press, CambridgeGoogle Scholar
  96. Vermunt JK, Van Dijk L (2001) A nonparametric random-coefficients approach: the latent class regression model. Multilevel Model Newsl 13(2):6–13Google Scholar
  97. Vichi M (2001) Double k-means clustering for simultaneous classification of objects and variables. In: Borra S, Rocci R, Vichi M, Schader M (eds) Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 43–52Google Scholar
  98. Wagenmakers EJ, Lee M, Lodewyckx T, Iverson GJ (2008) Bayesian versus frequentist inference. Springer, BerlinGoogle Scholar
  99. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou ZH, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37Google Scholar
  100. Wyse J, Friel N (2012) Block clustering with collapsed latent block models. Stat Comput 22(2):415–428MathSciNetzbMATHGoogle Scholar
  101. Zhang Z, Chan KL, Wu Y, Chen C (2004) Learning a multivariate gaussian mixture model with the reversible jump MCMC algorithm. Stat Comput 14(4):343–355MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institut de Recerca Sant Joan de Déu, Parc Sanitari Sant Joan de DéuCIBERSAMSant Boi de LlobregatSpain
  2. 2.School of Mathematics and StatisticsVictoria University of WellingtonWellingtonNew Zealand
  3. 3.Institute for Molecular BioscienceUniversity of QueenslandBrisbaneAustralia

Personalised recommendations