Criteria for Mixture-Model Clustering with Side-Information

  • Edith Grall-MaësEmail author
  • Duc Tung Dao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10163)


The estimation of mixture models is a well-known approach for cluster analysis and several criteria have been proposed to select the number of clusters. In this paper, we consider mixture models using side-information, which gives the constraint that some data in a group originate from the same source. Then the usual criteria are not suitable. An EM (Expectation-Maximization) algorithm has been previously developed to jointly allow the determination of the model parameters and the data labelling, for a given number of clusters. In this work we adapt three usual criteria, which are the bayesian information criterion (BIC), the Akaike information criterion (AIC), and the entropy criterion (NEC), so that they can take into consideration the side-information. One simulated problem and two real data sets have been used to show the relevance of the modified criterion versions and compare the criteria. The efficiency of both the EM algorithm and the criteria, for selecting the right number of clusters while getting a good clustering, is in relation with the amount of side-information. Side-information being mainly useful when the clusters overlap, the best criterion is the modified BIC.


Mixture Model Akaike Information Criterion Bayesian Information Criterion Gaussian Mixture Model Cluster Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Biernacki, C., Celeux, G., Govaert, G.: An improvement of the nec criterion for assessing the number of clusters in a mixture model. Pattern Recogn. Lett. 20(3), 267–272 (1999)CrossRefzbMATHGoogle Scholar
  3. 3.
    Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14(3), 315–332 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Celeux, G., Govaert, G.: Gaussian parcimonious clustering models. Pattern Recogn. 28, 781–793 (1995)CrossRefGoogle Scholar
  5. 5.
    Celeux, G., Soromenho, G.: An entropy criterion for assessing the number of clusters in a mixture model. J. Classif. 13(2), 195–212 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, Hoboken (1973)zbMATHGoogle Scholar
  7. 7.
    Fonseca, J.R., Cardoso, M.G.: Mixture-model cluster analysis using information theoretical criteria. Intell. Data Anal. 11(1), 155–173 (2007)Google Scholar
  8. 8.
    Grall-Maës, E., Dao, D.: Assessing the number of clusters in a mixture model with side-information. In: Proceedings of 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), Rome, Italy, pp. 41–47, 24–26 February 2016Google Scholar
  9. 9.
    Grall-Maës, E.: Spatial stochastic process clustering using a local a posteriori probability. In: Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2014), Reims, France, 21–24 September 2014Google Scholar
  10. 10.
    Lebarbier, E., Mary-Huard, T.: Le critère BIC: fondements théoriques et interprétation. Research report, INRIA (2006)Google Scholar
  11. 11.
    McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, Hoboken (2000)CrossRefzbMATHGoogle Scholar
  12. 12.
    McLachlan, G., Basford, K.: Mixture models. Inference and applications to clustering. In: Statistics: Textbooks and Monographs, vol. 1. Dekker, New York (1988)Google Scholar
  13. 13.
    Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Shental, N., Bar-Hillel, A., Hertz, T., Weinshall, D.: Computing Gaussian mixture models with EM using side-information. In: Proceedings of 20th International Conference on Machine Learning. Citeseer (2003)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.ICD - LM2S - UMR 6281 CNRS - Troyes University of TechnologyTroyesFrance

Personalised recommendations