Hartigan’s Method for \(k\)-MLE: Mixture Modeling with Wishart Distributions and Its Application to Motion Retrieval

  • Christophe Saint-JeanEmail author
  • Frank Nielsen
Part of the Signals and Communication Technology book series (SCT)


We describe a novel algorithm called \(k\)-Maximum Likelihood Estimator (\(k\)-MLE) for learning finite statistical mixtures of exponential families relying on Hartigan’s \(k\)-means swap clustering method. To illustrate this versatile Hartigan \(k\)-MLE technique, we consider the exponential family of Wishart distributions and show how to learn their mixtures. First, given a set of symmetric positive definite observation matrices, we provide an iterative algorithm to estimate the parameters of the underlying Wishart distribution which is guaranteed to converge to the MLE. Second, two initialization methods for \(k\)-MLE are proposed and compared. Finally, we propose to use the Cauchy-Schwartz statistical divergence as a dissimilarity measure between two Wishart mixture models and sketch a general methodology for building a motion retrieval system.


Mixture modeling Wishart \(k\)-MLE Bregman divergences Motion retrieval 


  1. 1.
    McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley Series in Probability and Statistics. Wiley-Interscience, New York (2008)Google Scholar
  2. 2.
    Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)zbMATHMathSciNetGoogle Scholar
  3. 3.
    Nielsen, F.: \(k\)-MLE: a fast algorithm for learning statistical mixture models. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 869–872 (2012). Long version as arXiv:1203.5181
  4. 4.
    Jain, A.K.: Data clustering: 50 years beyond \(K\)-means. Pattern Recogn. Lett. 31, 651–666 (2010)CrossRefGoogle Scholar
  5. 5.
    Wishart, J.: The generalised product moment distribution in samples from a Normal multivariate population. Biometrika 20(1/2), 32–52 (1928)CrossRefGoogle Scholar
  6. 6.
    Tsai, M.-T.: Maximum likelihood estimation of Wishart mean matrices under Lwner order restrictions. J. Multivar. Anal. 98(5), 932–944 (2007)CrossRefzbMATHGoogle Scholar
  7. 7.
    Formont, P., Pascal, T., Vasile, G., Ovarlez, J.-P., Ferro-Famil, L.: Statistical classification for heterogeneous polarimetric SAR images. IEEE J. Sel. Top. Sign. Proces. 5(3), 567–576 (2011)CrossRefGoogle Scholar
  8. 8.
    Jian, B., Vemuri, B.: Multi-fiber reconstruction from diffusion MRI using mixture of wisharts and sparse deconvolution. In: Information Processing in Medical Imaging, pp. 384–395, Springer, Berlin (2007)Google Scholar
  9. 9.
    Cherian, A., Morellas, V., Papanikolopoulos, N., Bedros, S.: Dirichlet process mixture models on symmetric positive definite matrices for appearance clustering in video surveillance applications. In: Computer Vision and Pattern Recognition (CVPR), pp. 3417–3424 (2011)Google Scholar
  10. 10.
    Nielsen, F., Garcia, V.: Statistical exponential families: a digest with flash cards. Accessed Nov 2009
  11. 11.
    Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1997)Google Scholar
  12. 12.
    Wainwright, M.J., Jordan, M.J.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)zbMATHGoogle Scholar
  13. 13.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. (Methodological). 39 1–38 (1977)Google Scholar
  14. 14.
    Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14(3), 315–332 (1992)CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A \(k\)-means clustering algorithm. J. Roy. Stat. Soc. C (Applied Statistics). 28(1), 100–108 (1979)Google Scholar
  16. 16.
    Telgarsky, M., Vattani, A.: Hartigan’s method: \(k\)-means clustering without Voronoi. In: Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 820–827 (2010)Google Scholar
  17. 17.
    Nielsen, F., Boissonnat, J.D., Nock, R.: On Bregman Voronoi diagrams. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 746–755 (2007)Google Scholar
  18. 18.
    Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Networks 16(3), 645–678 (2005)CrossRefGoogle Scholar
  19. 19.
    Kulis, B., Jordan, M.I.: Revisiting \(k\)-means: new algorithms via Bayesian nonparametrics. In: International Conference on Machine Learning (ICML) (2012)Google Scholar
  20. 20.
    Ackermann, M.R.: Algorithms for the Bregman \(K\)-median problem. PhD thesis. Paderborn University (2009)Google Scholar
  21. 21.
    Arthur, D., Vassilvitskii, S.: \(k\)-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)Google Scholar
  22. 22.
    Ji, S., Krishnapuram, B., Carin, L.: Variational Bayes for continuous hidden Markov models and its application to active learning. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 522–532 (2006)CrossRefGoogle Scholar
  23. 23.
    Hidot, S., Saint-Jean, C.: An Expectation-Maximization algorithm for the Wishart mixture model: application to movement clustering. Pattern Recogn. Lett. 31(14), 2318–2324 (2010)CrossRefGoogle Scholar
  24. 24.
    Brent. R.P.: Algorithms for Minimization Without Derivatives. Courier Dover Publications, Mineola (1973)Google Scholar
  25. 25.
    Bezdek, J.C., Hathaway, R.J., Howard, R.E., Wilson, C.A., Windham, M.P.: Local convergence analysis of a grouped variable version of coordinate descent. J. Optim. Theory Appl. 54(3), 471–477 (1987)CrossRefzbMATHMathSciNetGoogle Scholar
  26. 26.
    Bogdan, K., Bogdan, M.: On existence of maximum likelihood estimators in exponential families. Statistics 34(2), 137–149 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  27. 27.
    Ciuperca, G., Ridolfi, A., Idier, J.: Penalized maximum likelihood estimator for normal mixtures. Scand. J. Stat. 30(1), 45–59 (2003)CrossRefzbMATHMathSciNetGoogle Scholar
  28. 28.
    Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)Google Scholar
  29. 29.
    Nielsen, F.: Closed-form information-theoretic divergences for statistical mixtures. In: International Conference on Pattern Recognition (ICPR), pp. 1723–1726 (2012)Google Scholar
  30. 30.
    Haff, L.R., Kim, P.T., Koo, J.-Y., Richards, D.: Minimax estimation for mixtures of Wishart distributions. Ann. Stat. 39(6), 3417–3440 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  31. 31.
    Jebara, T., Kondor, R., Howard, A.: Probability product kernels. J. Mach. Learn. Res. 5, 819–844 (2004)zbMATHMathSciNetGoogle Scholar
  32. 32.
    Moreno, P.J., Ho, P., Vasconcelos, N.: A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. In: Advances in Neural Information Processing Systems (2003)Google Scholar
  33. 33.
    Petersen, K.B., Pedersen, M.S.: The matrix cookbook. Accessed Nov 2012

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Mathématiques, Image, Applications (MIA)Université de La RochelleLa RochelleFrance 
  2. 2.Sony Computer Science Laboratories, Inc. Shinagawa-Ku, TokyoJapan
  3. 3.Laboratoire d’Informatique (LIX)Ecole PolytechniquePalaiseau CedexFrance

Personalised recommendations