Multivariate Gaussian densities are pervasive in pattern recognition and machine learning. A central operation that appears in most of these areas is to measure the difference between two multivariate Gaussians. Unfortunately, traditional measures based on the Kullback–Leibler (KL) divergence and the Bhattacharyya distance do not satisfy all metric axioms necessary for many algorithms. In this paper we propose a modification for the KL divergence and the Bhattacharyya distance, for multivariate Gaussian densities, that transforms the two measures into distance metrics. Next, we show how these metric axioms impact the unfolding process of manifold learning algorithms. Finally, we illustrate the efficacy of the proposed metrics on two different manifold learning algorithms when used for motion clustering in video data. Our results show that, in this particular application, the new proposed metrics lead to significant boosts in performance (at least 7%) when compared to other divergence measures.


Video Sequence Divergence Measure Video Clip Closed Form Expression Gaussian Density 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Abou–Moustafa, K., Ferrie, F.: A framework for hypothesis learning over sets of vectors. In: Proc. of 9th SIGKDD Workshop on Mining and Learning with Graphs, pp. 335–344. ACM (2011)Google Scholar
  2. 2.
    Abou-Moustafa, K., Shah, M., De La Torre, F., Ferrie, F.: Relaxed Exponential Kernels for Unsupervised Learning. In: Mester, R., Felsberg, M. (eds.) DAGM 2011. LNCS, vol. 6835, pp. 184–195. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  3. 3.
    Ali, S.M., Silvey, S.D.: A general class of coefficients of divergence of one distribution from another. J. of the Royal Statistical Society, Series B 28(1), 131–142 (1966)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for data representation. Neural Computation 15, 1373–1396 (2003)zbMATHCrossRefGoogle Scholar
  5. 5.
    Cao, G., Bachega, L., Bouman, C.: The sparse matrix transform for covariance estimation and analysis of high dimensional signals. IEEE. Trans. on Image Processing 20(3), 625–640 (2011)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Csiszár, I.: Information–type measures of difference of probability distributions and indirect observations. Studia Scientiarium Mathematicarum Hungarica 2, 299–318 (1967)zbMATHGoogle Scholar
  7. 7.
    Förstner, W., Moonen, B.: A metric for covariance matrices. Tech. rep., Dept. of Geodesy and Geo–Informatics, Stuttgart University (1999)Google Scholar
  8. 8.
    Gower, J., Legendre, P.: Metric and Euclidean properties of dissimilarity coefficients. J. of Classification 3, 5–48 (1986)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Kailath, T.: The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. on Communication Technology 15(1), 52–60 (1967)CrossRefGoogle Scholar
  10. 10.
    Kondor, R., Jebara, T.: A kernel between sets of vectors. In: ACM Proc. of ICML (2003)Google Scholar
  11. 11.
    Kreyszig, E. (ed.): Introductory functional Analysis with Applications. Wiley Classics Library (1989)Google Scholar
  12. 12.
    Kullback, S.: Information Theory and Statistics – Dover Edition. Dover, New York (1997)Google Scholar
  13. 13.
    Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proc. of IJCAI, pp. 674–679 (1981)Google Scholar
  14. 14.
    von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Mercer, J.: Functions of positive and negative type, and their connection with the theory of integral equations. Philosophical Trans. of the Royal Society of London, Series A 209, 415–446 (1909)zbMATHCrossRefGoogle Scholar
  16. 16.
    Moreno, P., Ho, P., Vasconcelos, N.: A Kullback–Leibler divergence based kernel for svm classification in multimedia applications. In: NIPS, vol. 16 (2003)Google Scholar
  17. 17.
    Pennec, X., Fillard, P., Ayache, N.: A Riemannian Framework for Tensor Computing. Tech. Rep. RR-5255, INRIA (July 2004)Google Scholar
  18. 18.
    Rao, C.R.: Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. (58), 326–337 (1945)Google Scholar
  19. 19.
    Schoenberg, I.: Metric spaces and positive definite functions. Trans. of the American Mathematical Society 44(3), 522–536 (1938)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Tenenbaum, J., de Silva, V., Langford, J.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)CrossRefGoogle Scholar
  21. 21.
    Young, G., Householder, A.: Discussion of a set of points in terms of their mutual distances. Psychometrika 3(1), 19–22 (1938)zbMATHCrossRefGoogle Scholar
  22. 22.
    Zha, H., Ding, C., Gu, M., He, X., Simon, H.: Spectral relaxation for k–means clustering. In: NIPS, vol. 13. MIT Press (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Karim T. Abou–Moustafa
    • 1
  • Frank P. Ferrie
    • 2
  1. 1.Robotics InstituteCarnegie Mellon UniversityPittsburghU.S.A
  2. 2.Dept. of Electrical & Computer EngineeringMcGill UniversityMontréalCanada

Personalised recommendations