A New Distance for Data Sets in a Reproducing Kernel Hilbert Space Context

  • Alberto Muñoz
  • Gabriel Martos
  • Javier González
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8258)


In this paper we define distance functions for data sets in a reproduncing kernel Hilbert space (RKHS) context. To this aim we introduce kernels for data sets that provide a metrization of the power set. The proposed distances take into account the underlying generating probability distributions. In particular, we propose kernel distances that rely on the estimation of density level sets of the underlying data distributions, and that can be extended from data sets to probability measures. The performance of the proposed distances is tested on several simulated and real data sets.


Bregman Divergence Energy Distance Size 100d Kernel Distance Underlie Data Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ahlbrandt, C., Benson, G., Casey, W.: Minimal entropy probability paths between genome families. J. Math. Biol. 48(5), 563–590 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Dryden, I.L., Koloydenko, A., Zhou, D.: The Earth Mover’s Distance as a Metric for Image Retrieval. Internat. Journal of Comp. Vision 40, 99–121 (2000)CrossRefGoogle Scholar
  3. 3.
    Institute of Information Theory and Automation ASCR. LEAF - Tree Leaf Database. Prague, Czech Republic,
  4. 4.
    Müller, A.: Integral Probability Metrics and Their Generating Classes of Functions. Advances in Applied Probability 29(2), 429–443 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Muñoz, A., Moguerza, J.M.: Estimation of High-Density Regions using One-Class Neighbor Machines. IEEE Trans. on Pattern Analysis and Machine Intelligence 28(3), 476–480 (2006)CrossRefGoogle Scholar
  6. 6.
    Nguyen, X., Wainwright, M.J., Jordan, M.I.: Nonparametric Estimatimation of the Likelihood and Divergence Functionals. In: IEEE International Symposium on Information Theory (2007)Google Scholar
  7. 7.
    Otey, E., Parthasarathy, S.: A dissimilarity measure for comparing subsets of data: application to multivariate time series. In: Fifth IEEE International Conference on Data Mining, pp. 101–112 (2005)Google Scholar
  8. 8.
    Phillips, J., Venkatasubramanian, S.: A gentle introduction to the kernel distance. arXiv preprint, arXiv:1103.1625 (2011)Google Scholar
  9. 9.
    Rubner, Y., Tomasi, C., Guibas, L.J.: A Metric for Distributions with Applications to Image Databases. In: Sixth IEEE Conf. on Computer Vision, pp. 59–66 (1998)Google Scholar
  10. 10.
    Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Scholkopf, B.: Hilbert Space Embeddings and Metrics on Probability Measures. Journal of Machine Learning Research, 1297–1322 (2010)Google Scholar
  11. 11.
    Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Scholkopf, B., Lanckriet, G.R.G.: Non-parametric estimation of integral probability metrics. In: International Symposium on Information Theory (2010)Google Scholar
  12. 12.
    Székely, G.J., Rizzo, M.L.: Testing for Equal Distributions in High Dimension. InterStat (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Alberto Muñoz
    • 1
  • Gabriel Martos
    • 1
  • Javier González
    • 2
  1. 1.Department of StatisticsUniversity Carlos IIIMadridSpain
  2. 2.J. Bernoulli Institute for Mathematics and Computer ScienceUniversity of GroningenThe Netherlands

Personalised recommendations