Robust Statistical Engineering by Means of Scaled Bregman Distances

  • Anna-Lena Kißlinger
  • Wolfgang Stummer
Conference paper


We show how scaled Bregman distances can be used for the goal-oriented design of new outlier- and inlier robust statistical inference tools. Those extend several known distance-based robustness (respectively, stability) methods at once. Numerous special cases are illustrated, including 3D computer graphical comparison methods. For the discrete case, some universally applicable results on the asymptotics of the underlying scaled-Bregman-distance test statistics are derived as well.


Probability Mass Function Leibler Divergence Hellinger Distance Adjustment Function Bregman Divergence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We are indebted to Ingo Klein for inspiring suggestions and remarks. The second author would like to thank the authorities of the Indian Statistical Institute for their great hospitality. Furthermore, we are grateful to both anonymous referees for useful suggestions.


  1. Ali SM, Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J Roy Stat Soc B 28:131–142MathSciNetMATHGoogle Scholar
  2. Basu A, Lindsay BG (1994) Minimum disparity estimation for continuous models: efficiency, distributions and robustness. Ann Inst Statis Math 46:683–705MathSciNetCrossRefMATHGoogle Scholar
  3. Basu A, Sarkar S (1994) On disparity based goodness-of-fit tests for multinomial models. Statis Probab Lett 19:307–312MathSciNetCrossRefMATHGoogle Scholar
  4. Basu A, Harris IR, Hjort N, Jones M (1998) Robust and efficient estimation by minimising a density power divergence. Biometrika 85(3):549–559MathSciNetCrossRefMATHGoogle Scholar
  5. Basu A, Shioya H, Park C (2011) Statistical inference: the minimum distance approach. CRC, Boca RatonMATHGoogle Scholar
  6. Basu A, Mandal A, Martin N, Pardo L (2013) Testing statistical hypotheses based on the density power divergence. Ann Inst Statis Math 65(2):319–348MathSciNetCrossRefMATHGoogle Scholar
  7. Basu A, Mandal A, Martin N, Pardo L (2015a) Density power divergence tests for composite null hypotheses. arXiv:14030330v2
  8. Basu A, Mandal A, Martin N, Pardo L (2015b) Robust tests for the equality of two normal means based on the density power divergence. Metrika 78:611–634MathSciNetCrossRefMATHGoogle Scholar
  9. Beran RJ (1977) Minimum hellinger distance estimates for parametric models. Ann Stat 5:445–463MathSciNetCrossRefMATHGoogle Scholar
  10. Bregman LM (1967) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput Math Math Phys 7(3):200–217MathSciNetCrossRefMATHGoogle Scholar
  11. Csiszar I (1963) Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Publ Math Inst Hungar Acad Sci A 8:85–108MathSciNetMATHGoogle Scholar
  12. Csiszar I (1991) Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann Stat 19(4):2032–2066MathSciNetCrossRefMATHGoogle Scholar
  13. Csiszar I (1994) Maximum entropy and related methods. In: Transactions 12th Prague Conference Information Theory, Statistical Decision Functions and Random Processes, Czech Acad Sci Prague, pp 58–62Google Scholar
  14. Csiszar I (1995) Generalized projections for non-negative functions. Acta Mathematica Hungarica 68:161–186MathSciNetCrossRefMATHGoogle Scholar
  15. Csiszar I, Shields PC (2004) Information theory and statistics: a tutorial. now. Hanover, MassGoogle Scholar
  16. Dik JJ, de Gunst MCM (1985) The distribution of general quadratic forms in normal variables. Statistica Neerlandica 39:14–26MathSciNetCrossRefMATHGoogle Scholar
  17. Ghosh A, Basu A (2013) Robust estimation for independent non-homogeneous observations using density power divergence with applications to linear regression. Electron J Stat 7:2420–2456MathSciNetCrossRefMATHGoogle Scholar
  18. Ghosh A, Basu A (2014) Robust and efficient parameter etsimation based on censored data with stochastic covariates. arXiv:14105170v2
  19. Golan A (2003) Information and entropy econometrics editors view. J Econometrics 107:1–15Google Scholar
  20. Grabisch M, Marichal JL, Mesiar R, Pap E (2009) Aggregation functions. Cambridge University PressGoogle Scholar
  21. Kißlinger AL, Stummer W (2013) Some decision procedures based on scaled Bregman distance surfaces. In: Nielsen F, Barbaresco F (eds) GSI 2013, Lecture Notes in Computer Science LNCS, 8085. Springer, Berlin, pp 479–486Google Scholar
  22. Kißlinger AL, Stummer W (2015a) A new information-geometric method of change detection. PreprintGoogle Scholar
  23. Kißlinger AL, Stummer W (2015b) New model search for nonlinear recursive models, regressions and autoregressions. In: Nielsen F, Barbaresco F, SCSL (eds) GSI 2015, Lecture Notes in Computer Science LNCS 9389. Springer, Switzerland, pp 693–701Google Scholar
  24. Kotz S, Johnson N, Boyd D (1967) Series representations of distributions of quadratic forms in normal variables. i. central case. Ann Math Stat 38(3):823–837MathSciNetCrossRefMATHGoogle Scholar
  25. Liese F, Miescke KJ (2008) Statistical Decision Theory: Estimation, Testing, and Selection. Springer, New YorkGoogle Scholar
  26. Liese F, Vajda I (1987) Convex statistical distances. Teubner, LeipzigMATHGoogle Scholar
  27. Liese F, Vajda I (2006) On divergences and informations in statistics and information theory. IEEE Trans Inf Theory 52(10):4394–4412MathSciNetCrossRefMATHGoogle Scholar
  28. Lindsay BG (1994) Efficiency versus robustness: the case for minimum Hellinger distance and related methods. Ann Statis 22(2):1081–1114MathSciNetCrossRefMATHGoogle Scholar
  29. Maasoumi E (1993) A compendium to information theory in economics and econometrics. Econom Rev 12(2):137–181Google Scholar
  30. Marhuenda Y, Morales D, Pardo JA, Pardo MC (2005) Choosing the best Rukhin goodness-of-fit statistics. Comp Statis Data Anal 49:643–662Google Scholar
  31. Pardo L (2006) Statistical inference based on divergence measures. Chapman & Hall/CRC, Taylor & Francis GroupGoogle Scholar
  32. Pardo MC, Vajda I (1997) About distances of discrete distributions satisfying the data processing theorem of information theory. IEEE Trans Inf Theory 43(4):1288–1293MathSciNetCrossRefMATHGoogle Scholar
  33. Pardo MC, Vajda I (2003) On asymptotic properties of information-theoretic divergences. IEEE Trans Inf Theory 49(7):1860–1868MathSciNetCrossRefMATHGoogle Scholar
  34. Read TRC, Cressie NAC (1988) Goodness-of-fit statistics for discrete multivariate data. Springer, New YorkCrossRefMATHGoogle Scholar
  35. Rukhin AL (1994) Optimal estimator for the mixture parameter by the method of moments and information affinity. In: Transactiona 12th Prague Conference Information Theory, Statistical Decision Functions and Random Processes. Czech Acad Sci, Prague, pp 214–216Google Scholar
  36. Serfling RJ (1980) Approximation theorems of mathematical statistics. Wiley Series in Probability and Mathematical StatisticsGoogle Scholar
  37. Stummer W (2004) Exponentials, diffusions, finance, entropy and information. Shaker, AachenMATHGoogle Scholar
  38. Stummer W (2007) Some Bregman distances between financial diffusion processes. Proc Appl Math Mech (PAMM) 7:1050,503–1050,504Google Scholar
  39. Stummer W, Lao W (2012) Limits of Bayesian decision related quantities of binomial asset price models. Kybernetika 48(4):750–767MathSciNetMATHGoogle Scholar
  40. Stummer W, Vajda I (2007) Optimal statistical decisions about some alternative financial models. J Econometrics 137:441–471Google Scholar
  41. Stummer W, Vajda I (2012) On Bregman distances and divergences of probability measures. IEEE Trans Inf Theory 58(3):1277–1288MathSciNetCrossRefGoogle Scholar
  42. Vajda I (1989) Theory of statistical inference and information. Kluwer, DordrechtMATHGoogle Scholar
  43. Vajda I, van der Meulen EC (2010) Goodness-of-fit criteria based on observations quantized by hypothetical and empirical percentiles. In: Karian Z, Dudewicz E (eds) Handbook of Fitting statistical distributions with R. CRC, Heidelberg, pp 917–994CrossRefGoogle Scholar
  44. Vapnik VN, Chervonenkis AY (1968) On the uniform convergence of frequencies of occurence of events to their probabilities. Sov Math Doklady 9(4):915–918, corrected reprint in: Schölkopf B et al (eds) (2013) Empirical Inference. Springer, Berlin, pp 7–12Google Scholar
  45. Voinov V, Nikulin M, Balakrishnan N (2013) Chi-squared goodness of fit tests with applications. Academic PressGoogle Scholar
  46. Zografos K, Ferentinos K, Papaioannou T (1990) Phi-divergence statistics: sampling properties and multinomial goodness of fit and divergence tests. Commun Statist A - Theory Meth 19(5):1785–1802MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  1. 1.Chair of Statistics and EconometricsUniversity of Erlangen-NürnbergNürnbergGermany
  2. 2.Department of MathematicsUniversity of Erlangen-NürnbergErlangenGermany
  3. 3.Affiliated Faculty Member of the School of Business and Economics, University of Erlangen-NürnbergNürnbergGermany

Personalised recommendations