Skip to main content

Statistical Distances and Their Role in Robustness

  • Chapter
  • First Online:
New Advances in Statistics and Data Science

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

Abstract

Statistical distances, divergences, and similar quantities have a large history and play a fundamental role in statistics, machine learning and associated scientific disciplines. However, within the statistical literature, this extensive role has too often been played out behind the scenes, with other aspects of the statistical problems being viewed as more central, more interesting, or more important. The behind the scenes role of statistical distances shows up in estimation, where we often use estimators based on minimizing a distance, explicitly or implicitly, but rarely studying how the properties of a distance determine the properties of the estimators. Distances are also prominent in goodness-of-fit, but the usual question we ask is “how powerful is this method against a set of interesting alternatives” not “what aspect of the distance between the hypothetical model and the alternative are we measuring?”

Our focus is on describing the statistical properties of some of the distance measures we have found to be most important and most visible. We illustrate the robust nature of Neyman’s chi-squared and the non-robust nature of Pearson’s chi-squared statistics and discuss the concept of discretization robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Basu, A., & Lindsay, B. G. (1994). Minimum disparity estimation for continuous models: Efficiency, distributions and robustness. Annals of the Institute of Statistical Mathematics, 46, 683–705.

    Article  MathSciNet  MATH  Google Scholar 

  • Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New York: Wiley.

    Book  MATH  Google Scholar 

  • Beran, R. (1977). Minimum Hellinger distance estimates for parametric models. Annals of Statistics, 5, 445–463.

    Article  MathSciNet  MATH  Google Scholar 

  • Berkson, J. (1980). Minimum chi-square, not maximum likelihood! Annals of Statistics, 8, 457–487.

    Article  MathSciNet  MATH  Google Scholar 

  • Chatterjee, S., & Hadi, A. S. (1986). Influential observations, high leverage points, and outliers in linear regression. Statistical Science, 1, 379–393.

    Article  MathSciNet  MATH  Google Scholar 

  • Davies, L., & Gather, U. (1993). The identification of multiple outliers. Journal of the American Statistical Association, 88, 782–792.

    Article  MathSciNet  MATH  Google Scholar 

  • Lindsay, B. G. (1994). Efficiency versus robustness: The case for minimum Hellinger distance and related methods. Annals of Statistics, 22, 1081–1114.

    Article  MathSciNet  MATH  Google Scholar 

  • Lindsay, B. G. (2004). Statistical distances as loss functions in assessing model adequacy. In The nature of scientific evidence: Statistical, philosophical and empirical considerations (pp. 439–488). Chicago: The University of Chicago Press.

    Chapter  Google Scholar 

  • Lindsay, B. G., Markatou, M., & Ray, S. (2014). Kernels, degrees of freedom, and power properties of quadratic distance goodness-of-fit tests. Journal of the American Statistical Association, 109, 395–410.

    Article  MathSciNet  MATH  Google Scholar 

  • Lindsay, B. G., Markatou, M., Ray, S., Yang, K., & Chen, S. C. (2008). Quadratic distances on probabilities: A unified foundation. Annals of Statistics, 36, 983–1006.

    Article  MathSciNet  MATH  Google Scholar 

  • Markatou, M. (2000). Mixture models, robustness, and the weighted likelihood methodology. Biometrics, 56, 483–486.

    Article  MATH  Google Scholar 

  • Markatou, M. (2001). A closer look at weighted likelihood in the context of mixture. In C. A. Charalambides, M. V. Koutras, & N. Balakrishnan (Eds.), Probability and statistical models with applications (pp. 447–468). Boca Raton: Chapman & Hall/CRC.

    Google Scholar 

  • Markatou, M., Basu, A., & Lindsay, B. G. (1997). Weighted likelihood estimating equations: The discrete case with applications to logistic regression. Journal of Statistical Planning and Inference, 57, 215–232.

    Article  MathSciNet  MATH  Google Scholar 

  • Markatou, M., Basu, A., & Lindsay, B. G. (1998). Weighted likelihood equations with bootstrap root search. Journal of the American Statistical Association, 93, 740–750.

    Article  MathSciNet  MATH  Google Scholar 

  • Matusita, K. (1955). Decision rules, based on the distance, for problems of fit, two samples, and estimation. Annals of Mathematical Statistics, 26, 613–640.

    Article  MathSciNet  MATH  Google Scholar 

  • Neyman, J. (1949). Contribution to the theory of the χ 2 test. In Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability (pp. 239–273). Berkeley: The University of California Press.

    Google Scholar 

  • Poor, H. (1980). Robust decision design using a distance criterion. IEEE Transactions on Information Theory, 26, 575–587.

    Article  MathSciNet  MATH  Google Scholar 

  • Rao, C. R. (1963). Criteria of estimation in large samples. Sankhya Series A, 25, 189–206.

    MathSciNet  MATH  Google Scholar 

  • Rao, C. R. (1982). Diversity: Its measurement, decomposition, apportionment and analysis. Sankhya Series A, 44, 1–21.

    MathSciNet  MATH  Google Scholar 

  • Wald, A. (1950). Statistical decision functions. New York: Wiley.

    MATH  Google Scholar 

Download references

Acknowledgements

The first author dedicates this paper to the memory of Professor Bruce G. Lindsay, a long time collaborator and friend, with much respect and appreciation for his mentoring and friendship. She also acknowledges the Department of Biostatistics, School of Public Health and Health Professions and the Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, for supporting this work. The second author acknowledges the Troup Fund, Kaleida Foundation for supporting this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marianthi Markatou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Markatou, M., Chen, Y., Afendras, G., Lindsay, B.G. (2017). Statistical Distances and Their Role in Robustness. In: Chen, DG., Jin, Z., Li, G., Li, Y., Liu, A., Zhao, Y. (eds) New Advances in Statistics and Data Science. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-69416-0_1

Download citation

Publish with us

Policies and ethics