Statistical Distances and Their Role in Robustness

Markatou, Marianthi; Chen, Yang; Afendras, Georgios; Lindsay, Bruce G.

doi:10.1007/978-3-319-69416-0_1

Marianthi Markatou⁹,
Yang Chen¹⁰,
Georgios Afendras⁹ &
…
Bruce G. Lindsay¹¹

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

1864 Accesses
14 Citations
1 Altmetric

Abstract

Statistical distances, divergences, and similar quantities have a large history and play a fundamental role in statistics, machine learning and associated scientific disciplines. However, within the statistical literature, this extensive role has too often been played out behind the scenes, with other aspects of the statistical problems being viewed as more central, more interesting, or more important. The behind the scenes role of statistical distances shows up in estimation, where we often use estimators based on minimizing a distance, explicitly or implicitly, but rarely studying how the properties of a distance determine the properties of the estimators. Distances are also prominent in goodness-of-fit, but the usual question we ask is “how powerful is this method against a set of interesting alternatives” not “what aspect of the distance between the hypothetical model and the alternative are we measuring?”

Our focus is on describing the statistical properties of some of the distance measures we have found to be most important and most visible. We illustrate the robust nature of Neyman’s chi-squared and the non-robust nature of Pearson’s chi-squared statistics and discuss the concept of discretization robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Basu, A., & Lindsay, B. G. (1994). Minimum disparity estimation for continuous models: Efficiency, distributions and robustness. Annals of the Institute of Statistical Mathematics, 46, 683–705.
Article MathSciNet MATH Google Scholar
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New York: Wiley.
Book MATH Google Scholar
Beran, R. (1977). Minimum Hellinger distance estimates for parametric models. Annals of Statistics, 5, 445–463.
Article MathSciNet MATH Google Scholar
Berkson, J. (1980). Minimum chi-square, not maximum likelihood! Annals of Statistics, 8, 457–487.
Article MathSciNet MATH Google Scholar
Chatterjee, S., & Hadi, A. S. (1986). Influential observations, high leverage points, and outliers in linear regression. Statistical Science, 1, 379–393.
Article MathSciNet MATH Google Scholar
Davies, L., & Gather, U. (1993). The identification of multiple outliers. Journal of the American Statistical Association, 88, 782–792.
Article MathSciNet MATH Google Scholar
Lindsay, B. G. (1994). Efficiency versus robustness: The case for minimum Hellinger distance and related methods. Annals of Statistics, 22, 1081–1114.
Article MathSciNet MATH Google Scholar
Lindsay, B. G. (2004). Statistical distances as loss functions in assessing model adequacy. In The nature of scientific evidence: Statistical, philosophical and empirical considerations (pp. 439–488). Chicago: The University of Chicago Press.
Chapter Google Scholar
Lindsay, B. G., Markatou, M., & Ray, S. (2014). Kernels, degrees of freedom, and power properties of quadratic distance goodness-of-fit tests. Journal of the American Statistical Association, 109, 395–410.
Article MathSciNet MATH Google Scholar
Lindsay, B. G., Markatou, M., Ray, S., Yang, K., & Chen, S. C. (2008). Quadratic distances on probabilities: A unified foundation. Annals of Statistics, 36, 983–1006.
Article MathSciNet MATH Google Scholar
Markatou, M. (2000). Mixture models, robustness, and the weighted likelihood methodology. Biometrics, 56, 483–486.
Article MATH Google Scholar
Markatou, M. (2001). A closer look at weighted likelihood in the context of mixture. In C. A. Charalambides, M. V. Koutras, & N. Balakrishnan (Eds.), Probability and statistical models with applications (pp. 447–468). Boca Raton: Chapman & Hall/CRC.
Google Scholar
Markatou, M., Basu, A., & Lindsay, B. G. (1997). Weighted likelihood estimating equations: The discrete case with applications to logistic regression. Journal of Statistical Planning and Inference, 57, 215–232.
Article MathSciNet MATH Google Scholar
Markatou, M., Basu, A., & Lindsay, B. G. (1998). Weighted likelihood equations with bootstrap root search. Journal of the American Statistical Association, 93, 740–750.
Article MathSciNet MATH Google Scholar
Matusita, K. (1955). Decision rules, based on the distance, for problems of fit, two samples, and estimation. Annals of Mathematical Statistics, 26, 613–640.
Article MathSciNet MATH Google Scholar
Neyman, J. (1949). Contribution to the theory of the χ ² test. In Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability (pp. 239–273). Berkeley: The University of California Press.
Google Scholar
Poor, H. (1980). Robust decision design using a distance criterion. IEEE Transactions on Information Theory, 26, 575–587.
Article MathSciNet MATH Google Scholar
Rao, C. R. (1963). Criteria of estimation in large samples. Sankhya Series A, 25, 189–206.
MathSciNet MATH Google Scholar
Rao, C. R. (1982). Diversity: Its measurement, decomposition, apportionment and analysis. Sankhya Series A, 44, 1–21.
MathSciNet MATH Google Scholar
Wald, A. (1950). Statistical decision functions. New York: Wiley.
MATH Google Scholar

Download references

Acknowledgements

The first author dedicates this paper to the memory of Professor Bruce G. Lindsay, a long time collaborator and friend, with much respect and appreciation for his mentoring and friendship. She also acknowledges the Department of Biostatistics, School of Public Health and Health Professions and the Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, for supporting this work. The second author acknowledges the Troup Fund, Kaleida Foundation for supporting this work.

Author information

Authors and Affiliations

Department of Biostatistics, SPHHP and Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
Marianthi Markatou & Georgios Afendras
Department of Biostatistics, University at Buffalo, Buffalo, NY, 14214, USA
Yang Chen
Department of Statistics, The Pennsylvania State University, University Park, PA, 16820, USA
Bruce G. Lindsay

Authors

Marianthi Markatou
View author publications
You can also search for this author in PubMed Google Scholar
Yang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Afendras
View author publications
You can also search for this author in PubMed Google Scholar
Bruce G. Lindsay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marianthi Markatou .

Editor information

Editors and Affiliations

University of North Carolina, Chapel Hill, North Carolina, USA
Ding-Geng Chen
Columbia University, New York, New York, USA
Zhezhen Jin
University of California, Los Angeles, California, USA
Gang Li
University of Michigan-Ann Arbor, Ann Arbor, Michigan, USA
Yi Li
National Institutes of Health, Bethesda, Maryland, USA
Aiyi Liu
Georgia State University, Atlanta, Georgia, USA
Yichuan Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Markatou, M., Chen, Y., Afendras, G., Lindsay, B.G. (2017). Statistical Distances and Their Role in Robustness. In: Chen, DG., Jin, Z., Li, G., Li, Y., Liu, A., Zhao, Y. (eds) New Advances in Statistics and Data Science. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-69416-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-69416-0_1
Published: 18 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69415-3
Online ISBN: 978-3-319-69416-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics