Advertisement

Journal of Statistical Theory and Practice

, Volume 8, Issue 2, pp 141–165 | Cite as

Noncentralities Induced in Regression Diagnostics

  • D. R. Jensen
  • D. E. Ramirez
Article

Abstract

Anomalies persist in the use of deletion diagnostics in regression. Tests for outliers under subset deletions utilize the R-Fisher FI statistics, each having a noncentral F-distribution with noncentrality parameter λ as a function of shifts only at deleted rows in the index set I. Numerous studies examine empirical outcomes of these diagnostics in random experiments. In contrast, studies here are probabilistic, examining distributions behind those empirical outcomes and tracking the effects of shifts at nondeleted rows. By allowing shifts at nondeleted rows in a set J, in addition to traditional shifts at deleted rows in I, FI is shown to have a doubly noncentral F-distribution. By removing the unnecessary restriction that shifts occur only at deleted rows, these findings support constructs akin to power curves in tracking probabilities of masking or swamping as shifts evolve. In addition, “regression effects” among outliers may have unforeseen consequences. A dichotomy of shifts is discovered as projections into the “regressor” and “error” spaces of a model. Hidden shifts at nondeleted rows can obfuscate not only meanings ascribed to traditional outlier diagnostics, but also to subset influence diagnostics corresponding one-to-one with FI. In short, despite wide usage abetted by software support, deletion diagnostics in current vogue no longer can be recommended to achieve objectives traditionally cited. Case studies illustrate the debilitating effects of these anomalies in practice, together with conclusions misleading to prospective users.

Keywords

Subset leverages Coleverages Vector outliers Regression diagnostics 

AMS Subject Classification

62J05 62J20 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrews, D. F., and T. Pregibon. 1978. Finding outliers that matter. J. R. Stat. Soc. B, 40, 85–93.zbMATHGoogle Scholar
  2. Atkinson, A. C. 1985. Plots, transformations, and regression. Oxford, U.K.: Oxford University Press.zbMATHGoogle Scholar
  3. Barnett, V., and T. Lewis. 1984. Outliers in statistical data, 2nd ed. New York, NY: Wiley.zbMATHGoogle Scholar
  4. Beckman, R. J., and H. J. Trussell. 1974. The distribution of an arbitrary Studentized residual and the effects of updating in multiple regression. J. Am. Stat. Assoc. 69, 199–201.MathSciNetCrossRefGoogle Scholar
  5. Belsley, D. A., E. Kuh, and R. E. Welsch. 1980. Regression diagnostics: Identifying influential data and sources of collinearity. New York, NY: Wiley.CrossRefGoogle Scholar
  6. Box, G. E. P., and K. B. Wilson. 1951. On the experimental attainment of optimum conditions. J. R. Stat. Soc. B, 13, 1–45.MathSciNetzbMATHGoogle Scholar
  7. Bulgren, W. 1971. On representations of the doubly non-central F distribution. J. Am. Stat. Assoc., 66, 184–186.zbMATHGoogle Scholar
  8. Chatterjee, S., and A. S. Hadi. 1986. Influential observations, high leverage points, and outliers in linear regression. Stat. Sci., 1, 379–393.MathSciNetCrossRefGoogle Scholar
  9. Chatterjee, S., and A. S. Hadi. 1988. Sensitivity analysis in linear regression. New York, NY: Wiley.CrossRefGoogle Scholar
  10. Cook, R. D. 1977. Detection of influential observations in linear regression. Technometrics, 19, 15–18.MathSciNetzbMATHGoogle Scholar
  11. Cook, R. D. 1986. [Influential observations, high leverage points, and outliers in linear regression]: Comment. Stat. Sci., 1, 393–397.CrossRefGoogle Scholar
  12. Cook, R. D., and S. Weisberg. 1982. Residuals and influence in regression. London, UK: Chapman and Hall.zbMATHGoogle Scholar
  13. Draper, N. R., J. A. John. 1981. Influential observations and outliers in regression. Technometrics, 23, 21–26.MathSciNetCrossRefGoogle Scholar
  14. Ennis, D., and N. Johnson. 1993. Noncentral and central chi-square, F and beta distribution functions as special cases of the distribution function of an indefinite quadratic form. Commun. Stat. Theory Methods, 22, 897–905.MathSciNetCrossRefGoogle Scholar
  15. Fox, J. 1991. Regression diagnostics. Newbury Park, CA: Sage.CrossRefGoogle Scholar
  16. Gentleman, J. F., and W. B. Wilk. 1975. Detecting outliers. II. Supplementing the direct analysis of residuals. Biometrics, 31, 387–410.CrossRefGoogle Scholar
  17. Ghosh, S. 1978. On robustness of designs against incomplete data. Sankhyā Ser. B, 40, 204–208.MathSciNetzbMATHGoogle Scholar
  18. Das Gupta, S., and M. D. Perlman. 1974. Power of the noncentral F-test: Effect of additional variates on Hotelling’s T2-test. J. Am. Stat. Assoc., 69, 174–180.zbMATHGoogle Scholar
  19. Hoaglin, D. C., and P. J. Kempthorne. 1986. [Influential observations, high leverage points, and outliers in linear regression]: Comment. Stat. Sci., 1, 408–412.CrossRefGoogle Scholar
  20. Imhof, J. 1961. Computing the distribution of quadratic forms in normal variables. Biometrika, 48, 419–426.MathSciNetCrossRefGoogle Scholar
  21. Jensen, D. R. 2000. The use of Studentized diagnostics in regression. Metrika, 52, 213–223.MathSciNetCrossRefGoogle Scholar
  22. Jensen, D. R. 2001. Properties of selected subset diagnostics in regression. Stat. Prob. Lett., 51, 377–388.MathSciNetCrossRefGoogle Scholar
  23. Jensen, D. R., and D. E. Ramirez. 1996. Computing the CDF of Cook’s DI statistic. In Proceedings of the 12th Symposium in Computational Statistics ed. A. Prat, and E. Ripoll, 65–66. Barcelona, Spain: Institut d’Estadistica de Catalunya.Google Scholar
  24. Johnson, N. L., and S. Kotz. 1970. Distributions in statistics: Continuous univariate distributions—2. Boston, MA: Houghton Mifflin.zbMATHGoogle Scholar
  25. LaMotte, L. R. 1999. Collapsibility hypotheses and diagnostic bounds in regression analysis. Metrika, 50, 109–119.MathSciNetCrossRefGoogle Scholar
  26. Mahalanobis, P. C. 1936. On the generalized distance in statistics. Proc. Nat. Inst. Sci. India, 12, 49–55.zbMATHGoogle Scholar
  27. Myers, R. H. 1990. Classical and modern regression with applications, 2nd ed. Boston, MA: PWS-KENT.Google Scholar
  28. Rousseeuw, P. J., and A.M. Leroy. 1987. Robust regression and outlier detection. New York, NY: Wiley.CrossRefGoogle Scholar
  29. Snedecor, G. W., and W. G. Cochran. 1968. Statistical methods, 6th ed. Ames, IA: Iowa State University Press.zbMATHGoogle Scholar
  30. Welsch, R. E. 1982. Influence functions and regression diagnostics. In Modern data analysis, ed. R. L. Launer and A. F. Siegel, 149–169. New York, NY: Academic Press.CrossRefGoogle Scholar
  31. Welsch, R. E., and E. Kuh. 1977. Linear regression diagnostics. Technical Report 923–77, Cambridge, MA: Sloan School of Management, Massachusetts Institute of Technology.CrossRefGoogle Scholar

Copyright information

© Grace Scientific Publishing 2014

Authors and Affiliations

  1. 1.Department of StatisticsVirginia TechBlacksburgUSA
  2. 2.Department of MathematicsUniversity of VirginiaCharlottesvilleUSA

Personalised recommendations