Advertisement

Computational Statistics

, Volume 15, Issue 3, pp 373–390 | Cite as

Type S error rates for classical and Bayesian single and multiple comparison procedures

  • Andrew Gelman
  • Francis Tuerlinckx
Article

Summary

In classical statistics, the significance of comparisons (e.g., θ1− θ2) is calibrated using the Type 1 error rate, relying on the assumption that the true difference is zero, which makes no sense in many applications. We set up a more relevant framework in which a true comparison can be positive or negative, and, based on the data, you can state “θ1 > θ2 with confidence,” “θ2 > θ1 with confidence,” or “no claim with confidence.” We focus on the Type S (for sign) error, which occurs when you claim “θ1 > θ2 with confidence” when θ2> θ1 (or vice-versa). We compute the Type S error rates for classical and Bayesian confidence statements and find that classical Type S error rates can be extremely high (up to 50%). Bayesian confidence statements are conservative, in the sense that claims based on 95% posterior intervals have Type S error rates between 0 and 2.5%. For multiple comparison situations, the conclusions are similar.

Keywords

Bayesian Inference Multiple Comparisons Type 1 Error Type M Error Type S Error 

References

  1. Berger, J. O., and Delampandy, M. (1987). Testing precise hypotheses (with discussion). Statistical Science, 2, 317–352.MathSciNetCrossRefGoogle Scholar
  2. Berger, J. O., and Sellke, T. (1987). Testing a point null hypothesis: the irreconcilability of P-values and evidence (with discussion). Journal of the American Statistical Association, 82, 112–139.MathSciNetzbMATHGoogle Scholar
  3. Carlin, B. P., and Louis, T. A. (1996). Bayes and Empirical Bayes Methods for Data Analysis. London: Chapman and Hall.zbMATHGoogle Scholar
  4. Casella, G., and Berger, R. L. (1987). Reconciling Bayesian and frequentist evidence in the one-sided testing problem (with discussion). Journal of the American Statistical Association, 82, 106–111.MathSciNetCrossRefGoogle Scholar
  5. Gelman, A. (1996). Discussion of “Hierarchical generalized linear models,” by Y. Lee and J. A. Neider. Journal of the Royal Statistical Society B.Google Scholar
  6. Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (1995). Bayesian Data Analysis. London: Chapman and Hall.CrossRefGoogle Scholar
  7. Gelman, A., and Little, T. C. (1997). Poststratification into many categories using hierarchical logistic regression. Survey Methodology, 23, 127–135.Google Scholar
  8. Harris, R. J. (1997). Reforming significance testing via three-valued logic. In What if there were no Significance Tests?, ed. L. L. Harlow, S. A. Mulaik, and J. H. Steiger, 145–174. Mahwah, N.J.: Lawrence Erlbaum Associates.Google Scholar
  9. Klockars, A.J., and Sax, G. (1986). Multiple Comparisons. Newbury Park: Sage.CrossRefGoogle Scholar
  10. Kirk, R. E. (1995). Experimental Design: Procedures for the Behavioral Sciences, third edition. Brooks/Cole.Google Scholar
  11. Maghsoodloo, S., and Huang, C. L. (1995) Computing probability integrals of a bivariate normal distribution. Interstat. http://interstat.stat.vt.edu/
  12. Meng, X. L. (1994). Posterior predictive p-values. Annals of Statistics, 22, 1142–1160.MathSciNetCrossRefGoogle Scholar
  13. Morris, C. (1983). Parametric empirical Bayes inference: theory and applications (with discussion). Journal of the American Statistical Association, 78, 47–65.MathSciNetCrossRefGoogle Scholar
  14. Pruzek, R. M. (1997). An introduction to Bayesian inference and its applications. In What if there were no Significance Tests?, ed. L. L. Harlow, S. A. Mulaik, and J. H. Steiger, 287–318. Mahwah, N.J.: Lawrence Erlbaum Associates.Google Scholar
  15. Rindskopf, D. M. (1997). Testing “small,” not null, hypotheses: classical and Bayesian approaches. In What if there were no Significance Tests?, ed. L. L. Harlow, S. A. Mulaik, and J. H. Steiger, 319–332. Mahwah, N.J.: Lawrence Erlbaum Associates.Google Scholar
  16. Robins, J. M., van der Vaart, A., and Ventura, V. (1998). The asymptotic distribution of p-values in composite null models. Technical report.Google Scholar
  17. Robinson, G. K. (1991). That BLUP is a good thing: the estimation of random effects (with discussion). Statistical Science, 6, 15–51.MathSciNetCrossRefGoogle Scholar
  18. Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12, 1151–1172.MathSciNetCrossRefGoogle Scholar
  19. Scheffe, H. (1959). The Analysis of Variance. New York: Wiley.zbMATHGoogle Scholar
  20. Tukey, J. W. (1960). Conclusions vs. decisions. Technometrics, 2, 423–433.MathSciNetCrossRefGoogle Scholar

Copyright information

© Physica-Verlag 2000

Authors and Affiliations

  • Andrew Gelman
    • 1
  • Francis Tuerlinckx
    • 2
  1. 1.Department of StatisticsColumbia UniversityNew YorkUSA
  2. 2.Department of PsychologyUniversity of LeuvenLeuvenBelgium

Personalised recommendations