Computational Statistics

, Volume 15, Issue 3, pp 373–390

# Type S error rates for classical and Bayesian single and multiple comparison procedures

• Andrew Gelman
• Francis Tuerlinckx
Article

## Summary

In classical statistics, the significance of comparisons (e.g., θ1− θ2) is calibrated using the Type 1 error rate, relying on the assumption that the true difference is zero, which makes no sense in many applications. We set up a more relevant framework in which a true comparison can be positive or negative, and, based on the data, you can state “θ1 > θ2 with confidence,” “θ2 > θ1 with confidence,” or “no claim with confidence.” We focus on the Type S (for sign) error, which occurs when you claim “θ1 > θ2 with confidence” when θ2> θ1 (or vice-versa). We compute the Type S error rates for classical and Bayesian confidence statements and find that classical Type S error rates can be extremely high (up to 50%). Bayesian confidence statements are conservative, in the sense that claims based on 95% posterior intervals have Type S error rates between 0 and 2.5%. For multiple comparison situations, the conclusions are similar.

## Keywords

Bayesian Inference Multiple Comparisons Type 1 Error Type M Error Type S Error

## References

1. Berger, J. O., and Delampandy, M. (1987). Testing precise hypotheses (with discussion). Statistical Science, 2, 317–352.
2. Berger, J. O., and Sellke, T. (1987). Testing a point null hypothesis: the irreconcilability of P-values and evidence (with discussion). Journal of the American Statistical Association, 82, 112–139.
3. Carlin, B. P., and Louis, T. A. (1996). Bayes and Empirical Bayes Methods for Data Analysis. London: Chapman and Hall.
4. Casella, G., and Berger, R. L. (1987). Reconciling Bayesian and frequentist evidence in the one-sided testing problem (with discussion). Journal of the American Statistical Association, 82, 106–111.
5. Gelman, A. (1996). Discussion of “Hierarchical generalized linear models,” by Y. Lee and J. A. Neider. Journal of the Royal Statistical Society B.Google Scholar
6. Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (1995). Bayesian Data Analysis. London: Chapman and Hall.
7. Gelman, A., and Little, T. C. (1997). Poststratification into many categories using hierarchical logistic regression. Survey Methodology, 23, 127–135.Google Scholar
8. Harris, R. J. (1997). Reforming significance testing via three-valued logic. In What if there were no Significance Tests?, ed. L. L. Harlow, S. A. Mulaik, and J. H. Steiger, 145–174. Mahwah, N.J.: Lawrence Erlbaum Associates.Google Scholar
9. Klockars, A.J., and Sax, G. (1986). Multiple Comparisons. Newbury Park: Sage.
10. Kirk, R. E. (1995). Experimental Design: Procedures for the Behavioral Sciences, third edition. Brooks/Cole.Google Scholar
11. Maghsoodloo, S., and Huang, C. L. (1995) Computing probability integrals of a bivariate normal distribution. Interstat. http://interstat.stat.vt.edu/
12. Meng, X. L. (1994). Posterior predictive p-values. Annals of Statistics, 22, 1142–1160.
13. Morris, C. (1983). Parametric empirical Bayes inference: theory and applications (with discussion). Journal of the American Statistical Association, 78, 47–65.
14. Pruzek, R. M. (1997). An introduction to Bayesian inference and its applications. In What if there were no Significance Tests?, ed. L. L. Harlow, S. A. Mulaik, and J. H. Steiger, 287–318. Mahwah, N.J.: Lawrence Erlbaum Associates.Google Scholar
15. Rindskopf, D. M. (1997). Testing “small,” not null, hypotheses: classical and Bayesian approaches. In What if there were no Significance Tests?, ed. L. L. Harlow, S. A. Mulaik, and J. H. Steiger, 319–332. Mahwah, N.J.: Lawrence Erlbaum Associates.Google Scholar
16. Robins, J. M., van der Vaart, A., and Ventura, V. (1998). The asymptotic distribution of p-values in composite null models. Technical report.Google Scholar
17. Robinson, G. K. (1991). That BLUP is a good thing: the estimation of random effects (with discussion). Statistical Science, 6, 15–51.
18. Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12, 1151–1172.
19. Scheffe, H. (1959). The Analysis of Variance. New York: Wiley.
20. Tukey, J. W. (1960). Conclusions vs. decisions. Technometrics, 2, 423–433.