Handling Multiplicity in Neuroimaging Through Bayesian Lenses with Multilevel Modeling
- 510 Downloads
Here we address the current issues of inefficiency and over-penalization in the massively univariate approach followed by the correction for multiple testing, and propose a more efficient model that pools and shares information among brain regions. Using Bayesian multilevel (BML) modeling, we control two types of error that are more relevant than the conventional false positive rate (FPR): incorrect sign (type S) and incorrect magnitude (type M). BML also aims to achieve two goals: 1) improving modeling efficiency by having one integrative model and thereby dissolving the multiple testing issue, and 2) turning the focus of conventional null hypothesis significant testing (NHST) on FPR into quality control by calibrating type S errors while maintaining a reasonable level of inference efficiency. The performance and validity of this approach are demonstrated through an application at the region of interest (ROI) level, with all the regions on an equal footing: unlike the current approaches under NHST, small regions are not disadvantaged simply because of their physical size. In addition, compared to the massively univariate approach, BML may simultaneously achieve increased spatial specificity and inference efficiency, and promote results reporting in totality and transparency. The benefits of BML are illustrated in performance and quality checking using an experimental dataset. The methodology also avoids the current practice of sharp and arbitrary thresholding in the p-value funnel to which the multidimensional data are reduced. The BML approach with its auxiliary tools is available as part of the AFNI suite for general use.
KeywordsNull Hypothesis Significance Testing (NHST) False Positive Rate (FPR) Type S and type M errors Regions of Interest (ROIs) General Linear Model (GLM) Linear Mixed-Effects (LME) modeling Bayesian Multilevel (BML) modeling Markov Chain Monte Carlo (MCMC) Stan Priors Leave-one-out (LOO) cross-validation
The research and writing of the paper were supported (GC, PAT, and RWC) by the NIMH and NINDS Intramural Research Programs (ZICMH002888) of the NIH/HHS, USA, and by the NIH grant R01HD079518A to TR and ER. Much of the modeling work here was inspired from Andrew Gelman’s blog. We are indebted to Paul-Christian Bürkner and the Stan development team members Ben Goodrich, Daniel Simpson, Jonah Sol Gabry, Bob Carpenter, and Michael Betancourt for their help and technical support. The simulations were performed in the R language for statistical computing and the figures were generated with the R package ggplot2 (Wickham 2009).
- Amrhein, V., & Greenland, S. (2017). Remove, rather than redefine, statistical significance. Nature Human Behavior, 1, 0224.Google Scholar
- Bates, B., Maechler, M., Bolker, B., Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1–48.Google Scholar
- Benjamin, D.J., Berger, J., Johannesson, M., Nosek, B.A., Wagenmakers, E.-J., Berk, R., Johnson, É.V. (2017). Redefine statistical significance. Nature Human Behavior, 1, 0189.Google Scholar
- Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57, 289–300.Google Scholar
- Chen, G., Taylor, P.A., Haller, S.P., Kircanski, K., Stoddard, J., Pine, D.S., Leibenluft, E., Brotman, M.A., Cox, R.W. (2018a). Intraclass correlation: improved modeling approaches and applications for neuroimaging. Human Brain Mapping, 39(3), 1187–1206. https://doi.org/10.1002/hbm.23909.PubMedPubMedCentralGoogle Scholar
- Chen, G., Cox, R.W., Glen, D.R., Rajendra, J.K., Reynolds, R.C., Taylor, P.A. (2018b). A tail of two sides: Artificially doubled false positive rates in neuroimaging due to the sidedness choice with t-tests. Human Brain Mapping. In press.Google Scholar
- Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003.Google Scholar
- Cox, R.W., Chen, G., Glen, D.R., Reynolds, R.C., Taylor, P.A. (2017). FMRI clustering in AFNI: false-positive rates redux. Brain Connection, 7(3), 152–171.Google Scholar
- Cox, R.W. (2018). Equitable Thresholding and Clustering. In preparation.Google Scholar
- Cox, R.W., & Taylor, P.A. (2017). Stability of Spatial Smoothness and Cluster-Size Threshold Estimates in FMRI using AFNI. arXiv:1709.07471.
- Forman, S.D., Cohen, J.D., Fitzgerald, M., Eddy, W.F., Mintun, M.A., Noll, D.C. (1995). Improved assessment of significant activation in functional magnetic resonance imaging (fMRI): use of a cluster-size threshold. Magnetic Resonance Medicine, 33, 636– 647.Google Scholar
- Gelman, A. (2015). Statistics and the crisis of scientific replication. Significance, 12(3), 23–25.Google Scholar
- Gelman, A. (2016). The problems with p-values are not just with p-values. The American Statistician, Online Discussion.Google Scholar
- Gelman, A., & Carlin, J.B. (2014). Beyond power calculations: assessing type s (sign) and type m (magnitude) errors. Perspectives on Psychological Science, 1–11.Google Scholar
- Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B. (2014). Bayesian data analysis, Third edition. Boca Raton: Chapman & Hall/CRC Press.Google Scholar
- Gelman, A., & Hennig, C. (2017). Beyond subjective and objective in statistics. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(4), 1–31.Google Scholar
- Gelman, A., Hill, J., Yajima, M. (2012). Why we (usually) don’t have to worry about multiple comparisons. Journal of Research on Educational Effectiveness, 5, 189–211.Google Scholar
- Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no ”fishing expedition” or ”p-hacking” and the research hypothesis was posited ahead of time. http://www.stat.columbia.edu/gelman/research/unpublished/p_hacking.pdf.
- Gelman, A., Simpson, D., Betancourt, M. (2017). The prior can generally only be understood in the context of the likelihood. arXiv:1708.07487.
- Gelman, A., & Tuerlinckx, F. (2000). Type S error rates for classical and Bayesian single and multiple comparison procedures. Computational Statistics15, 373–390.Google Scholar
- Lewandowski, D., Kurowicka, D., Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100, 1989–2001.Google Scholar
- McElreath, R. (2016). Statistical Rethinking: a Bayesian course with examples in R and Stan. Boca Raton: Chapman & Hall/CRC Press.Google Scholar
- McShane, B.B., Gal, D., Gelman, A., Robert, C., Tackett, J.L. (2017). Abandon statistical significance. arXiv:1709.07588.
- Mejia, A., Yue, Y.R., Bolin, D., Lindren, F., Lindquist, M.A. (2017). A Bayesian general linear modeling approach to cortical surface fMRI data analysis. arXiv:1706.00959.
- Nichols, T.E., & Holmes, A.P. (2001). Nonparametric permutation tests for functional neuroimaging: a primer with examples. Human Brain Mapping, 15(1), 1–25.Google Scholar
- Olszowy, W., Aston, J., Rua, C., Williams, G.B. (2017). Accurate autocorrelation modeling substantially improves fMRI reliability. arXiv:1711.09877.
- R Core Team. (2017). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
- Saad, Z.S., Reynolds, R.C., Argall, B., Japee, S., Cox, R.W. (2004). SUMA: an interface for surface-based intra- and inter-subject analysis with AFNI. In Proceedings of the 2004 IEEE International Symposium on Biomedical Imaging (pp. 1510–1513).Google Scholar
- Schaefer, A., Kong, R., Gordon, E.M., Zuo, X.N., Holmes, A.J., Eickhoff, S.B., Yeo, B.T. (2017). Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cerebral Cortex. In press.Google Scholar
- Stan Development Team. (2017). Stan modeling language users guide and reference manual, Version 2.17.0. http://mc-stan.org.
- Wasserstein, R.L., & Lazar, N.A. (2016). The ASA’s statement on p-values: context, process, and purpose. The American Statistician 70, 2, 129–133.Google Scholar
- Vehtari, A., Gelman, A., Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432.Google Scholar
- Wickham, H. (2009). Ggplot2: elegant graphics for data analysis. New York: Springer.Google Scholar
- Xiao, Y., Geng, F., Riggins, T., Chen, G., Redcay, E. (2018). Neural correlates of developing theory of mind competence in early childhood. Under review.Google Scholar