Brain Imaging and Behavior

, Volume 13, Issue 5, pp 1281–1291 | Cite as

Identifying errors in Freesurfer automated skull stripping and the incremental utility of manual intervention

  • Abigail B. WatersEmail author
  • Ryan A. Mace
  • Kayle S. Sawyer
  • David A. Gansler
Original Research


Quality assurance (QA) is vital for ensuring the integrity of processed neuroimaging data for use in clinical neurosciences research. Manual QA (visual inspection) of processed brains for cortical surface reconstruction errors is resource-intensive, particularly with large datasets. Several semi-automated QA tools use quantitative detection of subjects for editing based on outlier brain regions. There were two project goals: (1) evaluate the assumption that statistical outliers are related to errors of cortical extension, and (2) examine whether error identification and correction significantly impacts estimation of cortical parameters and established brain-behavior relationships. T1 MPRAGE images (N = 530) of healthy adults were obtained from the NKI-Rockland Sample and reconstructed using Freesurfer 5.3. Visual inspection of T1 images was conducted for: (1) participants (n = 110) with outlier values (z scores ±3 SD) for subcortical and cortical segmentation volumes (outlier group), and (2) a random sample of remaining participants (n = 110) with segmentation values that did not meet the outlier criterion (non-outlier group). The outlier group had 21% more participants with visual inspection-identified errors than participants in the non-outlier group, with a medium effect size (Φ = 0.22). Nevertheless, a considerable portion of images with errors of cortical extension were found in the non-outlier group (41%). Although nine brain regions significantly changed size from pre- to post-editing (with effect sizes ranging from 0.26 to 0.59), editing did not substantially change the correlations of neurocognitive tasks and brain volumes (ps > 0.05). Statistically-based QA, although less resource intensive, is not accurate enough to supplant visual inspection. We discuss practical implications of our findings to guide resource allocation decisions for image processing.


Quality assurance Automated segmentation statistics Reconstruction error Freesurfer 



The authors would like to acknowledge the following people and organizations for their contributions:

The editor, Dr. Andrew Saykin, and the three anonymous reviewers for their thorough review, which has strengthened the quality of this manuscript.

Douglas Greve at the MGH/HST Athinoula A. Martinos Center for Biomedical Imaging for his comments and consultation.

The NKI-Rockland Sample Initiative for providing the data used in these analyses (data collection funded through NIMH BRAINS R01MH094639-01).

The Suffolk University Psychology Department for their support of doctoral students and David Gansler’s Lab, and the contributions of undergraduate students Ms. Paige Kawai and Ms. Leah Pedersen.

Compliance with ethical standards

Conflicts of interest

Abigail B. Waters, Ryan A. Mace, Kayle S. Sawyer, and David A. Gansler declare that they have no conflicting interests. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Informed consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, and the applicable revisions at the time of the investigation. Informed consent was obtained from all patients for being included in the study.

Supplementary material

11682_2018_9951_MOESM1_ESM.docx (94 kb)
ESM 1 (DOCX 94 kb)


  1. Ahmed, B., Brodley, C. E., Blackmon, K. E., Kuzniecky, R., Barash, G., Carlson, C., & Thesen, T. (2015). Cortical feature analysis and machine learning improves detection of “MRI-negative” focal cortical dysplasia. Epilepsy & Behavior, 48, 21–28.CrossRefGoogle Scholar
  2. Bremner, J. D., Vythilingam, M., Vermetten, E., Vaccarino, V., & Charney, D. S. (2004). Deficits in hippocampal and anterior cingulate functioning during verbal declarative memory encoding in midlife major depression. American Journal of Psychiatry, 161(4), 637–645.CrossRefGoogle Scholar
  3. Chen, X., Liang, S., Pu, W., Song, Y., Mwansisya, T. E., Yang, Q., & Xue, Z. (2015). Reduced cortical thickness in right Heschl’s gyrus associated with auditory verbal hallucinations severity in first-episode schizophrenia. BMC Psychiatry, 15(1), 152.CrossRefGoogle Scholar
  4. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hilsdale (p. 2). NJ: Lawrence Earlbaum Associates.Google Scholar
  5. Collins, D. L. (1994). 3D Model-based segmentation of individual brain structures from magnetic resonance imaging data (Doctoral dissertation, McGill University).Google Scholar
  6. Delis, D. C., Kaplan, E., & Kramer, J. H. (2001). Delis-Kaplan executive function system (D-KEFS). Psychological Corporation.Google Scholar
  7. Desikan, R. S., Segonne, F., Fischl, B., Quinn, B. T., Dickerson, B. C., Blacker, D., Buckner, R. L., Dale, A. M., Maguire, R. P., & Hyman, B. T. (2006). An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage, 31(3), 968–980.CrossRefGoogle Scholar
  8. Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., & Dale, A. M. (2002). Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain. Neuron, 33(3), 341–355.CrossRefGoogle Scholar
  9. Fischl, B., van der Kouwe, A., Destrieux, C., Halgren, E., Segonne, F., Salat, D. H., Busa, E., Seidman, L. J., Goldstein, J., Kennedy, D., Caviness, V., Makris, N., Rosen, B., & Dale, A. M. (2004). Automatically parcellating the human cerebral cortex. Cerebral Cortex, 14(1), 11–22.CrossRefGoogle Scholar
  10. Gur, R. C., Richard, J., Hughett, P., Calkins, M. E., Macy, L., Bilker, W. B., & Gur, R. E. (2010). A cognitive neuroscience-based computerized battery for efficient measurement of individual differences: Standardization and initial construct validation. Journal of Neuroscience Methods, 187(2), 254–262.CrossRefGoogle Scholar
  11. Kaufmann, L. K., Baur, V., Hänggi, J., Jäncke, L., Piccirelli, M., Kollias, S., & Milos, G. (2017). Fornix under water? Ventricular enlargement biases Forniceal diffusion magnetic resonance imaging indices in anorexia nervosa. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 2(5), 430–437.Google Scholar
  12. Keshavan, A., Datta, E., McDonough, I., Madan, C. R., Jordan, K., & Henry, R. G. (2017). Mindcontrol: A web application for brain segmentation quality control. NeuroImage.Google Scholar
  13. Kessler, R. C., Chiu, W. T., Demler, O., & Walters, E. E. (2005). Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the National Comorbidity Survey Replication. Archives of General Psychiatry, 62(6), 617–627.CrossRefGoogle Scholar
  14. Koh, D., Lee, S., Pacheco, J., Pappu, V., & Vinke, L. (2017). January 19. In Freesurfer QA tools Retrieved from Scholar
  15. Lenroot, R. K., Gogtay, N., Greenstein, D. K., Wells, E. M., Wallace, G. L., Clasen, L. S., et al. (2007). Sexual dimorphism of brain developmental trajectories during childhood and adolescence. Neuroimage, 36(4), 1065–1073.CrossRefGoogle Scholar
  16. Li, H., Smith, S. M., Gruber, S. A., Lukas, S. E., Silveri, M. M., Hill, K. P., ... & Nickerson, L. D. (2018). Combining Multi-Site/Multi-Study MRI Data: Linked-ICA Denoising for Removing Scanner and Site Variability from Multimodal MRI Data. bioRxiv, 337576.Google Scholar
  17. Lichy, M. P., Wietek, B. M., Mugler III, J. P., Horger, W., Menzel, M. I., Anastasiadis, A., et al. (2005). Magnetic resonance imaging of the body trunk using a single-slab, 3-dimensional, T2-weighted turbo-spin-echo sequence with high sampling efficiency (SPACE) for high spatial resolution imaging: Initial clinical experiences. Investigative Radiology, 40(12), 754–760.CrossRefGoogle Scholar
  18. McCarthy, C. S., Ramprashad, A., Thompson, C., Botti, J. A., Coman, I. L., & Kates, W. R. (2015). A comparison of FreeSurfer-generated data with and without manual intervention. Frontiers in Neuroscience, 9.Google Scholar
  19. Meng, X. L., Rosenthal, R., & Rubin, D. B. (1992). Comparing correlated correlation coefficients. Psychological Bulletin, 111(1), 172–175.Google Scholar
  20. Nooner, K. B., Colcombe, S., Tobe, R., Mennes, M., Benedict, M., Moreno, A., & Sikka, S. (2012). The NKI-Rockland sample: A model for accelerating the pace of discovery science in psychiatry. Frontiers in Neuroscience, 6, 152.CrossRefGoogle Scholar
  21. Phan, K. L., Wager, T., Taylor, S. F., & Liberzon, I. (2002). Functional neuroanatomy of emotion: A meta-analysis of emotion activation studies in PET and fMRI. NeuroImage, 16(2), 331–348.CrossRefGoogle Scholar
  22. Roalf, D. R., Ruparel, K, Gur, R. E., Bilker, W., Gerraty, R., Elliott, M. A., Sean Gallagher, R., Almasy, L., Pogue-Geile, M. F., Prasad, K., Wood, J., Nimgaonkar, V. L., Gur, R. C., (2014) Neuroimaging predictors of cognitive performance across a standardized neurocognitive battery. Neuropsychology 28 (2):161–176.CrossRefGoogle Scholar
  23. Savalia, N. K., Agres, P. F., & Wig, G. S. (2015). Processing & editing overview. The Center for Vital Longevity.Google Scholar
  24. Seidman, L. J., Valera, E. M., & Makris, N. (2005). Structural brain imaging of attention-deficit/hyperactivity disorder. Biological Psychiatry, 57(11), 1263–1272.CrossRefGoogle Scholar
  25. U.S. Census Bureau. (2009). Census data. US Department of Health and Human Services. D.C.: Washington.Google Scholar
  26. Van Petten, C. (2004). Relationship between hippocampal volume and memory ability in healthy individuals across the lifespan: Review and meta-analysis. Neuropsychologia, 42(10), 1394–1413.CrossRefGoogle Scholar
  27. Viviani, R., Pracht, E. D., Brenner, D., Beschoner, P., Stingl, J. C., & Stöcker, T. (2017). Multimodal MEMPRAGE, FLAIR, and R2 * segmentation to resolve dura and vessels from cortical gray matter. Frontiers in Neuroscience, 11.Google Scholar
  28. Wechsler, D. (2011). WASI-II: Wechsler abbreviated scale of intelligence--. Psychological Corporation.Google Scholar
  29. Yeo, B. T. T., Krienen, F. M., Eickhoff, S. B., Yaakub, S. N., Fox, P. T., Buckner, R. L., Asplund, C. L., & Chee, M. W. L. (2015). Functional specialization and flexibility in human association cortex. Cerebral Cortex, 25, 3654–3672.CrossRefGoogle Scholar
  30. Yuan, P., & Raz, N. (2014). Prefrontal cortex and executive functions in healthy adults: A meta-analysis of structural neuroimaging studies. Neuroscience & Biobehavioral Reviews, 42, 180–192.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of PsychologySuffolk UniversityBostonUSA
  2. 2.Department of Anatomy & NeurobiologyBoston University School of MedicineBostonUSA
  3. 3.VA Boston Healthcare SystemBostonUSA
  4. 4.Athinoula A. Martinos Center for Biomedical ImagingMassachusetts General Hospital, Harvard Medical SchoolCharlestownUSA
  5. 5.Sawyer Scientific, LLCBostonUSA

Personalised recommendations