Eating disorders affect upwards of 30 million people worldwide and often go undertreated and underdiagnosed. The purpose of this systematic review and meta-analysis was to evaluate the diagnostic accuracy of the Sick, Control, One, Fat and Food (SCOFF) questionnaire for DSM-5 eating disorders in the general population.
The Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) were followed. A PubMed search was conducted among peer-reviewed articles. Information regarding validation of the SCOFF was required for inclusion. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool.
The final analysis included 25 studies. The validity of the SCOFF was high across samples with a pooled sensitivity of 0.86 (95% CI, 0.78–0.91) and specificity of 0.83 (95% CI, 0.77–0.88). Subgroup analyses were conducted to examine the impact of methodology, study quality, and clinical characteristics on diagnostic accuracy. Studies with the highest sensitivity tended to be case-control studies of young women with anorexia nervosa (AN) and bulimia nervosa (BN). Studies which included more men, included those diagnosed with binge eating disorder, and recruited from large community samples tended to have lower sensitivity. Few studies reported on BMI and race/ethnicity; thus, subgroups for these factors could not be examined. No studies used reference standards which assessed all DSM-5 eating disorders.
This meta-analysis of 25 validation studies demonstrates that the SCOFF is a simple and useful screening tool for young women at risk for AN and BN. However, there is not enough evidence to support utilizing the SCOFF for screening for the range of DSM-5 eating disorders in primary care and community-based settings. Further examination of the validity of the SCOFF or development of a new screening tool, or multiple tools, to screen for the range of DSM-5 eating disorders heterogenous populations is warranted.
This study is registered online with PROSPERO (CRD42018089906).
Eating disorders effect upwards of 30 million people and carry with them significant morbidity and mortality.1 Effective screening for eating disorders is critical as these disorders are commonly underdiagnosed and undertreated.1,2,3 The 5-item SCOFF (Sick, Control, One, Fat and Food; see Fig. 1) questionnaire, developed in 1999 by Morgan and colleagues, is the most widely used screening measure for eating disorders. With the inclusion of binge eating disorder and other specified eating disorders (i.e., atypical anorexia, low frequency or limited duration bulimia nervosa and binge eating disorder, purging disorder, night eating syndrome) in DSM-5,4 it has become increasingly important to expand awareness of various types of eating pathology. Of particular importance, these new categories of eating disorders had not yet been defined at the time that the SCOFF was developed.
The changing landscape of diagnostic eating disorder categories since the publication of DSM-5 highlights the importance of ensuring screening tools are appropriate for detecting the full range of eating disorders in the general population. To date, the SCOFF has been the recommended screening tool across numerous validation studies; however, these recommendations have not been systematically assessed. The purpose of this systematic review and meta-analysis is to evaluate whether the SCOFF can appropriately screen patients in the general population for the full range of eating pathology currently represented in DSM-5. To accomplish this, the literature was reviewed for studies that report the diagnostic test characteristics of the SCOFF.
The Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) guideline was followed in preparing this systematic review.5 This study is registered online with PROSPERO (CRD42018089906) and all search strategies and methods were determined before the onset of the study.
Search Strategy and Study Selection
We conducted a systematic literature search using PubMed from database inception through March 13, 2018. The search terms were “SCOFF” and “Feeding and Eating Disorders/Diagnosis” AND (“Psychometrics” OR “Sensitivity and Specificity”). Other search terms, such as “SCOFF questionnaire,” “Eating disorders screening,” and “Feeding and Eating Disorder/Diagnosis” AND “screening,” were attempted but revealed overlapping or extraneous results. The word “SCOFF” was searched for in all text of the Cochrane database in addition to these searches as this included the most comprehensive results in that database. Two reviewers (AMK and AGM) independently screened all abstracts generated from the subject search. Inclusion criteria specified that studies were published in English or were available in translation to English. To be included, it was required that validation information for the SCOFF could be derived from articles and included some specific demographic information (i.e., age range or standard deviation, gender, eating disorder diagnosis). The two independent reviewers (AMK and AGM) had high inter-rater agreement (Κ = 1) for exclusion of articles.
Data Extraction and Quality Assessment
The reviewers (AMK and AGM) used a standardized data collection form to extract data on date of publication, country in which the study was conducted, recruitment method, reference measure utilized, sample size, age, gender of sample, and race/ethnicity of the sample, participants’ average BMI and weight category, and percentage of sample with eating disorder diagnoses.
The reviewers (AMK and AGM) independently assessed study quality of all included studies using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool.6 Differences in assessment were resolved by consensus and inter-rater agreement was very high (K = 1). The original QUADAS-2 tool includes 4 domains: patient selection, index test(s), reference standard, and flow and timing. Given that there was little meaningful variation in the index test (i.e., the SCOFF) questions or administration, the index test domain was dropped from ratings. Specific rating criteria for each domain are presented in Appendix 1.
Data Synthesis and Analysis
Statistical measures of test performance (true positive, false positive, true negative, and false negative) were extracted from individual studies. Extracted test performance data can be found in Appendix 2. When sensitivity (true positive rate) and specificity (true negative rate) were reported separately for different eating disorder diagnoses or by gender, frequencies of the statistical measures for test performance were summed using data available in the manuscript. In two instances,7, 8 this data was not readily available or additional information from the authors was needed and corresponding authors on these manuscripts were contacted. Contacted authors provided data to calculate frequencies of the total sample (M. Tseng, personal communication, May 2018; S. Maguen, personal communication, November 2018). For all studies, statistical heterogeneity was estimated using the I2 statistic. Statistical heterogeneity provides an estimate of the amount of variance that is attributable to variability between studies. To account for variability across studies, subgroup analyses were conducted. Subgroups were prespecified based on study methodology (case study vs non-case study; type of reference standard used—interview vs questionnaire), study quality based on QUADAS-2 ratings, and patient characteristics (gender, age, sample type, and location). Statistical analyses were performed using STATA version 14.2 (StataCorp, College Station, TX). The meta-analytical integration of diagnostic accuracy studies (MIDAS) command was used to obtain figures and descriptive summaries and conduct subgroup analyses.9
Literature Search and Study Selection
A total of 984 abstracts were identified through the included databases and three were identified through bibliographies (Fig. 2). After 47 duplicates were removed, all titles or abstracts were reviewed for relevance. Following initial review, 882 records were excluded leaving 58 full-text articles for full review. Of these 58 articles, 33 were excluded for the following reasons: no validation information available, no reference standard included, article and data not available in English, and article representing re-publication of prior data or commentary on data.
Table 1 depicts the characteristics of included studies.7, 8, 10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32 The 25 studies reviewed included a total of 11,531 individuals. Thirteen unique countries were represented with 16 studies conducted in Europe, four in North America, three in Asia, and two in South America. Samples were recruited from three primary locations: medical settings (primary care clinics, specialty clinics), schools (grade school, high school, and universities), and the general community. Ages across studies ranged from 10 to 95, with the majority (n = 18) of studies including primarily adult samples and seven being conducted in entirely adolescent or young adult populations. Twelve studies included an entirely female sample and four additional studies included samples which were at least 70% female. The percentage of females included in the remaining eight studies ranged from 46.2 to 68%. Thirteen studies utilized interview format (SCID-I, CIDI interview, DSM-IV interview, EDE) for the reference standard. The remaining sixteen studies utilized various self-report measures including the EDI-3, the EDE-Q, the EAT-26, the Q-EDD, and an ICD-10 symptom rating scale. Eighteen studies reported on the percent of the sample which was diagnosed with an eating disorder based on the criterion used. The range of any eating disorder diagnosis, which included anorexia nervosa, bulimia nervosa, binge eating disorder, and eating disorder not otherwise specified, was 1.2 to 64.3%. The range of the sample that had each eating disorder diagnosis was as follows: anorexia nervosa = 0 to 32.1%, bulimia nervosa = 0 to 23.1%, eating disorder not otherwise specified = 0.4 to 46.3%, binge eating disorder = 0.4 to 11.6%. Sixteen studies explicitly reported on the percent of individuals with anorexia and bulimia. Eleven studies reported on the percent of the sample meeting criteria for eating disorder not otherwise specified and six reported on binge eating disorder. Aside from binge eating disorder, no studies explicitly examined validity for any of the other newly included eating disorders in the DSM-5.
Other demographic characteristics were not frequently reported across all studies and thus are not included in Table 1. Only four studies reported any information about race or ethnicity, and samples tended to be primarily Caucasian (57.2 to 87.9%). Studies also did not frequently include information about BMI. The six studies reporting average BMI found that it ranged from 21.98 to 28.1. Of these, four studies included samples with an average BMI in the normal range (i.e., between 18.5 and 24.9).
Summary index scores for the QUADAS-2 are depicted in Table 2. Risk of bias and applicability concerns were rated as low risk/concern (depicted as “+” signs in the table), high risk/concern (represented as “-” signs in the table), and unknown risk/concern (depicted as “?” signs in the table). Only two studies were rated as low across all risk of bias and applicability concern domains. Risk of bias was high within patient selection across five studies. Four of these studies used a case-control design while one recruited an at-risk sample as opposed to utilizing a random or consecutive sample. Risk of bias was high in four studies in the flow and timing domain due to the SCOFF and reference standard being administered at different times (i.e., not sequentially) or if there was any ambiguity about questionnaires being completed at the same time (such as would be present in surveys that were mailed). In general, risk of bias was low across studies in the reference standard domain. Applicability concerns were most prevalent in the patient selection domain, with 14 studies having high applicability concerns. This was often because the sample utilized was restricted demographically (e.g., only included females). Applicability concerns were high across five studies for the reference standard. In these studies, certain subgroups of patients were not given the reference standard or were excluded for unknown reasons.
Diagnostic accuracy rates for each study are depicted in the forest plot in Figure 3 and the receiver operating curve (SROC) in Figure 4. Pooled sensitivity was 0.86 (95% CI, 0.78–0.91) and specificity was 0.83 (95% CI, 0.77–0.88). The area under the curve (AUC) was 0.91 (95% CI, 0.88–0.93). Heterogeneity was statistically significant for sensitivity (I2 = 97.63; 95% CI, 97.17–98.09) and specificity (I2 = 98.22; 95% CI, 97.91–98.54). Differences in study methodology or clinical characteristics of the sample may result in elevated heterogeneity. Heterogeneity may also be elevated due to different thresholds utilized across studies to define cases. This was not the case for the SCOFF as the threshold effect was not significant (r = − 0.21; p = 0.32).
In order to address the significant heterogeneity found across studies, subgroup analyses were conducted to examine the impact of methodological (i.e., case-control vs non-case-control; interview vs questionnaire reference standard) and clinical characteristics (age, gender, location, and diagnosis) on diagnostic accuracy. Table 3 presents pooled sensitivity, specificity, and heterogeneity values for each subgroup. The diagnostic accuracy of the SCOFF was higher in case-control studies (p < 0.01), when an interview was used as a reference standard as opposed to a questionnaire (p = 0.05) and when the percentage of women in the sample was larger than the percentage of men (p < 0.01). Additionally, diagnostic accuracy was higher when risk of bias was high for patient selection (p < 0.01). Sensitivity and specificity were lower in studies which included individuals diagnosed with BED; however, this difference was not significant (p = 0.22). Of note, subgroup analysis did not explain the high overall heterogeneity of the included studies as all subgroups had an I2 value of greater than 60%.
The likelihood ratio scattergram (Fig. 5) shows the distribution of positive and negative likelihood ratios. The pooled positive likelihood ratio of 5.0 (95% CI, 3.6–6.8) suggests that the SCOFF is moderately helpful in detecting eating disorders. The negative likelihood ratio of 0.17 (95% CI, 0.11–0.27) suggests that the SCOFF is moderately helpful in ruling out the presence of an eating disorder.33, 34
We conducted a meta-analysis of 25 validation studies on the SCOFF to determine whether this screen is a valid tool for identifying eating disorders in diverse settings and populations. Our in-depth examination of the SCOFF calls into question the effectiveness of this tool for eating disorder screening in primary care and community settings, with diverse populations, and with the full range of DSM-5 eating disorder diagnoses. This examination provides a critical context given that we also found in our study, as was found in a previous meta-analysis,35 that the SCOFF is an effective tool for identifying the presence of particular eating disorders (i.e., AN and BN) in the population for which it was initially developed (i.e., young women with eating disorder symptoms).
The purpose of screening is to capture the range of pathological eating and identify cases that might not be identified by other means. The SCOFF was originally developed and subsequently validated several times using case-control study designs. Case-control studies dramatically limit samples to a specific target population (i.e., cases and matched controls) and do not capture the diversity and range of disorders in the general population. Additionally, validity data from case-control studies may artificially inflate the efficacy of screening measures and lead to erroneous conclusions about the utility of the measure in the general population.6 As expected, our analyses revealed that the highest levels of sensitivity were found in case-control studies including young women diagnosed with AN and BN. These findings are important as higher rates of sensitivity and specificity in case-control samples highlight that when patients are at risk for AN and BN, the SCOFF is a highly robust screening measure. Conversely, studies with lower sensitivity rates were primarily recruited from community samples and included the highest reported rates of BED. Sensitivity was also lower in locations where rates of obesity tend to be higher (e.g., North America).
Comparing demographic variables across studies shows that while the SCOFF has been validated numerous times since its development, it is often validated in samples highly similar to the population in which it was initially validated (i.e., young women with AN and BN). In fact, of the 25 studies reviewed, more than half utilized a predominately or entirely female sample. Of the studies that did include males, only three were conducted using adults. In addition, many studies did not report on important demographic and clinical characteristics including certain eating disorder diagnoses (e.g., BED), race, and BMI. Of those that did report on these characteristics, there was evidence that samples utilized in these validation studies often did not reflect the racial and weight diversity seen across DSM-5 eating disorders outside of AN and BN. In addition, only six studies explicitly examined the efficacy of the SCOFF for identifying BED and none examined efficacy for any of the other specified eating disorders in DSM-5. Reflecting the lack of demographic variability in the samples across the 25 studies, applicability concerns were high in many studies on the QUADAS-2 risk of bias tool in the patient selection domain. Given these high applicability concerns, it is difficult to make conclusions about the appropriateness of using the SCOFF to screen for eating disorders with the exception of young women at risk for AN and BN.
Compared with a prior systematic review on this topic conducted by Botella and colleagues,35 the present systematic review provides a more in-depth and comprehensive analysis and, most importantly, includes an assessment of the quality of included studies. Additionally, ten new validation studies had been published and were included in our analysis for a total of 25 validation studies. As per PRISMA-DTA guidelines, this review also includes subgroup analyses. There were, however, several limitations to the current review. First, the literature search was limited to PubMed and Cochrane Library databases. Other databases were referenced in conducting initial searches; however, they were not included in the final search. Systematic reviews should include a range of databases as part of the final, systematic search strategy in general so that any possible articles are captured. The search was also limited to articles written or translated into English. These search limitations could have resulted in missing articles which might otherwise be included. With this being stated, there were no articles that the reviewers encountered that included validation of the SCOFF questionnaire and were inaccessible in English. This review was also limited to examining the validity of the SCOFF. A more comprehensive review of all eating disorder screening measures might have provided additional information regarding eating disorder screening; however, the SCOFF is the screening measure with the most extensive validity data and is frequently used in clinical practice. Another limitation was that we were unable to conduct subgroup analyses for other potentially relevant clinical characteristics (e.g., BMI, race, and ethnicity) as these variables were infrequently reported in the validation studies.
The current review was conducted to address concerns about the use of the SCOFF as a primary care screener for DSM-5 eating disorders, including BED and other specified eating disorders. Findings revealed that the psychometrics of the SCOFF are virtually unknown for the full range of DSM-5 eating disorder diagnoses and for diverse populations. The present review suggests that the SCOFF is a highly sensitive screening measure for young women at risk for AN and BN but analyses and quality assessment of studies raised concerns about the generalizability and reliability of these results for other eating disorder diagnoses. Currently, there is insufficient evidence to recommend the use of the SCOFF for large-scale screening in primary care and diverse community settings. This review identifies the need for the development of a new screening tool, or multiple tools, for validation for the full range of DSM-5 eating disorder diagnoses in heterogenous samples.
Hudson JI, et al. The prevalence and correlates of eating disorders in the National Comorbidity Survey Replication. Biol Psychiatry. 2007;61(3):348-358.
Ogg EC, et al. General practice consultation patterns preceding diagnosis of eating disorders. Int J Eat Disord. 1997;22(1): 89-93.
Strother E, et al. Eating disorders in men: underdiagnosed, undertreated, and misunderstood. Eat Disord. 2012;20(5):346-355.
Association, A.P Diagnostic and statistical manual of mental disorders (DSM-5®). American Psychiatric Pub; 2013.
McInnes MD, et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA. 2018;319(4):p. 388-396.
Whiting PF, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529-536.
Liu C-Y, et al. Sex difference in using the SCOFF questionnaire to identify eating disorder patients at a psychiatric outpatient clinic. Compr Psychiatry. 2015;57:160-166.
Maguen S, et al. Screen for Disordered Eating: Improving the accuracy of eating disorder screening in primary care. Gen Hosp Psychiatry. 2018;50:20-25.
Dwamena B., MIDAS: Stata module for meta-analytical integration of diagnostic test accuracy studies. 2009.
Aoun A, et al. Validation of the Arabic version of the SCOFF questionnaire for the screening of eating disorders. East Mediterr Health J. 2015;21(5).
Berger U, et al. Screening of disordered eating in 12-Year-old girls and boys: psychometric analysis of the German versions of SCOFF and EAT-26. Psychother Psychosom Med Psychol. 2011;61(7):311-318.
Caamaño F, et al. Validation of SCOFF questionnaire among preteenagers. BMJ. 2002.
Cotton MA, Ball C, Robinson P. Four simple questions can help screen for eating disorders. J Gen Intern Med. 2003;18(1):53-56.
Garcia FD, et al. Validation of the French version of SCOFF questionnaire for screening of eating disorders among adults. World J Biol Psychiatry. 2010;11(7):888-893.
Garcia FD, et al. Detection of eating disorders in patients: validity and reliability of the French version of the SCOFF questionnaire. Clin Nutr. 2011;30(2):178-181.
Garcia-Campayo J, et al. Validation of the Spanish version of the SCOFF questionnaire for the screening of eating disorders in primary care. J Psychosom Res. 2005;59(2):51-55.
Lähteenmäki S, et al. Validation of the Finnish version of the SCOFF questionnaire among young adults aged 20 to 35 years. BMC Psychiatry. 2009;9(1):5.
Leung SF, et al. Psychometric properties of the SCOFF questionnaire (Chinese version) for screening eating disorders in Hong Kong secondary school students: a cross-sectional study. Int J Nurs Stud. 2009;46(2):239-247.
Lichtenstein MB, Hemmingsen SD, Støving RK. Identification of eating disorder symptoms in Danish adolescents with the SCOFF Questionnaire. Nordic J Psychiatry. 2017;71(5):340-347.
Luck AJ, et al. The SCOFF questionnaire and clinical interview for eating disorders in general practice: comparative study. BMJ. 2002;325(7367):755-756.
Mond JM, et al. Screening for eating disorders in primary care: EDE-Q versus SCOFF. Behav Res Ther. 2008;46(5):612-622.
Morgan JF, Reid F, Lacey JH., The SCOFF questionnaire: assessment of a new screening tool for eating disorders. BMJ. 1999;319(7223):1467-1468.
Muro-Sans P., Amador-Campos JA, Morgan JF. The SCOFF-c: Psychometric properties of the Catalan version in a Spanish adolescent sample. J Psychosom Res. 2008;64(1):81-86.
Pannocchia L, et al. A psychometric exploration of an Italian translation of the SCOFF questionnaire. Eur Eat Disord Rev. 2011;19(4):371-373.
Parker SC, Lyons J., Bonner J. Eating disorders in graduate students: exploring the SCOFF questionnaire as a simple screening tool. J Am Coll Heal. 2005;54(2):103-107.
Richter F, et al. Screening disordered eating in a representative sample of the German population: Usefulness and psychometric properties of the German SCOFF questionnaire. Eat Behav. 2017;25:81-88.
Rueda GE, et al. Validación de la encuesta SCOFF para tamizaje de trastornos de la conducta alimentaria en mujeres universitarias. Biomédica. 2005;25(2):196-202.
Rueda GJ, et al. Validation of the SCOFF questionnaire for screening the eating behaviour disorders of adolescents in school. Aten Primaria. 2005;35(2):89-94.
Sanchez-Armass O, et al. Validation of the SCOFF questionnaire for screening of eating disorders among Mexican university students. Eat Weight Disord. 2017;22(1):153-160.
Siervo M, et al. Application of the SCOFF, Eating Attitude Test 26 (EAT 26) and Eating Inventory (TFEQ) Questionnaires in young women seeking diet-therapy. Eat Weight Disord. 2005;10(2):76-82.
Solmi F, et al. Validation of the SCOFF questionnaire for eating disorders in a multiethnic general population sample. Int J Eat Disord. 2015;48(3):312-316.
Wahida WMZW, Lai PSM, Hadi HA. Validity and reliability of the english version of the sick, control, one stone, fat, food (SCOFF) in Malaysia. Clin Nutr ESPEN. 2017;18:55-58.
McGee S. Simplifying likelihood ratios. J Gen Intern Med. 2002;17(8):647-650.
Wilson MC, Henderson MC, Smetana GW. Chapter 5. Evidence-Based Clinical Decision Making. In: Henderson MC, Tierney LM, Jr., Smetana GW. eds. The Patient History: An Evidence-Based Approach to Differential Diagnosis. New York: McGraw-Hill; 2012.
Botella J, et al. A meta-analysis of the diagnostic accuracy of the SCOFF. Span J Psychol. 2013;16:E92.
This project was supported in part by the VA’s Heath Services Research and Development (CIN 13-407) (HSR&D) Center of Innovation (COIN) Pain Research, Informatics, Multi-morbidities, and Education (PRIME) Center, West Haven, CT.
Conflict of Interest
The authors declare that they do not have a conflict of interest.
The content of this research is solely the responsibility of the authors and does not necessarily represent the official views of the VA or the Veterans Health Administration.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Kutz, A.M., Marsh, A.G., Gunderson, C.G. et al. Eating Disorder Screening: a Systematic Review and Meta-analysis of Diagnostic Test Characteristics of the SCOFF. J GEN INTERN MED 35, 885–893 (2020). https://doi.org/10.1007/s11606-019-05478-6
- eating disorders
- systematic review
- diagnostic test accuracy