Getting serious about test–retest reliability: a critique of retest research and some recommendations
To focus attention on the need for rigorous and carefully designed test–retest reliability assessments for new patient-reported outcomes and to encourage retest researchers to be thoughtful, ambitious, and creative in their retest efforts.
The paper outlines key challenges that confront retest researchers, calls attention to some limitations in meeting those challenges, and describes some strategies to improve retest research.
Modest retest coefficients are often reported as acceptable, and many important decisions—such as the retest interval—appear not to be evidence-based. Retest assessments are seldom undertaken before a measure has been finalized, which rules out using retest data to select strong, reproducible items.
Strategies for improving retest research include seeking input from patients or experts regarding the stability of the construct to support decisions about the retest interval, analyzing item-level retest data to identify items to revise or discard, establishing a priori standards of acceptability for reliability coefficients, using large, heterogeneous, and representative retest samples and collecting follow-up data to better understand consistent and inconsistent responses over time.
KeywordsCOSMIN Instrument development Measurement Patient-reported outcome Psychometrics Reliability Test–retest reliability
- 2.Mokkink, L. B., Terwee, C., Patrick, D., Alonso, J., Stratford, P., Knol, D. L., et al. (2010). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of Clinical Epidemiology, 63, 737–745.PubMedCrossRefGoogle Scholar
- 3.DeVellis, R. F. (2012). Scale development: Theory and application (3rd ed.). Thousand Oaks, CA: Sage.Google Scholar
- 6.U. S. Food and Drug Administration. (2009). Guidance for industry, patient-reported outcome measures: Use in medical product development to support labeling claims. Washington, DC: U. S. Department of Health and Human Services.Google Scholar
- 7.Polit, D. F., & Yang, F. (2014). Measurement and the measurement of change: A primer for health professionals. Philadelphia: Lippincott Williams & Wilkins.Google Scholar
- 9.Nunnally, J., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.Google Scholar
- 11.Simon, A. E., Forbes, L., Boniface, D., Warburton, F., Brain, K., Dessaix, A., et al. (2012). An international measure of awareness and beliefs about cancer: Development and testing of the ABC. BMJ Open, 2(6). doi: 10.1136/bmjopen-2012-001758.
- 21.Willis, G. B. (2005). Cognitive interviewing. Thousand Oaks, CA: Sage.Google Scholar
- 29.Terwee, C. B., Mokkink, L. B., Knol, D. L., Ostelo, R., Bouter, L. M., & DeVet, H. C. W. (2012). Rating the methodological quality in systematic reviews of studies on measurement properties: A scoring system for the COSMIN checklist. Quality of Life Research, 21, 651–657.PubMedCentralPubMedCrossRefGoogle Scholar