Reliability Studies and Surveys

  • Kelsey L. Wise
  • Brandon J. Kelly
  • Michael L. Knudsen
  • Jeffrey A. MacalenaEmail author


Instruments that are valuable in clinical research yield similar results when used by different users, in different settings, and at different times. Reliability studies test the reproducibility of instruments by examining the relationship between the predicted distribution of measurements, the actual measurement distribution, and the resulting measurement error. Understanding the study types and common statistical measures is imperative when conducting or appraising reliability studies.

Surveys are useful tools in orthopedic research for obtaining information on the views and practices of large populations in an efficient and cost-friendly manner. An intentional and organized approach to survey design and administration can assist to maximize response rate, thus decreasing noncompliance, bias, and increasing generalizability of the results.


Reliability Reproducibility Intraobserver Interobserver Alternate form Test-retest Internal consistency Kappa coefficient Intraclass correlation coefficient Survey 


  1. 1.
    Aday LA, Cornelius LJ. Designing and conducting health surveys: a comprehensive guide. 3rd ed. San Francisco: Jossey-Bass; 2006.Google Scholar
  2. 2.
    Adler J, Parmryd I. Quantifying colocalization by correlation: the Pearson correlation coefficient is superior to the Mander’s overlap coefficient. Cytometry A. 2010;77(8):733–42.CrossRefGoogle Scholar
  3. 3.
    Asch DA, Christakis NA, Ubel PA. Conducting physician mail surveys on a limited budget. A randomized trial comparing $2 bill versus $5 bill incentives. Med Care. 1998;36(1):95–9.CrossRefGoogle Scholar
  4. 4.
    Audigé L, Bhandari M, Kellam J. How reliable are reliability studies of fracture classifications? A systematic review of their methodologies. Acta Orthop Scand. 2004;75(2):184–94.CrossRefGoogle Scholar
  5. 5.
    Avery DM, Matullo KS. Distal radial traction radiographs: interobserver and intraobserver reliability compared with computed tomography. J Bone Joint Surg Am. 2014;96(7):582–8.CrossRefGoogle Scholar
  6. 6.
    Baron G, De Wals P, Milord F. Cost-effectiveness of a lottery for increasing physicians’ responses to a mail survey. Eval Health Prof. 2001;24(1):47–52.CrossRefGoogle Scholar
  7. 7.
    Bergk V, Gasse C, Schnell R, Haefeli WE. Mail surveys: obsolescent model or valuable instrument in general practice research? Swiss Med Wkly. 2005;135(13–14):189–91.PubMedGoogle Scholar
  8. 8.
    Bhandari M, Devereaux PJ, Swiontkowski MF, Schemitsch EH, Shankardass K, Sprague S, et al. A randomized trial of opinion leader endorsement in a survey of orthopaedic surgeons: effect on primary response rates. Int J Epidemiol. 2003;32(4):634–6.CrossRefGoogle Scholar
  9. 9.
    Braithwaite D, Emery J, De Lusignan S, Sutton S. Using the Internet to conduct surveys of health professionals: a valid alternative? Fam Pract. 2003;20(5):545–51.CrossRefGoogle Scholar
  10. 10.
    Bruinsma WE, Guitton TG, Warner JJP, Ring D, Science of Variation Group. Interobserver reliability of classification and characterization of proximal humeral fractures: a comparison of two and three-dimensional CT. J Bone Joint Surg Am. 2013;95(17):1600–4.CrossRefGoogle Scholar
  11. 11.
    Burns KEA, Duffett M, Kho ME, Meade MO, Adhikari NKJ, Sinuff T, et al. A guide for the design and conduct of self-administered surveys of clinicians. CMAJ. 2008;179(3):245–52.CrossRefGoogle Scholar
  12. 12.
    Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull. 1968;70(4):213–20.CrossRefGoogle Scholar
  13. 13.
    Corona J, Sanders JO, Luhmann SJ, Diab M, Vitale MG. Reliability of radiographic measures for infantile idiopathic scoliosis. J Bone Joint Surg Am. 2012;94(12):e86.CrossRefGoogle Scholar
  14. 14.
    Duffett M, Burns KE, Adhikari NK, Arnold DM, Lauzier F, Kho ME, et al. Quality of reporting of surveys in critical care journals: a methodologic review. Crit Care Med. 2012;40(2):441–9.CrossRefGoogle Scholar
  15. 15.
    Fischbacher C, Chappel D, Edwards R, Summerton N. Health surveys via the Internet: quick and dirty or rapid and robust? J R Soc Med. 2000;93(7):356–9.CrossRefGoogle Scholar
  16. 16.
    Fisher R. Statistical methods for research workers. 5th ed. Edinburgh: Oliver and Boyd Ltd.; 1925.Google Scholar
  17. 17.
    Gaumétou E, Quijano S, Ilharreborde B, Presedo A, Thoreux P, Mazda K, et al. EOS analysis of lower extremity segmental torsion in children and young adults. Orthop Traumatol Surg Res. 2014;100(1):147–51.CrossRefGoogle Scholar
  18. 18.
    Giraudeau B, Mary JY. Planning a reproducibility study: how many subjects and how many replicates per subject for an expected width of the 95 per cent confidence interval of the intraclass correlation coefficient. Stat Med. 2001;20(21):3205–14.CrossRefGoogle Scholar
  19. 19.
    Hocking JS, Lim MSC, Read T, Hellard M. Postal surveys of physicians gave superior response rates over telephone interviews in a randomized trial. J Clin Epidemiol. 2006;59(5):521–4.CrossRefGoogle Scholar
  20. 20.
    Jepson C, Asch DA, Hershey JC, Ubel PA. In a mailed physician survey, questionnaire length had a threshold effect on response rate. J Clin Epidemiol. 2005;58(1):103–5.CrossRefGoogle Scholar
  21. 21.
    Jones D, Story D, Clavisi O, Jones R, Peyton P. An introductory guide to survey research in anaesthesia. Anaesth Intensive Care. 2006;34(2):245–53.PubMedGoogle Scholar
  22. 22.
    Karanicolas PJ, Bhandari M, Kreder H, Moroni A, Richardson M, Walter SD, et al. Evaluating agreement: conducting a reliability study. J Bone Joint Surg Am. 2009;91(Suppl 3):99–106.CrossRefGoogle Scholar
  23. 23.
    Lee KM, Chung CY, Park MS, Lee SH, Cho JH, Choi IH. Reliability and validity of radiographic measurements in hindfoot varus and valgus. J Bone Joint Surg Am. 2010;92(13):2319–27.CrossRefGoogle Scholar
  24. 24.
    Lee KM, Lee J, Chung CY, Ahn S, Sung KH, Kim TW, et al. Pitfalls and important issues in testing reliability using intraclass correlation coefficients in orthopaedic research. Clin Orthop Surg. 2012;4(2):149–55.CrossRefGoogle Scholar
  25. 25.
    Leece P, Bhandari M, Sprague S, Swiontkowski MF, Schemitsch EH, Tornetta P, et al. Internet versus mailed questionnaires: a controlled comparison (2). J Med Internet Res. 2004;6(4):e39.CrossRefGoogle Scholar
  26. 26.
    Litwin MS. How to measure survey reliability and validity. In: Litwin MS, editor. How to measure survey reliability and validity. Thousand Oaks: Sage; 1995. p. 5–32.CrossRefGoogle Scholar
  27. 27.
    Mailey SK. Increasing your response rate for mail survey data collection. SCI Nurs. 2002;19(2):78–9.PubMedGoogle Scholar
  28. 28.
    Mavis BE, Brocato JJ. Postal surveys versus electronic mail surveys. The tortoise and the hare revisited. Eval Health Prof. 1998;21(3):395–408.CrossRefGoogle Scholar
  29. 29.
    McMahon SR, Iwamoto M, Massoudi MS, Yusuf HR, Stevenson JM, David F, et al. Comparison of e-mail, fax, and postal surveys of pediatricians. Pediatrics. 2003;111(4 Pt 1):e299–303.CrossRefGoogle Scholar
  30. 30.
    McPeake J, Bateson M, O’Neill A. Electronic surveys: how to maximise success. Nurse Res. 2014;21(3):24–6.CrossRefGoogle Scholar
  31. 31.
    Nakash RA, Hutton JL, Jørstad-Stein EC, Gates S, Lamb SE. Maximising response to postal questionnaires—a systematic review of randomised trials in health research. BMC Med Res Methodol. 2006;6:5.CrossRefGoogle Scholar
  32. 32.
    Pappas N, Lawrence JT, Donegan D, Ganley T, Flynn JM. Intraobserver and interobserver agreement in the measurement of displaced humeral medial epicondyle fractures in children. J Bone Joint Surg Am. 2010;92(2):322–7.CrossRefGoogle Scholar
  33. 33.
    Passmore C, Dobbie AE, Parchman M, Tysinger J. Guidelines for constructing a survey. Fam Med. 2002;34(4):281–6.PubMedGoogle Scholar
  34. 34.
    Penson DF, Wei JT. Clinical research methods for surgeons. 1st ed. Totowa: Humana Press; 2006.Google Scholar
  35. 35.
    Pomerantz ML, Glaser D, Doan J, Kumar S, Edmonds EW. Three-dimensional biplanar radiography as a new means of accessing femoral version: a comparative study of EOS three-dimensional radiography versus computed tomography. Skelet Radiol. 2015;44(2):255–60.CrossRefGoogle Scholar
  36. 36.
    Richards BS, Sucato DJ, Konigsberg DE, Ouellet JA. Comparison of reliability between the Lenke and King classification systems for adolescent idiopathic scoliosis using radiographs that were not premeasured. Spine (Phila Pa 1976). 2003;28(11):1148–56; discussion 1156–7.Google Scholar
  37. 37.
    Roberts LM, Wilson S, Roalfe A, Bridge P. A randomised controlled trial to determine the effect on response of including a lottery incentive in health surveys [ISRCTN32203485]. BMC Health Serv Res. 2004;4(1):30.CrossRefGoogle Scholar
  38. 38.
    Rubenfeld GD. Surveys: an introduction. Respir Care. 2004;49(10):1181–5.PubMedGoogle Scholar
  39. 39.
    Schleyer TK, Forrest JL. Methods for the design and administration of web-based surveys. J Am Med Inform Assoc. 2000;7(4):416–25.CrossRefGoogle Scholar
  40. 40.
    Shiono PH, Klebanoff MA. The effect of two mailing strategies on the response to a survey of physicians. Am J Epidemiol. 1991;134(5):539–42.CrossRefGoogle Scholar
  41. 41.
    Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–8.CrossRefGoogle Scholar
  42. 42.
    Sierles FS. How to do research with self-administered surveys. Acad Psychiatry. 2003;27(2):104–13.CrossRefGoogle Scholar
  43. 43.
    Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther. 2005;85(3):257–68.PubMedGoogle Scholar
  44. 44.
    Sprague S, Quigley L, Bhandari M. Survey design in orthopaedic surgery: getting surgeons to respond. J Bone Joint Surg Am. 2009;91(Suppl 3):27–34.CrossRefGoogle Scholar
  45. 45.
    Sudman S. Applied sampling. In: Rossi PH, Wright JD, Anderson AB, editors. Handbook of survey research. San Diego: Elsevier; 1983. p. 145–94.CrossRefGoogle Scholar
  46. 46.
    Thelen P, Delin C, Folinais D, Radier C. Evaluation of a new low-dose biplanar system to assess lower-limb alignment in 3D: a phantom study. Skelet Radiol. 2012;41(10):1287–93.CrossRefGoogle Scholar
  47. 47.
    VanDenKerkhof EG, Parlow JL, Goldstein DH, Milne B. In Canada, anesthesiologists are less likely to respond to an electronic, compared to a paper questionnaire. Can J Anaesth. 2004;51(5):449–54.CrossRefGoogle Scholar
  48. 48.
    Walter SD, Eliasziw M, Donner A. Sample size and optimal designs for reliability studies. Stat Med. 1998;17(1):101–10.CrossRefGoogle Scholar
  49. 49.
    Wright RW, MARS Group. Osteoarthritis classification scales: interobserver reliability and arthroscopic correlation. J Bone Joint Surg Am. 2014;96(14):1145–51.CrossRefGoogle Scholar
  50. 50.
    Zelnio RN. Data collection techniques: mail questionnaires. Am J Hosp Pharm. 1980;37(8):1113–9.PubMedGoogle Scholar

Copyright information

© ISAKOS 2019

Authors and Affiliations

  • Kelsey L. Wise
    • 1
  • Brandon J. Kelly
    • 1
  • Michael L. Knudsen
    • 1
  • Jeffrey A. Macalena
    • 1
    Email author
  1. 1.Department of Orthopaedic SurgeryUniversity of MinnesotaMinneapolisUSA

Personalised recommendations