Quality of Life Research

, Volume 29, Issue 1, pp 213–221 | Cite as

A comparison of computer adaptive tests (CATs) and short forms in terms of accuracy and number of items administrated using PROMIS profile

  • Eisuke SegawaEmail author
  • Benjamin Schalet
  • David Cella



In the Patient-Reported Outcomes Measurement Information System (PROMIS), seven domains (Physical Function, Anxiety, Depression, Fatigue, Sleep Disturbance, Social Function, and Pain Interference) are packaged together as profiles. Each of these domains can also be assessed using computer adaptive tests (CATs) or short forms (SFs) of varying length (e.g., 4, 6, and 8 items). We compared the accuracy and number of items administrated of CAT versus each SF.


PROMIS instruments are scored using item response theory (IRT) with graded response model and reported as T scores (mean = 50, SD = 10). We simulated 10,000 subjects from the normal distribution with mean 60 for symptom scales and 40 for function scales, and standard deviation 10 in each domain. We considered a subject’s score to be accurate when the standard error (SE) was less than 3.0. We recorded range of accurate scores (accurate range) and the number of items administrated.


The average number of items administrated in CAT was 4.7 across all domains. The accurate range was wider for CAT compared to all SFs in each domain. CAT was notably better at extending the accurate range into very poor health for Fatigue, Physical Function, and Pain Interference. Most SFs provided reasonably wide accurate range.


Relative to SFs, CATs provided the widest accurate range, with slightly more items than SF4 and less than SF6 and SF8. Most SFs, especially longer ones, provided reasonably wide accurate range.


Computer adaptive testing (CAT) Short form PROMIS Item response theory 



This study was funded by National Institutes of Health (U2CCA186878, Recipient David Cella).

Compliance with ethical standards

Conflict of interest

Dr. Cella is an unpaid board member of the PROMIS Health Organization (PHO). He declares no other conflict of interest. Eisuke Segawa declares that he has no conflict of interest. Benjamin David Schalet declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. 1.
    Ahmed, S., Berzon, R. A., Revicki, D. A., et al. (2012). The use of patient-reported outcomes (PRO) within comparative effectiveness research: Implications for clinical practice and health care policy. Medical Care,50(12), 1060–1070.PubMedGoogle Scholar
  2. 2.
    Cella, D., Riley, W., Stone, A., et al. (2010). The patient-reported outcomes measurement information system (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology,63(11), 1179–1194.PubMedPubMedCentralGoogle Scholar
  3. 3.
    Cella, D., Yount, S., Rothrock, N., et al. (2007). The patient-reported outcomes measurement information system (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Medical Care,45(5 Suppl 1), S3–S11.PubMedPubMedCentralGoogle Scholar
  4. 4.
    Gershon, R., Lai, J., Bode, R., et al. (2012). Neuro-QOL: Quality of life item banks for adults with neurological disorders: Item development and calibrations based upon clinical and general population testing. Quality of Life Research,21(3), 475–486.PubMedGoogle Scholar
  5. 5.
    Gershon, R. C., Bleck, T. P., & Nowinski, C. J. (2013). NIH toolbox for assessment of neurological and behavioral function. Neurology,80(11 Supplement 3), S2–S6.PubMedPubMedCentralGoogle Scholar
  6. 6.
    Choi, S., Reise, S., Pilkonis, P., Hays, R., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research,19(1), 125–136.PubMedGoogle Scholar
  7. 7.
    Flynn, K., Dew, M., Lin, L., et al. (2015). Reliability and construct validity of PROMIS® measures for patients with heart failure who undergo heart transplant. Quality of Life Research,24(11), 2591–2599.PubMedPubMedCentralGoogle Scholar
  8. 8.
    Northwestern University. HealthMeasures. (2018). Accessed October 5, 2019.
  9. 9.
    Lai, J. S., Cella, D., Choi, S. W., et al. (2011). How item banks and their application can influence measurement practice in rehabilitation medicine: A PROMIS Fatigue item bank example. Archives of Physical Medicine and Rehabilitation,92(10 Supplement), S20–S27.PubMedPubMedCentralGoogle Scholar
  10. 10.
    Amtmann, D., Cook, K. F., Jensen, M. P., et al. (2010). Development of a PROMIS item bank to measure pain interference. Pain,150(1), 173–182.PubMedPubMedCentralGoogle Scholar
  11. 11.
    Rose, M., Bjorner, J. B., Gandek, B., Bruce, B., Fries, J. F., & Ware, J. E., Jr. (2014). The PROMIS Physical Function item bank was calibrated to a standardized metric and shown to improve measurement efficiency. Journal of Clinical Epidemiology,67(5), 516–526.PubMedPubMedCentralGoogle Scholar
  12. 12.
    Choi, S. W., Victorson, D. E., Yount, S., Anton, S., & Cella, D. (2011). Development of a conceptual framework and calibrated item banks to measure patient-reported dyspnea severity and related functional limitations. Value Health.,14(2), 291–306.PubMedGoogle Scholar
  13. 13.
    Hahn, E. A., DeWalt, D. A., Bode, R. K., et al. (2014). New english and spanish social health measures will facilitate evaluating health determinants. Health Psychology,33(5), 490–499.PubMedPubMedCentralGoogle Scholar
  14. 14.
    Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., & Cella, D. (2011). Item banks for measuring emotional distress from the patient-reported outcomes measurement information system (PROMIS): Depression, anxiety, and anger. Assessment,18(3), 263–283.PubMedPubMedCentralGoogle Scholar
  15. 15.
    Cella D, Choi S, Schalet B, et al. (2018). PROMIS® Health Profiles: Efficient short-form measures of seven health domains. Value Health. Submitted.Google Scholar
  16. 16.
    Cella, D., Gershon, R., Lai, J.-S., & Choi, S. (2007). The future of outcomes measurement: Item banking, tailored short-forms, and computerized adaptive assessment. Quality of Life Research,16(Suppl 1), 133–141.PubMedGoogle Scholar
  17. 17.
    Cook, K. F., O’Malley, K. J., & Roddey, T. S. (2005). Dynamic assessment of health outcomes: Time to let the CAT out of the bag? Health Services Research,40(5 Pt 2), 1694–1711.PubMedPubMedCentralGoogle Scholar
  18. 18.
    Ware, J. E., Kosinski, M., & Dewey, J. E. (2000). How to score version 2 of the SF-36 health survey. Lincoln: QualityMetric.Google Scholar
  19. 19.
    Bjorner, J. B., Chang, C.-H., Thissen, D., & Reeve, B. B. (2007). Developing tailored instruments: Item banking and computerized adaptive assessment. Quality of Life Research,16(Suppl1), 95–108.PubMedGoogle Scholar
  20. 20.
    Thissen, D., Reeve, B. B., Bjorner, J. B., & Chang, C. H. (2007). Methodological issues for building item banks and computerized adaptive scales. Quality of Life Research,16(Suppl 1), 109–119.PubMedGoogle Scholar
  21. 21.
    Reeve, B. B., Hays, R. D., Bjorner, J. B., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care,45(5 Suppl 1), S22–S31.PubMedGoogle Scholar
  22. 22.
    Fayers, P. M. (2007). Applying item response theory and computer adaptive testing: The challenges for health outcomes assessment. Quality of Life Research,16(Suppl 1), 187–194.PubMedGoogle Scholar
  23. 23.
    Gardner, W., Shear, K., Kelleher, K. J., et al. (2004). Computerized adaptive measurement of depression: A simulation study. BMC Psychiatry.,4(1), 13.PubMedPubMedCentralGoogle Scholar
  24. 24.
    Fliege, H., Becker, J., Walter, O. B., Bjorner, J. B., Klapp, B. F., & Rose, M. (2005). Development of a computer-adaptive test for depression (D-CAT). Quality of Life Research,14(10), 2277–2291.PubMedGoogle Scholar
  25. 25.
    Gibbons, R. D., Weiss, D. J., Pilkonis, P. A., et al. (2012). Development of a computerized adaptive test for depression. Archives of General Psychiatry,69(11), 1104–1112.PubMedPubMedCentralGoogle Scholar
  26. 26.
    Gibbons, R. D., Weiss, D. J., Kupfer, D. J., et al. (2008). Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatric Services (Washington, D. C.),59(4), 361–368.Google Scholar
  27. 27.
    Gibbons, R. D., Weiss, D. J., Pilkonis, P. A., et al. (2014). Development of the CAT-ANX: A computerized adaptive test for anxiety. American Journal of Psychiatry,171(2), 187–194.PubMedGoogle Scholar
  28. 28.
    Eisen, S. V., Schultz, M. R., Ni, P., et al. (2016). Development and validation of a computerized-adaptive test for PTSD (P-CAT). Psychiatric Services (Washington, D. C.),67(10), 1116–1123.Google Scholar
  29. 29.
    Holman, R., Weisscher, N., Glas, C. A. W., et al. (2005). The academic medical center linear disability score (ALDS) item bank: Item response theory analysis in a mixed patient population. Health Qual Life Outcomes.,3, 83.PubMedPubMedCentralGoogle Scholar
  30. 30.
    Holman, R., Lindeboom, R., Vermeulen, M., & de Haan, R. J. (2004). The AMC linear disability score project in a population requiring residential care: Psychometric properties. Health Qual Life Outcomes.,2, 42.PubMedPubMedCentralGoogle Scholar
  31. 31.
    Dumas, H., Fragala-Pinkham, M., Haley, S., et al. (2010). Item bank development for a revised pediatric evaluation of disability inventory (PEDI). Phys Occup Ther Pediatr.,30(3), 168–184.PubMedPubMedCentralGoogle Scholar
  32. 32.
    Chakravarty, E. F., Bjorner, J. B., & Fries, J. F. (2007). Improving patient reported outcomes using item response theory and computerized adaptive testing. Journal of Rheumatology,34(6), 1426–1431.PubMedGoogle Scholar
  33. 33.
    Gibbons, R. D., Kupfer, D., Frank, E., Moore, T., Beiser, D. G., & Boudreaux, E. D. (2017). Development of a computerized adaptive test suicide scale-the CAT-SS. Journal of Clinical Psychiatry,78(9), 1376–1382.PubMedGoogle Scholar
  34. 34.
    Tulsky, D. S., Kisala, P. A., Victorson, D., et al. (2015). Overview of the spinal cord injury—quality of life (SCI-QOL) measurement system. Journal of Spinal Cord Medicine,38(3), 257–269.PubMedGoogle Scholar
  35. 35.
    Petersen, M. A., Aaronson, N. K., Arraras, J. I., et al. (2018). The EORTC CAT Core—the computer adaptive version of the EORTC QLQ-C30 questionnaire. European Journal of Cancer,100, 8–16.PubMedGoogle Scholar
  36. 36.
    Petersen, M. A., Gamper, E.-M., Costantini, A., et al. (2016). An emotional functioning item bank of 24 items for computerized adaptive testing (CAT) was established. Journal of Clinical Epidemiology,70, 90–100.PubMedGoogle Scholar
  37. 37.
    Dirven, L., Groenvold, M., Taphoorn, M. J. B., et al. (2017). Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients. Quality of Life Research,26(11), 2919–2929.PubMedPubMedCentralGoogle Scholar
  38. 38.
    Cella, D., Lai, J. S., Nowinski, C., et al. (2012). Neuro-QOL: Brief measures of health-related quality of life for clinical research in neurology. Neurology,78, 1860–1867.PubMedPubMedCentralGoogle Scholar
  39. 39.
    Kisala, P. A., Tulsky, D. S., Kalpakjian, C. Z., et al. (2015). Measuring anxiety after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Anxiety item bank and linkage with GAD-7. Journal of Spinal Cord Medicine,38(3), 315–325.PubMedGoogle Scholar
  40. 40.
    Kisala, P. A., Victorson, D., Pace, N., Heinemann, A. W., Choi, S. W., & Tulsky, D. S. (2015). Measuring psychological trauma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Psychological Trauma item bank and short form. Journal of Spinal Cord Medicine,38(3), 326–334.PubMedGoogle Scholar
  41. 41.
    Tulsky, D. S., Kisala, P. A., Kalpakjian, C. Z., et al. (2015). Measuring depression after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Depression item bank and linkage with PHQ-9. Journal of Spinal Cord Medicine,38(3), 335–346.PubMedGoogle Scholar
  42. 42.
    Lai, J.-S., Cella, D., Yanez, B., & Stone, A. (2014). Linking Fatigue measures on a common reporting metric. Journal of Pain and Symptom Management,48(4), 639–648.PubMedPubMedCentralGoogle Scholar
  43. 43.
    Varni, J. W., Magnus, B., Stucky, B. D., et al. (2014). Psychometric properties of the PROMIS (R) pediatric scales: Precision, stability, and comparison of different scoring and administration options. Quality of Life Research,23(4), 1233–1243.PubMedGoogle Scholar
  44. 44.
    Rose, M., Bjorner, J. B., Becker, J., Fries, J. F., & Ware, J. E. (2008). Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). Journal of Clinical Epidemiology,61(1), 17–33.PubMedGoogle Scholar
  45. 45.
    Buysse, D. J., Moul, D. E., Germain, A., et al. (2010). Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments. Sleep,33(6), 781–792.PubMedPubMedCentralGoogle Scholar
  46. 46.
    Hahn, E. A., Devellis, R. F., Bode, R. K., et al. (2010). Measuring social health in the patient-reported outcomes measurement information system (PROMIS): Item bank development and testing. Quality of Life Research,19(7), 1035–1044.PubMedPubMedCentralGoogle Scholar
  47. 47.
    Liu, H., Cella, D., Gershon, R., et al. (2010). Representativeness of the patient-reported outcomes measurement information system internet panel. Journal of Clinical Epidemiology,63(11), 1169–1178.PubMedPubMedCentralGoogle Scholar
  48. 48.
    Hansen, M., Cai, L., Stucky, B. D., Tucker, J. S., Shadel, W. G., & Edelen, M. O. (2014). Methodology for developing and evaluating the PROMIS® smoking item banks. Nicotine and Tobacco Research,16(Suppl 3), S175–S189.PubMedGoogle Scholar
  49. 49.
    Yu, L., Buysse, D. J., Germain, A., et al. (2011). Development of short forms from the PROMIS sleep disturbance and sleep-related impairment item banks. Behavioral Sleep Medicine,10(1), 6–24.PubMedPubMedCentralGoogle Scholar
  50. 50.
    Samejima F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No. 17. Richmond, VA: Psychometric Society. Accessed October 5, 2019.
  51. 51.
    De Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Publications.Google Scholar
  52. 52.
    Choi, S. W., & Swartz, R. J. (2009). Comparison of CAT item selection criteria for polytomous items. Applied Psychological Measurement,33(6), 419–440.PubMedPubMedCentralGoogle Scholar
  53. 53.
    Cella, D., Choi, S., Garcia, S., et al. (2014). Setting standards for severity of common symptoms in oncology using the PROMIS item banks and expert judgment. Quality of Life Research,23(10), 2651–2661.PubMedPubMedCentralGoogle Scholar
  54. 54.
    Cook, K. F., Schalet, B. D., Kallen, M., Rutsohn, J. P., & Cella, D. (2015). Establishing a common metric for self-reported pain: Linking BPI pain interference and SF-36 bodily pain subscale scores to the PROMIS pain interference metric. Quality of Life Research,24(10), 2305–2318.PubMedPubMedCentralGoogle Scholar
  55. 55.
    R: A language and environment for statistical computing [computer program]. Vienna, Austria: R Foundation for Statistical Computing; 2018.Google Scholar
  56. 56.
    Paap, M. C. S., Born, S., & Braeken, J. (2019). Measurement efficiency for fixed-precision multidimensional computerized adaptive tests: Comparing health measurement and educational testing using example banks. Applied Psychological Measurement,43(1), 68–83.PubMedGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.SK DataChicagoUSA
  2. 2.Department of Medical Social SciencesNorthwestern University Feinberg School of MedicineChicagoUSA

Personalised recommendations