Symptoms account for over half of all outpatient visits1 and are associated with substantial impairments in health-related quality of life, work-related disability, and increased healthcare costs.1,2,3 Further, symptoms that are unexplained, multiple, or persistent lead to mutual patient and clinician dissatisfaction.4, 5 Nonetheless, symptoms have been underemphasized in research and clinician training, thereby leading to suboptimal recognition and management in patient care.6

The SPADE pentad (sleep problems, pain, anxiety, depression, and low energy/fatigue) is especially important for several reasons. First, the five SPADE symptoms are the most prevalent, chronic, disabling, and undertreated symptoms in both the general population7, 8 and clinical practice.2, 8,9,10 Second, they cause additive impairment and adversely affect treatment response of one another.2, 11,12,13 Third, the SPADE symptoms are ubiquitous across most medical and mental disorders. Fourth, these symptoms commonly cluster3, 14,15,16,17,18,19,20,21,22,23 so that clinically unbundling the SPADE cluster is both difficult and perhaps counterproductive.

Interest is building in incorporating patient-reported outcome measures (PROs) into clinical practice12, 24,25,26 based upon the untested assumption that providing this information to clinicians and patients will change outcomes. Moreover, a number of PRO initiatives have occurred in specialty clinics which focus on a narrower range of diseases and outcomes. In contrast, the primary care clinician is responsible for managing all or most of a patient’s acute and chronic conditions, and therefore is particularly challenged27 in deciding how many and which PROs to administer. The PROMIS (patient-reported outcome measurement information system) measures are an extensively tested set of public domain PROs, and SPADE symptoms constitute 5 of the 7 domains assessed by the PROMIS 29-, 43-, and 57-item profiles (www.healthmeasures.net). The objective of this randomized clinical trial was to assess the effectiveness of providing PROMIS symptom scores to primary care clinicians on patient outcomes.

METHODS

Study Design and Participants

In this prospective, two-arm randomized clinical trial, patients were recruited from March 2015 through April 2016 from urban academic primary care clinics in which both faculty and residents provide care. Upon checking in for their clinic visit, patients were asked to complete a five-item symptom screener adapted from the MD Anderson Symptom Inventory28 rating the severity of SPADE symptoms on a 0 to 10 scale. Patients were eligible if they were ≥ 18 years old and English-speaking, received care from a participating primary care clinician, and reported a severity score ≥ 4 for at least one SPADE symptom. The study was approved by Indiana University’s institutional review board.

Randomization

After providing informed consent and completing the PROMIS measures on a touch-screen tablet, participants were allocated to the feedback or control group in randomly alternating computer-generated blocks of 2 and 4. Randomization occurred at the level of the patient in order to control for clinician factors likely to influence symptom evaluation and management.

Interventions

For patients randomized to the feedback arm, their clinician was provided, just before the encounter, a printed bar graph of PROMIS symptom scores (Fig. 1). The PROMIS numeric scores for all five SPADE symptoms were specified on the graph, and elevated scores (T-scores ≥ 55) were further highlighted by including threshold lines and making symptom bars that crossed the threshold line red.29 Patients randomized to the control group completed the same study measures as the feedback group, but scores were not provided to their clinician.

Fig. 1
figure 1

Visual display of PROMIS symptom scores provided to physicians in feedback group.

Outcomes

The PROMIS30 profile-29 includes 4-item scales for 7 domains; 5 of these domains were used for this study—sleep, pain, anxiety, depression, and fatigue—as they reflected the SPADE symptoms. Each PROMIS scale provides a raw score, ranging from 4 to 20. Raw scores can be converted to T-scores using the PROMIS conversion tables. A T-score of 50 on each PROMIS symptom scale represents the general population norm (i.e., mean), and each 10-point deviation represents one standard deviation (SD) from the population norm. A cut point of ≥ 55 was used to represent a clinically elevated symptom score as this is 0.5 SD worse than the population mean, which is traditionally considered a moderate effect size.31

The enrollment clinic visit note from the electronic health records was reviewed to assess clinical documentation of SPADE symptoms and SPADE-specific diagnostic and treatment actions. Coding criteria were adapted from previous chart review studies of symptoms,10, 32 and study team members were trained in use of the coding criteria. Every clinic note was independently coded by two study team members who were blinded to study group. Coding disagreements were arbitrated by a study investigator (KK).

Three months after the enrollment visit, participants completed a follow-up survey, selecting either a mailed or web-based version. Non-respondents were contacted up to five times to complete the survey by telephone. In addition to completing the PROMIS symptom scales, participants were asked to recall whether they had discussed any of the SPADE symptoms with their clinician during the enrollment visit (as well as reasons for not discussing) and whether they had received treatment for any of the symptoms. They were also asked if they currently desired treatment or a change in treatment for any of the SPADE symptoms. Satisfaction with the care of their symptoms was rated from 1 (excellent) to 5 (poor).33

Statistical analysis

The trial was powered to detect a small to moderate effect size of 0.35 (T-score of 3.5 points on individual PROMIS scales and approximately 2.8 points on composite score). This required 131 patients per study group at an alpha = 0.05 and beta = 0.20 (power of 80%) or allowing for 10% attrition by 3 months, 146 per study group.

The primary hypothesis was that change in the composite PROMIS T-score from baseline to 3 months would be greater in the feedback group than in the control group. Multiple imputation was used to impute PROMIS scores for participants not completing the 3-month assessment. Secondarily, complete cases and within-group changes were analyzed, as well as changes in the five individual symptom scores. All analyses were intent-to-treat (as randomized).

All-subsets multivariate regression analysis was used to explore whether certain patient factors (age, sex, race, education, number of comorbid medical conditions, and primary care discipline [internal medicine vs. family medicine]) predicted symptom improvement, adjusting for study arm and baseline symptom severity.

RESULTS

Study Participants

Of 419 patients screened in the clinic, 374 (89%) screened positive for at least 1 of the 5 SPADE symptoms (Fig. 2). Symptom screening scores did not significantly differ between the 30 eligible patients who declined, 44 who were interested but unable to complete enrollment, and 300 who enrolled in the trial (n = 300). A total of 75 primary care clinicians (22 staff physicians, 2 nurse practitioners; 51 residents) had patients enrolled in the study, and of these, 61 received feedback on at least 1 patient.

Fig. 2
figure 2

CONSORT diagram showing participant flow.

The feedback and control groups were similar at baseline (Table 1). Average age of the sample was 49.4 years with 72% women and a similar proportion of white (45.0%) and African-American (49.3%) patients. The mean composite PROMIS T-score was 58.3. Participants typically had multiple SPADE symptoms; the proportion with 0, 1, 2, 3, 4, and 5 clinically significant symptoms (T-score ≥ 55) was 5, 11, 13, 18, 21, and 31%, respectively.

Table 1 Baseline Characteristics of Patients Enrolled in SPADE Trial

Symptom Outcomes

Follow-up data was collected from 256 (85.3%) of the study participants. Compared to participants with follow-up data, the 44 participants without follow-up data were younger (41.6 vs. 50.7 years, P < 0.001) but were otherwise similar with regard to recruitment site, sex, race, education and baseline PROMIS composite T-score.

As shown in Table 2, participants demonstrated significant small to moderate within-group T-score improvements for each of the individual symptoms as well as the composite T-score, with effect sizes in imputed analyses ranging from 0.17 to 0.52. Although feedback participants reported slightly greater within-group improvement than the control group (3.48 vs. 2.38 decrease in PROMIS composite T-score), the between-group difference of 1.1 (effect size = 0.16) was not significant (P = 0.17). Likewise, between-group differences were not significant for any of the five individual symptom T-scores. Results of complete case analyses were similar.

Table 2 Symptom Outcomes at 3 Months in Imputed (n = 300) and Complete (n = 256) Cases

Multivariate analysis showed that independent predictors of improvement in the SPADE composite T-score at 3 months were female sex (1.7 points greater improvement in T-score, P = .036), black race (2.5 points greater improvement, P < .001), fewer than 2 comorbid medical diseases (2.5 points greater improvement, P = .001), and having a family medicine provider (1.9 points greater improvement, P = .013). Age and education were not significant predictors.

Symptoms were more likely to persist than resolve (online Appendix, eTable 1). Of the 256 patients with 3-month follow-up data who had a threshold-level symptom at baseline, persistence at 3 months was 78% (157/201) for pain, 76% (139/182) for anxiety, 70% (105/149) for depression, 65% (101/156) for fatigue, and 56% (86/154) for sleep problems; thus, less than one third (254/842) of symptoms resolved. Of patients without a given symptom at baseline, the 3-month incidence was 5% for pain, 7% for anxiety and sleep problems, and 9% for depression and fatigue.

Symptom Documentation and Symptom-Specific Clinician Actions

Baseline visit notes were available to review for 292 patients, of which 26 (9%) were new patient visits and 266 (91%) were patients previously seen by the primary care clinician. In the feedback group, PROMIS scores were directly mentioned in only 1 of 147 notes. Patients with threshold-level PROMIS T-scores (i.e., ≥ 55) were more likely to have SPADE symptoms documented in the medical record (Fig. 3). However, even threshold-level symptom documentation varied substantially by symptom type, ranging from 81% for pain to 16% for fatigue. Overall, threshold-level, non-pain SPADE symptoms were documented < 50% of the time. Documentation rates did not differ between feedback and control groups.

Fig. 3
figure 3

Symptom documentation in clinic visit note.

SPADE symptom-specific clinician actions are summarized in eTable 2 (online Appendix). Since patients often had multiple SPADE symptoms, the actions shown in the table are for any SPADE symptom. The most common clinician actions were medication for 65.7% of study participants, another type of treatment (e.g., education) for 35.3%, and specialty referrals for 28.1%. With the exception of one category (diagnostic tests other than laboratory tests or imaging), clinician actions did not differ between the feedback and control groups. Medication prescriptions and referrals (but not other clinician actions) increased with symptom burden.

Patient-Reported Discussion and Treatment of SPADE Symptoms

At 3-month follow up, patients reported whether they had discussed symptoms and received treatment at the baseline clinic visit (online Appendix, eTable 3). The level of clinician action (not discussed vs. discussed but not treated vs. treated) increased with symptom severity whether measured as the mean symptom T-score or as a threshold-level symptom (T-score ≥ 55). There were no differences, however, between feedback and control group patients. The proportion of threshold-level symptoms not discussed was lowest for pain (12%), intermediate for sleep and fatigue (22% each), and highest for depression (35%) and anxiety (36%). The level of patient-reported clinician action was not associated with patient demographics, medical comorbidity, specialty (internal medicine vs. family medicine), or overall satisfaction with symptom care.

Reasons for not discussing the symptom were provided by 140 patients. The most common perceived reasons were more pressing medical issues to discuss (n = 68; 49%) or the patient did not need (n = 66; 47%) or want (n = 40; 29%) treatment, followed by the doctor not bringing the symptom up (n = 30; 21%), the patient (n = 22; 16%) or doctor (n = 13; 9%) not feeling comfortable talking about the symptom, or the doctor seeming too busy (n = 10; 7%).

Table 3 shows the proportion of patients who still desired treatment for symptoms at 3-month follow-up which ranged from 23% for depression to 40% for pain. Patients who still desired treatment had more severe symptoms at 3 months as measured by either the mean symptom T-score or a threshold-level (T-score ≥ 55) symptom, less improvement in their symptom from baseline to 3-month follow-up, lower satisfaction with their overall symptom care, and greater medical comorbidity (latter not shown in table). Desire for treatment did not differ between feedback and control groups, and also was not associated with patient demographics or primary care specialty.

Table 3 Patient-Reported Desire for SPADE Treatment at 3-Month Follow-up (n = 256)

Treatment Satisfaction

Overall satisfaction with symptom care was rated as excellent by 18% of participants, very good by 24%, good by 32%, fair by 19%, and poor by 8%. Satisfaction did not differ between study groups. However, participants who still desired treatment for their symptoms at 3 months were less likely to rate their satisfaction as excellent or very good (Table 3).

DISCUSSION

Our trial has several important implications for the real-world implementation of symptom measures in clinical practice. First, simple feedback of PROMIS symptom scores to primary care clinicians was inadequate to significantly enhance symptom improvement at 3-month follow-up. A minimal clinically important change in PROMIS T-scores is generally in the 2 to 4 point range34,35,36 which corresponds to the within-group changes in both study arms, but not the between-group difference in our trial. Second, SPADE symptoms other than pain were infrequently documented in the clinician’s note. Third, a substantial proportion of patients reported persistent symptoms at follow-up for which they desired treatment.

Our findings that feedback alone was insufficient to improve symptom outcomes is consistent with multiple trials showing that the provision of additional information to primary care clinicians in a busy setting with many competing demands—without also providing additional time or resources—is relatively ineffective.37 This phenomenon has been best demonstrated for depression,38, 39 and several studies have shown that simply providing pain or anxiety scores to clinicians does not change outcomes.40,41,42 To our knowledge, the effect of feedback regarding fatigue or sleep problems has not been previously studied. Research suggesting feedback of symptom scores may be beneficial have largely demonstrated improved processes of care (e.g., documentation of symptoms, discussions with patients, treatment actions) rather than symptom outcomes and, where outcomes have improved, this has occurred predominantly in specialty settings (e.g., cancer centers, palliative care) with additional clinical team members and extra patient contacts.37, 43,44,45,46,47,48,49 The movement to implement PROs into clinical practice and electronic health records24, 50, 51 may have limited impact unless simultaneous consideration is given to the systems support necessary to facilitate clinical actions, monitor outcomes, and adjust treatment.39 However, the lack of systems support may not be the only explanation for our study findings. It is also possible that the type or number of symptoms chosen made clinical actions or symptom improvement more challenging, that the method of feedback used was suboptimal, or that PRO feedback was not particularly conducive (or necessary) to the primary care setting in which the intervention was implemented.

Most patients had more than one threshold-level SPADE symptom. The fact that multiple symptoms is the norm was also found in a trial involving 250 primary care patients with chronic pain in which the proportion with 0, 1, 2, 3, 4, and 5 SPADE symptoms was 10, 20, 16, 23, 12, and 20%, respectively.14 Admittedly, selection bias might play some role in that eligibility for our study required that patients screen positive for at least one symptom. Still, of the 419 patients screened for our trial, only 11% did not screen positive for at least 1 symptom, suggesting study participants were not a highly selected sample. Also, other studies have shown that patients reporting one symptom typically have other symptoms as well.6

Despite the prevalence of symptoms, documentation of threshold-level symptoms (i.e., T-score ≥ 55) in the visit note was only 20–41% for the four non-pain SPADE symptoms, suggesting substantial limitations in using EHR data from unstructured clinical notes for the secondary purposes of symptoms research or quality improvement. Under-documentation may be due to the time constraints and competing demands of primary care, as well as the lack of incentives for evaluating and managing symptoms. Also, patients frequently noted that symptoms were not discussed because there were more pressing issues or they did not want treatment. Finally, PROs may detect a higher frequency of symptoms (including less bothersome symptoms) than symptoms spontaneously reported by patients.1

The decision about which symptoms warrant treatment must weigh symptom severity, availability of evidence-based therapies, patient and provider prioritization of symptoms, and treatment preferences. Optimal treatment for the SPADE symptoms, particularly when chronic, typically includes non-pharmacological therapies (e.g., cognitive-behavioral therapy, exercise, mindfulness-based treatments) rather than medications alone.6 However, several obstacles exist to broader implementation of these treatments, including an insufficient number of healthcare professionals trained in these non-pharmacological therapies, reimbursement barriers, and motivating patients to engage in these treatments. Moreover, even if such treatments had been provided, the 3-month follow-up assessment used in our trial may have been an inadequate period of time for patients to receive a sufficient intensity and duration of non-pharmacological therapy to experience optimal symptomatic improvement.

Symptoms present at a threshold-level at baseline persisted in half to three-quarters of patients at 3-month follow-up, and patients frequently still desired treatment. This suggests that symptom severity and persistence coupled with patient expectations5, 52, 53 might be one approach to balancing overtreatment vs. patient-centered treatment of common symptoms. Other factors influencing management might include whether the symptom is secondary to another medical condition or treatment, the presence of competing health concerns, the relative role of clinical judgment vs. PRO scores in determining clinician actions, and the option of watchful waiting to distinguish persistent from self-limited symptoms.47 Shared decision-making between the clinician and patient is core to navigating these factors.54

A study strength in terms of generalizability was the relatively balanced distribution of patients among the two principal disciplines providing primary care for adults: general internal medicine and family medicine. Second, the patient sample had a good distribution of age, race, and medical comorbidity. Third, the participation rate among eligible patients was reasonably high, minimizing refusal as a major source of selection bias.

Several study limitations should be noted. Three-month follow-up data could not be obtained for 14.6% of the study participants. However, multiple imputation using the full sample of 300 participants and analysis of the 256 complete cases produced similar results. Second, secondary outcomes assessed by patient report at 3 months or by chart review are susceptible to recall or rater bias, respectively. The latter, however, was reduced by rater training, explicit coding criteria, independent review of all notes by two raters, and rater blinding to study group. Third, 61 clinicians received feedback on one or more of the 151 patients in the feedback arm, meaning that most physicians received feedback on only a few patients in the trial. Receiving symptom feedback on more patients over a longer period of time might lead to greater attention to SPADE symptoms. Fourth, the trial was conducted in academic clinics staffed by both faculty and residents who were providing care to an underserved population, and findings should be replicated more broadly.

Diagnostic testing and procedures are unnecessary for the majority of patients with SPADE symptoms; instead, the history and physical examination coupled with communication strategies are more effective for symptom evaluation and management.6 Realigning incentives to enable more patient-centered approaches has the potential of improving symptom outcomes at lower cost. Making information from PROs readily actionable through sufficient training, time, and resources may be critical to the effective use of PROs by practicing clinicians.55 At the same time, determining which PROs are valued by clinicians and patients, the optimal frequency of assessment and provision of results, and in which setting PROs can improve symptom outcomes are all appropriate steps prior to widespread PRO implementation.