Background

Chronic obstructive pulmonary disease (COPD) is one of the leading causes of death worldwide, and affects 9-10% of adults aged 40 years or older [1, 2]. It is a slowly progressive lung disease, characterized by chronic airway inflammation and not fully reversible airflow obstruction [3]. Cigarette smoking is the main cause of COPD in developed countries, and patients typically present with shortness of breath, chronic cough and/or excessive sputum production. In addition to a higher risk of mortality and morbidity, COPD is associated with a substantial economic burden of illness to the health care system [46].

The management of COPD is largely symptomatic, so patient-reported outcomes (PROs) that evaluate health-related quality of life (HRQL) are important to evaluating the treatment and management of COPD. As COPD progresses, poor symptom control and exacerbations can lead to limitations in functioning and impaired HRQL [7]. Disease-specific measures can provide insight into specific aspects of HRQL, while generic HRQL measures have the advantage of being able to compare across different patient populations but they may be less sensitive to changes in HRQL compared to disease-specific measures [8, 9].

Two generic HRQL measures, the Patient Reported Outcomes Measurement Information System (PROMIS) and EQ-5D-5L, were recently developed that have potential for broad use in evaluating COPD outcomes. PROMIS is a health measurement system designed for a wide variety of patient populations that utilizes banks of items belonging to specific domains of health [10]. The PROMIS item banks were derived using item response theory (IRT) and developed through a rigorous process of literature review, focus groups across multiple diseases and sites, cognitive assessments, and expert consultation. In addition, fixed-length short forms, including the PROMIS-43, were developed as an alternative computer-adaptive testing (CAT). These short forms, while not providing as precise measurement estimates as using CAT, cover core dimensions of health from the PROMIS. Another measure, the EQ-5D-5L, expanded the EQ-5D from 3 Levels to 5 Levels in order to potentially improve upon the properties of the standard 3-level EQ-5D by enhancing sensitivity and reducing ceiling effects [11].

As few studies have examined these recently developed measures in COPD, the aims of this study were: (1) to examine their psychometric properties in patients with COPD, and (2) to identify dimensions of HRQL that differ and do not differ by lung function.

Methods

Study design and subject recruitment

We conducted a secondary data analysis on COPD patients who participated in a multi-center cross-sectional study (NHLBI COPD Outcomes-based Network for Clinical Effectiveness & Research Translation [CONCERT], https://www.kpchr.org/concert/). The CONCERT investigators developed a COPD Data Warehouse (CDW), containing comprehensive information on more than 220,000 patients with some indication of a chronic respiratory condition between 2006 and 2010. Seven U.S. clinical centers were involved in patient recruitment: Kaiser Permanente Northwest Region, VA Puget Sound Health Care System, University of Chicago, University of Illinois Hospital and Health Sciences System, University of Washington, Baystate Medical Center, and University of North Carolina at Chapel Hill. Institutional Review Board approval was received at each individual site and at the Data Coordinating Center. A sample of patients from the CDW were recruited for in-person evaluations and a total of 1,206 patients (response rate 36%) participated and completed the evaluations including spirometry, six-minute walk test (6MWT), and extensive patient-reported outcomes (dyspnea scores, quality of life, etc) (see Additional file 1 for details on patient sampling and recruitment). Patients were excluded if they could not perform a spirometry test or could not participate due to cognitive impairment, frailty, acute illness, receiving hospice care or staying in long-term care facility, and issues related to geography, administration or communication.

For the present study, patients fulfilling all the following criteria were included: (1) having a diagnosis of COPD, defined based on the Global Initiative for Chronic Obstructive Lung Disease (GOLD) spirometric criteria—a post-bronchodilator FEV1/FVC (forced expiratory volume in 1 second/forced vital capacity) ratio <0.7 [3]; (2) ≥40 years of age (as COPD mainly affects people over the age of 40 and the disease develops over decades of exposure to inhaled particulates [12]); (3) with FEV1 percent predicted available to indicate COPD disease severity; and (4) completed both EQ-5D-5L and PROMIS questionnaires.

Measures

Clinical and HRQL measures were administered to every patient. Post-bronchodilator spirometry was utilized to assess the extent of airflow limitation. Severity of COPD was graded using percent predicted FEV1: GOLD 1 (mild) – FEV1 ≥ 80% predicted, GOLD 2 (moderate) – 50% ≤ FEV1 < 80% predicted, GOLD 3 (severe) – 30% ≤ FEV1 < 50% predicted, and GOLD 4 (very severe) – FEV1 < 30% predicted [3].

The six minute walk test (6MWT) was administered to assess patients’ functional capacity. The 6MWT measures the distance (in meters) that an individual is able to walk on a flat and hard floor over six minutes, referred to as the 6 minute walk distance (6MWD) [13]. Dyspnea scales were used to quantify the degree of shortness of breath. These included the modified Medical Research Council (mMRC) dyspnea scale, Borg scale, and the Functional Assessment of Chronic Illness Therapy (FACIT)-Dyspnea 10-item short form. A higher score is associated with more severe symptoms. The widely used mMRC dyspnea scale is easy to administer, where patients indicate their dyspnea level on a 5-point scale (0-4) by selecting the physical tasks that provoke shortness of breath [14]. Borg scale is another method of rating breathlessness, both at rest and during the 6MWT (measuring exertional dyspnea), on a scale of 0 to 10 [15]. The FACIT-Dyspnea short form comprises 10 items that describe the experience of shortness of breath when doing a range of common tasks in the past week, with each item scored on a 4-point rating scale (from 0 [no shortness of breath] to 3 [severe shortness of breath]) [16, 17]. FACIT-Dyspnea scores are generated using the T-score metric, where summary scores from responses to items are transformed into a scale with a mean of 50 and a standard deviation [SD] of 10 for patients with self-reported COPD [18].

EQ-5D is a widely used preference-based measure that includes a descriptive health self-classifier and a visual analog scale (EQ-VAS) [19]. The classifier contains five dimensions of health: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. The original EQ-5D, which has 3 Levels (EQ-5D-3L) was recently expanded to 5 Levels (EQ-5D-5L) for each single item dimension of health. An index-based summary score for the EQ-5D-5L can be generated using a recently published crosswalk between the EQ-5D-5L descriptive system and the EQ-5D-3L value sets [20]. This algorithm provides index-based scores ranging from -0.109 to 1.0 in the U.S. population, with lower values signifying worse health [21]. The EQ-VAS asks the patient to rate their own health on a scale ranging from 0 (worst imaginable health) to 100 (best imaginable health) [19].

The PROMIS-43 short form, a multi-dimensional 43-item generic measure of health, is intended for use across a variety of conditions. It includes seven domains: physical function, anxiety, depression, fatigue, sleep disturbance, pain interference, and satisfaction with participation in social roles, as well as a pain intensity item scored from 0 to 10. There are 6 items in each of the seven domains, with responses ranging from 1 to 5. Raw scores for each domain are calculated by summing the item scores while adjusting for missing item responses, and it can be estimated if at least 4 out of the 6 items in that domain were answered. Raw scores are transformed using the T-score metric based on the item response theory calibrations in which scores have a mean of 50 and a SD of 10 for the general population in the U.S. [22, 23]. T-scores can be estimated using the scoring tables listed in the PROMIS manuals [24]. A higher PROMIS T-score implies more of the concept being measured; for instance, a higher PROMIS score on physical function indicates better functioning, whereas a higher score on depression indicates more severe depressive symptoms.

Statistical analysis

The psychometric properties of EQ-5D-5L and PROMIS-43 were examined in terms of reliability, convergent validity, and discriminative ability. Internal consistency reliability of the PROMIS domains was evaluated using Cronbach’s alpha. The distribution of responses to the EQ-5D-5L were described, and Chi-square or Fisher’s exact tests were performed to assess the ability of EQ-5D dimensions to discern among COPD patients with different grades of airflow limitation.

Convergent construct validity was evaluated by assessing the strength of associations among the PROMIS domains, EQ-5D measures (summary and dimension scores), and other clinical or functional status measures using Spearman’s rank correlation coefficients (rs). Correlation coefficients were interpreted as follows: rs < 0.10 (absent), rs = 0.10-0.29 (weak), rs = 0.30-0.49 (moderate), and rs ≥ 0.50 (strong) [25]. We hypothesized that strong correlations would be observed between related domains of health on the EQ-5D-5L and PROMIS, i.e. mobility and physical functioning; usual activities and social roles; pain/discomfort and pain interference; anxiety/depression and anxiety and depression, respectively.

One-way analysis of variance (ANOVA) was used to compare the EQ-5D, EQ-VAS, PROMIS domains, and other measures among patients with different GOLD grades. ANOVA F-statistics comparing mean scores across GOLD grades were used to assess the relative discriminative ability of the EQ summary scores (index score, VAS) and the PROMIS-43 subscales. Relative statistical efficiency (RE) was compared, defined as the ratio of the ANOVA F-statistics for a given measure and the FACIT-Dyspnea score, which served as the reference measure [2628]. Although ANOVA is robust to moderate deviations from normality [29], non-parametric Kruskal-Wallis tests were also performed. All statistical analyses were carried out using the Statistical Analysis System (SAS), version 9.2 (SAS Institute, Cary, NC). All p-values were reported, and for purposes of interpretation, p-values <0.01 were considered statistically significant.

Results

A total of 670 patients were included in our analysis. The mean age was 68.5 (SD 10.4) years, with 58% men and 78% self-identified as caucasian (Table 1). Most patients were current (n = 193, 28.8%) or past (n = 412, 61.5%) smokers, with an average of 44.2 (SD 31.2) pack-years. The study subjects were divided into 4 groups based on the severity of airflow limitation: mild (GOLD 1, n = 102), moderate (GOLD 2, n = 353), severe (GOLD 3, n = 165) and very severe (GOLD 4, n = 50). Patients included as GOLD 3 and 4 (with more severe disease) were significantly younger, more likely to be African-American, have lower education level, less household income, heavier smoking history, and less likely to have coronary heart disease as compared to those with mild to moderate disease (p < 0.05) (Table 1).

Table 1 Patient demographic and clinical characteristics (Total N = 670)

All levels of the EQ-5D-5L were utilized by the overall cohort, but only a relatively small proportion of patients reported severe or extreme problems on each dimension (range 1.4-8.7%) (Table 2). More than 50% of patients reported no problems with self-care and anxiety/depression (in all COPD grades) and usual activities (GOLD 1 only). More severe COPD was associated with significantly more problems with mobility, self-care and usual activities (p < 0.01). Approximately 30% of all COPD patients reported at least moderate pain/discomfort and 15% reported at least moderate anxiety/depression, but differences across COPD grades were not significant (p = 0.26 and p = 0.15, respectively). The response of ‘11111’ (no problems in any dimension) was reported in a dimishing proportion of respondents as grades got more severe: 19.6% (GOLD 1), 18.4% (GOLD 2), 11.5% (GOLD 3), and 4% (GOLD 4). Overall, the mean (SD) score was 0.79 (0.15) for the EQ-5D index, and 70.6 (19.6) for the EQ-VAS (Table 3). When stratified by GOLD grade, EQ-5D index-based mean scores ranged from 0.81 (GOLD 1) to 0.74 (GOLD 4) (p-value = 0.004), and EQ-VAS mean scores ranged from 76.6 (GOLD 1) to 61.1 (GOLD 4) (p-value < 0.001). Patients with more severe COPD had lower mean EQ-5D index scores and EQ-VAS scores, although the index-based score did not discriminate between the milder grades of COPD.

Table 2 EQ-5D-5L profile of patients by GOLD grade
Table 3 Patient clinical and HRQL measurements

Regarding PROMIS-43, physical function had a overall mean score of 40.6 (SD 7.6), and for the rest of domains, the mean scores ranged from 48.1 (social roles) to 53.9 (pain interference) (Table 3). The mean pain intensity score was 3.49 (SD 2.67) on a scale of 0 to 10. The PROMIS physical function, depression, fatigue, and social roles had p-values <0.05, but only physical function and social roles demonstrated a statistically significant decline that was monotonically associated with decreasing lung function (Table 3). No differences in mean scores by GOLD grade were observed for the PROMIS domains of anxiety, sleep disturbance, pain interference and pain intensity.

About sixty patients refused or did not complete the 6MWT and/or dyspnea severity assessment due to health issues (e.g., wheelchair or walker dependent, body pain, discomfort while doing the test). All the dyspnea measures and 6MWD demonstrated discriminative ability when mean scores were compared among subgroups with different levels of COPD severity (Table 3). The FACIT-Dyspnea provided the highest relative efficiency (RE) to discriminate among subgroups of COPD severity, followed by PROMIS physical function, Borg dyspnea during 6MWT, EQ-VAS, mMRC dyspnea, 6MWD, and EQ-5D index .

Cronbach’s alpha for PROMIS domains ranged from 0.89 to 0.95, demonstrating acceptable internal consistency reliability [30]. Strong correlations between related domains on the EQ-5D-5L and PROMIS-43 were observed as expected (Table 4). EQ-5D usual activities (EQ-UA) showed strong correlations with the PROMIS physical function (P-PF) (r s = -0.65), fatigue (P-F) (r s  = 0.54), and social roles (P-SR) (r s  = -0.55), and moderate correlations with the rest of PROMIS domains. EQ-5D pain/discomfort (EQ-PD) was strongly correlated with PROMIS pain interference (P-PI) and pain intensity (P-P) (r s  = 0.67 and 0.63, respectively), and the EQ-5D anxiety/depression (EQ-AD) with PROMIS anxiety (P-A) and depression (P-D) (r s  = 0.60 and 0.59, respectively). EQ-5D mobility (EQ-MO) was moderately to strongly related to four domains of the PROMIS (physical function [P-PF], fatigue [P-F], social roles [P-SR], and pain interference [P-PI]) and the pain intensity item (P-P). The EQ-5D dimension of self-care (EQ-SC) was moderately correlated with P-PF and P-SR.

Table 4 Correlations between domains of EQ-5D-5L and PROMIS-43 (all with p <0.001)

In examining the relationship between clinical measures and the EQ-5D index, EQ-VAS, PROMIS subscales, only the FACIT-Dyspnea and the PROMIS physical function (P-PF) showed at least moderate correlations with % of predicted FEV1 (r s = -0.36 and 0.32, respectively) (Table 5). EQ-5D index scores and VAS scores were both moderately to strongly correlated with PROMIS domains, dyspnea scales, and the 6MWD, but the magnitude of these correlations was smaller with EQ-VAS scores than EQ-5D index scores. All subscales of PROMIS-43 had moderate to strong correlations with at least one of the dyspnea scores. Among all the HRQL measures, the PROMIS physical function (P-PF), fatigue (P-F), social roles (P-SR), EQ-5D index and EQ-VAS, in general, had stronger correlations with the symptom severity. All measures were at least (or nearly) moderately correlated with 6MWD, except for PROMIS anxiety (P-A), depression (P-D) and sleep disturbance (P-SD) (absolute r <0.3).

Table 5 Correlations between clinical and HRQL measures

Discussion

Our study results provide evidence to support the validity of two recently developed generic measures of HRQL, EQ-5D-5L and PROMIS-43, in COPD. The convergent construct validity of the two measures was supported by the moderate to strong correlations between related domains, and between the domain and summary scores of the generic measures and the dyspnea measures. EQ-5D-5L index, EQ-VAS, and two domains of PROMIS (physical function and social roles) had higher RE ratios among the HRQL measures, suggesting that these scores provide greater statistical power (discriminative ability) to capture differences in HRQL in relation to disease severity as measured by lung function.

Level of dyspnea is a strong predictor for health status [3133]. Both EQ-5D and PROMIS had moderate associations with at least one measure of dyspnea, with the correlations varying across the PROMIS-43 subscales. Our results concur with previous reports that spirometric parameters (% of predicted FEV1), unlike severity of breathlessness, does not correlate well with HRQL [31, 32, 34, 35]. While lung function test with spirometry serves as an important clinical tool to measure the degree of airflow limitation, a number of studies have demonstrated that it provides an incomplete assessment of health burden to the patient, which can include physical and psychosocial functioning. This discernment coincides with the new COPD assessment tool recently proposed by the GOLD, which recommends evaluation of COPD based on not only lung function, but also the assessment of symptoms and exacerbation risk [3]. This also reinforces the importance of evaluating patient-reported outcomes along with clinical measures (e.g., lung function test) when gauging the effect of health interventions.

COPD severity has been shown to influence the degree of physical disability, impairing the ability to perform activities of daily living, and contributing to poor HRQL [36]. Patients with COPD had relatively worse self-rated HRQL in multiple PROMIS domains as compared with individuals without COPD or any condition [37]. The negative impact of COPD is more pronounced on the physical aspect of health than on the mental component [31]. Consistent with the study by Gonzalez-Moro and colleagues [36], our findings suggest that, in general, physical functioning tends to be affected in all grades of COPD patients while mental health is impaired only in patients at more severe stages. The mean scores of PROMIS domains indicated that physical function, among all the domains measured in the PROMIS, was the aspect of health status most affected by COPD; physical function was considerably impaired even in patients with mild COPD, as the mean domain score of PROMIS physical function in patients with GOLD grade 1 was less than 50 (the mean score of the general population). The mean domain scores of PROMIS anxiety and depression were higher than 50 only in patients with very severe COPD (i.e. GOLD 4).

Evidence on the properties of the EQ-5D-5L is only beginning to emerge. The first paper was a multi-country study by Janssen et al. in 2012 that compared the measurement properties of the 5-level and 3-level EQ-5D, including 342 patients with respiratory disease (COPD or asthma) as one of the eight patient groups with chronic conditions [38]. The 5-level EQ-5D descriptive system (EQ-5D-5L) reduced ceiling effects of the 3-level EQ-5D (EQ-5D-3L) and improved the discriminatory power and convergent validity. In our study, broad use of 4 of the 5 Levels of the EQ-5D-5L suggests that it could provide higher discriminative power than the standard EQ-5D-3L in COPD, although the most severe category appears to be rarely utilized. A previous study showed that EQ-5D-3L index score (both UK and US) failed to differ across COPD severity stages [35]. The mean EQ-5D-5L index score significantly decreases as COPD severity deteriorates, particularly in the advanced stages of disease (Table 3), which may suggest better discriminatory power of EQ-5D-5L than EQ-5D-3L to distinguish COPD patients of different severity. Similar to studies of the EQ-5D-3L in COPD, self-care is the dimension least affected by COPD [39, 40]. In accordance with a study by Punekar et al. [40], about 80% of COPD patients reported no problems in self-care. As the severity of COPD increased, COPD patients reported more problems with mobility, self-care, and usual activities. However, pain/discomfort and anxiety/depression tended not to differ by disease severity using the EQ-5D-5L or the PROMIS. Our study also suggested that EQ-5D-5L index scores were less able to discriminate among patients with milder disease, i.e. GOLD grades 1 and 2. This is consistent with a previous study by Antonelli-Incalzi et al. who observed that health status dramatically declined when predicted FEV1 was 49% or less (upper limit of GOLD grade 3) [41]. Alternatively, the lack of discrimination between grade 1 and 2 may also suggest that the EQ-5D-5L descriptive system does not entirely address some of the limitations of the three-level EQ-5D [39], assuming there is a meaningful difference in self-reported health based on GOLD grades 1 and 2. Unlike the EQ-5D index score which is derived based on the five dimensions using population preference weights, the EQ-VAS provides a direct rating of health from the patient’s point of view. Consistent with previous reports [35], EQ-VAS had a more monotonic relationship with disease severity and better ability to discriminate according to disease severity compared to EQ-5D index.

Among the PROMIS subscales, physical functioning was most strongly associated with disease grades and measures of breathing difficulty and functioning. Only physical function (P-PF), depression (P-D), fatigue (P-F), and social roles (P-SR) varied significantly across COPD grades and the magnitude of differences in the PROMIS scores of depression and fatigue across different GOLD grades were smaller than half of their SD, a commonly used cutoff for interpreting important differences in HRQL scores [42]. Anxiety, sleep, and pain domains of PROMIS, although moderately related to other HRQL measurements and dyspnea scores (mainly FACIT-Dyspnea), did not vary by COPD GOLD grade. The lack of correlation between pain, anxiety, and sleep disturbance and the degree of COPD severity does not preclude the importance of these HRQL parameters in COPD patients. In fact, it has been reported that 35%, 37% and 51% of advanced COPD patients suffer from sleep disturbance, pain and anxiety, respectively, arguably among the most prevalent symptoms associated with advanced COPD [43]. Despite the inability of these domains to discriminate patients with different level of airflow limitation, the domains present convergent validity and it suggests that they may capture patient-reported outcomes other than those associated with spirometry. In addition, the observation that the parameters of physical or physiological measures (dyspnea scores; mobility, self-care and usual activities in EQ-5D-5L; physical function in PROMIS) deteriorate more with the increase in COPD severity, as compared to psychosocial measures (anxiety/depression in EQ-5D-5L; anxiety, depression and social roles in PROMIS), suggests the possibility of adapation and coping mechanisms developed in COPD patients as the disease severity progresses, which is often observed with chronic illnesses and disabling conditions [44].

EQ-5D and PROMIS, both generic measures of HRQL, are distinctive in several ways. While EQ-5D index and VAS scores both provide summary scores for evaluating general health status as a whole, PROMIS describes different aspect of health status individually using domain scores. The domains of anxiety, depression, and pain are apparently covered by both of the measures, but it is arguable if fatigue (PROMIS) overlaps with pain/discomfort (EQ-5D), as well as the extent of overlap between physical functioning, fatigue, sleep disturbance, or social roles (PROMIS) and mobility, self-care, or usual activities (EQ-5D). EQ-5D index-based scores are generated from societal preferences for health that can be applied to economic evaluations. Although PROMIS-43 does not include global items and was not designed as a preference-based measure as EQ-5D, at least one scoring function is available to convert PROMIS selected domain scores into a single index value by mapping onto the EQ-5D [45]. Comparing to PROMIS, EQ-5D is presumably briefer to administer as it contains 6 items (including VAS) rather than 43, but the PROMIS-43 contains more items in each domain, thereby providing the potential of a higher level of precision and sensitity than EQ-5D. Alternatively, even briefer short-form versions of the PROMIS are available.

This study has several limitations. Since patients did not complete EQ-5D-3L, we could not directly determine whether the EQ-5D-5L improves upon the properties of the EQ-5D-3L in COPD. In addition, longitudinal data are needed to examine and compare the responsiveness of the measures to detect meaningful change following interventions. Lastly, in our study, patients with more severe COPD (GOLD 3 and 4) were younger than those with milder disease, which was contrary to our expectation but may be due to the eligibility of study participation or possibly a survivor effect. The representativeness of patients included in this analysis could also be restricted by the relatively low response rate (36%) for participating in the in-person evaluations. Age is a known factor that could confound the association between HRQL and disease severity [7]. In order to rule out the confounding effect, we also conducted an analysis of covariance (ANCOVA) to control for age when comparing the responses in EQ-5D, PROMIS domain scores, dyspnea measures, and 6MWD among patients with different GOLD grades (data not shown). Similar results (F-statistic and significance level) were found as in Table 3 after controlling for age effect, except that the discriminative ability of 6MWD and PROMIS sleep disturbance (P-SD) to distinguish COPD patients of different severity was improved.

Conclusions

In summary, our study provides evidence to support validity of EQ-5D-5L and PROMIS-43 to assess HRQL in patients with COPD. The measures indicated that patient-reported physical function and social activities decrease with level of lung function by GOLD grade, but not pain, mental health, sleep or fatigue. Future research using a longitudinal design will help to further understand the strengths and limitations of these measures in assessing outcomes in COPD patients.