Introduction

The quantification of patients’ health status (their symptoms, function, and health-related quality of life) is becoming increasingly important as an endpoint in clinical trials [1, 2], as a means of monitoring patients over time [3], and even as a marker of health care quality [46]. Measuring patients’ health status is most commonly accomplished by direct solicitation from patients themselves, using either generic or disease-specific health status questionnaires [7]. Whereas generic health status measures seek to quantify the health-related quality of life limitations of patients’ ‘overall health’, including all combinations of diseases and ailments, disease-specific measures focus upon a specific disease process. Accordingly, an advantage of disease-specific questionnaires is that they can be both more sensitive to clinical change and more interpretable than general health status questionnaires. Because disease-specific instruments tend to be more actionable to clinicians, they can be useful in monitoring patients’ clinical status over time [8]. An unresolved challenge in the use of disease-specific measures, however, is whether they perform well when patients have other conditions that may also impact their symptoms, function, and quality of life in ways that are similar to the specific disease of interest. While this is not a problem with generic health status measures, if a co-occurring disease manifests similar effects to a primary disease of interest, and if one of the conditions worsens while the other remains stable or improves, then it is unknown whether the disease-specific measure will accurately measure the status of the disease it is designed to quantify. This can undermine the original validity, reliability and responsiveness of the disease-specific measure. One potential setting in which this type of confounding may occur is in the setting of heart failure and anemia.

Heart failure (HF) affects over 5 million Americans and is responsible for enormous health care costs and resource utilization [9]. While several disease-specific measures for patients with HF have been created to quantify HF patients’ health status [1013], the psychometric properties of these instruments have all been demonstrated in patients whose predominant medical condition was heart failure. Recently, however, anemia has emerged as an important co-occurring condition that is prognostically important in HF [1421]. Importantly, the symptoms of anemia may mimic those of HF. Specifically, chronic anemia manifests itself in patients as “fatigue, loss of stamina, breathlessness, and tachycardia, particularly with exertion” [22], coupled with the development of “gradual fatigue and a reduced exercise tolerance” [23]. The symptoms of anemia [24] are therefore remarkably similar to those of HF, for which breathlessness, reduced exercise tolerance, and fatigue are hallmarks of the disease [22, 23]. Because of the overlap in symptomatic manifestations of both diseases, it is not clear whether heart failure-specific health status assessments, such as the Kansas City Cardiomyopathy Questionnaire (KCCQ) which captures patients’ perspectives of their disease rather than physiologic or anatomic characteristics (e.g. their hematorit or left ventricular function), would perform similarly in patients with and without anemia. For example, if a patient’s heart failure symptoms improve from an increased use of diuretics, would there be an improvement in their KCCQ scores if their anemia did not change? Consequently, it is unknown whether the psychometric properties of a disease-specific HF instrument would be similar in anemic and nonanemic HF patients, given the potential for both conditions to induce similar symptoms in patients. The purpose of this analysis was to provide empirical evidence supporting the reliability and validity (including responsiveness) of the KCCQ in anemic HF patients. We sought to compare the psychometric properties, including reliability (internal consistency and test–retest) and validity (including responsiveness to clinical change), in anemic and nonanemic HF patients.

Methods

Patient population

The STudy of AneMia IN A Heart Failure Population (STAMINA-HFP) was a prospective registry of 1,090 HF patients who were initially enrolled in the outpatient clinics of 58 U.S. cardiology centers. The purpose of STAMINA-HFP was to estimate the incidence and prevalence of anemia in patients with HF and to describe the association of anemia with HF progression. As such, the protocol mandated an initial clinical history and physical exam (including assignment of the New York Heart Association [NYHA] classification), a measurement of patients’ hemoglobin, and an assessment of their health status, including the KCCQ. Enrolled patients were contacted by telephone within 1–2 days after their initial enrollment and every 3 months for a period of 1 year to assess their health status. In addition, data from outpatient visits during the year of observation were abstracted at each visit. To be eligible for participation, patients had to be >18 years of age, diagnosed with a history of symptomatic HF and willing to provide informed consent. Patients who had a prior organ transplant, were planning a major surgery, or were participating in another interventional trial were excluded. Data capture was completed in the STAMINA-HFP registry in July 2004. Patients participating in this study (n = 811) had to have complete baseline health status assessments and a measure of hemoglobin. They were slightly more likely to be white (75% vs. 68%, P = 0.03) and to not smoke (90% vs. 86%, P = 0.03) but had similar distributions of NYHA class, KCCQ scores, and hemoglobin as compared with the entire cohort of 1,090 patients. For the longitudinal assessments of test–retest reliability and responsiveness, a 3-month health status assessment was also required from these patients (n = 698). At the time of follow-up interviews, a patient global assessment of change question, asking patients to describe ‘how [they] think [their] condition has changed compared to the start of the study,’ was also asked at each follow-up interview. Responses were on a 7-point categorical response scale ranging from ‘markedly improved’ to ‘markedly worse.’ The initial 3-month assessment was used for this study because it was the closest assessment to the time of their anemia evaluation and because it had the shortest recall period with which to assess patients’ perspectives of clinical change of any of the other assessments.

The KCCQ is a 23-item, disease-specific measure that quantifies four clinically relevant domains of patients’ health status including physical limitations, symptoms (frequency, severity, and change over time), a heart failure-specific assessment of their quality of life and their perceived social limitations due to heart failure, and a self-efficacy domain (a measure of patients’ knowledge of how to best manage their disease). The four health status scales can be combined into a single, overall summary score [10]. Scores range from 0–100, where higher scores indicate better functioning, fewer symptoms, and better disease-specific quality of life. The KCCQs validity and reproducibility have been previously supported and recent studies have demonstrated that both cross-sectional variations [25, 26] and changes [27] in KCCQ scores are prognostic of subsequent mortality and HF hospitalizations. A mean difference between groups of patients and an intra-individual change over time of ≥5 points is considered clinically significant. This was established in two studies: a prospective study of 476 outpatients in which a mean change of five points in the KCCQ overall summary scale was observed in those patients who experienced a small, but clinically significant change in their heart failure [28] and in a prospective study of 1,358 HF patients in which a 5-point change was associated with an 11% change in the multivariable-adjusted hazard ratio of hospitalization and cardiovascular death [27].

Classification of anemia

Patients were classified as being anemic or not based upon their hemoglobin at the time of study enrollment. Thus, the patients’ hemoglobin value at baseline determined their anemia status throughout the remainder of the study. The World Health Organization (WHO) definition of anemia was used (hemoglobin <13 g/dl for men, <12 g/dl for women).

Statistical analysis

The patient population was divided into those with and without anemia, and baseline demographic and clinical characteristics were compared with chi-square tests for categorical variables and t tests for continuous variables.

To evaluate the construct validity of the KCCQ in anemic and nonanemic patients, comparisons across NYHA classifications from the baseline interviews were conducted. A two-way ANOVA with KCCQ overall summary score as the dependent variable and baseline anemia classification, NYHA, and an anemia-by-NYHA interaction term as independent variables was constructed to establish whether the association of NYHA and KCCQ was different in patients with and without anemia.

Both internal consistency and test–retest reliability estimates were compared among anemic and nonanemic patients. Internal consistency reliability was estimated using Cronbach’s alpha. This value was calculated independently for patients with and without anemia. Test–retest reliability was performed using a t test, by comparing the 3-month change in KCCQ scores for patients who responded on the global change in health question that their condition was stable over the previous 3 months. Three months was selected as the interval of analysis to maximize the number of patients included in the analyses, to leverage the existing data in which a 3-month interval was selected, and to replicate the time frame of the original validation studies evaluating the KCCQ [10].

To compare the responsiveness of the KCCQ in patients with and without anemia, the association of patients’ means change in KCCQ scores with their responses to the global health question were assessed using linear regression. In these linear regression analyses, the mean change in KCCQ scores was used as the dependent variable and baseline anemia status, global assessment of change, and an anemia-by-global assessment of change interaction term were independent variables. As a secondary analysis, patients’ perspectives of their 3-month change in status was categorized into improved (mildly–markedly), stable (no change), and worsened (mildly–markedly), and mean change in KCCQ scores were compared for those with and without anemia by these patient-perceived categories of change in health status. All analyses were conducted in SAS version 9.1 (SAS Institute Inc., Cary, NC, USA). All statistical tests were two-sided and a P value <0.05 was used as the threshold to declare statistical significance.

Results

Patient characteristics

Overall, 811 patients completed the baseline clinical assessments, including the KCCQ interview, and were used to assess baseline construct validity and internal consistency reliability. Only 4% of the population had KCCQ scores of 100, suggesting a minimal ceiling effect of the instrument in this population. Among the patients who completed the baseline assessments, 268 (33%) met the WHO criteria for anemia. Baseline characteristics of the anemic and nonanemic patients are provided in Table 1. Compared to nonanemic patients, anemic patients were significantly more likely to be older than (mean ± SD = 67 ± 13 vs. 63 ± 14 years, P < 0.001), non-Caucasian (30% vs. 23%, P = 0.03), and to have an ischemic HF etiology (46% vs. 38%, P = 0.04), diabetes (47% vs. 34%, P < 0.001), lower glomerular filtration rate (GFR) (50.8 ± 24.6 vs. 63.9 ± 23.1 ml/min/m2, P < 0.001), chronic renal insufficiency (33% vs. 15%, P < 0.001), higher serum creatinine (1.7 ± 1.5 vs. 1.3 ± 1.2 mg/dl, P < 0.001), worse NYHA class (44% vs. 32% class III or IV, P < 0.001), lower KCCQ scores (61 ± 23 vs. 65 ± 23, P = 0.009), and to be treated with diuretics (90% vs. 84%, P = 0.04).

Table 1 Baseline clinical characteristics of the population

For the longitudinal analyses of test–retest reliability and responsiveness, 698 (86.1%) patients provided both baseline and 3-month assessments. No statistically significant differences in any of the patient characteristics listed in Table 1 were observed between those who did and did not participate in the 3-month interviews. Notably, no difference in anemia classification (33.0% among participants vs. 33.6% in those without follow-up, P = 0.89) or mean hemoglobin (13.2 ± 1.8 vs. 13.2 ± 2.1, P = 0.94) was observed between those with and without follow-up.

Construct validity

Figure 1 illustrates the relationship between the overall KCCQ score by NYHA class in anemic and nonanemic patients. Mean KCCQ scores tracked strongly with NYHA class (mean summary score ± SE for the entire population = 80.3 ± 1.6, 68.0 ± 1.0, 50.9 ± 1.2, and 45.6 ± 5.4 for NYHA classes I–IV respectively, P < 0.0001). While the mean KCCQ scores for the patients with NYHA class IV differed substantially, this was likely due to the small numbers of patients in these subgroups, resulting in a large standard deviation of scores (KCCQ scores in anemic [n = 12] and nonanemic [n = 7] patients = 39.5 ± 20 vs. 56.2 ± 28). Furthermore, the relationship between KCCQ and NYHA did not differ significantly between anemic and nonanemic patients (P = 0.38 for interaction).

Fig. 1
figure 1

KCCQ overall summary score by NYHA class in anemic and nonanemic patients

Reliability

The internal consistency reliability (Cronbach’s alpha) of the KCCQ overall summary score for the entire population was 0.93. In patients who were anemic and nonanemic, the Cronbach alphas were 0.92 and 0.93, respectively.

The mean (±SE) KCCQ overall summary score change in patients who reported no change over 3 months in their clinical condition (n = 257) was −1.2 ± 0.7. Among anemic patients (n = 73) the mean change was −2.8 ± 1.4, and for nonanemic patients (n = 184) the mean change was −0.5 ± 0.8. These mean 3-month changes were similar between anemic and nonanemic patients (P = 0.14). Importantly, the change between and within both groups was less than five points over time and, thus, not considered clinically significant. The intraclass correlation coefficient for the KCCQ overall summary score was 0.885 in the overall cohort and did not differ among those with and without anemia (0.865 for anemic patients and 0.892 for nonanemic patients [P = 0.11]).

Responsiveness

To compare the sensitivity of the KCCQ to patients’ perceptions of change, the mean change in KCCQ scores across each category of change were compared in anemic and nonanemic patients. Figure 2 describes the means and 95% confidence intervals of change in KCCQ scores across crude categories of improved (mild to marked), no change, and worsening (mild to marked) of their overall clinical status during the first 3 months of the study. No significant differences by anemia status were detected within any of the three categories of reported health change. In regression analysis, mean 3-month KCCQ change was found to be linearly related to reported health change, with a slope of −2.7 points (95% CI = [−3.6, −1.9]) per 1 step down on the Likert-scale response. Nonlinearity was tested using cubic splines and was found to be nonsignificant (P = 0.68). The association was consistent in both anemic and nonanemic patients (slope estimates = −3.4 [95% CI = (−5.0, −1.8)] vs. −2.5 [95% CI = (−3.4, −1.6)], P-value = 0.33).

Fig. 2
figure 2

Three-month KCCQ change by reported health change and anemia status

Discussion

We report empirical evidence to support the reliability and validity (including responsiveness) of the KCCQ in HF patients with anemia. We found similar associations between the KCCQ overall summary score and NYHA classification, similar internal consistency and test–retest reliability in stable patients, and similar responsiveness of the KCCQ to patients’ perceptions of 3-month clinical change. In light of the potential overlap in symptoms between anemia and HF, we tested and rejected the hypothesis that the psychometric properties of the KCCQ might differ between HF patients with and without anemia. Thus, the KCCQ could be a valid, reliable, and responsive outcome for clinical trials of anemia treatment in HF.

To date, there has been a paucity of literature examining attribution of symptoms to one of a spectrum of potentially co-occurring diseases in patient-reported health status assessments. A common recommendation for the design of clinical trials is to include both disease-specific and generic measures of health status. These recommendations are predicated upon the desire to capture the impact of treatments on a specific disease of interest, as well as the overall impact of treatment on patients’ health outside the condition of interest. The latter intended to capture side effects or other unanticipated complications of therapy.

However, since patients are often only aware of the symptoms that they experience, and not the underlying pathophysiology that is responsible for those symptoms, it is unclear whether a distinct pathophysiology (e.g. anemia) could confound the psychometric properties of a disease-specific health status measure quantifying a disease with similar clinical manifestations (e.g. HF). While a review of the literature and an independent investigation of Rijken et al. demonstrated that co-occurring diseases are associated, as expected, with worse scores on general health status measures, they did not examine the influence of co-occurring diseases on disease-specific instruments [29]. To examine the relative performance of generic and disease-specific instruments, Ren and colleagues demonstrated, in both cross-sectional [30] and longitudinal [31] analyses, that disease-specific measures outperformed generic measures in terms of their content and discriminative validity among veterans with multiple comorbidities. However, these latter studies did not focus upon the ability of a disease-specific measure to be a psychometrically valid representation of the disease of interest when another co-occurring illness may manifest similar symptoms.

One research situation in which one disease has similar symptoms to another is depression and heart failure. Numerous studies have documented that depressed patients have worse health status than nondepressed patients [3235], yet both depression and heart failure can manifest symptoms of fatigue. Interestingly, Rumsfeld et al. were able to demonstrate that depression was associated with worse KCCQ scores over time, regardless of baseline scores, and that among nondepressed patients, poorer health status was associated with a greater likelihood of becoming depressed [36]. These studies did not address whether the KCCQ was equally valid in both depressed and nondepressed patients, but did show important interrelationships between diseases that can have similar clinical manifestations.

Several potential limitations of this article should be considered in interpreting our findings. First, validation is always difficult to establish given the absence of a criterion standard for a patient-centered outcome such as health status. We selected the NYHA classification because it is a commonly used metric with which to quantify HF patients’ health status. Yet the reproducibility and interobserver variability of NYHA assessments is known to be lower than directly soliciting such data from patients themselves. [37]. A second concern is that our assessments of change in HF came from a global patient-reported estimate of change. At best this is a crude assessment and much more detailed examinations of clinical change in the KCCQ have been conducted [28]. Although the quantitative assessments of change in KCCQ scores may be different than in previous studies, there is no a priori reason to suspect that these patient-reported assessments of global change would differ by patients’ anemia status. Therefore, finding similar associations between 3-month change in KCCQ scores and patients’ global assessments of change between those with and without anemia supports the fact that the responsiveness of the KCCQ to clinical change is similar in both populations of patients. In addition, concluding that no differences exist in the performance characteristics of the KCCQ in anemic and nonanemic patients could be incorrect due to a type II error. However, the point estimates for all of our analyses were extremely similar and included more than 800 patients, which is greater than six times as large as the original study documenting the reliability and validity (including responsiveness) of the KCCQ [10]. Finally, 12% of patients were lost to follow-up and we cannot exclude the introduction of a potential bias from failing to include these patients. Supporting the generalizability of our study, however, is the similar degree of association between KCCQ scores and NYHA classification, the similar estimates of internal consistency and test–retest reliability, and similar responsiveness observed in this population as seen in other studies testing the psychometric properties of the KCCQ [10, 28].

In summary, we provide empirical evidence to support the ability of a disease-specific measure of heart failure patients’ health status, the KCCQ, regardless of the presence of anemia. As such, we believe that patient-reported health status outcomes are a valid, reliable, and sensitive means to quantify the impact of anemia treatment on HF-specific outcomes. Given that the improvement in patients’ health status is a primary goal of treatment, being able to accurately capture HF patients’ health status in trials of anemia therapy are important. The KCCQ should be a valid and sensitive measure for accomplishing these goals.