Data sources
We analyzed a unique longitudinal national dataset initially created to study predictors of healthcare associated infections in elderly Medicare recipients during a hospitalization in an ICU [15, 16]. In that study, 31 hospitals belonging to the Centers for Disease Control and Prevention’s (CDC) National Nosocomial Infections Surveillance system that had conducted device-associated infection surveillance in 2002 were recruited. We obtained Medicare claims data for the universe of elderly patients in the participating hospitals during the months that surveillance was conducted. This study was approved by the Institutional Review Boards at the Columbia University Medical Center and the RAND Corporation.
From this sample we defined “index ICU stays” as those that occurred during a period of CDC infection surveillance in 2002. An individual patient could contribute more than one index ICU stay to the sample as long as the additional ICU stay was in a month in which surveillance occurred. We merged these index ICU data with individual Medicare inpatient, outpatient and denominator data for the years 2001 through 2007, yielding a minimum of one year of data prior to, and up to five years of data following, the index ICU stay. All analyses that follow are conditional on being discharged alive from the index stay.
Measures
Exposure
For each index ICU stay, we obtained sepsis and pneumonia status from the Medicare claims data and healthcare associated infections status from the CDC data. We defined exposure variables based on infection status during the index hospitalization including: 1) sepsis, 2) pneumonia, 3) CLABSI and 4) VAP. The fifth group was the control (i.e., none of these infections).
The definitions of sepsis and pneumonia were based on the International Classification of Diseases, Ninth Revision, Clinical Modification [ICD-9-CM] codes associated with the index ICU stay. Codes used for sepsis included ICD-9-CMs: 038 (septicemia), 995.91 (sepsis), 995.92 (severe sepsis), and 785.2 (septic shock), and for pneumonia, ICD-9-CMs: 482.0-482.2, 482.4-482.9 (pneumonia cases with a bacterial diagnosis code) [17]. These ICD-9-CM codes have been used in previous research and the 038 codes for septicemia and the 482 codes for pneumonia have been validated with a specificity and positive predictive value of 99% and 89%, and 99% and 85%, respectively [3, 7, 18].
Patients with either of the two device-associated infections of interest (CLABSI and VAP) were identified by the hospitals’ infection preventionists and reported into CDC’s system. All infection preventionists used the same direct surveillance protocols developed by the CDC and these protocols included both clinical and laboratory data [19]. In the infection groups, if a person was identified with a CLABSI, they were not included in the sepsis group, and similarly, if they were identified with a VAP they were not included in the pneumonia group. However, a patient could be identified in both the sepsis and pneumonia cohorts, or in both the VAP and CLABSI cohorts.
Outcomes
The principal outcome measures were survival and healthcare utilization. We used the date of death and the date of discharge from the index admission to define the length of survival. We used place and date of service to define healthcare utilization categories, including inpatient admissions, outpatient visits, emergency department admissions, long-term care admissions, and home healthcare visits. The broad outpatient visits category included office, outpatient hospital, ambulatory surgical center, federally qualified health center, state or local public health clinic, rural health clinic, and community mental health center visits. For each utilization category we generated annual counts (365-day period) for each of the five years following the index hospitalization discharge date.
Covariates
For each patient, both inpatient and outpatient Medicare data were used to generate measures of health status based on 30 aggregated condition codes and 184 hierarchical condition codes using the DxCG software [20–22]. These controls were calculated based on healthcare experiences during the year prior to the index admission. Our method used diagnostic information associated with prior hospitalizations, outpatient, and ambulatory services to characterize health status by considering multiple coexisting medical conditions and creating more aggregated groupings. The hierarchies served to 1) improve clinical validity (e.g., it is not useful to characterize a person with a more severe manifestation of diabetes the same as a less serious type); 2) overcome some of the limitations found in Medicare data due to coding practices (e.g., the proliferation of recorded diagnoses for the purpose of maximizing reimbursement); and 3) improve the precision of the risk adjustment [23, 24]. We also included other patient demographics (age, gender, race) and Medicaid status (i.e., dual eligibility).
Statistical analysis
For each subgroup, summary statistics were computed. Pearson’s Chi squared statistics or Fisher’s exact test statistics were used to test the equivalence of categorical variables. For continuous variables, t-tests or non-parametric tests were computed.
Mortality models
Patient survival outcomes across infection groups were examined first by calculating Kaplan-Meier survival functions and log-rank tests to detect differences among cohorts. We then estimated multiple variable Cox proportional hazard models to control for the effects of health status (aggregated condition codes and/or hierarchical condition codes measures) and other covariates. We tested the proportional hazard assumption by estimating alternative specifications that admitted interactions between analysis time and variables of interest. Finally, we estimated multiple variable parametric hazard models with continuous time frailty to account for unobserved heterogeneity in health status among subjects [25, 26]. We considered alternative distributional assumptions for the parametric duration dependence (exponential, gamma, Gompertz, log logistic, log normal, and Weibull), selecting Weibull because of model fit and because it best matched the patterns of duration dependence identified by the Kaplan-Meier models. We considered gamma and inverse Gaussian distributions to model unobserved frailty, and selected the gamma distribution based on model performance.
Utilization models
The data were organized so that the unit of analysis was the person-year. We formulated multiple variable Poisson and negative binomial models for each healthcare utilization category; however, in the likelihood ratio test of over-dispersion the alphas were significantly different from zero, which led us to reject the Poisson models for the negative binomial models in every case.
Each subject could contribute up to five years of follow-up or until death. Subjects who died were at risk for healthcare utilization only for the part of the year in which they were alive. We defined an exposure variable to adjust these observations for the amount of time at risk during the year of death. Variation in time at risk is typically handled by including ln(exposure) as a variable in the model and restricting its coefficient to one, typically referred to as fitting exposure time as an offset. We refer to this specification as Model 1, which estimates the full relationship between infection and utilization but treats censoring (time at risk) as random.
Because the censoring (exposure) was a function of death, and since death is undoubtedly correlated with healthcare utilization, the censoring is unlikely to be random. To determine how much of the resource utilization was related to impending death during the follow up period, we admitted an additional polynomial to account for changes in utilization as a function of “time-until-death” and identify this specification as Model 2 (see Additional file 1 for full discussion). In Model 2, alternative specifications of the “time-until-death” included step, linear, quadratic, or third order functions. We also allowed for alternative periods for the “time-until-death” including 6 months, 12 months, or 24 months prior to death. The Model 2 specifications, therefore, separately identify an underlying pattern of utilization following discharge and an additive component based on the time until death following discharge. Huber-White sandwich estimators are used to calculate all standard errors [27, 28].