INTRODUCTION

Sickle cell trait (SCT) is increasingly being studied as a risk factor for diseases disproportionately affecting African Americans.1 Research into sickle cell disease (SCD) is also increasing due to poor outcomes in this understudied condition.2 Most sickle cell research uses hemoglobin electrophoresis or genetic data to identify patients. However, such information is not always collected or available in clinical care records, limiting adequate sample sizes in large databases with useful outcomes. The utility of International Classification of Diseases (ICD) codes to determine sickle cell status has not been examined in detail. We sought to determine sickle cell prevalence and the validity of sickle cell ICD codes in African American adults in a large multi-hospital healthcare system.

METHODS

We reviewed all adult African American patients with a hemoglobin electrophoresis in the patients’ data registry of Partners Healthcare, Boston Massachusetts. Hemoglobin electrophoresis reports were used as the “gold standard” for the diagnosis of SCT and SCD. ICD codes input any time after January 1, 2005, were used as the test. All analyses were conducted using STATA 14.2 (StataCorp.).

SCT ICD codes used were 282.5 or D57.3. SCD ICD codes used were 282.4x (1, 2), 282.6x (0–4, 8, 9), 289.52, 517.3, D57.0x (0–2), D57.1, D57.20, D57.21x (1, 2, 9), D57.40, D57.41x (1, 2, 9), D57.80, and D57.81x (1, 2, 9). ICD code algorithms evaluated were (i) at least one, (ii) at least two, and (iii) at least five of the same or different codes. Anemia (average hemoglobin < 12 g/dL) and low average urine specific gravity (USG < 1.015), both of which may occur in sickle cell,3 were subsequently added. We determined SCT and SCD prevalence, and the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and the area under the curve (AUC) for each algorithm. This study was approved by the Institutional Review Board at Partners Healthcare, Boston Massachusetts and the need for informed consent was waived.

RESULTS

We identified 10,877 African American patients who had undergone a hemoglobin electrophoresis (Table 1). The prevalence of SCT and SCD was 12% and 2%, respectively. Results are shown in Table 2. InSCT, using at least one ICD code had the highest AUC (0.784). Adding anemia and low USG improved the SCTAUC (0.866) but diminished the PPV. In SCD, at least two ICD codes achieved an AUC of 0.956. The addition of anemia and low USG further increased SCD AUC to 0.977 (Table 2).

Table 1. Characteristics of All Patients as of January 1, 2005, Based on Hemoglobin Electrophoresis
Table 2. Sensitivity, Specificity, Predictive Values, and Areas Under the Curve (AUC) of ICD Code–Based Algorithms for the Determination of Sickle Cell Trait and Sickle Cell Disease Status

DISCUSSION

Our findings suggest that ICD codes can adequately adjudicate sickle cell status for observational studies.

SCT rarely has overt clinical manifestations1; therefore, physicians may not code for sickle cell trait unless prompted. Hence, SCT PPV reached 78% despite the low prevalence in our cohort. Although adding complications increased SCT AUC to 0.866 from 0.784, the PPV fell to 45.5% due to an increase in false positives. Not having a SCT code remained highly predictive (95.8%) for SCT absence. In contrast, SCD ICD code algorithms achieved high AUCs although the PPV was diminished (27%) due to low prevalence. The absence of a SCD ICD code essentially excluded SCD. Adding complications improved SCD AUC; however, the PPV was moderate (52.2%).

The influence of hemoglobin electrophoresis on ICD coding by physicians is unclear; therefore, these results need to be confirmed in a validation cohort where the gold standard is not clinically indicated. Using hemoglobin electrophoresis may have falsely increased the prevalence of sickle cell in our cohort (which was higher than the national average4, 5) and biased this population towards anemia.

In conclusion, sickle cell ICD codes are valuable tools for identifying SCT and SCD patients for much needed large epidemiological studies. Due to moderate PPVs, the use of sickle cell ICD codes in epidemiological studies should be accompanied by a sensitivity analysis using only sickle cell cases confirmed by the available gold standard to verify the direction of observed estimates. Given the risk for misclassification associated with moderate PPVs, we would not recommend using sickle cell ICD codes to create prediction models based on sickle cell status. Rather, with the sensitivity analysis caveat, sickle cell ICD codes would be best suited for investigating clinical associations in retrospective observational data which can subsequently be evaluated prospectively.