Validity of International Classification of Diseases Codes for Sickle Cell Trait and Sickle Cell Disease


Sickle cell trait (SCT) is increasingly being studied as a risk factor for diseases disproportionately affecting African Americans.1 Research into sickle cell disease (SCD) is also increasing due to poor outcomes in this understudied condition.2 Most sickle cell research uses hemoglobin electrophoresis or genetic data to identify patients. However, such information is not always collected or available in clinical care records, limiting adequate sample sizes in large databases with useful outcomes. The utility of International Classification of Diseases (ICD) codes to determine sickle cell status has not been examined in detail. We sought to determine sickle cell prevalence and the validity of sickle cell ICD codes in African American adults in a large multi-hospital healthcare system.


We reviewed all adult African American patients with a hemoglobin electrophoresis in the patients’ data registry of Partners Healthcare, Boston Massachusetts. Hemoglobin electrophoresis reports were used as the “gold standard” for the diagnosis of SCT and SCD. ICD codes input any time after January 1, 2005, were used as the test. All analyses were conducted using STATA 14.2 (StataCorp.).

SCT ICD codes used were 282.5 or D57.3. SCD ICD codes used were 282.4x (1, 2), 282.6x (0–4, 8, 9), 289.52, 517.3, D57.0x (0–2), D57.1, D57.20, D57.21x (1, 2, 9), D57.40, D57.41x (1, 2, 9), D57.80, and D57.81x (1, 2, 9). ICD code algorithms evaluated were (i) at least one, (ii) at least two, and (iii) at least five of the same or different codes. Anemia (average hemoglobin < 12 g/dL) and low average urine specific gravity (USG < 1.015), both of which may occur in sickle cell,3 were subsequently added. We determined SCT and SCD prevalence, and the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and the area under the curve (AUC) for each algorithm. This study was approved by the Institutional Review Board at Partners Healthcare, Boston Massachusetts and the need for informed consent was waived.


We identified 10,877 African American patients who had undergone a hemoglobin electrophoresis (Table 1). The prevalence of SCT and SCD was 12% and 2%, respectively. Results are shown in Table 2. InSCT, using at least one ICD code had the highest AUC (0.784). Adding anemia and low USG improved the SCTAUC (0.866) but diminished the PPV. In SCD, at least two ICD codes achieved an AUC of 0.956. The addition of anemia and low USG further increased SCD AUC to 0.977 (Table 2).

Table 1. Characteristics of All Patients as of January 1, 2005, Based on Hemoglobin Electrophoresis
Table 2. Sensitivity, Specificity, Predictive Values, and Areas Under the Curve (AUC) of ICD Code–Based Algorithms for the Determination of Sickle Cell Trait and Sickle Cell Disease Status


Our findings suggest that ICD codes can adequately adjudicate sickle cell status for observational studies.

SCT rarely has overt clinical manifestations1; therefore, physicians may not code for sickle cell trait unless prompted. Hence, SCT PPV reached 78% despite the low prevalence in our cohort. Although adding complications increased SCT AUC to 0.866 from 0.784, the PPV fell to 45.5% due to an increase in false positives. Not having a SCT code remained highly predictive (95.8%) for SCT absence. In contrast, SCD ICD code algorithms achieved high AUCs although the PPV was diminished (27%) due to low prevalence. The absence of a SCD ICD code essentially excluded SCD. Adding complications improved SCD AUC; however, the PPV was moderate (52.2%).

The influence of hemoglobin electrophoresis on ICD coding by physicians is unclear; therefore, these results need to be confirmed in a validation cohort where the gold standard is not clinically indicated. Using hemoglobin electrophoresis may have falsely increased the prevalence of sickle cell in our cohort (which was higher than the national average4, 5) and biased this population towards anemia.

In conclusion, sickle cell ICD codes are valuable tools for identifying SCT and SCD patients for much needed large epidemiological studies. Due to moderate PPVs, the use of sickle cell ICD codes in epidemiological studies should be accompanied by a sensitivity analysis using only sickle cell cases confirmed by the available gold standard to verify the direction of observed estimates. Given the risk for misclassification associated with moderate PPVs, we would not recommend using sickle cell ICD codes to create prediction models based on sickle cell status. Rather, with the sensitivity analysis caveat, sickle cell ICD codes would be best suited for investigating clinical associations in retrospective observational data which can subsequently be evaluated prospectively.


  1. 1.

    Naik RP, Haywood C. Sickle cell trait diagnosis: clinical and social implications. Hematol Am Soc Hematol Educ Progr 2015;2015(1):160-167. doi:

    Article  Google Scholar 

  2. 2.

    Rees DC, Williams TN, Gladwin MT. Sickle-cell disease. Lancet (London, England). 2010;376(9757):2018-2031. doi:

    CAS  Article  Google Scholar 

  3. 3.

    Nath KA, Hebbel RP. Sickle cell disease: renal manifestations and mechanisms. Nat Rev Nephrol 2015;11(3):161-171. doi:

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Centers for Disease Control and Prevention. Incidence of Sickle Cell Trait — United States, 2010. Morbidity and Mortality Weekly Report. Published 2014. Accessed June 2, 2017.

  5. 5.

    Centers for Disease Control and Prevention. Data and Statistics | Sickle Cell Disease | NCBDDD | CDC. Sickle Cell Disease Homepage. Published 2016. Accessed June 8, 2017.

Download references


K.O.O. is supported by the Ben J. Lipps Research Fellowship Award of the American Society of Nephrology. A.S.A is supported by the American Heart Association Career Development Award 18CDA34110131. S.U.N. is supported by the National Center for Research Program Winter 2015 Fellow-to-Faculty Transition Award 15FTF25980003 from the American Heart Association and by the KL2/Catalyst Medical Research Investigator Training award TR001100 (an appointed KL2 award) from Harvard Catalyst, the Harvard Clinical and Translational Science Center (National Center for Research Resources, and the National Center for Advancing Translational Sciences, National Institutes of Health). S.K. is supported by National Institutes of Health award K23 DK 106479.

Author information



Corresponding author

Correspondence to Kabir O. Olaniran MD, MPH.

Ethics declarations

This study was approved by the Institutional Review Board at Partners Healthcare, Boston Massachusetts and the need for informed consent was waived.

Conflict of Interest

The authors have no conflicts of interest to declare.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Olaniran, K.O., Seethapathy, H., Zhao, S.H. et al. Validity of International Classification of Diseases Codes for Sickle Cell Trait and Sickle Cell Disease. J GEN INTERN MED 35, 1323–1324 (2020).

Download citation


  • sickle cell trait
  • sickle cell disease
  • ICD
  • validity