Abstract
Purpose
To externally validate four previously developed severity scores (i.e., CALL, CHOSEN, HA2T2 and ANDC) in patients with COVID-19 hospitalised in a tertiary care centre in Switzerland.
Methods
This observational analysis included adult patients with a real-time reverse-transcription polymerase chain reaction or rapid-antigen test confirmed severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) infection hospitalised consecutively at the Cantonal Hospital Aarau from February to December 2020. The primary endpoint was all-cause in-hospital mortality. The secondary endpoint was disease progression, defined as needing invasive ventilation, ICU admission or death.
Results
From 399 patients (mean age 66.6 years ± 13.4 SD, 68% males), we had complete data for calculating the CALL, CHOSEN, HA2T2 and ANDC scores in 297, 380, 151 and 124 cases, respectively. Odds ratios for all four scores showed significant associations with mortality. The discriminative power of the HA2T2 score was higher compared to CALL, CHOSEN and ANDC scores [area under the curve (AUC) 0.78 vs. 0.65, 0.69 and 0.66, respectively]. Negative predictive values (NPV) for mortality were high, particularly for the CALL score (≥ 6 points: 100%, ≥ 9 points: 95%). For disease progression, discriminative power was lower, with the CHOSEN score showing the best performance (AUC 0.66).
Conclusion
In this external validation study, the four analysed scores had a lower performance compared to the original cohorts regarding prediction of mortality and disease progression. However, all scores were significantly associated with mortality and the NPV of the CALL and CHOSEN scores in particular allowed reliable identification of patients at low risk, making them suitable for outpatient management.
Similar content being viewed by others
Background
The coronavirus disease 2019 (COVID-19) pandemic, with its overwhelming resource use, has been a major challenge for clinicians and health care institutions worldwide. Identifying patients at high risk of disease progression may help allocating resources more efficiently. Since presentation and course of the infection can vary considerably (including asymptomatic cases), no single trait is sufficient to appropriately categorise patients [1,2,3,4,5,6,7,8,9]. Thus, several scores have attempted to improve identification of patients at high risk of progression or death of COVID-19. Among these scores, the CALL, CHOSEN, HA2T2 and the ANDC score have generated much interest [10,11,12,13].
The CALL score (Comorbidity, Age, Lactate dehydrogenase (LDH) and Lymphocyte count) showed great discriminatory potential for disease progression with an area under the curve (AUC) of 0.91 (95%-CI 0.86–0.94) in its derivation cohort [10]. Disease progression was defined as respiratory rate ≥ 30 breaths per minute (bpm), peripheral oxygen saturation (SpO2) ≤ 93%, arterial partial oxygen pressure (PaO2)/fraction of inspired oxygen (FiO2) ≤ 300 mmHg, mechanical ventilation or worsening of lung computer tomography (CT) findings [10]. The CHOSEN score used age, FiO2 and albumin to predict progression defined as requiring supplemental oxygen, admission to the intensive care unit (ICU) or death [11]. The authors reported a good discriminative capacity for their score with an AUC of 0.89 (95%-CI 0.87–0.91) in their derivation and 0.87 (95%-CI 0.81–0.93) in their validation cohort [11]. The HA2T2 score was used to predict all-cause in-hospital mortality in COVID-19 patients based on need for supplemental oxygen, age and troponin [12]. It showed good discriminative power in both their derivation (AUC 0.83, 95%-CI 0.79–0.88) and their validation cohort (AUC 0.78, 95%-CI 0.72–0.84) [12]. The ANDC score, based on age, neutrophil-to-lymphocyte ratio (NLR), d-dimer and C-reactive protein (CRP), predicted all-cause in-hospital mortality with an excellent AUC of 0.92 (95%-CI 0.84–0.97) in their derivation and 0.98 (95%-CI 0.95–1.00) in their validation cohort [13].
So far, only the CALL score has undergone external validation, with the score performing markedly worse than in the original cohort (AUC 0.62 vs. 0.91) [14]. Thus, before wide-spread implementation, independent external validation of all these scores is mandatory. Herein, we validated four severity scores (i.e., the CALL, CHOSEN, HA2T2 and ANDC scores) in patients with COVID-19 hospitalised in a tertiary care centre in Switzerland.
Methods
Study design and participants
This retrospective observational analysis included all consecutive adult patients (≥ 18 years) with a confirmed Severe Acute Respiratory Syndrome Corona Virus type 2 (SARS-CoV-2) infection that required hospitalisation for at least 24 h at the Medical University Clinic of the Cantonal Hospital Aarau (Switzerland) between February 26, 2020 and April 30, 2020 (first wave) and between October 1, 2020 and December 31, 2020 (second wave). In this tertiary care centre with 130 medical ward beds, indications for in-hospital treatment of COVID-19 were respiratory distress with need for oxygen supplementation, high fever or relevant clinical deterioration. This study was approved by the local ethics committee (EKZN, 2020-01306).
Detailed description of the study methodology has been reported previously [6, 15]. A confirmed SARS-CoV-2 infection was defined as a combination of typical clinical symptoms (e.g., respiratory symptoms with or without fever, and/or pulmonary infiltrates and/or anosmia/dysgeusia) and a positive real-time reverse-transcription polymerase chain reaction (RT-PCR) test, obtained from nasopharyngeal swabs or lower respiratory tract samples, according to guidance by the World Health Organization (WHO) [16, 17]. Data for the second wave also included patients with positive rapid-antigen tests. However, due to their lower positive predictive value, we excluded asymptomatic patients unless their rapid-antigen results were confirmed by a positive RT-PCR test. We further excluded patients from the analysis if they did not provide general informed consent or if they had not yet been discharged when data collection was closed (January 20, 2021). This study adheres to the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement for reporting of prediction models.
Data collection
All analysed data were collected as part of the clinical routine during the hospitalisation (from admission to discharge or death). We performed chart reviews and automatic export from electronic health records (EHR), including vital signs and clinical characteristics upon admission as well as sociodemographic factors, comorbidities based on pre-existing diagnoses and home medication. COVID-19-specific inpatient medication was assessed until hospital discharge or death and exported from the EHR. Experimental treatment was offered to all suitable patients according to ongoing clinical trials and WHO guidelines [16,17,18]. During the second wave, this also included the application of high-dose glucocorticoids [19]. The age-adjusted Charlson comorbidity index (ACCI) [20] and the Clinical Frailty Scale score (CFS) [21] were calculated for all patients as part of the clinical routine or through chart review. Laboratory values were available according to clinical routine and derived from the first blood draw obtained within 7 days from admission.
Definition of endpoints
All-cause in-hospital mortality was defined as the primary endpoint. The secondary endpoint, disease progression, had different definitions in the original studies. For easier comparability between the scores, we defined disease progression as needing invasive ventilation, ICU admission or death in our own analysis. Originally, the CALL score defined progression as respiratory rate ≥ 30 bpm, SpO2 ≤ 93%, PaO2/FiO2 ≤ 300 mmHg, requiring mechanical ventilation or worsening of lung CT findings. CT findings were not available for our analysis and thus not considered. The definition of progression for the CHOSEN score was requirement of supplemental oxygen, admission to the ICU or death. Validation results were based on these original definitions.
Statistical analysis
Discrete variables are expressed as frequency (percentage) and continuous variables as medians with interquartile ranges (IQR, for skewed data) or mean with standard deviation (SD, for normally distributed data). We used the Wilcoxon rank-sum test to compare continuous variables and the Pearson's chi-squared test to compare categorical or binary variables. Odds ratios (OR) were calculated with corresponding 95% confidence intervals (CI) as measures of association. We assessed calibration for mortality numerically by tabulating the observed risks against those reported in the original studies. These were not available for the CALL and CHOSEN scores. We considered a two-sided p-value of < 0.05 significant and calculated the unadjusted area under the receiver operating characteristic curve (AUC) as a measure of discrimination. Statistical analysis was performed as a complete-case-analysis based on the original regression coefficients using Stata 15.1 (StataCorp, College Station, TX, USA).
Results
Figure 1 provides an overview of the study flow and Table 1 shows overall patient demographics, comorbidities, laboratory values and vital signs on admission as well as stratified according to the individual score cohorts. In total, 399 patients hospitalised with a confirmed SARS-CoV-2 infection were included in this analysis (mean age 66.6 years ± 13.4 SD, 68% male). Complete data sets to allow for the calculation of the CALL and CHOSEN score were available in 297 and 380 patients, respectively. Fewer patients had all values necessary to calculate the HA2T2 (n = 151) and ANDC score (n = 124). There were several noticeable differences between the score cohorts, for example, transfer rates from other hospitals (range from 14.5% for ANDC to 28.5% for HA2T2), supplemental oxygen (29.8% for CALL to 45.7% for HA2T2), obesity (30.8% for CHOSEN to 41.7% for ANDC) and ICU admission (19.5% for CHOSEN to 46.4% for HA2T2). However, overall comorbidity and frailty were similar.
Table 2 shows the discriminative power of each score for mortality and disease progression (defined as requiring invasive ventilation, ICU admission or death for all scores for easier comparability). For mortality, the HA2T2 performed best (AUC 0.78, 95%-CI 0.70–0.85). For progression, overall discriminative capacity was lower, with the CHOSEN score performing slightly better than the others (AUC 0.66, 95%-CI 0.72–0.60). All scores were associated with mortality.
Sensitivity and specificity as well as positive and negative predictive value for each proposed cut-off are summarised in Table 3 and visualised in Fig. 2. The negative predictive value of the CALL score was highest (≥ 6 points: 100%, 95%-CI 75.3–100), while the highest positive predictive value was found for the HA2T2 score (≥ 3 points: 58.6%, 95%-CI 38.9–76.5).
The direct comparison with the original outcomes can be found in Table 4. Only the HA2T2 score performed similarly with an AUC of 0.78 (95%-CI 0.72–0.84) in the original validation cohort and an AUC of 0.78 (95%-CI 0.70–0.85) in our sample. The discriminative power for all other scores was markedly worse in comparison with their respective original cohorts. These results persisted when performed in the cohort with full data sets for all scores (n = 67, data not shown).
The calibration assessment for mortality for the HA2T2 and ANDC scores can be found in the additional files 1 and 2 (Tables S1 and S2). Overall, calibration was poor, with the ANDC score performing slightly better (overprediction up to 18 percentage points) than the HA2T2 score (underprediction up to 30 percentage points). Calibration for the CALL and CHOSEN scores were not possible due to lacking published data.
Discussion
In this validation study, four currently available scores to predict mortality and disease progression in COVID-19 patients performed markedly worse in patients hospitalised at a Swiss tertiary care centre than in their original cohorts. The HA2T2 score showed the best discrimination for mortality (AUC 0.78, 95%-CI 0.70–0.85) and the only results similar to the derivation cohort.
Some loss of predictive ability can be explained by the differences between our study population and the original derivation cohorts. This is most apparent when comparing age, which has been recognised as an important risk factor for worse outcomes [22] and is included in all four scores. Mean age ranged from 44 to 65 years for the CALL, CHOSEN, HA2T2 and ANDC scores in the original publications whereas the mean age in our population was 67 years. However, even when comparing the scores among the 67 patients who had all parameters required for all scores, the HA2T2 score showed the best discriminative power (data not shown). Apart from the small sample size, further limitations in this comparison arise from the fact that the study populations were also different in their origins. The CALL and ANDC scores were based on Chinese patients while the CHOSEN and the HA2T2 score were derived in US American patients. Interestingly, the other currently available external validations of the CALL score in Italian and Turkish patients resulted in AUCs that were very similar to our own (original AUC for disease progression 0.91 vs. Italian AUC 0.62, Turkish AUC 0.59, our AUC 0.61) [14, 23]. Hence, it seems that compatibility and comparability of these scores for different populations cannot be assumed.
Further difficulties are rooted in the novelty of COVID-19. Much is still unknown about the disease including which factors best predict progression or mortality. This is reflected in the very different factors included in the scores. Still, these more recent approaches are already an improvement to initial scores which included up to 12 different items, making them difficult to use in a clinical setting [24]. However, in a busy environment such as the emergency department, ease of use is crucial. The scores discussed here all use no more than four variables that are relatively readily available in middle- to high-income countries. There also exists a simplified version of the CHOSEN score that does not rely on laboratory values but did also not perform as well in the original cohort [11].
All scores were significantly associated with mortality and their respective discriminative capacities were moderate to good but calibration was poor due to considerable population differences. Furthermore, the negative predictive value of the CALL score was particularly high and could thus help identify patients who are not at risk. The CHOSEN score, whose explicit aim was to differentiate between patients who needed hospitalisation and those who could be sent home safely, also had a high negative predictive value and, in addition, showed a relatively balanced relation between sensitivity and specificity, making it a potentially valuable tool for risk stratification. Since we did not include outpatients in our study, our results are likely to underestimate the true value of the CHOSEN score.
Limitations
There are certain limitations to our study. First, our findings are limited to hospitalised patients in a single centre in Switzerland, limiting generalisability. In addition, baseline parameters of our population were markedly different from the original study populations including ethnicity and important predictors such as age. Unfortunately, regression coefficients could not be updated based on the available data. Similarly, we could not calculate calibration for the CALL and CHOSEN score. Internal validity is also limited due to the retrospective design, which meant that a considerable proportion of patients had to be excluded from certain score cohorts because the required data were missing. Additional validation analyses should be conducted in larger data sets. Furthermore, troponin and d-dimer values (required for the HA2T2 and ANDC scores, respectively) were usually available for sicker patients who reached the primary and secondary endpoints more often, which not only limited study population sizes but also comparability between scores. Finally, we had to exclude four patients due to missing outcome data, thus increasing the risk for selection bias.
Conclusions
In our independent validation, the four analysed scores performed worse than in their original cohorts regarding prediction of mortality and disease progression. However, all scores were significantly associated with mortality. While the HA2T2 score identified high risk patients, the negative predictive values of the CALL and CHOSEN scores allowed reliable identification of patients at low risk, which may make them suitable for outpatient management.
Data availability
The datasets used during the current study are available from the corresponding author on reasonable request.
References
Guan WJ, Ni ZY, Hu Y, Liang WH, Ou CQ, He JX, Liu L, Shan H, Lei CL, Hui DSC, et al. Clinical characteristics of coronavirus disease 2019 in China. N Engl J Med. 2020;382:1708–20.
Richardson S, Hirsch JS, Narasimhan M, Crawford JM, McGinn T, Davidson KW, The Northwell C-RC, Barnaby DP, Becker LB, Chelico JD, et al. Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City Area. JAMA. 2020;323:2052–9.
Haase N, Plovsing R, Christensen S, Poulsen LM, Brochner AC, Rasmussen BS, Helleberg M, Jensen JUS, Andersen LPK, Siegel H, et al. Characteristics, interventions, and longer term outcomes of COVID-19 ICU patients in Denmark-A nationwide, observational study. Acta Anaesthesiol Scand. 2021;65:68–75.
Allameh SF, Nemati S, Ghalehtaki R, Mohammadnejad E, Aghili SM, Khajavirad N, Beigmohammadi MT, Salehi M, Mirfazaelian H, Edalatifard M, et al. Clinical characteristics and outcomes of 905 COVID-19 patients admitted to imam khomeini hospital complex in the capital city of Tehran, Iran. Arch Iran Med. 2020;23:766–75.
Grasselli G, Zangrillo A, Zanella A, Antonelli M, Cabrini L, Castelli A, Cereda D, Coluccello A, Foti G, Fumagalli R, et al. Baseline characteristics and outcomes of 1591 patients infected with SARS-CoV-2 admitted to ICUs of the Lombardy Region, Italy. JAMA. 2020;323:1574–81.
Gregoriano C, Koch D, Haubitz S, Conen A, Fux CA, Mueller B, Bernasconi L, Hammerer-Lercher A, Oberle M, Burgermeister S, et al. Characteristics, predictors and outcomes among 99 patients hospitalised with COVID-19 in a tertiary care centre in Switzerland: an observational analysis. Swiss Med Wkly. 2020;150:w20316.
Moon SS, Lee K, Park J, Yun S, Lee YS, Lee DS. Clinical characteristics and mortality predictors of COVID-19 patients hospitalized at nationally-designated treatment hospitals. J Korean Med Sci. 2020;35: e328.
Thompson JV, Meghani NJ, Powell BM, Newell I, Craven R, Skilton G, Bagg LJ, Yaqoob I, Dixon MJ, Evans EJ, et al. Patient characteristics and predictors of mortality in 470 adults admitted to a district general hospital in England with Covid-19. Epidemiol Infect. 2020;148: e285.
Golozar A, Lai LY, Sena AG, Vizcaya D, Schilling LM, Huser V, Nyberg F, Duvall SL, Morales DR, Alshammari TM et al. Baseline phenotype and 30-day outcomes of people tested for COVID-19: an international network cohort including > 3.32 million people tested with real-time PCR and > 219,000 tested positive for SARS-CoV-2 in South Korea, Spain and the United States. medRxiv. 2020. https://doi.org/10.1101/2020.10.25.20218875
Ji D, Zhang D, Xu J, Chen Z, Yang T, Zhao P, Chen G, Cheng G, Wang Y, Bi J, et al. Prediction for progression risk in patients with COVID-19 pneumonia: the CALL score. Clin Infect Dis. 2020;71:1393–9.
Levine DM, Lipsitz SR, Co Z, Song W, Dykes PC, Samal L. Derivation of a clinical risk score to predict 14-day occurrence of hypoxia, ICU admission, and death among patients with coronavirus disease 2019. J Gen Intern Med. 2021;36:730–7.
Manocha KK, Kirzner J, Ying X, Yeo I, Peltzer B, Ang B, Li HA, Lerman BB, Safford MM, Goyal P, et al. Troponin and other biomarker levels and outcomes among patients hospitalized with COVID-19: derivation and validation of the HA2T2 COVID-19 mortality risk score. J Am Heart Assoc. 2021;10: e018477.
Weng Z, Chen Q, Li S, Li H, Zhang Q, Lu S, Wu L, Xiong L, Mi B, Liu D, et al. ANDC: an early warning score to predict mortality risk for patients with Coronavirus Disease 2019. J Transl Med. 2020;18:328.
Grifoni E, Valoriani A, Cei F, Vannucchi V, Moroni F, Pelagatti L, Tarquini R, Landini G, Masotti L. The CALL score for predicting outcomes in patients with COVID-19. Clin Infect Dis. 2021;72:182–3.
Wolfisberg S, Gregoriano C, Struja T, Kutz A, Koch D, Bernasconi L, Hammerer-Lercher A, Mohr C, Haubitz S, Conen A, et al. Comparison of characteristics, predictors and outcomes between the first and second COVID-19 waves in a tertiary care centre in Switzerland: an observational analysis. Swiss Med Wkly. 2021;151:20569.
Clinical management of severe acute respiratory infection when novel coronavirus (nCoV) infection is suspected: interim guidance. 2020. https://apps.who.int/iris/handle/10665/330854. Accessed 1 July 2021.
Clinical management of COVID-19: interim guidance. 2020. https://apps.who.int/iris/handle/10665/332196. Accessed 1 July 2021.
W. H. O. Solidarity Trial Consortium, Pan H, Peto R, Henao-Restrepo AM, Preziosi MP, Sathiyamoorthy V, Abdool Karim Q, Alejandria MM, Hernandez Garcia C, Kieny MP, et al. Repurposed antiviral drugs for Covid-19—Interim WHO solidarity trial results. N Engl J Med. 2021;384:497–511.
Corticosteroids for COVID-19, Living Guidance 2. 2020. https://www.who.int/publications/i/item/WHO-2019-nCoV-Corticosteroids-2020.1. Accessed 1 July 2021.
Charlson M, Szatrowski TP, Peterson J, Gold J. Validation of a combined comorbidity index. J Clin Epidemiol. 1994;47:1245–51.
Juma S, Taabazuing MM, Montero-Odasso M. Clinical frailty scale in an acute medicine unit: a simple tool that predicts length of stay. Can Geriatr J. 2016;19:34–9.
Yadaw AS, Li YC, Bose S, Iyengar R, Bunyavanich S, Pandey G. Clinical features of COVID-19 mortality: development and validation of a clinical prediction model. Lancet Digit Health. 2020;2:e516–25.
Erturk Sengel B, Tukenmez Tigen E, Ilgin C, et al. Application of CALL score for prediction of progression risk in patients with COVID-19 at university hospital in Turkey. Int J Clin Pract. 2021;75: e14642.
Galloway JB, Norton S, Barker RD, Brookes A, Carey I, Clarke BD, Jina R, Reid C, Russell MD, Sneep R, et al. A clinical risk score to identify patients with COVID-19 at high risk of critical care admission or death: an observational cohort study. J Infect. 2020;81:282–8.
Acknowledgements
We thank all participating patients, their families and all healthcare workers at the Cantonal Hospital Aarau for their help and dedication to reduce the burden of the ongoing pandemic.
Funding
This study was funded by the Research Council KSA (Kantonsspital Aarau). The funding agency had no bearing on the study design, data collection and analysis or writing of the manuscript.
Author information
Authors and Affiliations
Contributions
PS and SW conceived of the study and its design. SW performed the statistical analysis and wrote the first draft of the paper. SW, CG, DK, LB, AH, CH, CM and SH collected and compiled the data. CG, PS, AK, CF, TS and BM critically revised the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Ethics approval and consent to participate
This study was approved by the local ethics committee (EKZN, 2020-01306).
Consent for publication
Not applicable.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Wolfisberg, S., Gregoriano, C., Struja, T. et al. Call, chosen, HA2T2, ANDC: validation of four severity scores in COVID-19 patients. Infection 50, 651–659 (2022). https://doi.org/10.1007/s15010-021-01728-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s15010-021-01728-0