Introduction

Primary aldosteronism (PA) is the most common form of secondary hypertension with an estimated prevalence between 5 and 20% depending on the severity of hypertension [1,2,3,4]. PA leads to morbidity and mortality through the effects of hypertension and aldosteronism itself on critical organs [5,6,7,8]. Therefore, the ultimate goal of treatment is resolution of both. Bilateral adrenal hyperplasia is treated medically while patients with an unilateral aldosterone-producing adenoma (APA) are preferably treated by unilateral adrenalectomy [9,10,11,12].

Cure of aldosteronism is reported in the majority of patients after adrenalectomy for APA [13,14,15]. However, resolution of hypertension, also called cure of hypertension (i.e., a normotensive patient without antihypertensive medications), is far from a certainty. In the past, resolution rates were estimated around 50% [13, 14, 16]. However, recently Williams et al. [15] showed less optimistic results by presenting a 37% resolution rate within a large, international and well-executed study. Moreover, recently our own study group also published on blood pressure-related outcomes after surgery for PA and we presented an even lower resolution rate of 27–30% [17, 18]. This stresses the importance of adequate patient counseling and expectation management before performing an operation. To do this, clinicians need a user-friendly and reliable prediction model.

In 2008, Zarnegar et al. [19] proposed the Aldosteronoma Resolution Score (ARS) as a practical prediction model for resolution of hypertension. The model is very easy to use because it only includes four dichotomous preoperative patient/disease characteristics associated with a high probability of resolution of hypertension: taking ≤ 2 number of antihypertensive medications (AHTN) (2 points), body mass index (BMI) ≤ 25 kg/m2 (1 point), duration of hypertension ≤ 6 years (1 point) and female sex (1 point). Based on the combined scores, three likelihood ratios for resolution of hypertension were identified: low (0–1), medium (2–3) and high (4–5) with corresponding likelihoods of resolution of 28%, 46% and 75%, respectively. The area under the curve (AUC) was 0.913 [19].

In the past, validation of the ARS showed contradicting results between studies and was frequently performed within small and single country or single-center study populations. In addition, these studies often included patients treated over several decades due to the low incidence of disease. Furthermore, the ARS was developed over a decade ago and, because of the improvement of diagnostic modalities and guidelines, patient care has made substantial progress over the years. This underscores the need to evaluate the clinical applicability and usefulness of the ARS in the current clinical APA population, especially because the performance of a prediction model may change over time [20,21,22]. In addition, since the prediction model was developed within the USA, the ARS is likely to have lower predictive value outside of the USA which questions the generalizability of the ARS worldwide. Therefore, we aimed to be the first to validate the ARS in current clinical practice and expand this geographically in a worldwide cohort of patients who had adrenalectomy between 2010 and 2016.

Methods

Patients and data collection

We performed a retrospective cohort study across 16 medical centers in the USA, Europe (EU), Canada (CA) and Australia (AU) (Fig. 1). Derivation of this cohort has been described before [17]. In brief, all patients who underwent unilateral adrenalectomy between 2010 and 2016 for APA were included. Because we aimed to make our study representative for current real-life clinical practice no strict inclusion or exclusion criteria were used regarding screening, case confirmation or subtype testing. Laterality of disease was based on computerized tomography (CT) and/or magnetic resonance imaging (MRI) and/or adrenal venous sampling (AVS). In general, biochemical evidence for PA was based on an elevated aldosterone-to-renin ratio (ARR) indicating PA. Patients with missing preoperative or follow-up data regarding systolic blood pressure (SBP), diastolic blood pressure (DBP) or corresponding number of AHTN were excluded (Fig. 1). Institutional review board approval was obtained in all participating centers.

Fig. 1
figure 1

Flowchart of included patients

Definitions and outcomes

Resolution of hypertension was defined as a postoperative normotensive patient (i.e., SBP < 140 mmHg and DBP < 90 mmHg) without antihypertensive medications. Office blood pressure measurements were performed during outpatient visitation. Number of AHTN was defined as the number of different antihypertensive medications used. The defined daily dose (DDD) was calculated with the World Health Organization Anatomical Therapeutic Chemical/DDD Index 2017 (see https://www.whocc.no/atc_ddd_index/). When a medication stop was performed due to laboratory measurements, for example prior to the ARR, the number of AHTN, DDD and corresponding blood pressure before discontinuation were used. Biochemical data were classified as elevated/suppressed when values were above/below the local reference range. Hypokalemia was defined as either a potassium level below the local reference ranges or the use of potassium supplementation. The predictive accuracy of the ARS was reported as the proportion of patients with resolution for every ARS subgroup. Geographic validation was performed after division of the cohort into four geographic regions: USA, EU, CA and AU [20,21,22]. The goal was to assess resolution of hypertension at follow-up closest to 6 months after adrenalectomy.

Statistical analysis

The Chi-Square test and Fisher’s exact test were used to analyze group differences for categorical variables. For comparisons of continuous variables between multiple groups, one-way ANOVA was used for normally distributed data and Kruskal–Wallis Test for not normally distributed data. A p value of < 0.05 was considered statistically significant. Multiple variables used as predictors in the ARS had missing values. To be able to calculate the ARS in all patients, these variables were imputed using multiple imputation with 20 imputed datasets [23]. The duration of hypertension and BMI was missing in 16% and 8% of patients, respectively. Gender and number of AHTN were known in all patients (Table 1). The primary endpoint of this study (i.e., resolution of hypertension) was known in all patients. Pooled negative predictive values (NPV), positive predictive values (PPV) and AUCs of the ARS for resolution were calculated. Statistical analysis was performed using SPSS version 25.0 (IBM Corp, New York, USA), and figures were constructed using Graphpad Prism version 7.02 (GraphPad Software Inc, California, USA).

Table 1 Baseline characteristics of the complete cohort and stratified by region

Results

Baseline characteristics are shown in Table 1. Five hundred fourteen patients underwent adrenalectomy and 435 (85%) patients were eligible for analysis. Two hundred forty-eight (57%), 106 (24%), 42 (10%) and 39 (9%) patients were included from USA, EU, CA and AU, respectively. Patients within the USA had a BMI of 30.4 ± 6.7, which was significantly higher compared to patients from the EU, CA or AU. The other predictors used within the ARS were comparable between the different regions. Furthermore, CT and AVS were performed in 88% and 64% of patients and the use of these modalities was comparable between the regions. A confirmatory test was more frequently performed within EU and AU compared to the USA and CA, 56% and 46% versus 27% and 17%, respectively. In 64% of patients, follow-up was performed approximately 6 months after surgery (range 3–9 months).

Resolution of hypertension was achieved in 118 (27%) patients within the total cohort and in 54 (22%), 32 (30%), 17 (40%) and 15 (38%) patients within USA, EU, CA and AU, respectively (p = 0.015). No differences in resolution rates were found between the centers within each of the four regions (Fig. 2). Patients with and without preoperative AVS achieved resolution of hypertension in 31% and 28%, respectively (p = 0.524). No significant differences were seen between patients with and without a confirmatory test (p = 0.232). The rates of resolution of hypertension were comparable between the four follow-up periods (p = 0.442) (Fig. 3) and between patients with < 1 month and 3–9 months follow-up (p = 0.400). Postoperative potassium and aldosterone were measured in 95% and 64% of patients, showing hypokalemia and hyperaldosteronism in 12% and 4%, respectively. Biochemical outcomes stratified per region are presented in supplement 1.

Fig. 2
figure 2

Rates of resolution of hypertension stratified by region and medical center

Fig. 3
figure 3

Rates of resolution of hypertension stratified by duration of follow-up

Validation of the ARS in current clinical practice

There were no significant differences in the dichotomous ARS variables between the geographic regions (Table 2). ARS 0, 1, 2, 3, 4 and 5 were observed in 25%, 19%, 20%, 20%, 10% and 6% of patients, respectively (Table 3). These scores were comparable between the four regions (p = 0.484). Within the complete cohort, assessment of the proportion of patients with resolution of HTN within each ARS showed a likelihood of 7% in case of ARS 0 and 84% in case of ARS 5. This corresponded to a NPV of 93% for ARS 0 and a PPV of 84% for ARS 5. The corresponding AUC was 0.751 (95% CI 0.699–0.802). When using the likelihood levels as proposed by Zarnegar et al. [19] ARS 0–1 (low), ARS 2–3 (medium) and ARS 4–5 (high) showed predictive accuracies of 11, 33 and 59%, respectively. The corresponding AUC for this categorical ARS was 0.718 (95% CI 0.664–0.772). Geographic validation showed a NPV of 96% for ARS 0 and a PPV of 79% for ARS 5 with a AUC of 0.782 (95% CI 0.714–0.851) within the USA. In EU, a NPV of 88%, PPV of 75% and AUC of 0.681 (95% CI 0.571–0.792) were observed. Furthermore, a NPV, PPV and AUC of 90%, 100% and 0.811 (95% CI 0.678–0.943) and 60%, 67% and 0.667 (95% CI 0.483–0.851) were found for CA and AU, respectively.

Table 2 Dichotomous variables used for the ARS stratified by region
Table 3 Geographic validation of the aldosteronoma resolution score

Discussion

This study validated the ARS within a worldwide cohort of patients which is representative for current clinical practice. Validation of the ARS within the complete cohort showed a moderate to good AUC of 0.751. Furthermore, the AUC was 0.782 within current US APA population. Although this prognostic accuracy was lower compared to the original data presented by Zarnegar et al. (AUC 0.913), it could still be considered as moderate to good prognostic performance [19]. Further geographic validation of the ARS displayed a comparable prognostic value within CA (AUC 0.811), but lower prognostic performance within EU (AUC 0.681) and AU (AUC 0.667) potentially indicating limited generalizability of the ARS outside the North American population.

The ARS, as introduced in 2008 by Zarnegar et al., is a user-friendly model to predict the likelihood of resolution of hypertension after adrenalectomy for PA [19]. Because the ARS was developed in the USA within a single-center cohort of 100 patients over a decade ago, it is essential to confirm that the model also predicts well in, and thus is generalizable to, APA patients which were treated within other institutions or in different clinical settings and diagnostic protocols [20,21,22]. This underscores the need for the evaluation of clinical applicability and usefulness of the ARS in the current clinical APA population, especially because the performance of a prediction model may change over time [20,21,22]. In the past, validation of the prediction model by others showed contradicting results; however, these studies were single center or country and frequently had small sample size [24,25,26]. Therefore, we chose to perform validation of the ARS within our large and worldwide cohort of patients, which at this time is the best available population to truly evaluate the generalizability in current real-life clinical practice. The results showed a lower, but still moderate to good, predictive performance of the ARS within the USA (AUC 0.782) compared to the development dataset AUC 0.913 [19]. Usually, this is expected because prediction models are likely to show optimistic results within the development dataset, because all development techniques are prone to produce “overfitted” models, especially when small datasets (with limited numbers of outcomes) are used [20,21,22]. In line, performance is often poorer in validation studies because of differences in case mix and domains. Because our study contained almost 250 patients from the USA from seven different medical centers, we believe this study shows a good generalizability of the ARS within the USA. Furthermore, results also showed a decent performance within CA (AUC 0.811). Therefore, these results indicate that the ARS could be an easy to use tool for clinicians from North America to use during patient counseling. Nevertheless, results showed a lower predictive performance of the ARS within the EU (AUC 0.681) and AU (AUC 0.667) demonstrating the potential limited transportability of the model to other countries or continents worldwide. Although this is potentially due to differences in case mix and baseline characteristics, our results surprisingly showed no clear differences within the four predictors used for the ARS or individual ARSs between the four regions. For instance, although patients from the USA had significant higher BMI compared to the other regions, this did not result in a lower proportion of patients with BMI ≤ 25 kg/m2 or more patients with a low ARS (Table 2).

We observed resolution of HTN in 27% of patients which is lower compared to the 42%, 50% and 52% presented in reviews or meta-analyses and the 37% presented within another worldwide study by Williams et al. [13,14,15,16]. Most likely, this difference is multifactorial. For instance, these earlier studies included patients treated over several decades ago and therefore the lower rates of resolution could be influenced by the worldwide increase in obesity and background/not PA-related hypertension over the years [27,28,29]. Furthermore, because we meant our results to be representative for current clinical practice the preoperative workup, including screening, case confirmation and subtype testing, was not as stringent as in other studies. Potentially this led to less favorable outcomes compared to studies only including patients who, for instance, underwent AVS and thus represent more selected study populations. Although our results showed no difference in resolution rates between patients with and without preoperative AVS, we cannot rule out that AVS truly does not improve outcomes because our cohort and study design are subject to confounding by indication. Further blood pressure-related outcomes and the potential benefits of surgery for patients without resolution of hypertension (i.e., reduction in blood pressure and antihypertensive medications) within this cohort were described in detail before [17, 18].

When comparing rates of resolution between the four regions, results showed a significantly lower resolution rate within the USA (22%) compared to EU (30%), CA (40%) and AU (38%). Besides a significantly higher mean BMI within the USA, another potential influence on the lower resolution rate could be the difference in preoperative workup. While CT and AVS were performed just as often within the four regions, a confirmatory test was performed in only 27% of patients within the USA which is lower compared to EU and AU. Furthermore, due to geographic distances within the USA, the period of follow-up was frequently shorter. Although we found no significant differences in resolution rates between patients that did or did not undergo confirmatory testing and between the different follow-up periods, we cannot exclude that this has influenced the outcomes.

Similar to most studies regarding PA, the need for a retrospective design, due to the low prevalence of PA, is one of the weaknesses of our study. This made it impossible to use standardized measurement procedures for clinical outcomes such as blood pressure measurements. Although the duration of follow-up had no significant influence on resolution of hypertension rates, the short period of follow-up in a substantial number of patients could also be a potential weakness of this study. Also, the limited number of participating medical centers from CA and AU, resulting in relatively wide confidence intervals of the AUC, should be taken into account.

As presented in earlier studies, the distribution of resolution rates might differ across countries or continents, which also was the case in our study [13,14,15,16]. In line, predictors for a certain outcome might differ between geographic populations and the effect or magnitude of predictors might change over time. Although dichotomous variables, as used within the ARS, simplify the use of prediction models in daily clinical practice, much information within the data is lost. This was best illustrated by the significant higher mean BMI within the USA, compared to the other three geographic regions, which did not lead to fewer patients with BMI ≤ 25 kg/m2 and a lower ARS. Moreover, the cutoffs for dichotomized variables are often driven by the data, hampering the generalizability of prediction models [20,21,22]. Therefore, in future studies, a prediction model ideally should include continuous instead of dichotomous variables. Moreover, in a world of rising technology and easy access to electronic devices and web-based applications, a prediction model containing continues variables could be user-friendly as well.

Conclusion

The ARS is a user-friendly prediction model for clinicians during patient counseling with a moderate to good predictive performance within current clinical practice. The model showed the highest predictive performance within North America, but potentially has less predictive performance in EU and AU indicating the potential limited generalizability outside of the North American APA population.