Background

Hepatocellular carcinoma (HCC) is the fifth most common cancer and the second most common cause of cancer-related death worldwide [1]. Current HCC staging systems, like Barcelona Clinic Liver Cancer (BCLC) staging system, indicate that hepatectomy is a potentially curative treatment for patients with early-stage HCC [2]. However, postoperative recurrence is high, with 5-year rates reaching 70% [3, 4], suggesting that even in the same early-stage, patients have a diverse postoperative prognosis. Thus, the current staging systems still need improvement, for example, incorporating new risk factors for a better stratification of postoperative outcome. In fact, traditional staging systems mainly consist of pathological factors, like tumor size and vascular invasion, while tremendous information in preoperative computed tomography (CT) or magnetic resonance imaging (MRI) reflecting tissue intrinsic characters and heterogeneity [5,6,7,8] remains untapped. Recently, it has been reported that various imaging features were associated with pathological features and prognosis of the tumor and complementary to current staging systems, like rectal cancer and bladder cancer [9, 10]. As such, new prognostic factors, like those derived from CT and MRI images, to identify patients with high risk of postoperative recurrence and death are urgently needed, which could help to select patients who are more likely to benefit from surgery.

Radiomics, an emerging and promising field, hypothesizes that medical images, including CT and MRI, could provide vivid and crucial information on tumor [11]. By converting medical images into high-dimensional, mineable and quantitative features via high-throughput data extraction, radiomics method provides an unprecedented opportunity to improve decision-support in oncology at low cost and noninvasively. Currently, image examinations are routinely conducted for cancer patients, including HCC [12]. Compared to developing new molecular biomarkers, radiomics method may not require additional physical or molecular tests and thus not increase the economic burden of patients. In addition, previous studies have demonstrated that quantitative radiomics features were associated with clinical prognosis and underlying genomic patterns across a range of cancer types, such as non-small cell lung cancer [13] and advanced nasopharyngeal carcinoma [14].

In HCC, contrast-enhanced computed tomography (CECT) has been widely used in the diagnosis due to its high specificity and sensitivity [12]. Meanwhile, it had been reported that the characteristics of tumor CT images were associated with gene expression profiles, pathological features, and prognosis of HCC [11, 15,16,17]. As far as we are concerned, image features could be divided into semantic features and agnostic features. Semantic features are commonly used in the radiology lexicon to describe regions of interest, including internal arteries, hypodense halos and so on, while agnostic features, like texture features, attempt to capture lesion heterogeneity though quantitative descriptors [11, 14, 18, 19]. Previous studies preferred the clinical application of semantic features, as they were easy to acquire. Recently, growing concerns have been paid on the potential clinical application of agnostic features. For instance, Fu et al. investigated the prognostic significance of CT image texture features for advanced HCC patients receiving TACE (transarterial chemoembolization) [15]. Another study has suggested that texture analysis was promising for HCC patient stratification for determining the suitability of liver resection vs. TACE [11]. Furthermore, texture analysis has been reported for the potential for predicting postoperative hepatic insufficiency and assessing fibrosis [20]. However, the prognostic significance of radiomics feature has been rarely investigated in HCC patients receiving hepatectomy.

In this study, we aimed at developing a rad-score derived from the preoperative CECT of solitary HCC patients, based on the assumption that such rad-score may help to identify patients who were at high risk of postoperative recurrence and death and improve clinical decision making for solitary HCC patients.

Methods

Patient selection and data collection

Patient recruitment, as well as the inclusion and exclusion criteria, were presented in Additional file 1: Figure S1. A total of 319 patients were enrolled and randomly divided into a training cohort (n = 212) and validation cohort (n = 107). The pathological diagnoses on all cases were reviewed and confirmed independently by two expert pathologists.

Baseline clinicopathological data were derived from medical records. Tumor differentiation was graded by the Edmondson grading system [21]. Postoperative follow-up strategy and treatment strategy were according to a uniform guideline as we previously described [22, 23], and were listed in the Additional file 2. Ethical approval was obtained from the institutional review board of Zhongshan Hospital, and the informed consent requirement was waived. Time to recurrence (TTR) was defined as the interval between surgery and recurrence or the last observation for surviving patients without recurrence. Overall survival (OS) was defined as the interval between surgery and death or the last observation for surviving patients. The data were censored at the last follow-up for living patients.

Quantitative imaging characteristics

CT protocols and details of texture features are described in Additional file 2. Arterial phase CECT data were retrieved from the institution archive in dicom format and loaded to a personal laptop for further textural analysis. In this study, a total of 110 candidate radiomics features were generated from one image by using an in-house algorithm implemented in Matlab 2016a (MathWorks, Natick, MA, USA). For texture analysis, a region of interest (ROI) was delineated initially around the tumor outline of the largest cross-sectional area. Details of texture feature extraction are presented in Additional file 3: Figure S2.

Inter-observer and intra-observer reproducibility of radiomics feature extraction

Sixty images were randomly chosen for evaluating the inter-observer reproducibility of the radiomics feature. All these images were reviewed by two radiologists with 10 (reader 1) and 5 years (reader 2) experience in abdominal CT interpretation. To assess the intra-observer reproducibility, reader 1 repeated the generation of texture features twice in a 1-week period followed the same procedure.

A two-way random, single measure (absolute agreement) intraclass correlation coefficients (ICC) was used to assess the differences between the features generated by reader 1 (first time) and those by reader 2, as well as between the twice-generated features by reader 1. An ICC value below 0.40 was considered poor reliability, fair for values between 0.41 and 0.59, good for values between 0.60 and 0.74, and excellent for values between 0.75 and 1.00. This is a descriptive statistic can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly units in the same group resemble each other. Previously, it has been reported as a reliable method to evaluate the reproducibility of data [24, 25] and has been used in the radiomics research [26].

Feature selection and rad-score building

According to the Harrell’s guideline, the number of events should exceed the number of included covariates by at least 10 times in a multivariate analysis. Therefore, in our study, the least absolute shrinkage and selection operator (lasso) method combined with logistic regression [27], was used to select the most useful features in the training cohort. This method minimized a log partial likelihood subject to the sum of the absolute values of the parameters being bounded by a constant:

$$ \widehat{\upbeta}=\mathrm{argmin}\ \mathrm{\ell}\left(\upbeta \right),\mathrm{subject}\ \mathrm{to}\sum \left|{\upbeta}_{\mathrm{j}}\right|\le \mathrm{s} $$

where, β ̂ is the obtained parameters, l(β) is the log partial likelihood of the logistic regression model, s>0 is a constant.

As a benefit of the absolute constraint, the lasso method shrinks coefficients and changes some coefficients to zero [28]. Therefore, it can be used for the feature reduction and selection. In this study, the standardized constraint parameter s was set as 0.00013868 and lasso selected 6 nonzero coefficients (\( \widehat{\upbeta} \)). Then, the logistic regression model was obtained with its outcome being the hazard rate at the fifth year after operation for individuals. The R software and “glmnet” package (R foundation for Statistical computing, Vienna, Austria, URL: http://www.R-project.org, 2016) were used for the lasso logistics regression model analysis.

Statistical analysis

Statistical analyses were performed using SPSS software (20.0; SPSS, Inc., Chicago, IL, USA) and R software (R Foundation for Statistical Computing, Vienna, Austria) with the “rms” package (R Foundation for Statistical Computing, Vienna, Austria). Continuous variables were compared using the Mann-Whitney U, while category variables were compared using Chi-squared or Fisher’s exact tests. X-tile (Yale University, New Haven, CT, USA) software was used to determine the optimal cut-off value of the rad-score, which is a graphical method that illustrates the presence of substantial tumor subpopulations and shows the robustness of the relationship between a biomarker and outcome by construction of a two-dimensional projection of every possible subpopulation [29, 30]. Survival curves were depicted using Kaplan–Meier analysis (log-rank test). The Cox’s proportional hazards regression model was applied for univariate and multivariate analyses. “Rms” package was used to build nomogram models. The Harrell’s concordance index (C-index) and calibration curves were used to evaluate the nomogram models [31]. Details of nomogram models were listed in the Additional file 2. A two-sided value of p < 0.05 was considered statistically significant.

Results

Clinical characteristics of the patients

No significant differences in clinicopathological features were observed between the two cohorts (Table 1). All patients were solitary HCC and received R0 resection. The mean follow-up time in training and validation cohorts was 52.7 ± 21.6 months and 54.5 ± 22.1 months, respectively. Overall survival rates at 1, 3, and 5 years after operation was 87, 76 and 69% for training cohort and 88, 75 and 72% for validation cohort, respectively.

Table 1 Clinicopathological features of HCC patients in training and validation cohorts

Results of inter-observer and intra-observer reproducibility of radiomics feature extraction

Satisfactory inter- and intra-observer reproducibility of the texture feature extraction was achieved. The reproducibility of radiomics feature extraction was good between the two readers (ICC range: 0.71–0.95) or between reader 1’s first and second-extracted features (ICC range: 0.83–0.99). These results suggested that our radiomics feature values were highly reproducible.

Development of the rad-score and its association with clinicopathological features

Six features were selected out of 110 texture features by using the lasso-logistic selection of the basis of 212 patients in the training cohort (Additional file 4: Figure S3). The rad-score calculation formula consisting of these features was presented in Additional file 2. All the coefficients in the equation are from lasso-logistic regression. Determined by X-tile software, the optimal cut-off for rad-score was 4.32 (Rad-score range: Training cohort: 1.70–22.3; Validation cohort: 2.1–29.2). Accordingly, patients were divided into high (> 4.32) and low (≤ 4.32) groups.

Further investigation was performed to assess the association between the rad-score and clinicopathological features in the training cohort (Additional file 5: Table S1). Patients with low rad-score were positively associated with high preoperative alpha-fetoprotein (AFP) level (p < 0.001), larger tumor size (p < 0.001), presence of vascular invasion (p = 0.009), advanced TNM stage (p = 0.015) and BCLC stage (p = 0.020), suggesting that low rad-score may indicate tumor aggressiveness.

Low rad-score correlated with poor survival in solitary HCC patients

In the training cohort, low rad-score were significantly associated with shorter TTR (median TTR [95% confident interval (CI)] for low [n = 49] versus high rad-score [n = 163]: 38 [28.2–47.1] versus 53 [48.0–58.4] months; p = 0.005, Fig. 1a). In the validation cohort, no significance was observed in recurrence between the two groups with the p value of 0.054 (Fig. 1b), suggesting that the rad-score was slightly over-fitted to the training cohort. As for OS, low rad-score significantly correlated with shorter postoperative survival in both training cohort (median OS [95% CI] for low [n = 49] versus high rad-score [n = 163]: 54.9[45.4–64.5] versus 70.5 [66.6–74.5] months; p = 0.003, Fig. 1c) and validation cohort (median OS [95%CI] for low [n = 37] versus high [n = 70]: 50.9[38.5–63.3] versus 82.2 [75.6–88.8] months; p = 0.003, Fig. 1d).

Fig. 1
figure 1

Prognostic significance of rad-score for solitary HCC patients. a Training cohort: median TTR [95%CI] for low rad-score [n = 49] versus high rad-score [n = 163]: 38 [28.2–47.1] versus 53 [48.0–58.4] months; p = 0.005. b Validation cohort: p = 0.054. c Training cohort: median OS [95%CI] for low rad-score [n = 49] versus high rad-score [n = 163]: 54.9[45.4–64.5] versus 70.5 [66.6–74.5] months; p = 0.003. d Validation cohort: median OS [95%CI] for low rad-score [n = 37] versus high rad-score [n = 70]: 50.9[38.5–63.3] versus 82.2 [75.6–88.8] months; p = 0.003

Multivariate analyses suggested that rad-score was an independent prognostic factor of recurrence in the training cohort (Hazard ratio (HR): 2.472, 95%CI: 1.339–4.564, p = 0.004, Table 2). As for OS, the rad-score (HR: 1.558, 95%CI: 1.022–2.375, p = 0.039, Table 3) was also identified as an independent prognostic factor in the training cohort. Similar results were observed in the validation cohort (recurrence: HR: 1.890, 95%CI: 1.04–3.436, p = 0.036, Table 2; survival: HR: 3.236, 95%CI: 1.416–7.407, p = 0.005, Table 3).

Table 2 Uni-and Multivariate analyses of predictors of postoperative recurrence in training and validation cohorts
Table 3 Uni-and Multivariate analyses of predictors of postoperative survival in training and validation cohorts

All these results demonstrated that rad-score was an independent prognostic factor of postoperative recurrence and survival for solitary HCC patients. Patients with low rad-score have a higher recurrence rate and poorer survival.

The performance of rad-score based prognostic nomograms

Based on the results of multivariate analysis, rad-score based nomogram predicting postoperative recurrence (Fig. 2a) of solitary HCC patients was established. In the nomogram model, each factor was ascribed a weighted point that implied a risk of recurrence or survival. For example, low rad-score was ascribed 20 points (on a scale of 0–100 points) in nomogram for postoperative survival. Each patient with a high total score had a worse prognosis, namely higher risk of recurrence or death. C-index was used to evaluate the predictive accuracy (discrimination) of the rad-score based nomograms, which was 0.639 (95% CI: 0.577–0.701, Table 4) for the nomogram of recurrence and 0.714 (95% CI: 0.635–0.793, Table 4) for the nomogram of survival in the training cohort. In the validation cohort, the C-index was 0.587(95% CI: 0.479–0.695, Table 4) for nomogram of recurrence, and the C-index was 0.71 (95% CI: 0.602–0.808, Table 4) for nomogram of survival. In addition, 50-sample bootstrapped calibration plots revealed the good predictive accuracy of the nomogram for the prediction of 3- (Fig. 2b, c) and 5- (Fig. 2d, e) year recurrence rate in the training and validation cohorts.

Fig. 2
figure 2

Development of rad-score based nomograms and calibration curves of the rad-score based nomogram for recurrence in both training and validation cohorts. a The prognostic nomogram for recurrence. b Calibration curves for 3 years TTR in the training cohort. c Calibration curves for 3 years TTR in the validation cohort. d Calibration curves for 5 years TTR in the training cohort. e Calibration curves for 5 years TTR in the validation cohort

Table 4 C-indices of rad-score based nomograms, clinicopathological nomograms and traditional staging systems

Similarly, rad-score based nomogram prediction postoperative survival of solitary HCC patients was developed (Fig. 3a). Good predictive accuracy of 3- (Fig. 3b, c) and 5-(Fig. 3d, e) year survival rate was also observed in both training and validation cohorts.

Fig. 3
figure 3

Development of rad-score based nomograms and calibration curves of the rad-score based nomogram for OS in both training and validation cohorts. a The prognostic nomogram for postoperative survival. b Calibration curves for 3 years OS in the training cohort. c Calibration curves for 3 years OS in the validation cohort. d Calibration curves for 5 years OS in the training cohort. e Calibration curves for 5 years OS in the validation cohort

Indeed, the Hosmer-Lemeshow test yielded no significant difference between the predictive calibration curve and the ideal curve for postoperative recurrence and survival prediction in both training and validation datasets. These results indicated that two nomograms could predict postoperative recurrence and survival effectively.

Comparison between the rad-score based nomograms and traditional staging systems

Previously, several traditional staging systems have been proposed for patients with HCC, including 7th edition of the American Joint Committee on Cancer (AJCC) TNM staging criteria, BCLC staging system [32], Japan Integrated Staging (JIS) [33] score and Hong Kong Liver Cancer (HKLC) staging score [34]. In the training cohort, the C-index of these staging systems in predicting postoperative survival was 0.575 (95% CI: 0.515–0.635) for AJCC staging system, 0.574(95% CI: 0.511–0.637) for BCLC staging system, 0.601(95% CI: 0.533–0.669) for JIS staging system and 0.628(95% CI: 0.548–0.708) for HKLC staging system, respectively (Table 4). When being compared to C-indices of our new nomogram including the rad-score, the C-indices of these staging systems were significantly lower in both training and validation cohorts. As for recurrence, the C-index of four staging systems was 0.552 (95% CI: 0.513–0.581) for AJCC TNM staging system, 0.547 (95% CI: 0.506–0.588) for BCLC staging system, 0.554 (95% CI: 0.508–0.600) for JIS staging system and 0.575 (95% CI: 0.529–0.631) for HKLC staging system, respectively, significantly lower than the C-index of our nomogram including the rad-score in both training and validation cohorts (Table 4). All these results suggested that our rad-score based nomograms had a better discrimination performance than traditional staging system for solitary HCC patients.

Assessment of incremental value of rad-score

To investigate the incremental value of rad-score in individual postoperative recurrence and survival prediction, we compared the discrimination performance of clinicopathological nomograms and rad-score based nomograms. The clinicopathological nomograms were established based on independent clinicopathological risk factors, with the C-index of 0.633 (95% CI: 0.571–0.695) for recurrence and 0.554 (95% CI: 0.485–0.623) for postoperative survival in the training cohort. The discrimination performance of the nomogram improved when the rad-score was integrated (recurrence: C-index, 0.639, 95%CI: 0.577–0.701; survival: C-index, 0.714, 95%CI: 0.635–0.793), significantly higher than the discrimination performance of clinicopathological nomogram in the training cohort (Table 4). In the validation cohort, similar results were observed for postoperative survival. The C-index of clinicopathological nomogram was 0.642 (95%CI: 0.532–0.752), while the C-index (0.710, 95%CI: 0.602–0.818) improved after incorporating the rad-score into nomogram (Table 4). These results suggested that the rad-score was a good complementary to clinicopathological factors in individual postoperative recurrence and survival prediction.

The similar analysis was performed for traditional staging systems. An improvement in evaluating postoperative recurrence and survival was observed after combining the rad-score with the TNM staging system and BCLC staging system (Table 4). Hence, the rad-score is complementary to the TNM and BCLC staging system, demonstrating the valuable prognostic role of rad-score.

Discussion

In this study, a multi-CT-texture feature based rad-score was proposed, which successfully stratified patients into groups with significant differences in TTR and OS, and may be complementary to traditional staging systems.

Radiomics, a promising field of oncological research, assume that image features could predict the prognosis of patients, as they are associated with tumor biological characteristics [11, 35]. Previous studies have supported this hypothesis [17, 36]. For instance, Banerjee et al. proposed an image features of venous invasion, consisting of three semantic features (internal arteries, hypodense halo, and tumor liver difference), were closely associated with early recurrence and poor survival for HCC [37]. Similarly, the rad-score identified in our study was closely associated with pathological factors of HCC, like larger tumor size and vascular invasion and could be predictive of recurrence and survival.

Previously, several staging systems have been proposed for HCC patients, including TNM, BCLC, and HKLC [38]. Our rad-score based nomograms yielded a better discriminative ability than these traditional staging systems for solitary HCC patients. In addition, our results suggested that the rad-score could complement the TNM and BCLC staging systems in prognostic stratification as the C-index value increased when the rad-score was added to them. This incremental ability indicated the clinical importance of our finding for solitary HCC patients.

In our study, lasso-logistic regression model was performed to select texture features to establish the rad-score, as features obtained from lasso were generally accurate and the regression coefficients of most features were shrunk toward zero during overfitting [39], making the model easier to interpret and allowing the identification of the most valuable features [40]. Indeed, this method had been widely used in similar studies [14, 19].

Of note, the C-index values were relatively low for traditional staging systems, this phenomenon may be attributed to the study design. In our study, only solitary HCC patients were included. According to the traditional staging systems, these patients belong to the early or intermediate stages and are appropriate for surgery. Although they share the same or similar stage, a great deal of heterogeneity exists among them and they have a diverse postoperative prognosis. Thus, traditional staging systems could not actually predict recurrence and survival for these patients. In addition, the rad-score proposed also shared a relatively low C-index, but this couldn’t affect the clinical significance of rad-score, as it could stratify these patients into groups with different prognosis and improved the prognostic performance of traditional staging systems when being added into them for these patients.

The current study had several limitations. On one hand, the data in this study were derived from only one hepatobiliary center. On the other hand, only solitary HCC patients were included in this study, which may influence the generalization of the conclusion. In addition, this is a retrospective research. Therefore, further perspective multicenter analyses including HCC patients as various tumor stages were needed to validate the prognostic significance of this rad-score.

Conclusions

In summary, a rad-score derived from CT texture features was proposed in this study, which was an independent prognostic factor for tumor recurrence and survival of solitary HCC patients. In addition, this image score was complementary to the current staging systems of HCC patients. Finally, prognostic nomograms combining this score and clinicopathological features were proposed, which outperformed traditional staging systems and provided a convenient way to predict prognosis for solitary HCC patients, and may influence decision-making on the possible benefit of surgery.