Introduction

Currently, [18F]fluorodeoxyglucose positron emission tomography/computed tomography ([18F]FDG PET/CT) is widely used in the management of colorectal cancer to detect metastases [1]. In recent decades, many studies focused on quantitative assessment of [18F]FDG PET and the correlation with clinical outcome. Unfortunately, there is no consensus regarding optimal segmentation methods or quantitative indices to express metabolic characteristics of a tumour lesion. Standard uptake values (SUV) and volume-based indices are most extensively investigated. In patients with colorectal cancer, these measures are demonstrated to be prognostic on pre-treatment [18F]FDG PET in neoadjuvant [2, 3] and metastatic settings [4, 5]. However, the corrected tumour activity or metabolically active tumour volume (MATV) are only some of the PET characteristics which can be calculated from PET images. Other structural and textural imaging features might have additional value and can more accurately represent tumour biology. Indices, such as sphericity and compactness describe the shape of a tumour lesion. Heterogeneity can be expressed using entropy, which describes the sum of probability of a voxel grey level within the tumour volume of interest (VOI). Another accepted heterogeneity index is the areas under the curve of the cumulative SUV-volume histogram (AUC-CSH). [6,7,8,9,10]. The interest in tumour heterogeneity is growing, as advances in targeted medicine and knowledge about colorectal cancer biology is increasing. During the course of disease progression, heterogeneity in somatic mutations occur. Heterogeneous tumours grow more aggressively and negatively influence treatment response and patient survival [11, 12]. These genetic alterations influence tumour glucose consumption detected with [18F]FDG PET [13]. Using the entire scope of radiomics indices, intralesional tumour heterogeneity in metabolism of [18F]FDG can be quantified and differences between lesions can be evaluated. In locally advanced disease, these measures for heterogeneity correlate with recurrence [14] and survival [15]. However, the clinical meaning of these structural and textural indices and added value to the conventional PET units remain unclear for patients with metastatic colorectal cancer (mCRC).

In this study, we retrospectively evaluate the baseline metabolic tumour fingerprint using a comprehensive radiomics panel on baseline [18F]FDG PET/CT in relation with clinical outcome for patients with mCRC undergoing palliative systemic therapy. We hypothesized that highly metabolically active and heterogeneous tumour lesions will respond poorly to systemic treatment and have a poor progression-free survival (PFS) and overall survival (OS).

Methods

Population

Patient records were evaluated for inclusion if patients had participated in one of seven prospective clinical trials open in the VU University Medical Center in the period from January 2012 until May 2017 (NCT01792934, NCT01998152, NCT02135510, NCT01896856, NCT02117466 and NCT01691391). These studies included patients undergoing first- (capecitabine combined with oxaliplatin with or without bevacizumab) or third-line (cetuximab monotherapy) standard systemic treatment. All patients gave written informed consent to participate in one of the aforementioned studies. The medical ethics commission of the VU University Medical Center approved the retrospective study protocol. Patients with mCRC were included if [18F]FDG PET/CT had been performed prior to the start of palliative systemic treatment, with a maximal interval between PET and treatment of 2 months. Patients did not receive any (local) anti-cancer treatment between baseline [18F]FDG PET and the start of the evaluated systemic treatment.

[18F]FDG PET/CT

[18F]FDG-PET/CT scans were performed and reconstructed according to the EANM guidelines using EARL-accredited PET scanners [16]. Briefly, patients fasted 6 h prior the tracer injection (target serum glucose ≤7 mmol/l). A static whole-body (skull to mid-thigh) PET scan was started 60 min (± 5 min) after injection of [18F]FDG (3–4 Mbq/kg), with a scanning time of 2 min per bed position. A low-dose CT (120 kVp, 50 mAs) was acquired prior to the PET scan. All PET data were normalized and corrected for scatter and random events, attenuation and decay.

Tumour delineation and quantification

PET VOIs were semiautomatically delineated using a threshold of 50% of the SUVpeak, with correction for local background (SUV ≤4) [17]. All visually identifiable tumour lesions were delineated. Lesions were analysed if SUVpeak was higher than background, defined as two times SUVmean of the blood pool (VOI of five voxels in five consecutive planes in the ascending aortic arch) [18].

From each VOI, 10 radiomics indices were calculated. SUV was defined as the activity in a tumour VOI normalised for injected dose and lean body mass. We evaluated three commonly used first-order SUV indices; SUVmax (defined by the voxel with the highest activity within VOI), SUVmean (mean activity in the tumour VOI) and SUVpeak (SUVmean determined in a 12-mm diameter sphere that was automatically positioned in the VOI to acquire the highest value). The MATV in cm3 was determined with a threshold of 50% of the SUVpeak (with background correction ≤ SUV 4). Total lesion glycolysis (TLG) was defined as the SUVmean times MATV.

Five textural and structural radiomics indices were evaluated. Entropy expresses heterogeneity in tracer uptake within the tumour VOI on a voxel basis. Entropy consists of the sum of the probability of a certain voxel value. The formula for \( entropy\ (Shannon)=-{\sum}_{l=1}^k\left[p(l)\right]\mathit{\log}2\left[p(l)\right] \), l is the number of grey levels in the VOI and ranges from 1 to k [19]. The probability of a certain range of grey-level values can be evaluated in steps based on the maximal value k in 64 bins, or in fixed SUV bins (0.25 g/ml) for every VOI (entropy FXD). AUC-CSH is another measure for heterogeneity; it comprises the AUC of the histogram of the % of total tumour volume above % threshold of SUVmax, calculated with the Riemann sum using the trapezoidal rule [20]. This results in a low AUC-CSH for heterogeneous lesions. Thus, homogeneous tumours would have higher entropy and AUC-CSH compared to heterogeneous tumours.

Sphericity is a measure to describe the sphere-like shape of the VOI. \( Sphericity=\frac{{\left(36\uppi {V}^2\right)}^{1/3}}{\mathrm{A}} \). V is defined as volume and A as surface area of the VOI. A is defined as the sum of length times width of every plan in the VOI. Much like sphericity, compactness describes the deviation of the VOI from a perfect sphere. \( Compactness=\frac{\mathrm{V}}{\uppi^{1/2}{\mathrm{A}}^{3/2}} \) [21]. Thus, spherical tumours would have higher sphericity and compactness compared to aspherical tumours.

The correlation between clinical outcome measures and all PET features were evaluated for lesions with a metabolic volume ≥ 4.2 mL, as these lesions are less affected by partial volume effects [22]. For the analysis at a patient level, the mean of all metastases was calculated for all PET features. To assess total tumour bulk per patient, the sum of MATV and TLG of all lesions was evaluated (independent of volume).

Clinical outcome

In this study, four clinical outcome measures were evaluated: anatomical change on CT per lesion, treatment benefit, PFS and OS. Treatment benefit was defined as stable disease or response versus progressive disease (PD) on first-evaluation CT scan (2–3 months) according to RECIST v1.1. Briefly, RECIST v1.1 response evaluation entails evaluation of maximally 5 lesions [≤2 per organ, lesion diameter ≥ 10 mm (long axis) or ≥ 15 mm (short axis) for lymph nodes]. PD is defined as ≥20% increase and non-PD as <20% increase of the sum of diameters. Additionally, all quantified tumour lesions (above background) were measured on the baseline and first-evaluation CT, with the exception of non-measurable lesions (e.g. bone lesions or pleural carcinomatosis). PFS and OS were defined as the period starting from the date of the first evaluated treatment cycle to the date of PD or death, respectively. Follow-up was continued until the first of August 2017.

Statistical analysis

All statistical analyses were performed using IBM SPSS version 22, with the exception of the cluster analysis, which was performed using R version 3.2.3. Benefit and survival analysis were performed separately for each treatment line, as first-line treatment is expected to lead to better response rates and longer survival compared to third-line treatment. The normality of PET features was evaluated using histograms. Correlations between PET features and change on CT were investigated using a Pearson’s correlation for normally distributed data, i.e. SUVmax, SUVpeak, SUVmean, AUC-CSH, entropy, entropy FXD, compactness and sphericity. Spearman’s rho was used in skewed data, i.e. MATV and TLG. For linear correlations, explained variance was defined as the square of the correlation coefficient. Radiomics features that demonstrated a significant univariate linear correlation with change on CT were evaluated using linear mixed-effects models to correct for clustering within a patient (skewed data was log transformed). A random intercept with one fixed factor (non-random slope) was used, with restricted maximum likelihood and unstructured covariance type. To compare differences in PET features for a patient with and without treatment benefit, RAS/BRAF mutations and sidedness of primary tumour, independent t tests were used for normally distributed values, i.e. SUVmax, SUVpeak, SUVmean, AUC-CSH, entropy, entropy FXD, compactness and sphericity. Mann-Whitney U tests were used in skewed data, i.e. MATV and TLG. Additionally, using a receiver operator characteristic (ROC) curve, the area under the ROC was calculated to give insight into the sensitivity and specificity of PET features.

For the survival analysis, patients without progression and patients that are still alive were censored at August 1st 2017. Univariate survival analysis was done using Cox regression to evaluate potential correlation between baseline PET features and either OS or PFS. To calculate a meaningful hazard ratios (HRs), continuous variables were binned in 10% percentiles. Multivariate analyses was performed using Cox regression with the "Enter" method. After continuous correlation of the PET features, data were dichotomized based on the 50th percentile and evaluated with Kaplan–Meier curves (log rank).

The 10 radiomics features were clustered using the partitioning around medoids and hierarchical clustering method. For both methods, the number of clusters was selected by means of consensus clustering, which selects the most stable clustering. As a way of technical validation, found clusters were visualized in a principal component plot.

Results

Patient characteristics

Of the 250 patients included in aforementioned studies, 104 were eligible for this analysis. One of these patients was lost in follow-up and four had no evaluable lesions on FDG PET. Thus, 99 patients were included in the final analysis (Fig. 1). Fifty-two were treated in first-line setting (capecitabine combined with oxaliplatin with or without bevacizumab); the remaining 47 were treated with third-line cetuximab monotherapy. Patient characteristics are described in Table 1.

Fig. 1
figure 1

Flow chart of patient inclusion

Table 1 Patient characteristics

In these 99 patients, 584 lesions were quantified on [18F]FDG PET. Of these lesions, 354 had an MATV of ≥4.2 mL and were included in the analyses. On baseline CT, lesions were smaller (32 versus 41 mm, p < 0.001) and tumour shrinkage at first CT evaluation was greater in the first-line group than in the third-line group (mean −21.9% versus −5.3%, p = 0.004). On [18F]FDG PET, tumour lesions of patients in the first-line group had lower SUVmax [mean 6.8 (SD 3.1) versus 7.8 (SD 2.9), p = 0.004], SUVpeak [mean 5.4 (SD 2.4) versus 6.4 (SD 2.3), p < 0.001], SUVmean [mean 4.4 (SD 1.8) versus 4.9 (1.7), p = 0.013], MATV [median 9.8 (110.4) versus 16.6 (SD 110), p < 0.001], TLG [median 39.3 (SD 672.3) versus 83.3 (SD 589), p < 0.001] compared to tumour lesions of patients in the third-line group. Compactness [mean 0.04 (SD 0.01) versus 0.03 (SD 0.013), p < 0.001]), sphericity [mean 0.86 (SD 0.16) versus 0.70 (SD 0.2), p < 0.001) and AUC-CSH [mean 0.74 (SD 0.05) versus 0.69 (SD 0.13), p < 0.001] were higher in the first-line group. Entropy [mean 5.3 (SD 0.21) versus 5.3 (SD 0.31) p = 0.4] and entropy FXD [mean 3.6 (SD 0.68) versus 3.6 (SD 0.91) p = 0.9] was not different between the first-line and third-line groups.

First-line treatment group

Analysis on a lesion level

In the first-line treatment group, 70% of 136 lesions were measurable on CT. There was a positive but weak correlation with percentage change on CT and MATV (p < 0.001, ρ = 0.36), TLG (p = 0.002, ρ = 0.31), compactness and sphericity (p = 0.009, ρ = −0.27 for both). Correction for clustering of lesions within patients MATV (p = 0.007, estimate log MATV 16.1, 95% CI 4.5–27.6) and TLG (p = 0.01, estimate log TLG 13.2, 95% CI 3.3–23.2) remained correlated with change on CT; compactness (p = 0.23) and sphericity (p = 0.23) did not.

SUVmax (p = 0.4), SUVpeak (p = 0.6), SUVmean (p = 0.45), MATV (p = 0.38), TLG (p = 0.56) and entropy (p = 0.51) were not different between different organ sites of metastases. Yet, compactness (p = 0.03), sphericity (p = 0.03) and AUC-CSH (p = 0.04) were significantly different (supplementary data 1).

Analysis on a patient level

Patients without treatment benefit had a significantly higher mean entropy compared to patients with benefit (5.38 versus 5.27, p = 0.04 respectively, Table 2). The ROC curve for mean entropy demonstrates that it is a fairly good predictor for treatment benefit with an area under the ROC of 0.74 (95% CI 0.52–0.97, supplemental figure 1).

Table 2 Radiomics versus clinical outcome in first-line treatment

PFS was positively correlated with mean AUC-CSH (p = 0.02, HR 0.86, 95% CI 0.76–0.97) and negatively correlated with mean MATV (p = 0.01, HR 1.15, 95% CI 1.03–1.28), sum MATV (p = 0.02, HR 1.14, 95% CI 1.02–1.29), mean TLG (p = 0.02, HR 1.16, 95% CI 1.03–1.30) and sum TLG (p = 0.05, HR 1.12, 95% CI 1.00–1.26, Table 2). With multivariate analyses, corrected for performance status, sidedness and RAS or BRAF mutation status, none of the radiomics features correlated with PFS.

Similar to PFS, OS was positively correlated with mean AUC-CSH (p < 0.01, HR 0.77, 95% CI 0.66–0.89) and negatively correlated with mean MATV (p < 0.01, HR 1.22, 95% CI 1.07–1.40), sum MATV (p = 0.01, HR 1.19, 95% CI 1.04–1.36), mean TLG (p = 0.01, HR 1.22, 95% CI 1.06–1.41) and sum TLG (p = 0.02, HR 1.18, 95% CI 1.03–1.35, Table 2). Dichotomization based on the 50th percentile showed a significantly shorter OS for patients with a low AUC-CSH (median 14.1 versus 27.9 months, p = 0.001), high sum MATV (median 16.1 versus 25.3 months, p = 0.036) and high sum TLG (median 15.6 versus 28.8 months, p = 0.027, Fig. 2a). With multivariate analyses, corrected for performance status, number of metastases, sidedness and RAS or BRAF mutation status, AUC-CSH (p = 0.016, HR 0.64, 95% CI 0.45–0.92) and sum MATV (p = 0.048, HR 2.63, 95% CI 1.01–6.87) correlated with OS. Mean and sum TLG and mean MATV did not (p = 0.34, p = 0.41 and p = 0.25, respectively).

Fig. 2
figure 2

a Differences in survival for patients undergoing first-line treatment based on dichotomized data using the 50th percentile of mean AUC-CSH, sum MATV and sum TLG. b Differences in survival for patients undergoing third-line treatment based on dichotomized data mean SUVmax, sum MATV and TLG

Patients with right-sided CRC had significantly higher mean MATV (median 12.5 versus 17.8, p = 0.049), sum MATV (median 24.3 versus 49.4, p = 0.043) and sum TLG (median 106.3 versus 382.7, p = 0.031). There was no significant relation between radiomics features and BRAF and RAS mutation status (supplemental table 1A).

Third-line treatment group

Analysis on a lesion level

In the third-line treatment group, 82% of 218 lesions were measurable on CT. Heterogeneity expressed as AUC-CSH was positively but weakly correlated with percentage change on CT (ρ = 0.21, p = 0.005, Fig. 3). Yet, after correction for clustering within patients, AUC-CSH was not correlated with change on CT (p = 0.35).

Fig. 3
figure 3

AUC-CSH and response on CT for two patents undergoing third-line cetuximab monotherapy

MATV (p = 41), TLG (p = 0.20), compactness (p = 0.22), sphericity (p = 0.22) and AUC (p = 0.44) were not different between different organ sites of metastases. Yet, SUVmax (p < 0.01), SUVpeak (p < 0.01), SUVmean, (p < 0.01) and entropy (p < 0.01) were significantly different (supplementary data 1).

Analysis on a patient level

There were no significant differences in radiomics features for patients with and without treatment benefit (Table 3).

Table 3 Radiomics versus clinical outcome in third-line treatment

PFS was correlated with mean MATV (p = 0.02, HR 1.27, 95% CI 1.05–1.54), sum MATV (p = 0.01, HR 1.35, 95% CI 1.09–1.68), mean TLG (p = 0.01, HR 1.29, 95% CI 1.06–1.56) and sum TLG (p = 0.01, HR 1.27, 95% CI 1.06–1.53, Table 3). With multivariate analyses corrected for performance status, number of metastases, sidedness and RAS or BRAF mutation status, mean MATV (p = 0.03, HR 1.35, 95% CI 1.02–1.78), sum MATV (p = 0.01, HR 1.43, 95% CI 1.08–1.91), mean TLG (p = 0.016, HR 1.45 95% CI 1.07–1.97) and sum TLG (p = 0.03, HR 1.35, 95% CI 1.02–1.79) remained correlated with PFS.

OS was negatively correlated with mean MATV (p < 0.01, HR 1.68, 95% CI 1.20–2.37), sum MATV (p < 0.01, HR 2.04, 95% CI 1.36–3.07), mean TLG (p < 0.01, HR 1.54, 95% CI 1.15–2.05) and sum TLG (p < 0.01, HR 1.80, 95% CI 1.24–2.61). Additionally, OS was negatively correlated with mean SUVmax (p = 0.03, HR 1.19, 95% CI 1.01–1.41) and SUVpeak (p = 0.04, HR 1.21, 95% CI 1.01–1.45, Table 3). With multivariate analyses, mean MATV (p < 0.01, HR 2.41, 95% CI 1.38–4.25), sum MATV (p < 0.01, HR 2.47, 95% CI 1.45–4.18), mean TLG (p < 0.01, HR 1.70, 95% CI 1.15–2.51) and sum TLG (p < 0.01, HR 1.72, 95% CI 1.16–2.54) remained correlated with OS. Mean SUVmax (p = 0.42) and mean SUVpeak (p = 0.25) did not correlate to OS in multivariate analysis.

Dichotomized data based on the 50th percentile of mean MATV (p = 0.04, 16.1 versus 9.1 months), sum MATV (p = 0.001, median 16.3 versus 7.0 months), mean TLG (p = 0.033, 16.1 versus 9.3 months) and TLG (p < 0.001, 16.3 versus 6.4 months) resulted in a significantly different OS between groups (Fig. 2b).

There were no significant differences in the radiomics indices for BRAF or RAS mutated tumours. Patients with right-sided disease had less spherical (compactness p = 0.03, mean 0.024 versus 0.033; sphericity p = 0.02, mean 0.56 versus 0.71) and less heterogeneous disease (entropy FXD p = 0.029, mean 4.12 versus 3.62) compared to left-sided disease (supplementary Table 1B).

Cluster analysis

First-line treatment group

To evaluate potential complementary predictive and prognostic value, 10 radiomics features were combined in a cluster analysis. Three cluster groups were identified (Fig. 4a); concordance with an alternative cluster analysis was 78%. A consensus-clustering graph demonstrates repeatability of clustering within different subsets of our data set (supplemental figure 2A) and a principal component analysis demonstrates the separation in the 3 groups, based on a summary of the 10 PET features (supplemental figure 2B).

Fig. 4
figure 4

a A heatmap of the cluster analysis results of 10 PET characteristics per lesion in the first-line treatment group demonstrates 3 cluster groups. b Here, the heatmap of the three cluster groups for the third-line treatment group is illustrated

There were no significant differences between the three cluster groups and percentage change on first CT evaluation (p = 0.65), PFS (p = 0.16) and OS (p = 0.37).

Third-line treatment group

With the two cluster methods, concordance was 86.1%; the consensus clustering graph and a principal component analysis are shown in supplemental figure 4A and 4B.

As in first-line, the percentage change on first CT evaluation and PFS were not significantly different between the three cluster groups (p = 0.22 and p = 0.09). However, OS was different, with significantly poorer survival for group 2 versus group 1 (p = 0.03, HR 5.03, 95% CI 1.17–21.7, Fig. 5).

Fig. 5
figure 5

OS for clusters 1, 2 and 3 in the third-line treatment group

Association between PET features

Almost all PET features were associated with each other (supplemental table 2). All SUV-based measurements (r2 95–97%) and compactness and sphericity (r2 99.4%) were highly correlated. However, neither compactness nor sphericity were correlated with entropy and entropy FXD. Furthermore, there was no association between AUC-CSH and SUVmean or entropy FXD.

Discussion

In this study, we demonstrated a relation between total tumour volume, shape and heterogeneity of tracer uptake on pre-treatment [18F]FDG PET and clinical outcome. TLG, MATV, compactness and sphericity correlated with anatomical change in the first-line group and AUC-CSH in the third-line group. Yet, with correction for clustering, only TLG and MATV remained correlated with anatomical change. Mean entropy correlated with treatment benefit. For both treatment lines, higher tumour bulk (mean and sum MATV and TLG) was negatively correlated to PFS and OS. Thus, tumour heterogeneity and tumour bulk influences survival despite palliative systemic treatment.

First-order SUV features are most frequently used in clinical care and in studies. In our study, pre-treatment SUVmax and SUVpeak correlated with OS in the third-line group; however, this was not significant in the multivariate analysis. SUVmax, SUVpeak and SUVmean did not correlate with any other clinical outcome measures. We demonstrate that radiomics, which incorporates biological features such as tumour volume, shape and heterogeneity, correlates better with response and survival data. For future studies, it is important to include these measures when associating PET data with clinical outcome.

Our data indicates that MATV and TLG are promising prognostic features, as these features correlate with PFS and OS in uni- and multivariate regression for patients starting first- and third-line treatment. In literature, a correlation between pre-treatment TLG and OS has been described for both curative [3, 23, 24] and palliative [25, 26] regimens. In line with the literature, our data suggests that a highly metabolically active tumour bulk before start of palliative systemic treatment is a poor prognostic factor, both for patients starting first-line treatment and heavily pre-treated patients. The correlation between tumour load and survival is known for ovarian cancer [27], and is the main rational for current studies investigating debulking of patients with metastasized colorectal cancer as palliative treatment, such as the ORCHESTRA study (NCT01792934). This study evaluates if reduction of tumour load improves OS of patients with mCRC.

Chromosomal tumour instability is a known hallmark of tumour aggressiveness and associated impaired survival. mCRC is heterogeneous in its genetic alterations [11, 28] and palliative systemic treatment increases heterogeneity over time. It has been reported that these genetic alterations influence tumour uptake on [18F]FDG PET/CT [13, 29]. With the radiomics features, such as entropy and AUC-CSH, heterogeneity in voxel uptake within lesions can be assessed [21]. Entropy evaluates the sum of the probability of a certain voxel value in the tumour VOI [19]. The AUC-CSH is calculated based on the histogram of the voxel values in the tumour VOI [20]. Homogeneous [18F]FDG uptake would result in a high entropy and AUC-CSH. AUC-CSH was significantly higher for first-line lesions compared to third-line lesions, indicating that third-line lesions are more heterogeneous. Additionally, AUC-CSH was positively correlated with change on CT. In line with our data, it has been reported that heterogeneous [18F]FDG uptake in colorectal cancer is correlated with poor clinical outcome, such as recurrence and survival [14, 15, 30].

Entropy and entropy FXD were the only two radiomics that were not different between first- and third-line treatment. In contrast to previous data, entropy was higher in patients whom did not respond to first-line treatment and was higher in lesions originating from cancer in the right hemicolon in the third-line cohort. Biologically, it is not logical that homogeneous tumours respond less. However, this result might be due to the semiautomatic delineation. The cut-off for lesion delineation was set at 50% of the SUVpeak, such that tumours with an intense focal uptake will have a higher cut-off and this could result in less heterogeneity.

Sidedness of the primary tumour is a surrogate prognostic biomarker, and lesions originating from right-sided primary tumours harbour genetic alterations associated with resistance to anti-EGFR therapy [31, 32]. Indeed, using the [18F]FDG PET data, we found poorer metabolic features for patients with right-sided disease, such as a higher tumour bulk in patients in the first-line group and less spherical disease in patients in the third-line group. Another potentially meaningful radiomics feature is the shape of a lesion. Aspherical tumour growth has shown to be a poor prognostic marker for breast and lung cancer [33, 34]. In this study, compactness and sphericity were evaluated. However, as both values have a near-perfect concordance, evaluating both would be redundant. In the first-line treatment group, aspherical lesions grow faster on CT.

Using a cluster analysis, we evaluated if certain clusters of PET features give complementary value in characterizing particularly indolent or aggressively growing lesions. The three clusters of lesions did not have differences in anatomical changes on CT. Per patient there were no differences in treatment benefit and PFS. We identified one cluster group with significantly longer survival after third-line treatment. However, the cluster groups had no additional predictive value compared to the individual units. An explanation for the low complementary value of the 10 PET units could be the high correlation among these PET units [35].

This study is limited by the number of included patients. Therefore, we only selected and evaluated 10 out of hundreds of radiomics features, based on previous studies and potential clinical interest [2,3,4,5,6,7,8,9,10,11,12, 14, 15]. Moreover, it is of utmost importance to explore the robustness of these features, assess redundancy and study their dependence on image quality and reconstruction settings as well as image processing steps. It is of interest to mention that a recent initiative, i.e. the Imaging Biomarker Standardisation Initiative (IBSI), is a first important step towards standardization of radiomics features [36].

In conclusion, these data demonstrates that baseline tumour heterogeneity, asphericity and high tumour volume on [18F]FDG PET is correlated with impaired benefit and survival despite palliative systemic treatment. Future PET imaging research should not only focus on first-order SUV measures, but also evaluate radiomics that incorporate tumour volume and heterogeneity.