Background

18F-fluoro-2-deoxy-2-d-glucose (18F-FDG) positron emission tomography (PET) is a valuable imaging tool for the diagnosis, staging and assessment of the responses of various malignancies to therapy [1,2,3,4,5]. Reports indicate that 18F-FDG PET is a more effective biomarker of treatment responses than morphological information because it measures glucose metabolism [6,7,8,9]. The standardised uptake value (SUV) has served as a semi-quantitative metric of 18F-FDG uptake by tumours.

The SUV must be highly reproducible and repeatable to serve as a reliable biomarker in multicentre trials using various PET scanners, because SUV variability among scanners can contribute to uncertainty in results. The Quantitative Imaging Biomarker Alliance (QIBA), which was organised by the Radiological Society of North America (RSNA), recommended definitions for reproducibility and repeatability [10, 11]. In short, reproducibility refers to the consistency of values derived from repeated tests of one individual by different operators, different scanners, software, or at different sites and times. Repeatability refers to the consistency of values derived from repeated tests of one individual by a single operator using the same scanner and software. That is, reproducibility refers to inter-scanner variability, whereas repeatability refers to intra-scanner variability.

Recent developments in hardware and software for PET technologies have resulted in wide variations in basic performance. The SUV considerably varies due to biological and technical factors involving the model of the PET scanner, acquisition protocols, reconstruction algorithm and parameters [12,13,14,15]. For example, ordered subset expectation maximization (OSEM) with a point spread function (PSF) algorithm improves the spatial resolution, but it causes edge artifacts as overestimation [16, 17]. Therefore, multicentre studies require a standard imaging protocol and SUV harmonization to reduce inter- and intra-scanner variability in the SUV.

Several organizations have published guidelines to standardise 18F-FDG PET imaging protocols and harmonize SUV harmonization [14, 18,19,20]. In terms of SUV harmonization, the European Association of Nuclear Medicine (EANM) and the Japanese Society of Nuclear Medicine (JSNM) have proposed a specified range of recovery coefficients (RC) as a function of sphere size [21]. “Harmonization” is defined herein as a PET image smoothed with an additional Gaussian filter (GF) so that the RC of each sphere falls within the specified range of the body phantom. This harmonization strategy minimises variations in SUV measurements during scan acquisition and processing [22,23,24] and reduces reconstruction-dependent variability in the PET Response Criteria in Solid Tumours (PERCIST) classification [25].

The usefulness of commercially available software for harmonization was first determined using EQ.PET software [23]. However, a costly, software-specific workstation provided by the Siemens Healthineers is required. The recently commercially available GI-PET software (AZE VirtualPlace Hayabusa, Tokyo, Japan) adjusts the RC of PET images into a reference RC using an additional GF. This vendor-neutral quantitative software can be installed in general personal computers.

We designed the Japanese SUV Harmonization Technology (J-Hart) study based on this background to determine the applicability of vendor-neutral software to multicentre PET SUV harmonization. We also assessed the effects of SUV harmonization on SUV reproducibility and repeatability.

Methods

Phantom

We used a NEMA 2001 International Electrotechnical Commission (IEC) body phantom (Data Spectrum Corp., Durham, NC, USA), consisting of six spheres (Model PET/IEC–BODY/P) of 10, 13, 17, 22, 28 and 37 mm in diameter, with a wall thickness of 1 mm in a quasi-cylindrical cavity (280 × 210 × 180 mm). All spheres and the background were filled with solutions containing 10.6 and 2.65 kBq/mL (at the midpoint of 30-min acquisitions), respectively, of 18F-FDG to obtain a sphere-to-background ratio of 4.

PET/CT scanners

Data were acquired at nine Japanese institutions using 12 popular PET systems (Table 1): Discovery ST Elite Performance (DSTEP), Discovery ST Elite (DSTE), Discovery 600 Motion (D600), Discovery 690 (D690) and Discovery IQ (DIQ) scanners (General Electric Medical Systems, Milwaukee, WI, USA); Biograph 64 True Point (BioTP), Biograph mCT 3ring (Bio3R) and Biograph mCT flow 4ring (Bio4R) scanners (Siemens Healthineers, Erlangen, Germany); GEMINI-TF16 (GTF), Gemini GXL (GXL) scanners (Philips Medical Systems, Cleveland, OH, USA); Aquiduo (Aquiduo) and Celesteion PCA-9000A (Celesteion) scanners (Toshiba Medical Systems, Otawara, Japan). The collaborating institutions comprised five university hospitals, three cancer centres and one research institution that normally conduct many oncological PET studies. The dose calibrator and each PET scanner of all evaluated PET/CT systems were cross-calibrated before data acquisition from the phantom according to the guidelines [19].

Table 1 Clinical parameters of PET scanners

Region of interest

The SUVmax of the six hot spheres was determined from circular regions of interest (ROIs) placed on the centre slice of images of each sphere. The diameter of each ROI was equal to that of each hot sphere (10, 13, 17, 22, 28 and 37 mm). Twelve circular ROIs with a diameter of 10 mm were placed on the background region at the centre slice of the PET image; the average SUV was calculated from twelve ROIs as the SUVmean of the background. The acceptance criterion for the SUVmean of the background region was 1.00 ± 0.05. Table 1 shows the average SUVmean of the background region after cross-calibration.

Digital reference object

The QIBA suggested that computational imaging models or phantom data might play an important role in reproducibility assessment [26] and thus developed an 18F-FDG PET/CT digital reference object (DRO), which is a synthetic test object in Digital Imaging and Communications in Medicine (DICOM) format [27]. We obtained the DRO from the QIBA [28]. The mathematically developed DRO of the NEMA body phantom is an ideal object with a uniform background with a transaxial diameter of 20 cm that simulates a human abdominal cross section with an SUV of 1.00. Six spheres with diameters of 10, 13, 17, 22, 28 and 37 mm contained an 18F-FDG solution with an SUV of 4.00. The SUV of a central cylinder with a diameter of 5 cm was 0.00. A three-dimensional (3-D) GF was applied to the DRO to simulate a PET image with relatively low spatial resolution. The RC of the DRO was included in the JSNM reference range for a GF with a full-width at half maximum (FWHM) value of 10–13 mm, and the RC was highest at a 10-mm FWHM of GF (Fig. 1). We adopted DRO10mm as the reference SUV (SUVref) for harmonization.

Fig. 1
figure 1

Development of reference recovery coefficient from digital reference object and three-dimensional Gaussian filter

Data acquisition and processing

Emission data were acquired from the phantom over a period of 30 min using list mode with the bed in position 1. A PET image with 30-min acquisition was reconstructed to determine the parameters for harmonization and to evaluate reproducibility. PET images with 2-min acquisition × 15 contiguous frames were reconstructed to evaluate the repeatability. The reconstruction parameters were individually determined according to the clinical setting at each institution (Table 1). All reconstructed PET images were output in DICOM format.

Harmonization

We used GI-PET software on a personal computer for harmonization processing. Stand-alone GI-PET software can quantify PET images and it is typically used to adjust spatial resolution to harmonize PET images using a 3-D GF.

The harmonization procedure of GI-PET was as follows: Firstly, reconstructed PET images with 30-min acquisitions in DICOM format were loaded into the GI-PET software. The original voxels were converted into isotropic voxels of an equivalent size in the x, y and z directions using the bi-linear method because the 3D-GF of GI-PET requires isotropic data. A 3-D GF with various FWHM values of 1–15 mm in 1-mm increments was applied to the PET images, and then, a graph of the SUVmax of the spheres was generated as a function of the diameter. The FWHM of the GF that provided an SUVmax within the reference range proposed by the JSNM [21] was determined. The reference ranges of the lower to the upper limits of the SUVmax on 10-, 13-, 17-, 22-, 28- and 37-mm hot spheres were 1.19–2.00, 1.52–3.04, 2.58–3.71, 3.25–4.09, 3.56–4.21 and 3.82–4.17, respectively. If only one FWHM provided an SUVmax within the reference range, then that FWHM was taken as the optimal parameter of GF for harmonization. On the other hand, if several FWHM provided SUVmax within the reference range, we compared the root mean square error (RMSE) with DRO10mm. The RMSE is the square root of the variance in the SUVmax of the six spheres between PET images and DRO10mm and is calculated as:

$$ \mathrm{RMSE}=\sqrt{\frac{1}{6}\sum \limits_{i=10,13,17,22,28,37\ \mathrm{mm}}{\left({\mathrm{SUV}}_{\max\ \mathrm{in}\ \mathrm{PET}\ \mathrm{scanner},i}-{\mathrm{SUV}}_{\max\ \mathrm{in}\ \mathrm{DRO}10\mathrm{mm},i}\right)}^2} $$

Finally, we determined the optimal FWHM that provided the smallest RMSE (Fig. 2). RSME is a simple mathematical measure that has been used extensively in nuclear medicine and molecular imaging for several years. Combining RSME with DRO is reportedly useful to create appropriate reconstruction conditions for other types of nuclear medicine image [29].

Fig. 2
figure 2

Flow chart used to determine the optimal FWHM of additional Gaussian filter for harmonization

Reproducibility and repeatability

We calculated the coefficients of variation (CV) of the SUVmax on images of spheres with different diameters acquired over a period of 30 min by 12 PET scanners to evaluate the reproducibility of quantitation and defined the CV across the scanners as CVrepro. We then compared CVrepro between the pre- and post-harmonized images calculated as the mean ± standard deviation (SD). The CV is the ratio of the standard deviation (SD) of the mean calculated as:

$$ \mathrm{CV}=\raisebox{1ex}{$\mathrm{SD}$}\!\left/ \!\raisebox{-1ex}{$\mathrm{mean}$}\right.\times 100\ \left(\%\right) $$

We calculated the CV of SUVmax on PET images with 2-min acquisitions × contiguous 15 frames for each PET scanner and each sphere size to evaluate the repeatability of quantitation and then defined the CV across the frames as CVrepeat. We then compared CVrepeat between the pre- and post-harmonized images calculated as the mean ± SD.

Statistical analyses

Differences in CVrepro and CVrepeat were compared among six and 54 pairs, respectively, between pre- and post-harmonization by Wilcoxon signed-rank tests using JMP® 13 (SAS Institute Inc., Cary, NC, USA). P values < 0.05 were considered statistically significant.

Results

Harmonization

The SUVmax fell within the JSNM reference range in three PET scanners (DSTE, DSTEP and GXL) without adding a GF (Table 2). Thus, adjustment was not required for harmonization. The FWHM values of the additional GF that resulted in an SUVmax within the JSNM reference range were 5–7, 5–9, 8–10, 9, 6–10, 8–9, 6–9, 6–9 and 5–7 mm for the Aquiduo, Bio3R, Bio4R, BioTP, Celesteion, DIQ, D600, D690 and GTF, respectively. The RMSE compared with DRO10mm was the smallest at FWHM values of 5, 5, 8, 9, 6, 8, 6, 6 and 5 mm, respectively (Table 2). The SUVmax of these nine scanners fell within the JSNM reference range only after harmonization. Figure 3 shows the pre- and post-harmonization RC of 12 PET scanners.

Table 2 Full-width at half maximum for SUV harmonization of PET scanners
Fig. 3
figure 3

Recovery coefficients of 12 PET scanners obtained from 30-min PET image. Comparison among a pre- and b post-harmonization RC

Reproducibility

Figure 4 shows the pre- and post-harmonization CVrepro of the 30-min PET images among 12 PET scanners. The (means ± SD) pre- and post-harmonization CVrepro of six spheres were 9.45% ± 4.69% (range, 3.83–15.3%) and 6.05% ± 3.61% (range, 2.30–10.7%), respectively. Harmonization significantly improved the reproducibility of the PET SUVmax (P = 0.0055).

Fig. 4
figure 4

Effects of harmonization on reproducibility. Pre- and post-harmonization comparisons of coefficients of variation of SUVmax for 12 PET scanners (CVrepro)

Repeatability

Figure 5 shows the CVrepeat of images across 15 frames acquired from each PET scanner. The post-harmonization CVrepeat of the nine PET scanners that required an additional GF for harmonization (Aquiduo, Bio3R, Bio4R, BioTP, Celesteion, DIQ, D600, D690 and GTF) was lower than that at pre-harmonization (Table 3). The pre- and post-harmonization CVrepeat (means ± SD) of nine scanners were 6.59% ± 1.29% (range, 5.00–8.98%) and 4.88% ± 1.64% (range, 2.65–6.72%), respectively. Harmonization significantly improved the repeatability of the PET SUVmax (P < 0.0001).

Fig. 5
figure 5

Effects of harmonization on repeatability. Pre- and post-harmonization comparisons of coefficients of variation of the SUVmax for 15 frames (CVrepeat). The asterisk symbol indicates three PET scanners (DSTE, DSTEP and GXL) did not require additional Gaussian filter for harmonization

Table 3 Pre- and post-harmonization coefficients of variation across 15 frames of 12 PET scanners

Discussion

The J-Hart study examined whether a vendor-neutral software could harmonize SUV across 12 PET scanners from various manufacturers. We obtained an appropriate FWHM of GF for harmonization by comparing the SUVmax of 12 PET scanners at nine institutions with both the JSNM reference range and the DRO10mm. Harmonization significantly improved the reproducibility and repeatability of the SUVmax.

Our harmonization process was based on the simple down-smoothing method proposed by Boellaard et al. [30]. This strategy has been widely applied to SUV harmonization [15, 31] and provides a pair of PET images with relatively high and low resolution. With a comparable SUV across different scanners as well as in EQ.PET software, not only high-resolution images for lesion detection but also harmonized images can be obtained [23]. The EANM provides the strategy for harmonization applied to RC [19], whereas JSNM recommends harmonization applied to SUVmax that can detect errors such as cross-calibration. Therefore, SUVmax data are presented in graphs of the present study.

Quak et al. validated a proprietary software tool (EQ.PET) to harmonize SUV across three PET systems independently of the applied reconstruction algorithm [23]. They reported that all RC of SUVmax fell within EANM curve after applying the EQ.PET filter. The present study confirmed that vendor-neutral GI-PET software could harmonize the SUVmax across 12 PET scanners. As a result, all SUVmax fell within in the JSNM reference range on GI-PET after adding a GF. Therefore, we consider that GI-PET and EQ.PET are equally useful for harmonization. The post-harmonization SUVmax of various PET scanners became comparable based on the same standard, which is relevant to multicentre and follow-up studies that use various PET scanners. E Q.PET is a proprietary software that can only be installed on its own dedicated workstation with PET system. In contrast, GI-PET is a vendor-neutral software that can be applied at any institution using only a personal computer, regardless of the PET system. Although the present study used the JSNM reference range of SUVmax, the EANM reference range can be used in vendor-neutral software.

Harmonization of the SUV improved the CVrepro of SUVmax across 12 scanners in the present study. Takahashi et al. reported a maximum difference of 45.7% in the SUVmax with a 27-mm sphere across five different PET scanners [32]. Such a substantial variation is a problem for reproducibility. Quak et al. evaluated harmonization across three PET scanners using EQ.PET for 1380 FDG-avid tumours [23]. They reported that the mean ratio with and without PSF of the SUVmax for 1380 tumour lesions was 1.46 (95%CI 0.86–2.06) before and 1.02 (95%CI 0.88–1.16) after harmonization. Because the mean value, + 1.96 × SD, was the upper limit of the CI, the SD before harmonization was 0.31, thus yielding a CV of 21%. The SD after harmonization was 0.07, thus yielding a CV of 7%. The present study found that GI-PET improved the range of CVrepro across 12 PET scanners from 3.83–15.3% to 2.30–10.7%, suggesting that inter-scanner variability was improved equally well by GI-PET and EQ.PET. Harmonization is thought to decrease differences in variations among acquisition protocols (2D, 3D, TOF, etc.), machine specificities (such as type of crystal) and other elements (clock synchronisation, activity preparation and injection). However, their study included 517 patients with tumours who were examined using three PET scanners, whereas we examined a NEMA IEC body phantom using 12 PET scanners. The difference may also be affected by the biological features of tumours and patients. Patients should ideally be clinically assessed using the same scanner. However, patients often undergo PET evaluations with different scanners within the same institution. Multicentre studies usually have different scanners, and thus, therapeutic effects can be assessed using various scanners, and patients are often transferred to other hospitals for advanced specialist care.

Harmonization of the SUV also improved the CVrepeat of SUVmax across 15 frames in each scanner. Velasquez et al. evaluated SUV repeatability in a study of 18F-FDG PET and found that the intra-subject CV for SUVmax was 10.7% [33]. Doot et al. evaluated the influence of the FWHM of a GF on the repeatability of phantom data and found that increase of smoothing parameter decreased the standard deviation (SD) of the RC [34]. Kelly et al. evaluated the effects of harmonization on quantitative variation in the SUVmax relative to the reconstruction protocol. They found that the CV of the SUVmax across 15 repeat scans pre-harmonization was 2.81%, 3.25% and 4.69% for OSEM, OSEM+PSF and OSEM+PSF+ time of flight (TOF), respectively, whereas those values after harmonization were 2.28%, 2.00% and 2.58%, respectively [31]. Harmonization is considered to improve the repeatability of any reconstruction protocol.

The CVrepeat of all scanners and spheres was < 10% except for that with the 10-mm sphere using the DSTE. With respect to CV repeatability, Lodge et al. reported that the within-subject CV of a tumour SUV was 10% when acquired with careful attention to the protocol [35]. Data acquisition with the DSTE for clinical examinations requires 3 min per bed position, whereas our phantom study acquired data for 2 min per frame in the evaluation of repeatability. This might be why the CVrepeat of the 10-mm sphere using DSTE was > 10%.

The SUVmax of all PET scanners in the present study was above the lower limit of the reference SUV range. Lasnon et al. examined the RC using Biograph TrueV and found that the RC for a 10-mm sphere on OSEM images was slightly below the lower limit of the EANM proposed range [15]. Harmonization using a GF is available when the RC is above the lower limit of the reference range, and it cannot proceed when the RC is below this limit. However, since some older PET and PET/CT findings of low spatial resolution might have an SUVmax below the reference range, the reconstruction settings (including reconstruction methods, matrix, pixel size, iterations, subsets and post-filter) should be carefully considered to obtain an RC within the reference range.

Several limitations are associated with the present study. The basic performance of PET/CT has improved in recent years as the technology has progressed. Therefore, the reference range of the SUV might change with future advancements in PET/CT. We only assessed the SUVmax, although other metrics, such as metabolic tumour volume and total lesion glycolysis, are also useful [36]. The reproducibility and repeatability of new metrics that might be reliable biomarkers should be evaluated. Differences in the voxel size depending on the reconstruction protocol require consideration. GI-PET automatically converts the original voxel size to the isotropic voxel size, which might result in the SUV differing from the value obtained on a proprietary workstation. A vendor-neutral software for SUV harmonization should be able to apply a 3D-GF to original (non-isotropic voxel) data. The present study is evaluated using only the phantom. Therefore, the feasibility of vendor-neutral software in clinical practice requires validation by further clinical studies.

Conclusions

We harmonized quantitative values among 12 PET scanners using a commercial vendor-neutral software. This harmonization strategy based on the simple down-smoothing method improved the reproducibility and the repeatability of the SUV determined from a hot lesion. Therefore, this software might enable comparison of SUV directly across different scanners and facilitate multicentre oncology PET studies.