Introduction

Myocardial perfusion imaging (MPI) using single-photon emission computed tomography (SPECT) is widely used for non-invasive diagnosis of obstructive coronary artery disease (CAD) [1]. In general, SPECT studies are graded based on visual assessment of relative tracer uptake images with subjective interpretation that includes factors such as pre-test likelihood of disease, image quality, and potential (attenuation) artifacts. Expert reading with comprehensive evaluation of these factors requires prolonged periods of training, but ultimately it remains subjective. To assist clinical decision-making, commercially available software packages have been developed to analyze MPI images based on normal databases [2,3,4]. Data on the actual clinical diagnostic performance of automated MPI SPECT analysis are, however, scarce [5,6,7]. Therefore, current guidelines recommend that automated analysis should be used only as an adjunct to visual analysis [8]. It is important to note that these recommendations have been based primarily on relatively outdated SPECT MPI technology without currently available attenuation correction (AC) or appropriate validation against valid reference standards [5,6,7, 9]. As a reference standard, predominantly invasive coronary angiography (ICA) is used, despite the fact that only fractional flow reserve (FFR) guided treatment has shown improvement in event-free survival and its frequent discrepancy with angiography is increasingly acknowledged [10,11,12]. It therefore remains unclear how automated scoring systems compare with expert human visual grading and what the potential impact is of CT-based AC. The current sub-analysis of the PACIFIC trial aims to compare the diagnostic accuracy of expert core laboratory reading of SPECT MPI against automated software package grading with and without CT-based AC, for the detection of obstructive CAD with FFR as a reference.

Methods

Patient population

The study population comprised 206 patients from the PACIFIC study (Comparison of Coronary CT Angiography, Myocardial Perfusion SPECT, PET, and Hybrid Imaging for Diagnosis of Ischemic Heart Disease: Prospective Cohort Study Using Fractional Flow Reserve to Determine Functional Severity of Coronary Stenoses; NCT01521468), who underwent ECG-gated SPECT/CT and invasive coronary angiography (ICA) with routine interrogation by FFR [13]. Enrolled patients were suspected of stable CAD with an intermediate pre-test likelihood and a normal left ventricular function. Exclusion criteria were a documented history of CAD, signs of prior myocardial infarction, contraindication to adenosine, atrial fibrillation, glomerular filtration rate < 45 mL∙min−1, and pregnancy.

Image acquisition and reconstruction

Patients were instructed to refrain from intake of products containing caffeine or xanthine 24 h prior to the scans, next to at least 4 h of fasting. A 2-day stress-rest 99mTc-tetrofosmin protocol was performed in all patients. During continuous infusion of adenosine (140 μg∙kg−1∙min−1), a weight-adjusted dose of 370 to 550 MBq 99mTc-tetrofosmin was injected. The adenosine infusion was terminated 3 min after tracer injection. Following a delay of 60 min, ECG-gated stress SPECT images were acquired. Rest SPECT imaging was performed on the same day as ICA. Images were acquired on a dual-head hybrid SPECT/CT scanner (Symbia T2, Siemens Medical Solutions, Erlangen, Germany). Emission data were acquired using a parallel-hole, low energy, high resolution collimator with a 20% symmetric window centred at 140 keV, where the two detector heads were positioned at an angle of 90°. The camera heads performed a 180° rotation with, in total, 64 rotational steps of 40 s per projection. ECG-gating was performed with an electrocardiogram R-wave detector with acquisition of 8 frames per cardiac cycle. Images were reconstructed in both a static and gated manner. Characteristics of the embedded two-slice CT component were as follows: slice width 5.0 mm; pitch 1.5; 130 kV, 17 mA; rotation time 0.8 s. SPECT acquisition was followed immediately by a low-dose CT scan during normal breathing and without ECG-gating to correct for attenuation using 130 keV, 20 mAs, a computed tomography dose index of 2.2, and a dose length product of 40. CT images were reconstructed with a 128 × 128 matrix and a slice thickness of 5 mm.

Visual analysis

Visual interpretation was performed by a core laboratory. A highly experienced observer (SRU, > 30 years of experience in nuclear cardiology) was blinded to other imaging and angiographic findings, but limited clinical information was available (patient's sex, age, body mass index, type of chest pain, and the presence of a left bundle branch block) because of the direct effects on scan interpretation. MPI images were interpreted based on a 17-segment model [14]. Each segment was scored using a 5-point scoring system (0, normal; 1, mildly decreased; 2, moderately decreased; 3, severely decreased; and 4, absence of segmental uptake). Summed rest scores (SRS), summed stress scores (SSS), and summed difference scores (SDS) were calculated from the segmental scores, with SSS ≥ 4 and SDS ≥ 2 considered abnormal [15, 16]. The expert reader was able to take into account additional information such as raw projections, ECG-gated LV functional information, as well as non-corrected (NC) and AC reconstructions in order to maximize recognition imaging artifacts. Visual interpretation was conclusively classified as normal or abnormal on a per patient basis.

Automated analysis

Perfusion parameters were derived in an entirely automated fashion using commercially available software (Cedars-Sinai Quantitative Perfusion SPECT [QPS]) [2, 17, 18]. Each scoring parameter was derived from images with and without AC, representing both the extent and severity of myocardial hypoperfusion. These parameters comprise both the aforementioned scores based on the average defect severity per segment (SSS and SDS), as well as the pixel-wise total perfusion deficit (TPD) during stress (S-TPD) and the ischemic TPD (I-TPD), defined as the difference between stress and rest TPD [18]. SSS ≥ 4, SDS ≥ 2, S-TPD ≥ 5%, and I-TPD ≥ 3% were considered abnormal [15, 16, 19, 20].

Potential enhancement of the diagnostic performance of automated quantitative scoring was also explored. For this purpose, the total study database was consecutively divided into two subgroups. The optimization process comprised two components. First, an institutional normal database was created using data from the first subgroup, the derivation cohort (n = 103). The normal database was developed with SPECT images derived from patients with both normal angiographic findings as well as normal myocardial perfusion using [15O]H2O positron emission tomography (PET) imaging, which was additionally performed in the context of the PACIFIC trial [13]. By doing so, only SPECT images derived from patients without CAD were selected. The institutional database could then be generated within the commercially available software package. Second, optimal thresholds were obtained from the derivation cohort for each grading parameter with and without AC. Automated scoring was subsequently performed in the validation cohort (n = 103) with the use of the new normal database and optimized thresholds for abnormal scans.

Invasive coronary angiography and fractional flow reserve

ICA imaging was performed using a standard protocol in at least two orthogonal directions per evaluated coronary artery segment. In order to induce epicardial coronary vasodilation, 0.2 mL of nitroglycerin was administered intracoronary ahead of contrast injection. All major coronary arteries were interrogated routinely by FFR, regardless of stenosis severity, except for occluded or subtotal lesions of more than 90%. FFR was measured using a 0.014-inch sensor tipped guide wire (Volcano Corporation, Rancho Cordova, CA, USA), which was introduced through a 5- or 6-F guiding catheter, calibrated and advanced into the coronary artery. Intracoronary (150 μg) or intravenous (140 μg∙kg−1∙min−1) adenosine infusion was used to induce maximal coronary hyperaemia. FFR was calculated as the ratio of mean distal intracoronary pressure, and mean arterial pressure. A coronary lesion was considered hemodynamically significant in case of FFR ≤ 0.80, or stenosis severity >90% obtained with quantitative coronary angiography (QCA) if FFR was missing. A stenosis with an FFR > 0.80, or a stenosis severity <30% (obtained with QCA) in the absence of FFR measurements, was not considered to be functionally relevant. Secondary analyses were performed using QCA stenosis severity as a reference with ≥70% stenosis considered obstructive. All images and FFR signals were interpreted by two experienced interventional cardiologists blinded to the SPECT result.

Statistical analysis

Continuous variables were expressed as mean ± standard deviation, and categorical variables were expressed as percentages (%).The total estimate of agreement, defined as total cases where the tests agreed, was compared between automated and visual reads. Receiver operating characteristic (ROC) curves were performed to evaluate the ability of automated and visual scoring for predicting significant CAD. Optimal thresholds were established with the use of these ROC curves and the Youden index. The McNemar test was used to compare binary diagnostic performances of two assessments. For all analyses, p values <0.05 were considered statistically significant. Data were analyzed using SPSS Statistics version 20 (IBM Corporation, Armonk, NY, USA) and MedCalc version 10.3.0.0 Software (Mariakerke, Belgium).

Results

Baseline characteristics of the study population are listed in Table 1. In brief, the mean age was 58.2 ± 8.7 years, 64% were male, and 92 (45%) patients were found to have significant CAD as defined by invasive coronary angiography with an FFR ≤ 0.80. In total, FFR was measured in 548 vessels, but not in 61 due to complete (n = 24) and sub-total (n = 34) occlusions (deemed hemodynamically significant), or severe coronary tortuosity (n = 3, no stenosis ≥30%, considered not hemodynamically significant CAD). Mean radiation dose for SPECT was 4.89 ± 0.71 mSv without low-dose CT and 6.01 ± 0.71 mSv with low-dose CT for attenuation correction. Additionally, visual expert analysis resulted in 59 (29%) abnormal SPECT studies. A case example is shown in Fig. 1.

Table 1 Patient baseline characteristics
Fig. 1
figure 1

Representative SPECT images with and without AC, and invasive coronary angiography images of an 80-year-old male with typical angina. The left panel shows stress (upper row) and rest (lower row) images without AC. Only subtle perfusion reversibility can be observed in the anterior territory, whereas a fixed defect might be identified visually in the inferior territory. Automated grading revealed rather low scoring values, which were nonsignificant except for SDS and I-TPD. SPECT images with AC in the center panel display a slightly different perfusion pattern with more pronounced reversibility in the anterolateral segments, whereas the inferior wall is corrected into normal perfusion. Automated grading now clearly indicates ischemia in the anterior region only, instead of possible ischemia anterior and inferior. A sub-totally occluded diagonal branch but non-significant stenosis in the RCA on angiographic images (right panel) confirm the SPECT findings. AC = attenuation correction; FFR = fractional flow reserve; I-TPD = ischemic total perfusion deficit; NC = non-corrected; SDS = summed difference score; SRS = summed rest score; SSS = summed stress score; R-TPD = rest total perfusion deficit; S-TPD = stress total perfusion deficit

Diagnostic performance of visual and standard automated assessment

Table 2 shows diagnostic performance in terms of sensitivity, specificity, and diagnostic accuracy of expert visual reading and multiple automated measurements for the detection of hemodynamically significant CAD. These parameters were scored using normal scan databases incorporated in the commercially available software and accepted thresholds of abnormality. Among all automatically assessed scores, only SDS with and without AC (86.5% and 80.0%, respectively, p < 0.001 for both) showed a significantly higher sensitivity than expert reading (56.5%). Sensitivity of the other parameters, including SSS, S-TPD and I-TPD, did not statistically differ from visual analysis, even though a trend was visible in favor of automated analysis. In contrast, specificity of expert reading (93.9%) was significantly higher than that of each of the automatically derived scores. In terms of diagnostic accuracy, automated assessment performed more poorly than visual reading (77.2%), except for SSS AC (69.8%, p = 0.063) and S-TPD AC (71.2%, p = 0.134).

Table 2 Diagnostic performance of expert visual analysis and automated analysis using standard software for the detection of coronary artery disease (n = 206)

Optimizing automated assessment

After dividing the total group of patients into derivation and validation cohorts (n = 103 for both), 51 normal SPECT images (including 30 female patients images) were used for the development of a new institutional database. As listed in Table 1, there were no differences in baseline characteristics between the derivation and validation cohorts. Figure 2 shows the average polar maps for the normal database, incorporated in the software package by the vendor, next to the average polar maps derived from the newly generated institutional normal database. Cases selected for the new normal database (normal FFR and normal PET perfusion) showed an SSS = 0 in 7 (14%) cases and an abnormal SSS ≥ 4 in 15 (29%) with the use of the original database. Mean SSS values decreased implementing the institutional database towards SSS = 0 in 34 (67%) cases and SSS ≥ 4 in 1 (2%) case. Based on the derivation cohort and the new normal databases, optimal thresholds were set at SSS ≥ 3, SDS ≥ 2, S-TPD ≥ 5 and I-TPD ≥ 2 for images without AC (AUC: 0.83, 0.84, 0.87, and 0.79, respectively), and at SSS ≥ 2, SDS ≥ 2, S-TPD ≥ 2 and I-TPD ≥ 1 for images with AC (AUC 0.82, 0.83, 0.85, and 0.76, respectively; Fig. 3).

Fig. 2
figure 2

Average polar maps for male (ad) and female (eh) from the newly derived institutional normal database in the left column and from the vendor-supplied normal database in the right column. Polar maps a, b, e, and f are created from non-attenuation corrected images, whereas polar maps c, d, g, and h are derived from attenuation corrected images

Fig. 3
figure 3

Receiver operating characteristic curves for predicting significant coronary artery disease, defined by an FFR ≤ 0.80, in the derivation cohort using the new normal databases for NC (left panel) and AC (right panel) automated parameters (SSS, SDS, S-TPD, and I-TPD). The lines represent prognostic sensitivity and false positive rates at increasing threshold values. Areas under the curves and 95% confidence intervals were calculated for each parameter. Threshold values with the highest Youden index for each curve are marked with open dots. Abbreviations as in Fig. 1

Diagnostic performance of optimized automated assessment

Table 3 presents the diagnostic performance of expert visual reading and automated assessment in the validation cohort using the institutional database and optimal thresholds. The sensitivities for NC images were consistently low and did not significantly differ from visual reads. Automated scoring with AC images provided higher sensitivities, although this difference was significant only for S-TPD and I-TPD (p = 0.001 and p = 0.008, respectively). In contrast, the visually obtained high specificity remained significantly different from all automated AC scores, whereas NC images did not show a significant difference. Consequently, diagnostic accuracy for each automated assessment did not significantly differ from expert visual analysis. Explicitly, the highest accuracies were found for SSS AC (72.5%), SDS AC (72.0%), and S-TPD AC (73.5%) and paralleled expert analysis (73.8%).

Table 3 Diagnostic performance of expert visual analysis and automated analysis using optimized software with a new normal database and thresholds, for the detection of coronary artery disease in the validation cohort (n = 103)

Diagnostic performance using angiographic stenosis severity as a reference

Using QCA instead of FFR as a reference, the performance of the expert reader was enhanced, mainly due to a higher sensitivity (online Tables 1 and 2). In general, sensitivity also increased for automated analysis while specificity generally decreased, yielding a heterogeneous change in performance in terms of diagnostic accuracy. Nevertheless, diagnostic accuracy was significantly lower for all automated scoring variables with the use of standard software (online table 1). After the software optimization, NC parameters were comparable to visual reading with regard to diagnostic accuracy, except for I-TPD which was significantly lower. Among the AC parameters, SSS and SDS showed comparable accuracy, whereas S-TPD and I-TPD revealed significantly lower accuracy (online table 2). AUCs of the automated parameters did not significantly differ, but in general, a trend was observed for the numerical smaller AUCs for I-TPD with and without AC (online Figs. 1 and 2).

Discussion

At present, the standard clinical practice for evaluation of SPECT images is visual assessment, which depends on the skills of the reader and which is rather subjective. The present study demonstrates that, in general, automated analysis has a lower diagnostic accuracy than visual analysis, predominantly instigated by a lower specificity. After the introduction of an institutional normal database and optimization of thresholds, however, diagnostic accuracy of the automated analysis increased and no longer differed from expert visual reading. A novelty of this prospective study comparing expert visual analysis and automated computer analysis is that every patient underwent invasive coronary angiography with routine FFR measurements as a reference standard.

Visual versus automated assessment of SPECT perfusion

An accurate assessment of the extent and severity of hypoperfused myocardium with SPECT is important for diagnostic and prognostic purposes [17, 19, 21], but remains subjective when determined visually. Therefore, software tools for automatic quantification have been developed, and subsequently it has been shown that these are more reproducible than visual assessment, even if the latter is performed by highly experienced readers [9, 22]. Arsanjani et al. showed that diagnostic performance of automated analysis did not significantly differ from that of visual assessment [7]. Although these results were promising, the study was hampered by the fact that a significant portion of the study population was expected to be free of CAD based on low risk profiles without confirmation through ICA. Another limitation of several diagnostic studies is that when study patients did undergo ICA, visual estimation of stenosis severity was used as the reference standard, but frequent disagreement between angiographic visual stenosis severity and functional severity is increasingly recognized [11]. As a sub-analysis from the PACIFIC trial, the present study was not hampered by these limitations, as all subjects underwent invasive coronary angiography with routine FFR measurements [13].

The present results from automated analysis using standard software and thresholds, in general, revealed a lower diagnostic accuracy than visual expert reading except for SSS and S-TPD with AC. Furthermore, automated analysis showed higher sensitivity, but lower specificity, than visual reading. This implies that when automated analysis is used for diagnostic purposes, AC is warranted and stress-only protocols are sufficient. Nevertheless, it should be realized that these results hold true only for this particular study population of patients with a normal left ventricular function without prior history of CAD or myocardial infarction. In addition, the slightly higher sensitivity of automated analysis in trade off of a lower specificity could be more favorable as the number of ‘unnecessary’ invasive angiograms might outweigh missing obstructive CAD due to a false negative SPECT. Of note, it seems important to see that the use of FFR instead of the more traditionally used QCA as a reference changed the results to some extent (online table 1). One explanation for the improved performance of visual reading using QCA as a reference, might be the reader’s experience and prior feedback based on QCA rather than FFR (i.e. the readers ‘internal normal database’).

Normal database

The technique for automated quantification of myocardial perfusion relies on the analysis of tracer distribution within one patient, which then is compared with a database of normal perfusion scans. This so called normal database is usually based on perfusion images from the USA, since most software packages are developed there. Given possible differences in patient habitus and imaging protocols (including tracer doses and scanning settings), this database may not be optimal for other regions [23]. An interesting study from Nakajima et al. showed a significant diagnostic improvement with the use of a region specific normal database in Japan [24]. In contrast, a similar study with a French population did not show a clear benefit compared with the normal database supplied by the vendor [25]. The present study demonstrates the feasibility of creating an institution normal database with only a limited number of normal perfusion studies. However, reviewing Fig. 2 reveals merely minor differences for average polar maps between the present institutional database and the vendor supplied database. Nonetheless, the normalcy rate improved from 71% to 98% and the majority of diagnostic accuracies for automated analyses in the validation cohort directly increased after implementing the institutional database (online Fig. 2). The most pronounced differences (according to average segmental counts) seem to be located in the anterior and basal inferior regions for NC images, probably as a result of attenuation differences due to surrounding soft tissue, such as breast and abdomen. Of interest, the normal tracer distribution of AC images, on average segmental counts maps and standard deviation maps, appears to be very similar for both gender averages as well as for vendor and institution databases. This suggests the possibility to easily exchange AC image normal databases [26]. A limitation of common normal databases is the typical use of SPECT images obtained from patients with a low pretest likelihood for CAD, who did not undergo ICA to confirm. Even though differences might be small, the current database is unique because normal perfusion was guaranteed through the confirmation with ICA and [15O]H2O PET imaging in prospectively enrolled patients.

Optimization of automated analysis

The main purpose of generating an institutional normal database was to improve performance of the automated analysis. Based on the derivation cohort, optimal thresholds were slightly lower than the traditional cut-off values, particularly for AC thresholds. The diagnostic accuracy of the newly set thresholds was higher for attenuation corrected images, confirming the benefit of attenuation correction when automated analysis is used. Although diagnostic accuracy was consistently higher with AC, the question remains whether this justifies the additional radiation burden for patients, or costs and time for imaging laboratories. Furthermore, derivation cohort AC images did not show an improved accuracy when using AUCs, which provide a comprehensive and likely a more adequate evaluation of the diagnostic performance than the dichotomized accuracy (Fig. 3). The validation cohort revealed that implementation of a normal database and optimized thresholds now resulted in an equivalent diagnostic accuracy of automated analysis as compared with visual expert grading for all investigated parameters (SSS, SDS, S-TPD and I-TPD). Using QCA as a reference instead of FFR, diagnostic accuracies of most optimized automated analysis parameters were not significantly different from visual reading (online table 2). In general, however, these numerical differences were somewhat more pronounced than for the comparisons referenced by FFR. The fact that the institutional normal database was created with the use of FFR rather than QCA might have played a role.

Recently, several developments have been implemented clinically in order to improve diagnostic performance of conventional SPECT imaging. For instance, ECG-gated acquisitions provide additional diagnostic and prognostic information such as end-diastolic volume, ejection fraction and transient ischemic dilatation [27,28,29]. These functional parameters were not included in the present automated analysis, but hold great potential for further improvement of the automatic SPECT interpretation, for example using machine learning programs [30].

Limitations

Some limitations should be noted in the context of this study. First, the study population consisted of a total of 206 patients with two test groups of 103 patients, which might be enough to perform comparisons, but may be rather small to detect significant differences. Accordingly, also, the newly derived normal database is relatively small. Despite Slomka et al. [31] having recommended 20–40 images to create a reliable normal database, an appealing study from Tragardh et al. [32] demonstrated an improved accuracy with an increasing database size, up to 100 images. Furthermore, present analyses were performed with one specific scanning protocol and study population and compared with a single expert visual reader. Current findings would therefore not be interchangeable with other institutions using different imaging protocols, other subgroups of patients, and other visual reviewers. Finally, it has to be acknowledged that diagnostic accuracy results depend on the prevalence of disease.

Conclusion

Visual analysis of SPECT imaging slightly outperforms automated analysis with standard software in the detection of FFR-defined significant CAD. After optimization with an institutional normal database and thresholds, however, diagnostic accuracy of automated analysis equalled expert visual analysis without the need for comprehensive reading experience. Therefore, automatic assessment has the potential to simplify the diagnostic process using SPECT, particularly in conjunction with CT-based AC.