Introduction

Ultrasonography, CT and MRI are non-invasive imaging methods that are commonly used for the evaluation of parotid tumours. However, these conventional imaging methods are less accurate owing to the overlap in the appearance of parotid tumours. Some malignancies that contain a large amount of serous and mucoid contents are well defined with a homogeneous appearance and resemble benign lesions. In addition, haemorrhage and calcification in benign tumours may result in a heterogeneous appearance that resembles a malignancy [1,2,3,4,5]. Although ultrasound-guided fine-needle aspiration cytology (FNAC) is considered the gold standard for preoperative diagnosis [6], it is an invasive method and, as a general rule, non-invasive methods are preferred when the results are similar [7].

Sonoelastography is an innovative diagnostic imaging tool that assesses tissue stiffness [8]. Since malignant tissues are generally stiffer than benign components, sonoelastography has been used in many organs, such as the breast, thyroid and prostate, for differential diagnosis between malignant and benign lesions [9,10,11,12,13,14]. Recently, numerous studies have been published on the role of sonoelastography for differentiating between malignant and benign parotid lesions. However, there are large differences in the results, with a sensitivity ranging from 40% to 100% and a specificity ranging from 26% to 97% [7, 8, 15,16,17,18,19,20]. Therefore, this study aimed to assess the performance of sonoelastography for differential diagnosis between malignant and benign parotid lesions using a meta-analysis.

Materials and methods

Literature search

The study complied with the PRISMA recommendations [21, 22]. An independent literature search of English medical databases including PubMed, Embase and Medline (Embase.com), Web of Science, Cochrane Library and Ovid was performed to identify all studies evaluating differential diagnosis between malignant and benign parotid lesions. The strategies are shown in Table 1. Duplicated articles were excluded manually. Unpublished relative data were considered as well, but no suitable studies were identified for inclusion. The study was performed by two independent researchers. This literature search was updated until 30 October 2017 and a beginning date limit was not used.

Table 1 Search strategy of each database

Inclusion and exclusion criteria

All the articles were assessed independently by two researchers. The inclusion criteria for the studies were as follows: (1) The study was approved by an ethics committee or institutional review board. (2) The diagnostic performance of sonoelastography for the differential diagnosis between malignant and benign parotid lesions was evaluated in the study. (3) Postoperative pathology and/or fine-needle aspiration cytology (and/or histology) results were used as the reference standard in the study. (4) Complete reported data were available to calculate the true positive (TP), false positive (FP), false negative (FN) and true negative (TN) cases. The exclusion criteria for the studies were as follows: (1) Reviews, case reports, letters, conference reports, editorial comments and articles that were not published in English were excluded. (2) In studies with insufficient data, the corresponding authors were contacted and requested to provide the missing data via e-mail. The studies were excluded if the author did not reply within 15 days. (3) When two or more studies were performed by the same department, the study that was older or that had the smaller number of patient samples was excluded. All the disagreements were resolved by consensus.

Data extraction

Two investigators extracted the data independently. All relevant data including first author, country where the study was performed, published year, patient age, proportion of male and female patients, number of patients, number of lesions, reference standard, type of lesions, ultrasound system, sonoelastography index, cut-off value and number of TPs, FPs, FNs and TNs were extracted. The cut-off value was defined according to the Youden method if it was not clearly provided by the author. Disagreements were resolved by consensus.

Quality assessment

The methodological qualities of primary studies were assessed with the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) criteria [23]. The defined questions were answered as yes, no or unclear, and ultimately, a maximum score of 14 was used to estimate the quality of each article. Two researchers completed all the items and disagreements were resolved by consensus.

Data analysis

The statistical software Meta-Disc (Version 1.4, Unit of Clinical Biostatistics team of the Ramón y Cajal Hospital), STATA (Version 12.0, Stata Corporation) and SPSS Statistics (Version 17.0, SPSS Inc.) were used in this study. The Spearman correlation coefficient was used to analyse the threshold effect. The heterogeneity was evaluated by the Cochran Q statistic and the I2 test. A random effects model was used when the p value of heterogeneity was less than 0.05 or the I2 was at least 50%, otherwise a fixed effects model was used. The pooled sensitivity, specificity, diagnostic odds ratio (DOR), area under the curve (AUC) and Q* index were calculated using Meta-Disc. Potential sources of heterogeneity were explored with a meta-regression analysis. Deeks’ funnel plot was generated in STATA to analyse the potential publication bias, with a p < 0.05 indicating potential publication bias. Interobserver agreement was analysed with Cohen’s κ analysis using SPSS software while screening articles and applying the QUADAS criteria.

Results

Literature search and characteristics of included studies

Ten relative studies including 711 patients with 725 parotid lesions were included in the meta-analysis after literature search, which were published from 2012 to 2017 [6,7,8, 15,16,17,18,19,20, 24] (Fig. 1). The main characteristics of the included studies are summarised in Table 2. Controversies occurred between two observers in the step when the records were excluded by title and abstract. However, it showed an excellent interobserver agreement (κ = 0.86; 95% CI 0.72–0.99). Ultimately, all the controversial articles were included in this step. There was no disagreement in other steps of screening (κ = 1).

Fig. 1
figure 1

Flow diagram of study selection. n = number of studies

Table 2 Main characteristics of included studies

Quality assessment

Quality assessment of each study is shown in Table 3. Most of the indexes were adequate and resulted in a high QUADAS score. However, it was unclear if the pathologist was blinded to the sonoelastography results in all the studies. In one study, only pleomorphic adenomas were identified in the benign group [16]. In one study, it was unclear if the radiologist was blinded to the pathology [20], and in another study, the ultrasound examiners were aware of the histological properties of the respective lesions [8]. The interobserver agreement was good (κ = 0.77; 95% CI 0.60–0.93).

Table 3 Quality assessment of the included studies using the “QUADAS” questionnaire

Diagnostic accuracy for differential diagnosis between malignant and benign parotid lesions

No heterogeneity was identified by analysis of the diagnostic threshold, with a Spearman correlation coefficient of 0.389 (p = 0.266). The diagnostic accuracy of sonoelastography for differential diagnosis between malignant and benign parotid lesions was computed on the basis of a pooled sensitivity of 0.67 (95% CI 0.59–0.74), specificity of 0.64 (95% CI 0.60–0.68) and DOR of 8.00 (95% CI 2.96–21.63) (Fig. 2). An overall moderate degree of accuracy was identified by the summary receiver operating characteristic (SROC) curve with an AUC of 0.77 (Q* = 0.71) (Fig. 3).

Fig. 2
figure 2

Forest plots of the pooled sensitivity (a) and specificity (b) of sonoelastography for differentiating between malignant and benign parotid lesions

Fig. 3
figure 3

Summary receiver operating characteristic (SROC) curve on sonoelastography for differentiating between malignant and benign parotid lesions. The middle curve is the SROC curve. The upper and lower curves show the 95% confidence intervals

Heterogeneity results

The Cochran Q test and the I2 test revealed significant heterogeneity with p < 0.001 and I2 = 77.2%. To further explore the sources of heterogeneity, a meta-regression analysis evaluating imaging mechanisms (group 1, strain elastography (SE); group 2, shear wave elastography (SWE)), shear wave elastography techniques (group 1, supersonic shear imaging (SSI) with a SuperSonic Imagine Aixplorer; group 2, acoustic radiation force impulse imaging (ARFI) with a Siemens S2000), assessment methods (group 1, qualitative; group 2, quantitative or semiquantitative) and QUADAS scores was performed. The results indicated that no heterogeneity was due to the imaging mechanism (p = 0.119), shear wave elastography technique (p = 0.473) or QUADAS score (p = 0.462). However, the assessment method was a significant factor that affected the study heterogeneity (p = 0.035). Compared with qualitative assessment methods, quantitative and semiquantitative methods performed better (Table 4).

Table 4 Results of the meta-regression and subgroup analysis for differential diagnosis between malignant and benign parotid lesions

Evaluation of publication bias

Publication bias was explored with a Deeks’ funnel plot and no significant differences were detected in this meta-analysis (p = 0.143) (Fig. 4).

Fig. 4
figure 4

Funnel plot for evaluating potential publication bias. Each solid circle represents a study in the meta-analysis. The line is the regression line

Discussion

Our current meta-analysis demonstrated that sonoelastography showed a pooled sensitivity of 0.67 (95% CI 0.59–0.74) and specificity of 0.64 (95% CI 0.60–0.68) for differential diagnosis between malignant and benign parotid lesions. The pooled DOR was 8.00 (95% CI 2.96–21.63) and the AUC was 0.77. The meta-regression analysis results revealed that the assessment method was a significant factor affecting study heterogeneity (p = 0.035). However, the summary estimates did not differ between SE and SWE (p = 0.119) or between ARFI and SSI (p = 0.473).

Recently, several original studies have focused on the value of sonoelastography for differentiating between malignant and benign parotid lesions. Sonoelastography is a novel ultrasonographic technique for assessing tissue elasticity and stiffness. Theoretically, malignant parotid tumours should be stiffer than benign ones. However, the situation seems complicated. Some authors have described the great performance of sonoelastography for differentiating between malignant and benign lesions with a high sensitivity of 94% and a specificity of 89% [7]. Some have described a relatively lower but still clear performance of sonoelastography, with a sensitivity of 70% and specificity of 66% [15]. However, others have described that there was no benefit of sonoelastography for differentiating between malignant and benign tumours; only cystic lesions or cystic areas within a lesion were reliably identified [18]. Our meta-analysis ultimately revealed a pooled sensitivity of 67% and a pooled specificity of 64% for differentiating between malignant and benign parotid lesions. Therefore, we believe that the overall value of sonoelastography for differential diagnosis was limited and not satisfactory.

Heterogeneity was revealed in our study. Therefore, a meta-regression analysis was performed to further explore the potential sources. The results showed that there was no difference between SE and SWE or between ARFI and SSI. However, the assessment method was a significant factor affecting study heterogeneity. Quantitative and semiquantitative methods performed better than qualitative ones. In this subgroup, there was a higher pooled sensitivity of 0.73, specificity of 0.83, DOR of 18.64 and an AUC of 0.88. This was probably because qualitative methods were usually performed with a scoring system that was subjectively used by operators and was thus more operator-dependent. However, semiquantitative and quantitative methods were automatically calculated by an ultrasound machine and were thus less operator-dependent.

Another potential source of heterogeneity might be the histopathological variety in malignant and benign parotid lesions. Celebi and Mahmutoglu [17] indicated that the diagnostic value of sonoelastography for evaluating pleomorphic adenomas, Warthin tumours, adenoid cystic carcinoma and high-grade tumours was low, whereas the diagnostic rates for low-grade tumours, such as mucoepidermoid carcinoma, acinic cell carcinoma and metastases of basal cell carcinoma, were better. Pleomorphic adenomas contained variable proportions of chondroid and/or myxoid matrix, which contained different amounts of fluid. Warthin tumours contained different amounts of lymphatic, cellular, mucous and fluid components. Thus, the two types of benign tumours could be solid, solid and cystic, or completely cystic, which resulted in a wide variety in stiffness. In a small sample study of 20 patients with only pleomorphic adenomas included in the benign group, 50% (6/12) of the adenomas were misdiagnosed as malignancies [16]. We tried to analyse whether sonoelastography could differentiate between low-grade parotid tumours and high-grade and benign ones. We also tried to analyse the effects of the different components in pleomorphic adenomas and Warthin tumours on sonoelastography. However, both of these analyses were not accomplished because, in most of the studies, the data were not recorded.

A strict procedure was carried out to screen the articles and ultimately 10 relative studies were identified. Deeks’ funnel plots showed no significant publication bias. Most of the studies were high quality according to the QUADAS questionnaire. A meta-regression revealed that the QUADAS score was not a significant factor affecting study heterogeneity. However, the QUADAS score seemed to perform better in relatively lower quality studies, as shown in Table 4. In one study [20], it was unclear whether the observers knew the histopathological results before analysing the images. In another study, the observers were aware of histological properties before reviewing the images and videos [8]. These unblinded studies probably had better performance and influenced the results. In addition, in all the studies it was unclear whether the histopathology reviewer knew the results of sonoelastography evaluations, which probably caused heterogeneity and influenced the results as well. To the best of our knowledge, this is the first meta-analysis to assess the diagnostic value of sonoelastography merely for differentiating between malignant and benign parotid lesions, except for salivary gland masses [26].

There are some limitations in our study. First, relatively few studies were included (i.e. ten). Second, we failed to acquire unpublished data and language limitations might have affected the reliability of the results. Third, postoperative pathology was used as a reference standard for tumour detection in most of the studies in this meta-analysis; however, in one study [15], only cytological and histological results from ultrasound-guided fine needle aspiration biopsy were used as reference standards, and in another two studies, cytology results from ultrasound-guided fine needle aspiration were used in six cases [17] and two cases [24], respectively. Although cytology and histology of fine-needle aspiration biopsy are suggested diagnostic methods for most parotid tumours, these methods have variable success with sensitivity ranging from 57% to 98%, specificity ranging from 56% to 100% and accuracy ranging from 78% to 98% [7].

In conclusion, this meta-analysis shows that sonoelastography has a limited value for differential diagnosis between malignant and benign parotid lesions. Quantitative and semiquantitative methods performed better than qualitative ones. Further large-sample, prospective, multicentre studies evaluating these two assessment methods are needed to confirm the findings. In addition, more studies should focus on the correlation between sonoelastography and corresponding histopathological changes in the future.