Introduction

Temporomandibular disorders (TMDs) are a type of musculoskeletal disorders with pain that compromise masticatory muscles, temporomandibular joints (TMJs), and other anatomical structures in the orofacial region [1]. Disc displacement (DD), i.e., an abnormal location of the TMJ disc, is a highly prevalent derangement within the TMJs [2]. The prevalence of DD in general population ranges from 18 to 35% [2], while that in TMD patients ranges from 38 to 73% [3,4,5]. DD is categorized based on the relation of the displaced disc with the mandibular condyle. The displacement of the disc can be anterior, anterolateral, anteromedial, lateral, medial, and posterior [6]. DD can be classified as DD with reduction (DDWR) or DD without reduction (DDWoR) [7]. DDWR was thought to be the most common internal derangement within the human TMJs [8]. In the general population, the prevalence of DDWR is about 40%, while that of DDWoR is 4% [9]. In TMD patients, the prevalence of DDWR ranges from 24 to 47%, while that of DDWoR ranges from 11 to 26% [3,4,5]. In DDWR, which is the most frequent type, the displaced disc returns to the normal position on mouth opening, thereby producing a “reciprocal click,” while DDWoR is a non-clicking condition that may be accompanied by a limited mouth opening and a deviation of the jaw to the affected side [7]. It is reported that DDWR is mostly a stable, pain-free, and lifelong condition of the joint [2]. However, in a small minority of patients, the disc loses its capacity to reduce on opening [2]. The loss of disc reduction may be accompanied by signs and symptoms of a closed lock-like painful and limited mouth opening [2]. A disc displacement may be a progressive disorder of the TMJ, starting as a DDWR early upon opening, then progressing to a disc displacement with a reduction late upon opening and finally, to a DDWoR [2]. However, there are indications that DDWR or DDWoR may either retard or arrest condylar growth and may thus be related to mandibular retrognathia and/or facial asymmetry, though these skeletal consequences are probably rare and small [2].

Following clinical examination, imaging of the TMJ can be used to confirm a possible disc displacement when indicated [10]. A proper diagnosis may inform the decision on its management [11]. In such instances, magnetic resonance imaging (MRI) is one of the most common imaging tools. It is in general widely regarded as the most accurate method, because it displays the anatomic details of the TMJ and high sensitivity and specificity of MRI have been reported [12, 13]. However, it has several drawbacks in clinical practice. MRI cannot be carried out in patients with pacemakers and metallic prosthesis or claustrophobic patients [14, 15]. Also, the use of MRI is limited by the required centralized facilities, the high cost, and the long time it takes for scanning compared to other imaging modalities like computed tomography (CT) [15]. Recently, visualization of the TMJ and its disc with US has gained attention and importance for by both research and patient care, because it is less expensive, less time-consuming, and easily accessible [16]. Also, it can be used to directly observe the joint disc movement during opening and closing of the mouth, which is considered an advantage, because it allows the investigator to detect disc position more accurately [17].

Although US is not yet commonly used in the diagnosis of TMJ DD, it has gained considerable attention over the last few years. However, its use in clinical practice is complicated by the wide and unexplained variation of the reported sensitivity, specificity, and accuracy, ranging from 13 up to 100%, from 62 up to 100%, and from 52 up to 100%, respectively. Moreover, the (added) diagnostic value of US compared to that of patients before receiving US for ruling in and ruling out the DD in clinical practice has not been established.

Thus, the purpose of this systematic review of clinical diagnostic outcome studies is to establish the diagnostic accuracy of US in patients with suspected of TMJ DD. The findings of this review will provide a summary on the utility of US in ruling in and ruling out DD. As such, the added diagnostic value that will be reported may facilitate decision-making on patient in daily practice.

Materials and methods

This systematic review and meta-analyses are carried out based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement [18].

Search strategy

Relevant publications were searched in electronic bibliographic sources, including PUBMED and EMBASE, without language restriction, up to May 2016. Relevant dental journals and reference lists of included studies were manually searched to avoid selection bias.

A combination of free text words and systematic vocabulary (Medical Subject Headings, EMTREE terms and topics) was used in the search strategies. The search strategy is presented in Table 1.

Table 1 The search strategies for the included studies

Selection criteria

Only publications reporting studies satisfying the following criteria were included:

  1. (1)

    Evaluating the diagnostic accuracy of US for DDs;

  2. (2)

    Patients with any symptoms or clinical signs related to a DD without considering type of displacement, gender, age, race, and social economic status;

  3. (3)

    MRI used as the reference test;

  4. (4)

    False-positive (FP), true-positive (TP), false-negative (FN), and true-negative (TN) values of US based on MRI for the diagnosis of DD are reported or sufficient reporting of data to (re)calculate these.

For US, a normal disc position has been defined for the closed-mouth position as the location of the intermediate zone of the disc between the anterosuperior aspect of the mandibular condyle and the posteroinferior aspect of the articular eminence, while for the open-mouth position, it is the location of the intermediate zone of the disc between the condyle and the articular eminence. DD is seen when compared to the above, the intermediate zone of the disc is positioned to the anterior, anteromedial, anterolateral, medial, or lateral [17, 19].

On MRI, a normal disc position is seen when the posterior band of the disc is located between 12 and 3 o’clock or in superior position relative to the condyle. DD is defined when the posterior band of the disc is in an anterior, anteromedial, anterolateral, medial, or lateral position relative to the superior part of the condyle [17].

DDWR is defined as displaced position of the disc in relation to the superior part of the condyle in the closed-mouth position and normal disc position in the open-mouth position. DDWoR is defined as displaced position of the disc in relation to the superior part of the condyle in both closed- and open-mouth positions.

Based on US, the disc can be identified as hyperechoic, hypoechoic, and isoechoic. Hyperechoic is defined that the disc is highly reflective and echo rich when compared with neighboring structures and appears as varying shades of lighter gray. Hypoechoic is defined that the disc is less reflective and low amount of echoes when compared with neighboring structures and appears as varying shades of darker gray. Isoechoic is defined that the disc has similar echogenicity to a neighboring structure [20].

Studies which met the following criteria were excluded:

  1. (1)

    Case-control studies, review articles, and case reports;

  2. (2)

    Duplicate publications.

Two reviewers (SN and LY) independently assessed the titles and abstracts of all identified studies from the electronic searches. Full texts were obtained for studies that met the inclusion criteria or where a clear decision could not be made from the title and abstract alone. In the latter case, selection was based on full text reading. During subsequent quality assessments and data extraction, the reviewers were neither masked for the study findings and reported conclusions, nor for the identity of the journal, the authors, and their affiliations. Reviewers resolved initial disagreements by consensus discussion.

Data extraction

For all included studies, the following data were extracted using a standardized form: (a) authors` names; (b) centers involved in the study and their locations; (c) year of publication; (d) demographic characteristics of participants, such as age and gender; (e) study design features; (f) inclusion and exclusion criteria; (g) diagnostic criteria; (h) type of US: US was performed at closed-mouth position and maximum-mouth-opening position (static) and/or US was performed during the mouth opening movement (dynamic); (i) the US positioning (horizontal or vertical); and (j) US resolution (high or low). High-resolution US (HRUS) is defined as the resolution of US being 12 MHz or more, while low-resolution US (LRUS) is defined as the resolution being lower than 12 MHz [15].

In addition, we aimed to extract the total number of study patients included and analyzed, as well as those with a true-positive, false-positive, false-negative, and true-negative finding of US based on MRI for the diagnosis of DD. These were either reported or recalculated from the reported data and explored missing data. Thereafter, we recalculated the prevalence of DD according to MRI and US, as well as the positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity. PPV was defined as the risk of the presence of DD based on MRI in those with presence of DD based on US (TP/(TP + FP)). NPV was defined as the risk of the absence of DD based on MRI in those with absence of DD based on US (TN/(FN + TN)). Sensitivity was defined as the percentage of patients with a DD, as based on MRI, correctly identified as such based on US (TP/(TP + FN)), while specificity was defined as the percentage of patients free of a DD, as based on MRI, correctly identified as such based on US (TN/(FP + TN)). The added value of US for ruling in the DD is calculated by subtracting the prior probability (prevalence) from the PPV. The added value of US for ruling out the DD is calculated by subtracting the complement of the prevalence (1 minus the prior probability) from the NPV.

Quality assessment

The QUADAS-2 tool was used to assess the methodological quality of the diagnostic accuracy studies [21]. The tool includes four domains, including patient selection, index test, reference standard, and flow and timing. Each domain was assessed in terms of risk of bias, with the first three domains also considered in terms of applicability concerns. A “Risk of bias” judgment (“high,” “low,” or “unclear”) was made for each domain. If the answers to all signaling questions within a domain were judged as “yes” (indicating low risk of bias for each question), the domain was judged to be at low risk of bias. If any signaling question was judged as “no” (indicating a high risk of bias), the domain was scored as a high risk of bias. This was followed by a judgment about concerns regarding clinical applicability for the patient selection, index test, and reference standard domains. Concerns about applicability were rated as “low,” “high,” or “unclear.” High concern was rated when the study did not match the review question in any domain, while low concern was rated when the study matched the review question in each domain. Unclear domain was rated only when insufficient data were reported.

Two independent raters, blinded for each other’s ratings, performed the QUADAS-2 assessment. Reviewers resolved initial disagreements by consensus discussion.

Statistical analysis

The data analysis was conducted using RevMan software, version 5.3 (RevMan, Copenhagen, Denmark; the Nordic Cochrane Centre, the Cochrane Collaboration, 2008), and Meta-Disc 1.4 (Meta-Disc, Clinical Biostatistics Unit, Ramon y Cajal Hospital, Madrid, Spain).

A random-effect model was used to calculate the overall pooled estimates of sensitivity, specificity, positive likelihood ratio (LR+), negative likelihood ratio (LR−), and diagnostic odds ratio (DOR) with 95% CIs. The forest plot and summary receiver operating characteristic (SROC) curve were generated to graphically present the interaction between sensitivity and specificity. The overall test performance was quantified using the area under the SROC curve (sAUC) and Q* (the point where sensitivity equals specificity on the SROC curve). sAUC measures the overall capacity of the test to discriminate between participants with the disease and those without it. A sAUC of 0.5 or lower indicates no discriminative ability. The higher a sAUC is, the higher the diagnostic accuracy of US is.

Finally, clinical value of US was assessed using the prevalences (prior probabilities) and the posterior probabilities, being the positive predictive values (PPV) and negative predictive values (NPV). The PPVs and prevalences were used to assess the (added) diagnostic value of US for ruling in an increased risk of DD. The NPVs and complement of the prevalences (1 minus prevalence) were used to assess the (added) diagnostic value of US for ruling out an increased risk of DD.

Results

Results of search and selection

The initial search identified a total of 559 studies. During screening the titles and abstracts, 532 of them did not satisfy the inclusion criteria. Another 11 studies eventually did not satisfy the inclusion criteria after carefully reading the full text publications. Therefore, a total of 16 studies were included in the present review (Fig. 1) [17, 19, 22,23,24,25,26,27,28,29,30,31,32,33,34,35].

Fig. 1
figure 1

Flow diagram of study inclusion

Characteristics of included studies

With respect to the design of the included studies, all of the 16 studies were prospective follow-up studies. The centers involved in the diagnostic assessments were various. Seven of the included studies were performed in the University of Innsbruck, Austria [19, 22, 24, 26, 28, 31, 33], and another two were performed in Frankfurt University Medical Center, Germany [23, 30]. The interpretation of US imaging varied among these studies. Eight studies identified the disc as hypoechoic to isoechoic [23, 24, 26, 27, 29,30,31, 33] and another three studies as hyperechoic [17, 25, 32]. However, the remaining five studies did not report the radiographic features of disc in US. The other characteristics are presented in Table 2.

Table 2 Characteristics of included studies

Quality assessment

With respect to the patient selection domain, we consider the risk of bias unclear in eight studies due to poor reporting on sampling procedures [24, 26,27,28, 30, 32, 34, 35]. For the index test domain, we consider the risk of bias high in one study [35], because the index test results were interpreted without blinding of the reference standard findings. Also, we consider the risk of bias unclear in another four studies [24, 29, 32, 33], because we were unable to confirm whether the index test results were interpreted in a blinded manner of the reference standard findings. For the reference standard domain, we consider the risk of bias high in one study [35], because the reference standard findings were interpreted without blinding of the index test results. The risk of bias in another four studies was considered unclear [24, 29, 32, 33], because we were unable to confirm whether the reference standard results were interpreted in a blinded manner of the index test findings. In the flow and timing domain, we considered the risk of bias high in three studies [23, 28, 35], because not all the patients included were in the final analysis in these studies. The risk of bias in other seven studies was considered unclear [17, 27, 29, 30, 32,33,34], because the time interval between the index test and the reference standard was not provided. Overall, there were six studies [17, 19, 22, 25, 26, 31] that carried a low risk of bias in all four domains or an unclear risk of bias in only one of the four domains, and there were three studies [23, 28, 35] that carried a high risk of bias in at least one domain (Fig. 2).

Fig. 2
figure 2

Summary of risk of bias assessment based on QUADAS 2

The evaluation of the applicability of included studies for the domains of the selection of patients, the index test and the reference standard showed that for only one study [26], insufficient information was provided on the index test. Furthermore, there were no concerns on the evaluated domains of the applicability for all the other included studies (Fig. 2).

Results of meta-analysis

Disc displacement at closed-mouth position

Fourteen studies were pooled [17, 19, 22,23,24,25,26,27,28,29,30, 32, 34, 35], and the pooled prevalence (prior probability) of DD-CM was 63% (95%CI: 60–65%). The pooled PPV was 85% (95%CI: 82–87%), and the added value for ruling in an increased risk of DD-CM was 22% (95%CI: 19–26%) (Table 3). The pooled NPV was 68% (95%CI: 64–71%), and the added value for ruling out an increased risk of DD-CM was 30% (95%CI: 26%; 35%) (Table 3). The pooled sensitivity was 0.78 (95%CI: 0.75–0.81), and the pooled specificity was 0.77 (95%CI: 0.73–0.80) (Table 4). Besides, the pooled LR+ was 3.02 (95%CI: 1.96–4.64), and the pooled LR− was 0.32 (95%CI: 0.23–0.45) (Table 4). The summary DOR was 10.80 (95%CI: 5.46–21.38), and the sAUC was 0.836 with a Q* of 0.768 (Fig. 3 and Table 4). When the three studies with high risk of bias [23, 28, 35] were excluded, the pooled added values for ruling in and ruling out an increased risk of DD-CM of the remaining 11 studies were 23% (95%CI: 19–26%) and 33% (28–37%), respectively. The pooled sensitivity was 0.80 (95%CI: 0.77–0.82), and the pooled specificity was 0.79 (95%CI: 0.75–0.82). Besides, the pooled LR+ was 3.31 (95%CI: 1.93–5.66), and the pooled LR− was 0.29 (95%CI: 0.21–0.40). The summary DOR was 12.25 (95%CI: 5.89–25.45), and the sAUC was 0.851 with a Q* of 0.782. The outcomes without the three studies of high risk of bias were very similar to the outcomes with all the studies included.

Table 3 Predictive values of ultrasonography for the diagnosis of disc displacement based on MRI
Table 4 Pooled statistics of ultrasonography for the diagnosis of disc displacement based on MRI using meta-analyses
Fig. 3
figure 3

Summary reviewer operator characteristics (sROC) curves. Symbols represent included studies in meta-analyses (DD-CM disc displacement at closed mouth position, DD-MMO disc displacement at maximum mouth opening, DDWR disc displacement with reduction, DDWoR disc displacement without reduction)

The subgroup analyses were performed for the types of US (static and/or dynamic), the resolution of US (high resolution or low resolution) and the centers of the researches (University of Innsbruck or other centers). The pooled added values for ruling in and out an increased risk of DD-CM based on the subgroups are presented in Table 3 and Fig. 4. The pooled sensitivity, specificity, LR+, LR−, DOR, and sAUC are presented in Table 4.

Fig. 4
figure 4

Pooled posterior probabilities of ultrasonography for ruling in or ruling out disc displacement compared to prevalences of disc displacement (a positive predictive values of ultrasonography for ruling in the presence of disc displacement at closed mouth position (DD-CM), disc displacement at maximum mouth opening (DD-MMO), disc displacement with reduction (DDWR), and disc displacement without reduction (DDWoR); b negative predictive values of ultrasonography for ruling out the presence of DD-CM, DD-MMO, DDWR, and DDWoR; c positive predictive values of ultrasonography for ruling in the presence of DD-CM based on subgroups; d negative predictive values of ultrasonography for ruling out the presence of DD-CM based on subgroups; e positive predictive values of ultrasonography for ruling in the presence of DD-MMO based on subgroups; f negative predictive values of ultrasonography for ruling out the presence of DD-MMO based on subgroups). The diagonals represent the posterior probability (PPV or NPV) were the same as the prior probability (prevalence), and there was no added values for ruling in or ruling out the DDs at this diagonal

Disc displacement at maximum mouth-opening position

Nine studies were pooled [19, 22,23,24,25,26,27,28, 30], and the pooled prevalence (prior probability) of DD-MMO was 35% (95%CI: 33–38%). The pooled PPV was 77% (95%CI: 72–81%), and the added value for ruling in an increased risk of DD-MMO was 41% (95%CI: 36–47%) (Table 3). The pooled NPV was 84% (95%CI: 81–87%), and the added value for ruling out an increased risk of DD-MMO was 20% (95%CI: 16–23%) (Table 3). The pooled sensitivity was 0.70 (95%CI: 0.65–0.74), and the pooled specificity was 0.88 (95%CI: 0.86–0.91) (Table 4). Besides, the pooled LR+ was 5.08 (95%CI: 3.36–7.68), and the pooled LR− was 0.40 (95%CI: 0.26–0.63) (Table 4). The summary DOR was 13.53 (95%CI: 6.27–29.18), and the sAUC was 0.952 with a Q* of 0.894 (Fig. 3 and Table 4). When the two studies [23, 28] with high risk of bias were excluded, the pooled added values for ruling in and ruling out an increased risk of DD-CM of the remaining seven studies were 43% (95%CI: 37–48%) and 22% (18–26%), respectively. The pooled sensitivity of the remaining seven studies was 0.72 (95%CI: 0.67–0.77), and the pooled specificity was 0.89 (95%CI: 0.86–0.92). Besides, the pooled LR+ was 5.95 (95%CI: 3.99–8.85), and the pooled LR− was 0.33 (95%CI: 0.23–0.47). The summary DOR was 19.74 (95%CI: 9.60–40.62), and the sAUC was 0.947 with a Q* of 0.886. The outcomes without the two studies of high risk of bias were very similar to the outcomes with all the studies included.

The subgroup analyses were performed for the types of US (static and/or dynamic), the resolution of US (high resolution or low resolution) and the centers of the researches (University of Innsbruck or other centers). The pooled added values for ruling in and out an increased risk of DD-MMO based on the subgroups are presented in Table 3 and Fig. 4. The pooled sensitivity, specificity, LR+, LR−, DOR, and sAUC are presented in Table 4.

DDWR

Four studies were pooled [19, 25, 31, 34], and the pooled prevalence (prior probability) of DDWR was 38% (95%CI: 33–43%). The pooled PPV was 73% (95%CI: 65–80%) and the added value for ruling in an increased risk of DDWR was 35% (95%CI: 26–44%) (Table 3). The pooled NPV was 82.4% (95%CI: 77–87%), and the added value for ruling out an increased risk of DDWR was 21% (95%CI: 14–28%) (Table 3). The pooled sensitivity was 0.71 (95%CI: 0.63–0.79), and the pooled specificity was 0.84 (95%CI: 0.78–0.88) (Table 4). Besides, the pooled LR+ was 3.76 (95%CI: 1.26–11.18), and the pooled LR− was 0.40 (95%CI: 0.24–0.68) (Table 4). The summary DOR was 9.79 (95%CI: 2.24–42.87), and the sAUC was 0.7946 with a Q* of 0.731 (Fig. 3 and Table 4).

DDWoR

Six studies were pooled [19, 25, 27, 31, 33, 34], and the pooled prevalence (prior probability) of DDWoR was 42% (95%CI: 38–46%). The pooled PPV was 83% (95%CI: 77–88%), and the added value for ruling in an increased risk of DDWoR was 41% (95%CI: 34–48%) (Table 3). The pooled NPV was 85% (95%CI: 81–89%), and the added value for ruling out an increased risk of DDWoR was 27% (95%CI: 21–33%) (Table 3). The pooled sensitivity was 0.79 (95%CI: 0.73–0.84) and the pooled specificity was 0.88 (95%CI: 0.84–0.92) (Table 4). Besides, the pooled LR+ was 6.39 (95%CI: 3.46–11.78), and the pooled LR− was 0.26 (95%CI: 0.15–0.46) (Table 4). The summary DOR was 29.18 (95%CI: 9.35–91.07), and the sAUC was 0.938 with a Q* of 0.874 (Fig. 3 and Table 4).

Discussion

The findings in this systematic review indicate that US has sufficient added value for both ruling in and ruling out DD. With the use of US, the probabilities for ruling in or ruling out DDs may increase, with a difference of about 20% or more between prior and posterior probability.

Radiographic imaging can be used for confirming the diagnosis of DD after clinical examination. That is, the ability of US for ruling out a DD is important in selective cases in dental practice based on the Diagnostic Criteria for Temporomandibular Disorders (DC/TMD) [10]. In the present systematic review, if a patient had suspected DDs symptoms and signs or had a clinical diagnosis of DDs, the prior probability of not having DD at closed-mouth position and maximum mouth-opening position is 37.2 and 64.7%, respectively. With a negative US result, the posterior probability of not having a DD is increased to 67.6 and 84.2%. Hence, the negative results of US have good added values for ruling out a DD and may help dentists to confirm the absence of a DD when indicated.

For dentists and patients, it is important to confirm the diagnosis of DD after clinical examinations in selective cases, for example, in patients who experience a substantial influence of the DDs on their oral health-related quality of life (OHRQoL). For policy makers, such information could provide evidence whether US should be used in dental practice as a screening tool after clinical examinations to confirm the diagnosis of DD. For researchers, US can replace MRI for the diagnosis of DDs because US is cheaper and more accessible.

Besides, in the present systematic review, the sROCs of DD-CM, DD-MMO, DDWR, and DDWoR ranged from 0.79 to 0.95, which indicated that the diagnostic values of US for the diagnosis of DD are good to excellent [22, 36]. Also, sensitivity and specificity of US for the diagnosis of DD were presented in the review. The sensitivity of US for the diagnosis of DD ranged from 0.70 to 0.79, while the specificity ranged from 0.77 to 0.88. Except for DD-CM, the specificity was significantly higher than the sensitivity, which also suggests that the US has better ability to exclude those patients without actual DD than to include those with actual DD, and this result is consistent with a previous review [15]. However, for clinical decision-making, the sensitivity and specificity are not very helpful, because they only indicate how good a test discerns cases from non-cases.

The present review shows that for almost all the included studies, the PPVs and NPVs for DD-CM, DD-MMO, DDWR, and DDWoR exceeded the prior probabilities to a large extent (see Fig. 5). This indicates that the added values of US for DD are sufficient. However, there was an exception that the PPV and NPV were both below the diagonal in one study in DD-MMO. This means that, compared to MRI, US resulted in major misclassification and so in a large number of false-positive and false-negative findings. This suggests that something might have gone wrong in that particular study [28]. A possible explanation might be that this was the first study (published in 1997) exploring the use of US in the diagnosis of DD. So, the researchers may still have been rather inexperienced and experimenting with the use of US and exploring its interpretation. Moreover, in this particular study, low-resolution US was used, which may have contributed to the low accuracy of US [15, 22, 36].

Fig. 5
figure 5

Positive and negative predictive values of ultrasonography compared to the prevalences of disc displacement at closed mouth position (DD-CM), disc displacement at maximum mouth opening (DD-MMO), disc displacement with reduction (DDWR), and disc displacement without reduction (DDWoR) for each included study. Red circles represent positive predictive values of US of the included studies and green circles represent negative predictive values of US of the included studies. (a DD-CM, b DD-MMO, c DDWR, d DDWoR). The red diagonals represent the positive predictive value were the same as prevalence, and there was no added value for ruling in the DDs at this diagonal. The green diagonals represent the negative predictive value were the same as the compliment of prevalence (1—prevalence), and there was no added value for ruling out the DDs at this diagonal

In this systematic review, we evaluated the accuracy of US for DD-CM and DD-MMO in several subsamples of included studies. Dynamic examination is regarded as “real-time,” as it allows dentists to detect the disc position in more detail during mouth opening and jaw movement [22, 36]. However, there are no studies published on the diagnostic value of dynamic examination alone. Still, the added diagnostic values of the combined static and dynamic examinations of US for ruling in and ruling out DD-CM and DD-MMO tended to be higher than for static examination alone. High-resolution US allows better visualization of joint structures than low-resolution US [15]. Current evidence shows that the added diagnostic values of US for ruling in and ruling out DD are both higher than with low-resolution US.

To date, a protocol for using and interpreting US in DD has neither been established nor has a recommendation for this been published. In the 16 included studies, six studies adopted vertical scans, three adopted horizontal scans, five adopted combined vertical and horizon scans, and one chose 60 degrees to the Frankfort plane as the initial position to scan. In the included studies, the definitions of the disc in the US images are not the same and ranged from hyperechogenic and isoechogenic to hypoechogenic. Hence, there seems to be neither a consensus on a standardized position of US in TMJ areas, nor a prevailing standardized approach for its interpretation.

The performance of US in clinical practice is reported to be operator-dependent [16]. Visualization of the disc through US can be technically challenging, because the disc can be visualized only through the small gap between zygomatic process of the temporal bone and the top of the condyle [16]. However, constantly adjusting the position of the transducers and well-trained operators are thought to be helpful to obtain better visualization of the disc [16]. So, training and calibration are necessary in operating US and interpreting images in clinical practice [16]. We found that seven of the included 16 studies have been reported by the University of Innsbruck. It has been the first center to study the diagnostic utility of US for DD [28], and the investigators in this center clearly were focused on this. So they may have gained more experience in using US and interpreting its findings. Interestingly, the added diagnostic value of US in these seven studies from the University of Innsbruck tended to be higher than those performed in the other centers.

In interpreting the findings of the present systematic review, there are some limitations that should be taken in consideration.

First, MRI was used as the reference standard in the review. However, MRI may not have perfect diagnostic accuracy for diagnosis of DD. Compared to cryosectional morphology of the TMJ on autopsy, the sensitivity of MRI ranged from 0.75 to 0.90, while the specificity ranged from 0.80 to 1.00 for the diagnosis of TMJ disc position [37,38,39]. Unfortunately, no comparison has been made with US in such autopsy studies. Still, MRI can be regarded as the reference standard for diagnosis of DDs for pragmatic reasons.

Second, while publication bias cannot be excluded, we did not search for unpublished research. Moreover, there is no registry or mechanism of registration for designs of diagnostic accuracy studies such as with randomized trials. As such, it will be difficult to retrieve unpublished diagnostic accuracy studies.

Third, in the 16 included studies of the present systematic review, seven studies were from the same institution (University of Innsbruck) from 1997 to 2005. However, five of them did not report the starting and end date of the researches so we cannot confirm whether the same subjects were participated in more than one study in these five studies. If that was the case, it may affect the results of meta-analyses and cause potential bias.

It should be mentioned that there are two systematic reviews published in 2012 and 2015 on the same topic as the present systematic review [13, 40]. These two previous systematic reviews both showed that US can be accepted as a rapid preliminary diagnostic method for DD, which is similar as the results of the present systematic review. However, these two previous studies mainly focused on the sensitivity and specificity of US based on MRI, which is meaningful for researchers but cannot be directly used in clinical practice for dentists. In the present study, we showed US is an acceptable method for diagnosis of DD in clinical practice based on the added values of PPV and NPV of US. That is, the first-line dentists can make decisions about the risk of an individual having a diagnosis of DD directly based on our results. Besides, for the previous systematic reviews, both follow-up studies and case-control studies were included for meta-analyses. However, in the present systematic review, we only included follow-up studies because case-control studies cannot truly reflect the prevalence, PPV, NPV, sensitivity, and specificity, and can be prone to bias [41] so that we cannot obtain the real added values of US for ruling in and ruling out DD. So, the results of the present systematic review should be more reliable than the previous ones.

For the future studies and clinical practice, a standardized protocol for the diagnostic use of US in patients suspected of DD should be established, and US operators should be well-trained and calibrated to reduce the variations in US across operators and studies. Also, additional higher quality studies are needed for further assessment of the value of different types of US for diagnosis of DD. Besides, in the future studies, improvement of the reference standard that with higher certainty results in an accurate external benchmark for the presence of TMJ DD clinical practice should be considered. If this is not available, a latent variable approach can be adopted for data analysis, by combining multiple diagnostic tests using a latent class modeling (LCM) [42]. This may help to improve the reliability of US for ruling in and out TMJ DD.

Conclusions

With MRI being used as the reference standard, the added diagnostic value of US for ruling in and ruling out DD is sufficient for its use in the decision-making in dental practice.

After sufficient training and calibration of operating and interpretation, US can be considered as a relevant imaging tool to supplement clinical examination in patients with suspected DDs in selected cases. Combined static and dynamic examinations using high-resolution US should be preferred.