Meta-analysis is a statistical method for pooling and analyzing the results from multiple studies.1 A typical meta-analysis will focus on a single outcome measure, such as a treatment effect or rate of an adverse event. However, applying meta-analysis to studies of diagnostic test’s accuracy is not straightforward. The accuracy of a diagnostic test is most often summarized with *two* outcomes, sensitivity and specificity, which cannot be expected to be independent and therefore must be analyzed together.2 Thus, meta-analytic methods specific to diagnostic tests are needed. The paper by Lee et al., which reports the results of a meta-analysis of F-18 FDG PET for detection of disease activity, published in this issue demonstrates the use of some of these methods.3

We consider here the setting of a diagnostic test that yields a qualitative result (for example, a test designed to indicate the presence or absence of a disease). Let a positive test result indicate a patient probably has a disease, and let a negative test indicate a patient probably does not have disease. The *sensitivity* of a test describes the probability that the test predicts the disease, given the presence of the disease in the patient. Alternately, a sensitive test is one which detects the disease, when the patient has the disease. The *specificity* of a test describes the probability that the test does not detect the disease, given the patient does not have the disease. Alternately, a specific test is one which produces positive results for a small number of patients, when they do not actually have the disease. An ideal test would have both high sensitivity and high specificity.4,5

*true positive*is the situation when a patient has the disease, and the test is positive for it; a

*true negative*is when a patient does not have the disease and the test result is negative. A

*false positive*is when the patient does not have the disease, but has a positive test result; a

*false negative*is when a patient has the disease, but has a negative test result.4,5 A useful way to compare patients for a particular test result is via likelihood ratios (LR), which are ratios of the sensitivity and specificity. LRs describe how likely a diseased patient is to have a (positive or negative) result, compared to disease-free patients. Values greater than 1.0 provide evidence that the positive/negative test result is related disease presence; values less than 1.0 demonstrate the rest result is related to disease absence. A positive LR (LR+) represents the ratio of patients with the disease who test positive, to those who test positive but are disease free. A negative LR (LR-) represents the ratio of patients with the disease who test negative, compared to those who test negative and are disease free.6 LR values ranging from 0.1 to 10 are considered substantial evidence to support a diagnosis as positive or negative, respectively.6 The diagnostic odds ratio (DOR) is a ratio of the LR+ to the LR−. It is the odds that the test produces positive results compared to the odds of negative results. Values are all greater than 0; larger values indicate a better-performing screening test.7

Descriptive statistics for diagnostic tests

Test result | True disease status | |
---|---|---|

Disease present | Disease absent | |

Positive | True positive (TP) | False positive (FP) |

Negative | False negative (FN) | True negative (TN) |

True positive rate (TPR) = Sensitivity = \( \frac{\text{TP}}{\text{TP + FN}} \) | ||

False negative rate (FNR) = 1 − TPR = \( \frac{\text{FN}}{\text{TP + FN}} \) | ||

False positive rate (FPR) = \( \frac{\text{FP}}{\text{FP + TN}} \) | ||

True negative rate (TNR) = 1 − FPR = Specificity = \( \frac{\text{TN}}{\text{FP + TN}} \) | ||

Positive likelihood ratio (LR+) = \(\frac{{{\text{Sensitivity}}}}{{1 - {\text{Specificity}}}} = \frac{{{\text{TPR}}}}{{{\text{FPR}}}}\) | ||

Negative likelihood ratio (LR−) = \( \frac{{{\text{1}} - {\text{Sensitivity}}}}{{{\text{Specificity}}}} = \frac{{{\text{FNR}}}}{{{\text{TNR}}}} \) | ||

Diagnostic odds ratio = \( \frac{{{\text{LR}} + }}{{{\text{LR}} - }} \) |

Descriptive statistics for diagnostic tests

Test result | True disease status | Total | |
---|---|---|---|

Disease present | Disease absent | ||

Positive | 94 | 3 | 97 |

Negative | 120 | 133 | 253 |

Total | 214 | 136 | 350 |

Once summary statistics for each of the studies to be included in the meta-analysis have been calculated, a forest plot can be used to display the estimates for each study (see Figure 1 for an example). A bivariate random-effects model (such as the one employed by Lee et al.3) can then be used to produce a summary point estimate of sensitivity and specificity.8 Note that a bivariate model is necessary in order to take into account the likely correlation between sensitivity and specificity across studies.9 Furthermore, while a common approach for meta-analysis of non-diagnostic studies is to consider both fixed-effect and random-effect models,1 diagnostic studies should be expected to be heterogeneous, making a fixed-effect model inappropriate.10

A notable source of heterogeneity for diagnostic studies is the result of what is known as the “threshold effect”.9 Many diagnostic tests compare a result or measurement to a pre-specified threshold, or reference standard. The choice of threshold will affect both the sensitivity and specificity of the test. For example, consider a simple diagnostic test that measures the amount of an antibody in a blood sample, with high levels of the antibody resulting in a positive test result. If the threshold for a positive result is lowered (meaning a smaller amount of the antibody is required to be present in the sample in order to diagnose illness), we would expect the test to return more false positives (and therefore have lower specificity) and fewer false negatives (greater sensitivity). If the threshold were raised, we would observer fewer false positives (greater specificity) and more false negatives (lower sensitivity). A receiver operating characteristic (ROC) curve plots the true positive rate (or sensitivity) against the false positive rate (1-specificity) for a diagnostic test under varying thresholds. The area under the ROC curve provides an overall summary of diagnostic test’s accuracy, independent of the threshold effect.4,5

Screening or diagnostic tests are useful when needing to determine the presence or potential development of a disease in question; they are particularly valuable when the confirmatory procedure is invasive, cost-prohibitive, time-intensive, or only available upon autopsy.5 When synthesizing evidence from different studies of a diagnostic test’s accuracy is of interest, meta-analysis may be used. However, meta-analytic methods specific to diagnostic tests must be used in order to properly summarize the study results.

## Notes

### Disclosure

Authors have no conflicts of interest to disclose.

## References

- 1.Kalra R, Arora P, Morgan C, Hage FG, Iskandrian AE, Bajaj NS. Conducting and interpreting high-quality systematic reviews and meta-analyses. J Nucl Cardiol. 2017;24:471–81.CrossRefGoogle Scholar
- 2.Liu Z, Yao Z, Li C, Liu X, Chen H, Gao C. A step-by-step guide to the systematic review and meta-analysis of diagnostic prognostic test accuracy evaluations. Br J Cancer. 2013;108:2299–303.CrossRefGoogle Scholar
- 3.Lee S-W, Kim S-J, Seo Y, Jeong SY, Ahn BC, Lee J. F-18 FDG pet for assessment of disease activity of large vessel vasculitis: A systematic review and meta-analysis. J Nucl Cardiol. 2018. https://doi.org/10.1007/s12350-018-1406-5.Google Scholar
- 4.Rosner B. Fundamentals of biostatistics. Boston, MA: Brooks/Cole; 2011.Google Scholar
- 5.van Belle G, Fisher LD, Heagerty PJ, Lumley T. Biostatistics: A methodology for the health sciences. Hoboken: Wiley; 2004.CrossRefGoogle Scholar
- 6.Deeks JJ, Altman DG. Diagnostic tests 4: Likelihood ratios. BMJ. 2004;329:168–9.CrossRefGoogle Scholar
- 7.Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PMM. The diagnostic odds ratio: A single indicator of test performance. J Clin Epidemiol. 2003;56:1129–35.CrossRefGoogle Scholar
- 8.Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. 2005;58:982–90.CrossRefGoogle Scholar
- 9.Leeflang MMG. Systematic reviews and meta-analysis of diagnostic test accuracy. Clin Microbiol Infect. 2013;20:105–13.CrossRefGoogle Scholar
- 10.Lee J, Kim KW, Choi SH, Huh J, Park SH. Systematic review and meta-analysis of studies evaluating diagnostic test accuracy: A practical review for clinical researchers-part ii. Statistical methods of meta-analysis. Korean J Radiol. 2015;16:1188–96.CrossRefGoogle Scholar
- 11.Rutter CM, Gatsonis CA. A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Stat Med. 2001;20:2865–84.CrossRefGoogle Scholar