INTRODUCTION

Lung cancer is the most commonly diagnosed visceral cancer and the leading cause of cancer death in the world, with over 2 million new cases in 2018 and nearly 1.8 million deaths.1 Because over 90% of lung cancer deaths in the USA are attributed to tobacco use, tobacco control is the most impactful strategy for reducing the burden of lung cancer.2 However, combined pharmacotherapy and behavioral interventions benefit less than 1 in 10 smokers.3 In 2011, the National Lung Screening Trial (NLST) reported that screening high-risk smokers, ages 55 to 74, with low-dose CT (LDCT) significantly reduced lung cancer and overall mortality compared with screening with chest radiography.4 Although numerous other LDCT screening trials have been conducted worldwide, only the Nederlands-Leuvens Longkanker Screenings Onderzoek (NELSON) study, conducted in the Netherlands and Belgium, subsequently also demonstrated a statistically significant lung cancer mortality benefit for LDCT screening.5 The recently published NELSON trial had the 2nd largest sample size and the longest follow-up duration among all screening trials—and provided gender-specific outcome data. Recent meta-analyses, which did not include peer-reviewed NELSON results, showed that LDCT screening significantly reduced lung cancer mortality though not overall mortality.6,7,8 We conducted a meta-analysis to evaluate the association of LDCT lung cancer screening with early-stage cancer diagnoses, lung cancer mortality, overall mortality, and screening harms, including false positive results, complications from invasive procedures among subjects with false positive results, overdiagnosis, and significant incidental findings.

METHODS

We performed a review using the rapid review and living systematic methods supported by openMetaAnalysis.9 These reviews emphasize interpreting results more than searching by building on previous reviews.10 We began by creating a reconciliation table listing the included studies and conclusions from recent systematic reviews.11 This allowed us to readily identify the consistently included studies to be used in our meta-analysis as well as inconsistently included studies that at least two investigators carefully reviewed and discussed for inclusion. Identifying these studies also helped us design high-sensitivity literature searches. Our data are maintained on the openMetaAnalysis site, enabling ongoing revisions by ourselves or others when new evidence becomes available.

We followed recommendations of PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines in conducting our meta-analysis, with the exception that the protocol was not registered at PROSPERO because it does not support living systematic reviews.12 The PRISMA checklist is in the supplemental index.

Study Eligibility Criteria and Selection

We included randomized controlled trials of computed tomography (CT) that reported lung cancer and/or overall mortality data. Studies were selected by the consensus of the four authors.

Information Sources

We first identified trials by tabulating all trials included in the most recent systematic reviews (published since 2016)6,7,8, 13, 14 as recommended by Riaz.15

Search Strategy

We executed three search strategies in PubMed. We first performed a Boolean PubMed search from January 2017, the date of the most recent comprehensive literature review,8 until April 2020 using search terms chosen to identify all studies in the reconciliation of studies table. We then executed a vector search for the same time period using PubMed’s “Find related data” option to search conceptually similar studies to those included in the reconciliation of studies table. The vector was simultaneously seeded with all studies in the reconciliation table. An experienced librarian developed an Ovid MEDLINE search using terms for lung neoplasms, screening and early detection of cancer, computed tomography, and randomized trials (supplement table 2). We limited the Ovid search to English language studies published from January 2011 until April 2020 to ensure that we retrieved all studies published after the NLST.

We also searched Google Scholar and Web of Science with a strategy which retrieved articles published since January 2017 that cited the NLST (the most highly cited study) and contained the terms “randomized” and “mortality.” We additionally searched ClinicalTrials.gov and the Cochrane Central Register of Controlled Trials. Finally, we created a monthly email alert to be notified if new articles are published in PubMed that contain our Boolean search terms or specific text words. Two reviewers screened titles and abstracts for review eligibility.

Study Selection Process

We reviewed abstracts to identify publications from randomized trials and then retrieved relevant full-text publications. We retrieved multiple publications from a given trial in order to extract comprehensive data on study design, baseline characteristics, and outcomes. At least 2 reviewers were involved in all decisions regarding retrieval of full-text manuscripts.

Data Collection Process

We collected the following clinical data from randomized controlled trials of LDCT lung cancer screening: country, year first subject enrolled, number of clinical sites, mean (median) age, percent < 65 years, percent male, mean (median) pack-years smoking, proportion currently smoking, screening and control interventions, rounds of screening, screening intervals, follow-up duration, incident cancers and proportion that were early stage, lung cancer deaths, and overall mortality. When provided, we also abstracted data by gender. We abstracted data on potential harms of screening, including false positive tests, complications from invasive procedures performed in subjects without cancer, overdiagnosis, and significant incidental findings. We entered clinical data from studies into online spreadsheets at openMetaAnalysis.9

Risk of Bias in Individual Studies

Four authors independently assessed the risk of bias by using the Cochrane Risk of Bias Tool version 1.16 These assessments were then reviewed by conference with all authors to resolve any disagreement. The risk of bias was reported as low, high, or unclear based on the items in the Cochrane tool. We considered individual studies at high risk of bias if the study had high risk of bias for one or more key domains of the Cochrane tool. We considered individual studies at unclear risk of bias if the study had an unclear risk of bias for one or more key domains and no high-risk domains.

Summary Measures

Our primary outcomes were lung cancer-specific mortality and overall mortality. Our secondary outcomes were diagnosis of early-stage (stage I) lung cancer and harms from screening. We reported relative risks with 95% confidence intervals (CI).

Synthesis of Results

We used a random-effects model with the Hartung-Knapp estimator.17 We performed all statistical analyses online at openCPU18 with the meta package of the R programming language.19 We measured heterogeneity of results with the I2 statistic.20, 21

We used subgroup and meta-regression analyses to characterize potential modulators of screening effectiveness. We conducted subgroup analyses based on the control intervention (chest X-ray or usual care), by gender, and risk of bias. We used meta-regression to examine modulators of the screening effect on lung cancer and overall mortality, including age (mean/median and percent < 65 years), proportion of male subjects, pack-year smoking history, percent current smokers, number of screening rounds, screening intervals, proportion of cancers diagnosed at stage I, and duration of follow-up. If meta-regression identified a statistically significant modulator, we would identify the cut point (threshold) of the modulator that was associated with a statistically significant reduction in mortality.

Risk of Bias Across Studies

We judged the overall risk of bias across the collected studies based on the criteria developed by the Cochrane Risk of Bias Tool.16 These criteria consider the risk of bias low if less than 25% of participants were from studies with low methodological quality. The risk of bias is considered serious if more than 25% but less than 50% of participants were from studies with low methodological quality and is considered very serious if more than 50% of participants were from studies with low methodological quality. We assessed publication bias using funnel plot asymmetry and Egger’s and Rucker’s tests.20, 22, 23

We summarized conclusions using the Grade Working Group’s Evidence Profile and Summary of Findings Table.24 We adjusted ratings based on the criteria developed by the Cochrane Back Group25 and described online at openMetaAnalysis.9

Data Availability

The literature search, datasets generated and analyzed during the current study, all plots, and the tables reconciling our conclusions and the trials included with previous meta-analyses are available online (https://openmetaanalysis.github.io/lung-cancer-screening).

RESULTS

Study Selection

We identified 9 randomized controlled trials from reference lists of meta-analyses (Fig. 1).4, 5, 26,27,28,29,30,31,32 We also identified 3663 citations through extensive literature searching. However, we did not find any additional eligible randomized controlled trials beyond those identified from the meta-analyses.

Fig. 1
figure 1

PRISMA flow diagram.

Study Characteristics

Studies are described in Table 1. The 9 studies enrolled a total of 96,559 subjects. The mean and median age was around 60, 64.1% were male, 51.7% were current smokers, and the mean and median pack-years of smoking was usually about 40 or more. We found 7 comparisons of LDCT screening vs. usual care (one study performed a baseline chest X-ray and sputum cytology for all subjects)35 and 2 comparisons of LDCT with chest X-rays.4, 24 Aside from the pilot Lung Screening Study26 and the AME Thoracic Surgery Collaborative Group (AME) trial reporting only baseline results,32 subjects in the LDCT arms underwent 3 to 5 rounds of screening. The latter studies generally had a mean or median follow-up duration of at least 8 years. In the 8 studies reporting cancer incidence data, 1910 lung cancers were found in the LDCT arms and 1578 in the control arms. Overall, 48.5% of cancers in the LDCT arms were detected at stage I compared with 24.3% in the control arms. In the LDCT arms, there were 890 lung cancer deaths and 3755 overall deaths. In the control arms, there were 1062 lung cancer deaths and 3912 overall deaths.

Table 1 Study Characteristics

We evaluated only the 3446 subjects from the Italian Multicentric Italian Lung Detection (MILD) study that were concurrently enrolled and randomly assigned to the screening arms of LDCT or usual care.29 Although included by the authors in their reports, we excluded the 653 subjects who were randomized to either annual or biennial screening but not to a control group during the first phase of the study. We also included 92 subjects (46 in each arm) in the Italian study Detection and Screening of Early Lung Cancer by Novel Imaging Technology and Molecular Essays (DANTE) that investigators had excluded for being ineligible.27 We did so because these subjects had consented to participate in the study, underwent clinical assessment, and were randomly allocated to study arms. The DANTE authors provided data on lung cancer and overall mortality, so we included these subjects in an intention-to-screen analysis. Although the NELSON report focused on outcomes for male patients, our analyses incorporated the data for female patients published in the supplemental index.5

We did not include the most recently published NLST outcome data, which provided a median 12.3 years of mortality follow-up.60 These data, based on deaths occurring up to 10 years after the final scheduled screen, showed attenuation of mortality benefits. However, participants were only passively followed through state tumor registry and National Death Index linkages, and cause of death was not adjudicated—raising concerns about bias. Furthermore, as recognized by the investigators, the longer-term follow-up diluted the screening effect making results less comparable to the other studies.

Risk of Bias Within Studies

The risk of bias table is in the supplemental index. We assessed the MILD, DANTE, and AME studies as being at high risk of bias, the NLST and NELSON trials as being at low risk of bias, and the remaining studies as being at unclear risk of bias. The MILD study failed to provide baseline comparisons of contemporaneously enrolled screening and control subjects. The DANTE investigators did not use an intention-to-screen analysis and enrolled 5.5% more subjects in the screening arm than in the control arm. The AME study did not describe randomization procedures and had nearly 12% more subjects in the screening arm than in the control arm.

Synthesis of Results

When we pooled results across the 8 trials reporting data, we found that lung cancer screening with LDCT was associated with a significantly increased likelihood of detecting a stage I lung cancer, RR = 2.73 (95% CI, 1.90–3.91), but heterogeneity was high: I2 = 79% (Fig. 2). However, when we restricted the analysis to the 7 studies without a screened control group, the relative risk for diagnosing stage I cancer was 2.93 (95% CI, 2.16–3.98), with much lower heterogeneity: I2 = 19%.

Fig. 2
figure 2

Forest plot: Diagnosis of stage I lung cancers by control group (usual care, chest X-ray).

Lung cancer screening with LDCT significantly reduced the risk of dying from lung cancer, with a relative risk of 0.84 (95% CI, 0.75–0.93) (Fig. 3). Heterogeneity was very low, with an I2 = 0%. Overall, the number needed to screen to prevent one lung cancer death, based on 3 to 5 rounds of screening with up to 10 years of follow-up, was 265.

Fig. 3
figure 3

Forest plot: Lung cancer mortality.

We evaluated the effects of screening on lung cancer mortality stratified by gender in the 3 studies reporting these data (Fig. 4). Screening decreased lung cancer mortality among women, but the relative risk of 0.69 (95% CI, 0.40–1.21) was not significant. Results were also not significant for men, with a relative risk of 0.86 (95% CI, 0.66–1.13). A test for interaction was not significant, p = 0.11. We did not find any subgroup effects when stratifying studies by control group or risk of bias. We conducted meta-regression analyses evaluating associations between patient characteristics and study factors with lung cancer mortality. We found no significant effects by median/mean age, percent < 65 years, proportion of male subjects, pack-year history, percent current smokers, number and frequency of screening rounds, proportion of cancers found at stage I, and duration of follow-up (data available at https://openmetaanalysis.github.io/lung-cancer-screening).

Fig. 4
figure 4

Forest plot: Lung cancer mortality by gender.

When pooling results from across the eight studies reporting data, we found that lung cancer screening did not significantly reduce the risk of overall mortality, the relative risk was 0.96 (95% CI, 0.91–1.01) (Fig. 5). We found no heterogeneity in study results, I2 = 0%.

Fig. 5
figure 5

Forest plot: Overall mortality.

Screening harms are detailed in the supplemental index. Eight studies reported diagnostic accuracy data, the pooled false positive rate was 8% (95% CI, 4–15), I2 = 100%. Studies inconsistently reported data on complications from invasive diagnostic procedures, particularly among those without lung cancer, but the risks were low. Only the NLST provided detailed data on procedures and associated complications (which were further classified by severity). Overall, 17 in 1000 subjects with a false positive LDCT underwent an invasive diagnostic procedure and 0.4 in 1000 suffered a major complication. Six studies compared LDCT with usual care and followed patients beyond the end of the screening period. Overall, there were 515 screen-detected cancers with 171 more cancers in the LDCT arms than in the control arms, suggesting an overdiagnosis rate of 33%. The NLST reported that 7.5% of LDCT participants had significant incidental findings, most commonly emphysema and coronary artery calcification.4 No other studies reported data on incidental findings.

The GRADE evidence profile, summarizing quality assessments and effects, is in the supplemental index. We had high certainty about the effects of LDCT screening on the detection of stage I cancer, lung cancer mortality, and overall mortality.

Risk of Bias Across Studies

We judged the risk of bias across studies to be low using the Cochrane framework. Although the majority of individual studies were rated as unclear risk of bias, more than three-fourths of the subjects were enrolled in the two studies assessed to be at low risk of bias.4, 5 We could not assess publication bias because we found too few studies (< 10) to perform analyses.61

DISCUSSION

We evaluated 9 randomized LDCT screening trials enrolling over 96,000 subjects most of whom were followed for at least 5 years. LDCT screening substantially increased the likelihood of detecting stage I cancer and reduced the risk of lung cancer mortality by 16%. Heterogeneity was very low, and the quality of the evidence was deemed to be moderate to high quality. When we evaluated gender-specific effects, we found that LDCT screening was associated with a non-significantly lower risk of lung cancer mortality for women than for men. Meta-regressions found no subject characteristics or study design features that modified lung cancer mortality outcome results. LDCT screening did not reduce overall mortality.

Our results are similar to previous meta-analyses, though we included the full peer-reviewed NELSON trial results to provide more accurate outcome estimates and to better evaluate gender differences. Evidence is convincing that LDCT screening reduces lung cancer mortality. LDCT screening is effective because it leads to a nearly threefold higher likelihood of diagnosing early-stage cancer compared with usual care. This stage shift is crucial, because about 60% of patients are currently diagnosed with distant-stage disease where the 5-year survival is only 5%.62 In contrast, only 16% of patients are diagnosed with early-stage cancer where the 5-year survival is 57%. Needing to screen 265 high-risk smokers to prevent one lung cancer death compares favorably with other cancer screening programs.63,64,65

Increasing the proportion of cancers detected at an early stage can also be associated with overdiagnosing—and overtreating—indolent cancers. Although 33% of cancers found by LDCT screening might be considered overdiagnosed, this estimate is unreliable because most studies lacked sufficient follow-up time. Modeling analyses suggested that the lead time for CT screening can be as long as 12 years.66 The NELSON investigators initially estimated an overdiagnosis rate of 19.7% through 10 years of follow-up; however, extending follow-up to 11 years reduced the rate to only 8.9%.5 Investigators concluded that this prolonged follow-up duration was necessary to accurately estimate overdiagnosis. We excluded NLST when estimating overdiagnosis because both arms were screened. However, trends in NLST results were consistent with NELSON findings; the overdiagnosis rate dropped from 18.5% at 6.5 years of follow-up to 3.1% at 11 years of follow-up.38, 60

False positive results are a potential harm from LDCT screening because they can lead to unnecessary diagnostic testing. The estimated false positive rate was 8%, but heterogeneity was extremely high because studies used varying criteria for categorizing abnormal tests (supplement table 4). The NLST, which primarily defined positive studies using a nodule diameter ≥ 4 mm, had a false positive rate of 23.3%.4 However, the Lung-RADS classification system, which increases the size threshold for identifying suspicious scans, has since become the standard for interpreting LDCT images.67 Post-hoc analyses suggested that applying Lung-RADS to NLST images would substantially reduce the false positive rates for both baseline (52% decrease) and follow-up (76% decrease) testing, though at the expense of reducing sensitivity.68 The NELSON trial, which had a false positive rate of only 1.2%, used volumetric criteria which are not part of Lung-RADS.5 The NLST provided the most detailed data on complications following invasive diagnostic procedures in subjects without cancer, the risk was very low though patients were being managed in academic medical centers. The NLST, the only study to provide data, found a 7.5% chance of having a significant incidental finding with LDCT imaging. However, the clinical significance of these findings is uncertain.

When we pooled results across all studies, screening was associated with a non-significant decreased risk for overall mortality. The NLST was the only trial showing lower overall mortality with LDCT screening. The authors attributed this finding to the high proportion of excess deaths from lung cancer in the radiography arm. When lung cancer deaths were excluded, the overall mortality difference was not significant. Generally, trials are unlikely to demonstrate that screening programs, which usually target average-risk subjects, decrease overall mortality because even the most common cancers account for only a small proportion of deaths. While lung cancer accounts for more deaths in the high-risk populations selected for screening, a modeling study suggested that more than 80,000 high-risk subjects would need to be randomized to a screening or a control arm and followed for at least 11 to 13 years in order to demonstrate a significant reduction in all-cause mortality.69 While we had pooled data for nearly 100,000 subjects, follow-up durations were usually less than 10 years. A further challenge in demonstrating that lung cancer screening reduces overall mortality is that the eligible population of older heavy smokers is also at high risk for competing mortality from tobacco-related cardiovascular, pulmonary, and oncologic diseases.

By including results from NELSON, which enrolled 2594 women, we were able to conduct a more robust meta-analysis of the three studies that stratified outcome data by gender.5, 31, 70 We found that women benefitted from screening substantially more than men, 31% relative risk reduction in lung cancer mortality compared with 14%. Neither risk reductions were statistically significant and the p value for interaction was 0.11. These studies enrolled nearly twice as many men as women, so analyses for women were likely underpowered. Unfortunately, we did not have access to patient-level data for the 5 other studies that also enrolled men and women. The mortality benefit for women in NLST was attributed to better outcomes following the diagnosis of small cell and squamous cell lung cancers.70 Given that screening is not considered effective for detecting these lung cancers (and risk reductions were similar for men and women for adenocarcinoma), investigators questioned whether the findings were due to chance. However, in the much smaller LUSI trial, which found a significant 69% risk reduction for women, no small cell cancers were diagnosed in women.31 The NELSON trial, which found a non-significant 34% risk reduction for women, did not report histology.5 Further research is needed to evaluate the observed gender differences.

The negative meta-regression analyses looking at the associations between screening protocols and patient characteristics with lung cancer mortality are underpowered given the limited number of studies. However, the optimal number and frequency of LDCT screening rounds are uncertain.71 The UK Lung Cancer Screening Trial will evaluate the benefit of a single screening LDCT.72 For the 2 studies demonstrating efficacy for LDCT screening, the NLST had 3 rounds of annual screening while the NELSON trial spaced 4 rounds of screening over 5.5 years. Less frequent screening intervals, particularly following a negative baseline scan, could make screening more cost effective and, along with using Lung-RADS to improve the specificity of LDCT, reduce radiation exposure from screening and diagnostic testing.68, 73, 74 We did not find that the patient characteristics of age, pack-years of smoking, or smoking status (typically used to determine screening eligibility) were associated with the lung cancer mortality benefits seen with screening. However, the variation of these characteristics across studies was small, particularly for age and pack-years (Table 1). Tammemagi and colleagues have shown that using comprehensive risk models, which include additional socio-demographic characteristics, clinical features, and family history, to select patients for screening may be more cost effective than using the study trial inclusion criteria that have been adopted for screening guidelines.75 The U.S. Preventive Services Task Force’s recommendation for lung cancer screening is being revised and will address eligibility criteria, including age range and calculated cancer risk, as well as alternatives to annual screening.71

Our study had several additional limitations. Long-term mortality data were available only for studies conducted in Europe and North American, which may limit the generalizability of results based on screening just older, high-risk current or former smokers. Lung cancer incidence and mortality rates vary around the world, particularly in emerging economies and developing countries, related to differences in genetics, tobacco use, environmental exposures, and access to care.76 The Chinese AME trial enrolled substantial proportions of subjects whose lung cancer risk was defined as exposure to second-hand smoke, cooking oil fumes, or occupational carcinogens.32 Additionally, the observed efficacy of LDCT screening as conducted in randomized clinical trial settings may not translate into community practice.77

CONCLUSION

Our meta-analysis, utilizing the most recently published randomized controlled trial data, demonstrated that LDCT screening is associated with a significant reduction of lung cancer mortality though not overall mortality. Women appeared more likely to benefit from screening than men, but data were inconclusive. The estimated risks for false positive results, screening complications, overdiagnosis, and incidental findings were low.