Background

It has been known for many years that smoking causes lung cancer. An association was clearly documented in case–control studies conducted in Germany in the 1930s [1], and in the United States and Great Britain [2, 3] in the 1950s, and was strengthened by surveys of large cohorts. This led the US Surgeon General to conclude in 1964 [4] that “cigarette smoking is a cause of lung cancer in men, and a suspected cause of lung cancer in women”. Further reports [5, 6] have defined the relationship in more detail, and it has been estimated that, in the United States, 90% of male lung cancer deaths and 75%-80% of female lung cancer deaths are caused by smoking [7].

While some meta-analyses of the evidence have been published in recent years [810] none consider more than a relatively small fraction of the published evidence. We attempt to rectify this omission, though the sheer extent of the available data, and resources available, has meant limiting attention to papers published in the last century and studies involving over 100 lung cancer cases. As will be seen, this still gives us an extensive database involving almost 300 studies.

Because the relationship of smoking to the two major types of lung cancer (squamous cell carcinoma and adenocarcinoma) is known to vary [5, 6], we present detailed results relating, not only to total lung cancer risk, but also to these two histological types of lung cancer. We also present some more limited results for other lung cancer types. To provide a broad description of the relationship of smoking to lung cancer, we do not concentrate on a single primary analysis, but quantify the relationships to each of a range of indices of smoking, investigating how these relationships vary according to characteristics such as sex, age, location, study design, period considered, definition of exposure and extent of confounder adjustment. The style of this systematic review is similar to one we have recently published for smoking and COPD, chronic bronchitis and emphysema [11].

Methods

Full details of the methods used are described in Additional file 1: Methods, and are summarized below. Throughout this paper, we use the term relative risk (RR) to include its various estimators, including the odds ratio and the hazard ratio.

Inclusion and exclusion criteria

Attention was restricted to epidemiological prospective or case–control studies published up to and including 1999, which involved 100 lung cancers or more, and which provided RR estimates for one or more defined major, cigarette-type or dose-related smoking indices. The “major indices” compare ever, current or ex smoking with never or non-current smoking, and refer to smoking of any product, cigarettes, pipes, cigars and combinations, or of specific types of cigarette. The “cigarette type indices” compare smokers of different types of cigarette – filter with plain, manufactured with handrolled and mentholated with non-mentholated. The “dose-related indices” concern amount smoked, age of starting to smoke, duration of smoking, duration of quitting, tar level, butt length or fraction smoked. Pack-years was not considered as it was felt more important to separate effects of extent and duration of exposure. Uncontrolled case studies were not included. There were no further exclusion criteria.

Literature searching

Between 1997 and 2001 potentially relevant papers were sought from Medline and Emtree searches, from British Library monthly bulletins, from files on smoking and health accumulated over many years by P N Lee Statistics and Computing Ltd, and from references cited in papers obtained, until ultimately no paper examined cited a paper of possible relevance not previously examined.

Identification of studies

Relevant papers were allocated to studies, noting multiple papers on the same study, and papers reporting on multiple studies. Each study was given a unique reference code (REF) of up to 6 characters (e.g. COMSTO or LUBIN2), based on the principal author’s name and distinguishing multiple studies by the same author.

Some studies were noted as having overlaps with other studies. To minimize problems in meta-analysis arising from double-counting of cases, overlapping studies were divided into two categories, as shown in Additional file 2: Studies. The first category involved minor overlap, which could not be disentangled, and which it was decided to ignore. The second category contains sets of studies which probably or definitely overlap. Here the set member containing the most comprehensive data (e.g. largest number of cases or longest follow-up) was called the ‘principal study’, other members being ‘subsidiary studies’ only considered in meta-analyses where the required RR was unavailable from the principal study.

Data recorded

Relevant information was entered onto a study database and two linked RR databases. Data entry was carried out in two stages. In 1997–2002, data were entered on the first RR database for the major smoking indices, cigarette type indices, and amount smoked. In 2009–2010, data were entered on the second RR database for the remaining dose-related indices.

The study database contains a record for each study, describing the following aspects: relevant publications; study title; study design; sexes considered; age range, race(s) and other details of the population studied; location; timing and length of follow-up; whether principal or subsidiary, with details of overlaps or links with other studies; number of cases and extent of histological confirmation; number of controls or subjects at risk; types of controls and matching factors used in case–control studies; use of proxy respondents, interview setting and response rates; confounding variables considered; availability of results by histological types; and availability of results for all smoking indices (including those indices not considered here, such as pack-years).

The RR databases hold the detailed results, typically containing multiple records for each study. Each record is linked to the relevant study and refers to a specific RR, recording the comparison made and the results. This record includes the sex, age range, race, lung cancer type, and (for prospective studies) the follow-up period. The smoking exposure of the numerator of the RR is defined by the smoking status (ever, current or ex), smoking product (e.g. any, cigarettes, cigarettes only, pipes only) and cigarette type (e.g. any, mainly hand-rolled cigarettes, filter cigarettes only, mentholated cigarettes). Similar information is recorded about the denominator of the RR. For dose-related indices, the level of exposure is recorded. The source of the RR is also recorded, as are details on adjustment variables. Results recorded include numbers of cases for the numerator and denominator, and, for unadjusted results, numbers of controls, persons at risk or person-years at risk. The RR itself and its lower and upper 95% confidence limits (LCL and UCL) are always recorded. These may be as reported, or derived by various means (see below), with the method of derivation noted.

Identifying which RRs to enter

RRs were entered relating to defined combinations of lung cancer type, smoking index (major, cigarette type or dose-related), confounders adjusted for, and strata, as described below.

Lung cancer type

Results were entered for all lung cancer, for Kreyberg I (as originally presented, or by combining squamous, small and large cell carcinoma) and Kreyberg II (as originally presented, or by combining adenocarcinoma and others not in Kreyberg I), and for squamous, small, and large cell carcinoma and for adenocarcinoma separately. Additionally, the following groups were constructed if not originally presented: all lung cancer or nearest equivalent, but at least squamous cell carcinoma and adenocarcinoma; squamous cell carcinoma or nearest equivalent; adenocarcinoma or nearest equivalent.

Major and cigarette type smoking indices

The intention was to enter RRs comparing current smokers, ever smokers or ex smokers with never or non smokers. Near-equivalent definitions were accepted when stricter definitions were unavailable, so that, for example, never smokers could include occasional smokers (or exceptionally, light smokers), while current smokers could include, and ex-smokers exclude, recent quitters. RRs were to be entered relating to smoking of defined products and, when the product related to cigarette smoking, to defined cigarette types (see also Additional file 1: Methods). If available, results (for each of current, ex and ever smoking) were entered for five comparisons: any product vs. never any product, cigarettes vs. never any product, cigarettes only vs. never any product, cigarettes vs. never cigarettes, and cigarettes only vs. never cigarettes (and also for five equivalent comparisons for current vs non smoking). Here “cigarettes” ignores whether other products (i.e. pipes and cigars) are also smoked, while “cigarettes only” excludes mixed smokers. Additionally, when the numerator related to the smoking of filter, handrolled or mentholated cigarettes, RRs were entered with the denominator defined as relating to plain, manufactured or non-mentholated smokers respectively.

Dose-related smoking indices

RRs were entered for seven measures: amount smoked, age of starting, duration of smoking, duration of quitting, tar level, butt length and fraction smoked. RRs were expressed relative to never smokers (or near equivalent), if available, or relative to non smokers otherwise. For duration of quitting, RRs were also expressed relative to current smokers. Except for amount smoked, further RRs were entered, restricted to smokers, and expressed relative to the level expected to have the lowest risk (e.g. shortest duration or latest age started).

Confounders adjusted for

For case–control studies, results were entered adjusted for the greatest number of potential confounding variables for which results were available, and also unadjusted (or adjusted for the smallest number of confounders). For prospective studies, results were entered adjusted for age and the greatest number of confounders, and for age only or age and the smallest number of confounders, with unadjusted results entered only if no age-adjusted results were available. These alternative RRs are subsequently referred to as “most-adjusted” and “least-adjusted”. For dose-related RRs restricted to smokers, results with “most adjustment” but without adjustment for other aspects of smoking were also entered if available.

Strata

Three strata were considered – sex, age and race. Results were entered for males and females separately when available, with combined sex results only entered when sex-specific results were not available. Results were entered for all ages combined and for individual age groups, and for all races and for individual racial groups.

Derivation of RRs

Adjusted RRs and their 95% CIs were entered as provided, when available. Unadjusted RRs and CIs were calculated from their 2 × 2 table, using standard methods (e.g. [12]), noting any discrepancies between calculated values and those provided by the author. Sometimes the 2 × 2 table was constructed by summing over groups (e.g. adding current and ex smokers to obtain ever smokers) or from a percentage distribution. Various other methods were used as required to provide estimates of the RR and CI. Some more commonly used methods are summarized below, fuller details being given in Additional file 1: Methods.

Correction for zero cell

If the 2 × 2 table has a zero cell, 0.5 was added to each cell, and the standard formulae applied.

Combining independent RRs

RRs were combined over ℓ strata (e.g. from a 2 × 2 × ℓ table) using fixed-effect meta-analysis [13], giving an estimate adjusted for the stratifying variable.

Combining non-independent RRs

The Hamling et al. method [14] was used (e.g. to derive an adjusted RR for ever smokers from available adjusted RRs for current and ex smokers, each relative to never smokers, or to combine adjusted RRs for several histological types, each relative to a single control group).

Estimating CI from crude numbers

If an adjusted RR lacked a CI or p-value but the corresponding 2 × 2 table was available, the CI was estimated assuming that the ratio UCL/LCL was the same as for the equivalent unadjusted RR.

Data entry and checking

Master copies of all the papers in the study file were read closely, with relevant information highlighted to facilitate checking. Where multiple papers are available for a study, a principal publication was identified, although details described only in other publications were also recorded. Preliminary calculations and data entry were carried out by one author and checked by another, and automated checks of completeness and consistency were also conducted. RR/CIs underwent validation checks [15].

Meta-analyses conducted – overview

A pre-planned series of meta-analyses was conducted for various smoking indices for each of the three main outcomes (all lung cancer, squamous cell carcinoma, and adenocarcinoma) and also for some indices for two other outcomes (large cell carcinoma and small cell carcinoma). Nearest equivalent definitions are allowed for the three main outcomes, with the terms “squamous” and “adeno” used subsequently to distinguish these results from those specifically for these cell types. Each meta-analysis was repeated, based on most-adjusted RRs and on least-adjusted RRs. For each meta-analysis conducted, combined estimates were made first for all the RRs selected, then for RRs subdivided by level of various characteristics, testing for heterogeneity between levels.

Selecting RRs for the meta-analyses

All meta-analyses are restricted to records with available RR and CI values. The process of selecting RRs for inclusion in a meta-analysis must try to include all relevant data and to avoid double-counting. For a given analysis (e.g. of current cigarette smoking), several definitions of RR may be acceptable (e.g. cigarette smoking, or cigarette only smoking), so, for studies with multiple RRs, the one to be used is determined by a preference order defined for the meta-analysis. Preference orders may be required for smoking status, smoking product, the unexposed base, and extent of confounder adjustment. As the definitions of RR available may differ by sex (e.g. a study may provide RRs for any product smoking for males, but only for cigarette smoking for females), the RRs chosen for each sex may not necessarily have the same definition. Sexes combined results are only considered where sex-specific results are not available. Similarly RRs from a subsidiary study are only used where eligible RRs are unavailable from the principal study. When multiple preference orders are involved, the sequence of implementation may affect the selection, so preferences for the most important aspects, usually concerning smoking, are implemented first.

Carrying out the meta-analyses

Fixed-effect and random-effects meta-analyses were conducted using the method of Fleiss and Gross [13], with heterogeneity quantified by H, the ratio of the heterogeneity chisquared to its degrees of freedom, which is directly related to the statistic I2 [16] by the formula I2 = 100 (H-1)/H. For all meta-analyses, Egger’s test of publication bias [17] was also included.

Meta-analyses were conducted in various sets (A to N) corresponding to the sub-sections of the results section of the paper. A full list of the analyses is given in Additional file 1: Methods.

The major smoking indices

For the major smoking indices, the first four sets of meta-analyses relate to: A ever smoking, B current smoking, C ever smoking (but with current smoking used if ever smoking not available), referred to subsequently as “ever/current” smoking, and D ex smoking. In what is referred to as the main analysis in each set, smoking of any product is preferred by selecting RRs in the following preference order: 1. smoking of any product vs. never smoked any product; 2. smoking of cigarettes vs. never smoked any product, 3. smoking of cigarettes only vs. never smoked any product; 4. smoking of cigarettes vs. never smoked cigarettes; 5. smoking of cigarettes only vs. never smoked cigarettes; with options 6–10 the same as options 1–5 except that “never smoked” is replaced by “never smoked near equivalent”. A variant analysis prefers cigarette smoking (by changing the preference order to 4, 5, 2, 3, 1, 9, 10, 7, 8, 6). In meta-analyses of type C, a further variant analysis reverses the preference so current smoking results are preferred to those for ever smoking, referred to subsequently as “current/ever” smoking. Other variant analyses are based on RRs for specified age ranges.

A further set of meta-analyses, E, concerns smoking of pipes and/or cigars (but not cigarettes), referred to subsequently as smoking of “pipes/cigars only”, smokers of pipes only, smokers of cigars only, and smokers of cigarettes and pipes/cigars (“mixed” smokers). Separate meta-analyses were conducted for ever smoking, current smoking, ever/current smoking, current/ever smoking and ex smoking.

The cigarette type indices

Meta-analyses were conducted, in set F, for only filter vs. only plain, ever filter vs. only plain, only filter vs. ever plain, handrolled vs. manufactured, and mentholated vs. non-mentholated. These were only conducted for ever/current smoking, and preferring RRs for cigarettes over RRs for cigarettes only. The analyses with only filter as the numerator used the preference order of filter only, always, mainly, both, equally, and ever, while the analyses with ever filter as the numerator used the reverse preference. Similar preference orders applied to the denominators. The analyses of handrolled vs. manufactured cigarettes used the preference order of any, both, mainly, and only for handrolled, and only ever, only current, any and ever for manufactured.

The dose-related smoking indices

For the dose-related indices, sets of meta-analyses were conducted for: G amount smoked, H age of starting to smoke, I duration of smoking, J duration of quitting compared to never smokers (or long-term ex smokers), K duration of quitting compared to current smokers (or short-term quitters), L tar level, and M butt length or fraction smoked (taking short butt length as being equivalent to a large fraction smoked). For any measure, a study typically provides a set of non-independent RRs for each dose-category, expressed relative to a common base. To avoid double-counting only one was included in any one meta-analysis. Two approaches were adopted. The first involves specifying a scheme with a number of levels of exposure (“key values”), then carrying out meta-analyses for each level in turn, expressed relative to never smokers. For an RR to be allocated to a key value, its dose-category has to include that key-value and no other. Schemes with a few, widely spaced, key values tend to involve more studies, whereas schemes with more key values, closely spaced, involve RRs from fewer studies, but ones with dose categories more closely clustered around the key value. The sets of key values used (with 999 indicating an open-ended category) were 5, 20, 45 and 1, 10, 20, 30, 40, 999 for amount smoked; 26, 18, 14 and 30, 26, 22, 18, 14, 10 for age of starting to smoke; 20, 35, 50 and 5, 20, 30, 40, 50, 999 for duration of smoking; 12, 7, 3 and 20, 12, 3 for duration of quitting vs. never; and 3, 7, 12 and 3, 12, 20 for duration of quitting vs. current. No key value analysis was conducted for tar level, or for butt length/fraction smoked. The second approach (not conducted for amount smoked) involves meta-analysing of RRs for the highest compared with the lowest categories of exposure within smokers available for each study.

Meta-regression analyses

While full multivariable analysis of the data is considered beyond the scope of this report, meta-regression analyses were also carried out using the sets of RRs selected for the main meta-analyses for ever smoking and for current smoking. Following preliminary meta-regressions (not shown), a “fixed model” was fitted to examine the effect on the results of six different categorical variables (sex, location, start year of study, major study type, number of lung cancer cases and number of adjustment factors). Note that the number of lung cancer cases (in the study as a whole), which is referred to subsequently as “number of cases”, is used as an indicator of study size. The significance of each of these variables was estimated by an F-test based on the increase in deviance resulting from its exclusion from the basic model. A list of secondary variables was also defined (relating to more detailed aspects of location, outcome, study type and confounder adjustment, national cigarette tobacco type, the product smoked, the denominator used in the RR, use of proxy respondents, whether the study required 100% histological confirmation of lung cancer, whether the population studied worked in risky occupations, the age of the subjects, and the derivation of the RR) with the significance of adding each characteristic to the fixed model estimated by an F-test based on the increase in deviance. Fuller details are given in Additional file 1: Methods.

Additional analyses

Additional tests of the relationship of lung cancer risk to various characteristics of interest were based on corresponding pairs of RR and CI estimates within the same study for the same definition of outcome and exposure, and deriving the ratio of the two RRs. Where the pairs involved independent sets of subjects, the variance of the ratio was also derived, and meta-analyses of the ratio were conducted. Where the pairs involved non-independent sets of subjects the numbers of ratios greater and less than 1 were compared using the sign test. Tests of independent pairs related to sex (males vs. females), age (oldest vs. youngest age group) and race (white people vs. non-white or black people). Tests of non-independent pairs related to level of adjustment (most-adjusted vs. least-adjusted), and to comparisons of product smoked (mixed smokers vs. cigarette only smokers, and vs. smokers of pipes/cigars only). Tests were always carried out for all lung cancer and ever/current smoking. For sex, additional analyses were conducted for current and for ever smoking, for squamous and adeno, and also within level of amount smoked. For level of adjustment, two sets of analyses were run. The first, relating to RRs for ever/current smoking were based on the most-adjusted/least-adjusted ratio, while the second, for highest vs. lowest RRs for age of starting to smoke, duration, years quit and tar level, compared RRs that were most- or least-adjusted for other aspects of smoking.

Software

All data entry and most statistical analyses were carried out using ROELEE version 3.1 (available from P.N. Lee Statistics and Computing Ltd, 17 Cedar Road, Sutton, Surrey SM2 5DA, UK). Some analyses were conducted using Quattro Pro 9 or Excel 2003.

Results

Studies identified

A total of 5,993 potentially relevant papers were identified, providing information on 287 eligible studies (Table 1).

Table 1 Literature searching and study identification

Table 2 presents selected details of the 287 studies while Table 3 gives the distribution of their major characteristics. Additional file 2: Studies gives fuller descriptions of the studies.

Table 2 Selected details of the 287 studies of lung cancer
Table 3 Distribution of the main characteristics of the 287 studies of lung cancer

Of the 287 studies, 267 are classified as principal, 209 (78.3%) of these being case–control studies, 52 (19.5%) prospective, 5 (1.9%) nested case–control and 1 (0.4%) case-cohort. Note that the last three study designs, where exposure was determined before diagnosis, are combined into one category in Table 3 (and the text below based on it). The other 20 studies are classified as subsidiary. Of the principal studies, 262 provide data for all lung cancer, 84 for squamous and 86 for adeno. Only rarely did these studies provide data only for squamous (1 study) or adeno (3 studies). The data come less often from case–control designs for all lung cancer (77.9%) than for squamous (86.9%) and adeno (87.2%).

Of the 267 principal studies, 158 (59.2%) provide results for both sexes, 90 (33.7%) for males only, and 19 (7.1%) for females only. One hundred and ninety-six (73.4%) of the studies included subjects who are under 30 years old (or allowed their inclusion by having no age restriction), while only 31 (11.6%) were restricted to subjects aged 40 or more. Subjects aged 80 years or more were included by 200 (74.9%), while only 16 (6.0%) were restricted to subjects aged 60 or less. Prospective studies were much more likely than case–control studies to specify age restrictions, e.g. 62.1% vs. 16.7% for age 30 years or more, and 48.3% vs. 18.7% for age less than 80 years. Eighty-nine (33.3%) principal studies were conducted in USA or Canada, with 22 (8.2%) in the UK, 25 (9.4%) in Scandinavia, 43 (16.1%) in other parts of Europe, 37 (13.9%) in China, 18 (6.7%) in Japan, 17 (6.4%) in the rest of Asia and 16 (6.0%) elsewhere – in South or Central America, Africa or Australia. Of the 58 prospective studies, all but 12 were conducted in North America, UK or Scandinavia. Of the principal studies, 42 (15.7%) were conducted in countries where at least 75% of cigarettes smoked are made from Virginia tobacco, with 184 (68.9%) carried out where at least 75% of cigarettes are from blended tobaccos. Forty seven (17.6%) started before 1960. Studies starting after 1979 were predominantly (92.4%) case–control. Thirty-six (13.5%) involved at least 1,000 lung cancer cases. Seven (2.6%) were conducted in miners, with a further 11 (4.1%) conducted in other occupational groups with a known relationship with lung cancer. Proxy respondents were used for some subjects in 74 (27.7%), with full histological confirmation of cases reported to be carried out in 68 (25.5%).

Most study groups (i.e. a principal study or one of its subsidiaries) provide some results for the major indices compared to never smokers, 240 (89.9%) for ever smokers, 134 (50.2%) for current smokers and 127 (47.6%) for ex smokers. Many studies provide results for smoking of any product (162 studies, 60.7%) or for cigarettes (147, 55.1%), but less do so for cigarette only smoking (55, 20.6%), smoking of pipes/cigars only (62, 23.2%), mixed smoking (29, 10.9%), or for the cigarette type indices filter/plain cigarette smoking (38, 14.2%), hand-rolled cigarette smoking (15, 5.6%), or mentholated cigarette smoking (3, 1.1%). Though dose–response data are most commonly available by amount smoked (162, 60.7%), many studies provide data by age of starting to smoke (62, 23.2%), duration (77, 28.8%), and time quit (58, 21.7%). Few studies provide data on tar level (11 studies, 4.1%), fraction smoked (9 studies, 3.4%), or butt length (2 studies, 0.7%).

Relative risks

A total of 16,616 RRs were entered, the number recorded per study varying from 1 to 1,029. Of these, 1,266 relate to subsidiary studies. Table 4 summarizes the distribution of various characteristics of the RRs by outcome, sex, study type and location.

Table 4 Distribution of the main characteristics of the relative risks a

Of the total of 16,616 RRs, 71.9% relate to case–control studies, and 93.8% are sex-specific. 40.2% come from North American studies, 36.8% from Europe, 16.7% from Asia, and 6.3% from other continents. 60.9% are unadjusted for potential confounding variables and 18.7% are adjusted for sex and/or age only. 70.1% are given directly or are calculated by standard methods, the rest being derived by more complex methods.

Of the total RRs, 5,061 relate to the major smoking indices, where the denominator is never or non smoking, with 3,614 of these relating to smoking of any product or cigarettes (regardless of pipe or cigar smoking), 678 to cigarette only smoking and 769 to pipe, cigar or mixed smoking. Four hundred and forty-eight relate to cigarette type comparisons, most commonly (303 RRs) to the filter vs. plain comparison. All the 25 RRs for the mentholated/non-mentholated comparison come from North American studies, while none of those for the handrolled/manufactured comparison do. There are 10,921 RRs for dose-related indices, based mainly on 3,625 sets, 2,047 vs. never or non smoking, 1,327 vs. the low level, and 251 vs. current smoking. There are most sets for amount smoked (1,145) and least for butt length (5). For amount smoked, age of starting, duration of smoking, years quit (vs. never and vs. current) there are sufficient numbers of dose–response sets to study variation in RR by sex, study type and continent.

None of the RRs included in the meta-analyses and meta-regressions show more than minor failures of the validation tests used, attributable to rounding errors or small imprecisions or uncertainties in estimating the RRs and CIs. Additional file 3: RRs provides further detail.

For dose-related indices, Additional file 4: Dose Not Meta gives results originally presented in forms unsuitable for meta-analysis.

The meta-analyses and meta-regressions

The main findings are summarized in the following sections, with tables and forest plots. Additional file 5: Detailed Analysis Tables fully presents all the meta-analyses and meta-regressions conducted. The interested reader should first see Additional file 1: Methods, which lists the other files, and describes their content and structure.

Findings are generally presented for three outcomes, referred to as “all lung cancer”, “squamous” or “adeno”. These outcomes are defined in the Methods section, and also in the footnotes to the tables, and allow the inclusion of results based on alternative similar definitions. (Note that the terms “squamous cell carcinoma” and “adenocarcinoma” are only used when reference is made to results specifically for the particular cell type).

A. Risk from ever smoking

Figures 1, 2, 3, 4 and 5 (all lung cancer), Figures 6, 7 (squamous) and Figure 8, 9 (adeno) present the results of the main meta-analyses for ever smoking any product (or cigarette smoking for studies without RRs for any product), based on most-adjusted RRs. Table 5 presents additional results subdivided by level of certain characteristics, while Table 6 presents results of some alternative meta-analyses of ever smoking. From these findings, various observations can be made.

Figure 1
figure 1

Forest plot of ever smoking of any product and all lung cancer – part 1. Table 5 presents the results of a main meta-analysis for all lung cancer based on 328 relative risk (RR) and 95% confidence interval (CI) estimates for ever smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 1, 2, 3, 4, 5. The studies are sorted in order of sex within study reference (REF) within start year of study (START) within continent (CONT), with the exception of study LIU4 shown at the end of Figure 5. In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated.

Figure 2
figure 2

Forest plot of ever smoking of any product and all lung cancer – part 2. This is a continuation of Figure 1, presenting further individual study data included in the main meta-analysis for all lung cancer shown in Table 5. For study DORGAN separate estimates, within sex, are shown for whites then blacks. For study HUMBLE they are shown for non-hispanic whites then hispanics, and for study KELLER for whites then non-whites.

Figure 3
figure 3

Forest plot of ever smoking of any product and all lung cancer – part 3. This is a continuation of Figure 2, presenting further individual study data included in the main meta-analysis for all lung cancer shown in Table 5.

Figure 4
figure 4

Forest plot of ever smoking of any product and all lung cancer – part 4. This is a continuation of Figure 3, presenting further individual study data included in the main meta-analysis for all lung cancer shown in Table 5.

Figure 5
figure 5

Forest plot of ever smoking of any product and all lung cancer – part 5. This is a continuation of Figure 4, presenting the remaining individual study data included in the main meta-analysis for all lung cancer shown in Table 5. Also shown are the combined random-effect estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI. Note that the sizes of the squares for the two estimates from study LIU4 indicate the relative weight of the male and female data, but are not comparable with the sizes of the squares for the other estimates.

Figure 6
figure 6

Forest plot of ever smoking of any product and squamous – part 1. Table 5 presents the results of a main meta-analysis for squamous based on 102 relative risk (RR) and 95% confidence interval (CI) estimates for ever smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 6, 7. The studies are sorted in order of sex within study reference (REF) within start year of study (START) within continent (CONT). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. For study SCHWAR separate estimates, within sex, are shown for whites then blacks.

Figure 7
figure 7

Forest plot of ever smoking of any product and squamous – part 2. This is a continuation of Figure 6, presenting the remaining individual study data included in the main meta-analysis for squamous shown in Table 5. Also shown are the combined random-effect estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.

Figure 8
figure 8

Forest plot of ever smoking of any product and adeno – part 1. Table 5 presents the results of a main meta-analysis for adeno based on 107 relative risk (RR) and 95% confidence interval (CI) estimates for ever smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 8, 9. The studies are sorted in order of sex within study reference (REF) within start year of study (START) within continent (CONT). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. For study SCHWAR separate estimates, within sex, are shown for whites then blacks.

Figure 9
figure 9

Forest plot of ever smoking of any product and adeno – part 2. This is a continuation of Figure 8, presenting the remaining individual study data included in the main meta-analysis for adeno shown in Table 5. Also shown are the combined random-effect estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.

Table 5 Main meta-analyses for ever smoking of any product (or cigarettes if any product not available) a
Table 6 Some alternative meta-analyses for ever smoking compared to those in Table 5

First, the RRs for all three outcomes are markedly heterogeneous. As shown in Table 5, H is estimated as 22.84 for all lung cancer, 5.17 for squamous and 8.78 for adeno (p < 0.001). Individual RRs vary up to 125.27 for all lung cancer (study STUCKE for males), 92.66 for squamous (ABRAHA/males), and 34.45 for adeno (SCHWAR/males). Based on random-effects estimates, a positive association is seen, strongest for squamous (RR 10.47, 95% CI 8.88-12.33, based on 102 RRs), but also clearly evident for all lung cancer (5.50, 5.07-5.96, n = 328) and adeno (2.84, 2.41-3.35, n = 107). Although the strength of association varies markedly by study, the consistency of direction is clear, with only two of the all lung cancer RRs, none of the 102 squamous RRs, and nine of the 107 adeno RRs below 1.0.

As shown in Table 6, the overall estimates for each outcome were virtually unchanged by using least-adjusted rather than most-adjusted estimates. They were slightly increased by restricting attention to estimates using a more precise outcome definition, the random-effects estimates changing to 5.59 (5.15-6.07) for the 317 estimates specifically for all lung cancer, 11.56 (9.68-13.81) for the 74 estimates specifically for squamous cell carcinoma, and 2.99 (2.49-3.58) for the 87 estimates specifically for adenocarcinoma. The overall estimates for each outcome were virtually unchanged when RRs for ever smoking cigarettes were preferred to RRs for ever smoking any product. This is partly due to many studies providing only one type of RR, so that for all lung cancer, for example, 250 of the 328 RRs are common to both meta-analyses. A much smaller number of estimates were available for cigarette only smoking; RRs from these were slightly higher: 6.45 (5.41-7.70, n = 54) for all lung cancer, 11.50 (7.47-17.69, n = 11) for squamous, and 2.87 (1.49-5.55, n = 11) for adeno. Estimates were also extracted specifically for populations of age <56, 50–70 or 65+ years (with age determined at baseline for prospective studies). As shown in Table 6, data were rather limited for squamous and adeno, particularly for older populations. For all lung cancer, the three RRs: 6.57 (4.94-8.74, n = 38) for age <56 years, 6.46 (4.99-8.35, n = 31) for age 50–70 years, and 5.48 (4.59-6.55, n = 37) for age 65+ years were all consistent with the overall RR of 5.50, with no clear trend.

Returning to the main meta-analysis (most-adjusted and preferring ever smoking any product), there is a large variation between RRs in the weight they contribute to the analysis. This is very marked for all lung cancer. Here the 328 estimates provided a combined weight of 19,346 (mean 59.0), but the male and female estimates from study LIU4 together contributed a weight of 9,846, 50.9% of the total. Omitting these two estimates substantially reduced the heterogeneity, H falling from 22.84 to 12.54. The next largest weights were 1,550 in study STOCKW (sexes combined), 443 in BROWN2 (males) and 428 in BROWN2 (females). For squamous, the total weight was 1,000 for the 102 RRs (mean 9.8). The largest contributors to this were 164 for BROWN2/males, 90 for BROWN2/females, 52 for LUBIN2/males and 47 for LUBIN2/females, together contributing 35% of the total weight. For adeno, the total weight was 1,514 for the 107 RRs (mean 14.1). Again, BROWN2 and LUBIN2 were the largest contributors, providing, respectively, 24% and 6% of the total weight.

In investigating sources of heterogeneity, variation was studied firstly using a univariable approach, the results for the characteristics considered in Table 5 being summarized below, based on the random-effects estimates.

Sex

For all three outcomes, RRs were always somewhat lower for females than for males or for sexes combined, though the variation by sex was not significant (p ≥ 0.1) for squamous.

Location

For all three outcomes, RRs were lower from studies conducted in Europe and Asia than from studies conducted in North America. While for all lung cancer and adeno RRs were noticeably lower in Asia than in Europe, this difference was not evident for squamous. The difference in RRs by continent was very marked and highly significant (p < 0.001) for all lung cancer and adeno, but less marked, though still significant (p < 0.01) for squamous.

Start year of study

For all lung cancer and squamous, variation by start year was not significant (p ≥ 0.05) although there was some tendency for RRs to be higher in more recent studies. For adeno, the variation was significant (p < 0.01) but there was no clear trend.

Study type

For all three outcomes, RRs were somewhat lower for case–control studies than for prospective studies (or other study designs where the smoking data were collected before lung cancer diagnosis). However, the difference was never statistically significant (p ≥ 0.05).

National cigarette tobacco type

For all three outcomes, there was significant (p < 0.01 or< 0.001) variation. This was mainly due to low estimates in the “other” group, which mainly included results from China. For all lung cancer, RRs for Virginia (6.24, 5.16-7.54, n = 50) and blended (6.30, 5.79-6.87) were quite similar. For squamous and adeno, there were limited results for Virginia, and no clear difference from blended was evident.

Any proxy use

There was some evidence that RRs were higher where proxy respondents were used for squamous (p < 0.05) and adeno (p < 0.1), but not for all lung cancer.

Full histological confirmation

RR estimates were somewhat higher where full histological confirmation of diagnosis was a study requirement, but this was only significant at p < 0.05 for all lung cancer.

Number of cases

Some tendency for RRs to increase with increasing number of cases was evident for all three outcomes, but variation in number of cases was only significant for all lung cancer (p < 0.01).

Smoking product

The analyses in Table 5 are based on a preference order of any product, cigarettes (ignoring other products) and cigarettes only. For all lung cancer, where 205 of the 328 estimates were for any product, 114 were for cigarettes and 9 for cigarettes only, there was no evidence that the RRs included varied by smoking product. For squamous and adeno (both p < 0.001), however, RRs were lowest for smoking any product, intermediate for cigarettes, and highest for cigarettes only (though based on only two RRs for cigarettes only for each outcome).

Unexposed base

RRs were somewhat higher where the unexposed base group was never cigarettes than when it was never any product, though this was only significant (p < 0.05) for adeno. This result is somewhat counter-intuitive, as lower RRs might be expected where the base (never cigarettes) includes some smokers (pipe/cigar only), and probably arises from the strong correlation between the definitions of smoking product and unexposed base. Two combinations – any product vs. never any product (n = 203) and cigarettes vs. never cigarettes (n = 90) – form a large proportion of the total RRs (with any product vs never cigarettes not a valid possibility).

Number of adjustment factors

There was no evidence that RR estimates varied by whether they were adjusted for 0, 1 or 2+ potential confounding variables.

The full meta-analysis (see Additional file 5: Detailed Analysis Tables) also includes results by levels of some other characteristics. In an attempt to evaluate the independent role of a whole range of characteristics, preliminary meta-regression analyses were conducted for each outcome (results not shown). As a result, it was decided to present findings for a fixed model involving six major characteristics (see Table 7), test the effect of each by deleting each of the six individually from the fixed model (and also by allowing each to enter a step-wise model in order of significance), and test the effect of a range of other characteristics by adding each individually into the fixed model (see Additional file 5: Detailed Analysis Tables). The main conclusions to be drawn from these analyses are summarized below.

Table 7 Meta-regression results for ever smoking of any product (or cigarettes if any product not available) a

For all lung cancer, by far the strongest source of variation was location, with the overall heterogeneity reduced from 22.84 per d.f. to 7.02 per d.f. after including location only into the model. As noted earlier this was mainly due to relatively high RRs in North America and low RRs in Asia. Other clear effects were also associated with start year of study (p < 0.001, higher risks in later studies, much more clearly evident than in the univariable analyses in Table 5), study type (p < 0.01, higher risks in prospective studies) and number of cases (p < 0.001, higher risks in larger studies). There was no significant effect of sex, and the weakly significant (p < 0.05) effect for number of adjustment factors was associated with an erratic pattern, with lower RRs where the number of factors was 1, and higher RRs where it was 0 or 2+. The heterogeneity for the fixed model including all the six characteristics included in Table 7 was 4.72 per d.f., with the model explaining 80.5% of the overall variation between the RRs. Inspection of standardized residuals revealed eight estimates where the value was outside the range +/− 2.5 SEs : MILLS/males (RR 1.33, fitted 3.35), LOMBA2/females (RR 1.33, fitted 5.07), TIZZAN/males (RR 1.93, fitted 3.50), WANG4/males (RR 1.16, fitted 2.01), PERNU/males (RR 8.93, fitted 4.37), LUBIN2/males (RR 8.50, fitted 5.47), BOFFET/males (RR 14.20, fitted 7.78) and JUSSAW/males (RR 16.83, fitted 3.77). Only two other characteristics studied significantly (p < 0.05) improved the fit of the model, both related to study location. One was a variable subdividing “Other Europe” (i.e. other than UK and Scandinavia) into five smaller regions, with risk relatively low in the Balkans (Greek and Turkish studies) and relatively high in multiregional studies compared with the rest, and the other a variable subdividing “Other Asia” (i.e. other than China or Japan) into three smaller regions, with risk higher in India compared to Hong Kong and the rest of Asia (Taiwan, Thailand, Singapore and South Korea). No independent effect was evident for national cigarette tobacco type. Additional analysis (data not shown) confirmed the strong independent effect of start year of study separately within studies conducted in North America, Europe and Asia, though the tendency for higher RRs in more recent studies was stronger in North America than in Europe, and the pattern of variation was more erratic for Asia. It also confirmed the strong independent effects of location and start year of study separately for males and for females.

For squamous, start year of study was the most important factor, on its own reducing the heterogeneity from 5.17 to 4.33 per d.f. (p < 0.001). Other significant characteristics included location (p < 0.001), with RRs high in North America and low in China, and number of cases (p < 0.05), with higher RRs in larger studies. Number of adjustment factors was also significant (p < 0.05), but the pattern was erratic and not the same as for all lung cancer. Though the pattern of results by study type was similar to that for all lung cancer, this characteristic did not contribute significantly to the model. The heterogeneity for the fixed model (Table 7) was 3.18 per d.f., the model explaining 49.9% of the overall variation. Two standardized residuals were outside the range +/− 2.5 SEs : STAYNE/males (RR 3.47, fitted 10.50) and LUBIN2/males (RR 16.66, fitted 8.41). Two other characteristics significantly improved the model fit. One was national cigarette tobacco type, with RRs higher where flue-cured Virginia tobacco was smoked, than where blended tobacco was smoked. Also, RRs were higher (p < 0.01) where they had been derived by a relatively complex method (see Methods) than where they were as reported originally, or derived by more standard methods.

For adeno, location was the most important factor, on its own reducing the heterogeneity from 8.78 to 4.36 per d.f. (p < 0.001), with the pattern of results (RRs high in North America and low in Asia) similar to that for all lung cancer. As for all lung cancer, there was variation by start year of study (p < 0.05) and number of cases (p < 0.05), with RRs higher for recent and larger studies. RRs were again higher for prospective studies, but here the difference was not significant (p ≥ 0.05). Here, variation by sex was significant (p < 0.05) with RRs higher for males than females, but number of adjustment factors was not (p ≥ 0.05). The heterogeneity for the fixed model (Table 7) was 3.27 per d.f., the model explaining 69.5% of the overall variation. Two standardized residuals were outside the range +/− 2.5 SEs : LOMBA2/females (RR 0.53, fitted 2.32) and WYNDER6/females (RR 13.99 fitted 6.22). Four other characteristics significantly improved the model fit. One was “Other Asia” (p < 0.05) where RRs were high in India (based on a single RR from JUSSAW) and relatively low in Hong Kong, Taiwan, Thailand, Singapore and South Korea. National cigarette tobacco type was also significant (p < 0.05), with RRs for blended higher than for Virginia, opposite to the finding for squamous. RRs were also lower where there was any use of proxy respondents (p < 0.05). Also, RRs varied (p < 0.001) by the detailed definition of adenocarcinoma used. This appeared to be mainly because of a low RR for “not squamous or undifferentiated”, a definition used only for LOMBA2/females, where the standardized residual of −3.721 SEs was the largest for any RR (see also above).

The fixed model (Table 7) considered how RR estimates varied by six main characteristics and additional analyses (see Additional file 5: Detailed Analysis Tables) tested whether adding in further characteristics improved the model fit. Characteristics which did not improve the fit for any of the three outcomes considered included whether there was adjustment for specific factors (such as age), the age of the subjects studied, the definition of smoking product, the definition of the unexposed base, whether the study was conducted in a population working in a risky occupation, and whether the study procedures required full histological confirmation.

B. Risk from current smoking

Figures 10, 11, 12 (all lung cancer), Figure 13 (squamous) and Figure 14 (adeno) present the results of the main meta-analyses for current smoking of any product. As before, RRs for smoking of cigarettes are used if RRs for any product smoking are not available, and RRs are most-adjusted. For prospective studies, current smoking refers to smoking status as at baseline. Table 8 presents additional results by level of the same set of characteristics considered in Table 5, while Table 9 presents results of alternative meta-analyses of current smoking.

Figure 10
figure 10

Forest plot of current smoking of any product and all lung cancer – part 1. Table 8 presents the results of a main meta-analysis for all lung cancer based on 195 relative risk (RR) and 95% confidence interval (CI) estimates for current smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 10, 11, 12. The studies are sorted in order of sex within study reference (REF) within start year of study (START) within continent (CONT). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. For study DORGAN separate estimates, within sex, are shown for whites then blacks. For study HUMBLE they are shown for non-hispanic whites then hispanics, and for study SCHWAR for whites then non-whites.

Figure 11
figure 11

Forest plot of current smoking of any product and all lung cancer – part 2. This is a continuation of Figure 10, presenting further individual study data included in the main meta-analysis for all lung cancer shown in Table 8. For study KELLER separate estimates, within sex, are shown for whites then non-whites.

Figure 12
figure 12

Forest plot of current smoking of any product and all lung cancer – part 3. This is a continuation of Figure 11, presenting the remaining individual study data included in the main meta-analysis for all lung cancer shown in Table 8. Also shown are the combined random-effect estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI. For study KREUZE separate estimates, within sex, are shown for age ≤ 45 and 55–69.

Figure 13
figure 13

Forest plot of current smoking of any product and squamous. Table 8 presents the results of a main meta-analysis for squamous based on 41 relative risk (RR) and 95% confidence interval (CI) estimates for current smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale. The studies are sorted in order of sex within study reference (REF) within start year of study (START) within continent (CONT). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. Also shown are the combined random-effect estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.

Figure 14
figure 14

Forest plot of current smoking of any product and adeno. Table 8 presents the results of a main meta-analysis for adeno based on 44 relative risk (RR) and 95% confidence interval (CI) estimates for current smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale. The studies are sorted in order of sex within study reference (REF) within start year of study (START) within continent (CONT). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. Also shown are the combined random-effect estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.

Table 8 Main meta-analyses for current smoking of any product (or cigarettes, if any product not available) a
Table 9 Some alternative meta-analyses for current smoking compared to those in Table 8

As for ever smoking, the RRs for all three outcomes are heterogeneous (p < 0.001), with the largest estimates seen being 104.50 for all lung cancer (STUCKE/males), 78.91 for squamous (CPSII/females), and 21.70 for adeno (OSANN/males). The random-effects estimates (all lung cancer 8.43, 95% CI 7.63-9.31, n = 195; squamous 16.91, 13.14-21.76, n = 41; adeno 4.21, 3.32-5.34, n = 44) are all clearly positive, larger than the corresponding estimates for ever smoking, and also show a stronger relationship with squamous than adeno. Similarly to ever smoking, the individual RRs are virtually all above 1.0, though varying substantially. The estimates are again little affected (Table 9) by preferring least, rather than most, adjusted RRs, by restricting to a more precise outcome definition, or by preferring RRs for current smoking of cigarettes to those for current smoking of any product. Again estimates based specifically on cigarette only smoking were slightly higher than those shown in Table 8 – 9.52 (7.89-11.49, n = 38) for all lung cancer, 20.85 (14.84-29.29, n = 8) for squamous, and 6.05 (3.69-9.92, n = 7) for adeno. More so than in Table 6, data by age were rather limited for squamous and adeno. For all lung cancer estimates were 6.57 (4.68-9.23, n = 25) for age <56 years, 9.62 (7.10-13.05, n = 24) for age 50–70 years, and 9.07 (6.83-12.04, n = 27) for age 65+ years, no clear trend being evident. Table 9 also includes results for the comparison current vs. non-current smokers. The RRs here (3.75, 3.48-4.03 for all lung cancer; 4.71, 3.84-5.79 for squamous; 2.46, 2.07-2.93 for adeno) were markedly lower than the corresponding estimates for current vs. never smokers, reflecting the increased risk in ex-smokers described later (see section D below).

For the main meta-analysis, the studies contributing most to the total weight for current smoking for all lung cancer were STOCKW/sexes combined (17.8% of the total of 6,750) followed by BROWNS/males (6.0%) and BROWNS/females (5.4%). BROWNS was the major contributor for both squamous and adeno, with the two sex-specific results contributing 36.0% of the total weight of 646 for squamous, and 30.0% of the total weight of 1,017 for adeno. The huge LIU4 study did not provide results for current smoking.

For the characteristics considered in Table 8, the pattern of variation has a number of similarities to that for ever smoking in Table 5. Thus, as for ever smoking, RRs for all three outcomes tend to be higher for males, for North American studies, and where the unexposed base is never cigarettes, and smaller for older studies and smaller studies, with no clear variation by extent of adjustment. A tendency for RRs to be higher where data may be reported by proxy respondents seems somewhat stronger for current smoking, although based on few estimates for squamous and adeno. A tendency for RRs to be higher where the smoking product is cigarettes or cigarettes only than when it is any product is also evident, though not for squamous, whereas it was seen most clearly in squamous for ever smoking. There is also some indication that RRs are higher in prospective studies, though interestingly not for all lung cancer. Whereas for ever smoking, RRs for studies requiring full histological confirmation were higher than for those that did not for all three outcomes, the tendency was in the reverse direction for squamous and adeno for current smoking. For national cigarette tobacco type, current smoking RRs for squamous and adeno are virtually all for blended, so are unhelpful. For all lung cancer, RRs are quite similar for Virginia and blended, the significant (p < 0.001) variation shown in Table 8 arising because of the low RRs in the “Other” group, mainly for China.

As for ever smoking, meta-regression analyses were conducted to give further insight, the results from the same fixed model including six characteristics being summarized in Table 10. Based on these results and those for other characteristics in Additional file 5: Detailed Analysis Tables various conclusions can be drawn.

Table 10 Meta-regression analyses for current smoking of any product (or cigarettes if any product not available) a

For all lung cancer, as was the case for ever smoking RRs, by far the strongest source of variation in current smoking RRs was location with relatively high risks in North America and low risks in Asia. The overall heterogeneity reduced from 13.76 per d.f. to 6.73 per d.f. after including location only into the model. Higher risks were also seen in the fixed model in more recent studies (p < 0.001) and for males than females (p < 0.01). There was some evidence (p < 0.1) of higher RRs in larger studies and in prospective studies, but no association was seen with the number of adjustment factors. The heterogeneity for the fixed model shown in Table 10 was 4.68 per d.f., with the model explaining 69.3% of the overall variation between the current smoking RRs. Four standardized residuals were outside the range +/− 2.5 SEs : BROWN2/males (RR 11.30, fitted 15.86), TIZZAN/males (RR 1.90, fitted 3.68), CPSI/females (RR 3.20, fitted 6.59) and KREUZE/males aged 55–69 (RR 41.86, fitted 11.85). No other characteristic significantly improved the fit when added to the fixed model. Additional analysis (data not shown) confirmed the effect of start year of study separately for North America and Europe (though no such relationship was seen in Asia) and also confirmed that the effects of location and start year of study were evident separately for males and for females.

For squamous and adeno, numbers of current smoking RRs (41 and 44 respectively) were much lower than those for all lung cancer, with no data for China or the United Kingdom, or for national cigarette type “other”. For squamous, only two characteristics in the fixed model (Table 10) were significant, and then only at p < 0.05, and one of these was number of adjustment factors, where the pattern of response was erratic. Location was the other, with RRs again highest in North America and lowest in Asia. There were no estimates with large standardized residuals, and no other characteristic improved the model fit.

For adeno, three of the characteristics considered in Table 10 contributed significantly to the model, sex (p < 0.001), location (p < 0.001) and start year of study (p < 0.05), with the direction of effect similar to that noted earlier for ever smoking. There were no large standardized residuals, and the only additional characteristic which improved the model fit (p < 0.05) related to somewhat lower RRs being seen for studies with full histological confirmation.

For none of the three outcomes did characteristics associated with detailed location, national cigarette tobacco type, the precise definition of the outcome, adjustment for specific factors, the definitions of smoking product or of the unexposed base, whether the study was conducted in a population working in a risky occupation or whether proxy respondents were used, add significantly to the model.

C. Risk from ever or current smoking

In an attempt to incorporate data from a greater number of studies, additional analyses were carried out for ever/current smoking and for current/ever smoking. The meta-analysis RRs are shown in Table 11. The number of studies included increased from 236 to 242 for all lung cancer, from 73 to 78 for squamous and from 75 to 81 for adeno, compared with Table 5. Note that the slightly higher number of RR estimates in the current/ever analysis arises from inclusion there of more sex-specific results.

Table 11 Main meta-analyses for current or ever smoking of any product (or cigarettes, if not available) a

As many of the RRs are common between the specific ever smoking analyses in Table 5 and the ever/current smoking analyses in Table 11, the meta-analysis RRs tend to be quite similar. However those for current/ever smoking are intermediate between those specifically for ever smoking (Table 5) and those specifically for current smoking (Table 8). For example, for all lung cancer, random-effects estimates are 5.50 (95% CI 5.07-5.96, n = 328) for ever smoking, 5.48 (5.07-5.93, n = 342) for ever/current smoking, 6.20 (5.68-6.77, n = 344) for current/ever smoking, and 8.43 (7.63-9.31, n = 195) for current smoking. The pattern of RRs by level of the characteristics studied for both ever/current and current/ever smoking tends to be quite similar to that for the specific analyses. Results for ever or current smoking by level of selected characteristics are therefore only presented in Additional file 5: Detailed Analysis Tables.

D. Risk from ex smoking

Figures 15, 16, 17 (all lung cancer), Figure 18 (squamous) and Figure 19 (adeno) present the results of the main meta-analyses for ex smoking of any product (or cigarettes if any product was not available), based on most-adjusted RRs. Some results by levels of characteristics are shown in Table 12.

Figure 15
figure 15

Forest plot of ex smoking of any product and all lung cancer – part 1. Table 12 presents the results of a main meta-analysis for all lung cancer based on 182 relative risk (RR) and 95% confidence interval (CI) estimates for ex smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 15, 16, 17. The studies are sorted in order of sex within study reference (REF) within start year of study (START) within continent (CONT). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. For studies DORGAN and KELLER separate estimates, within sex, are shown for whites then blacks. For study HUMBLE they are shown for non-hispanic whites then Hispanics. For study KELLER the estimate shown for females is for whites.

Figure 16
figure 16

Forest plot of ex smoking of any product and all lung cancer – part 2. This is a continuation of Figure 15, presenting further individual study data included in the main meta-analysis for all lung cancer shown in Table 12. For study KELLER the estimate shown for females is for non-whites.

Figure 17
figure 17

Forest plot of ex smoking of any product and all lung cancer – part 3. This is a continuation of Figure 16, presenting the remaining individual study data included in the main meta-analysis for all lung cancer shown in Table 12. Also shown are the combined random-effect estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI. For study KREUZE separate estimates, within sex, are shown for age ≤ 45 and 55–69.

Figure 18
figure 18

Forest plot of ex smoking of any product and squamous. Table 12 presents the results of a main meta-analysis for squamous based on 33 relative risk (RR) and 95% confidence interval (CI) estimates for ex smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale. The studies are sorted in order of sex within study reference (REF) within start year of study (START) within continent (CONT). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. Also shown are the combined random-effect estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.

Figure 19
figure 19

Forest plot of ex smoking of any product and adeno. Table 5 presents the results of a main meta-analysis for adeno based on 34 relative risk (RR) and 95% confidence interval (CI) estimates for ex smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale. The studies are sorted in order of sex within study reference (REF) within start year of study (START) within continent (CONT). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. Also shown are the combined random-effect estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.

Table 12 Main meta-analyses for ex smoking of any product (or cigarettes, if any product not available) a

Again the RRs are markedly heterogeneous (p < 0.001 for all three outcomes), ranging up to 135.69 for all lung cancer (STUCKE/males), 22.90 for squamous (OSANN/males) and 13.10 for adeno (OSANN/males). The random-effects estimates (all lung cancer 4.30, 95% CI 3.93-4.71, n = 182, squamous 8.74, 6.94-11.01, n = 33, and adeno 2.85, 2.20-3.70, n = 34), though all clearly positive, are smaller than the corresponding estimates for current smoking. Individual RRs are only very occasionally below 1.0 and never significantly so. Estimates are little affected by using the more specific definition of each outcome, preferring least-adjusted RRs to most-adjusted RRs, or preferring RRs for ever smoking cigarettes to those for ever smoking any product. RRs for ever smoking cigarettes only were too few for useful analysis for squamous and adeno, but for all lung cancer were similar to those for ever smoking any product. Fuller details are given in the Additional file 5: Detailed Analysis Tables.

For the main meta-analysis of ex smoking, the studies contributing most to the total weight for all lung cancer were STOCKW/sexes combined (22.4% of the total of 4,739), followed by BROWNS/males (8.5%) and BROWNS/females (6.5%). BROWNS was the major contributor for both squamous and adeno, with the two sex-specific results contributing 49.4% of the total weight of 446 for squamous, and 45.2% of the total weight of 619 for adeno.

For the characteristics considered in Table 12 the sources of variation for all lung cancer are generally quite similar to those seen for ever smoking in Table 5 and for current smoking in Table 8. Thus, RRs are higher for males, for North America, for more recent studies and for larger studies. Interestingly RRs are clearly lower for prospective than for case–control studies. Numbers of ex smoking RRs are less for squamous (33) and for adeno (34) than for all lung cancer (182), but nevertheless some associations are evident in relation to location for adeno, to study type for squamous, to number of adjustment factors for adeno, and to number of cases, smoking product and unexposed base for both squamous and adeno. Meta-regression analyses were not attempted for ex smoking.

E. Risk from smoking specific products compared to smoking of any product

Table 13 summarizes the results of meta-analyses for all lung cancer for cigarette only smokers, smokers of pipes/cigars only, smokers of pipes only, and smokers of cigars only. In each analysis, the base is never smokers of any product. The results for ever smoking of pipes/cigars only are also shown in Figure 20.

Table 13 Meta-analyses for smoking of cigarettes, cigars and pipes (all lung cancer) a
Figure 20
figure 20

Forest plot of ever pipe and/or cigar smoking and all lung cancer. Table 13 presents the results of a meta-analysis for all lung cancer based on 56 relative risk (RR) and 95% confidence interval (CI) estimates for ever pipe and/or cigar smoking. The individual study estimates are shown numerically and graphically on a logarithmic scale. The studies are sorted in order of sex within study reference (REF) within start year of study (START) within continent (CONT). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. Also shown are the combined random-effects estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.

For ever smoking, current smoking and ex smoking the random-effects RRs are similarly elevated for pipes/cigars, pipes only and cigars only, but to a markedly lesser extent than for cigarettes only. As for cigarette smoking, RRs for pipe and cigar smoking are clearly higher for current smokers than for ex smokers.

Available results for squamous and adeno are limited, and mainly for ever smoking. For pipe and/or cigar smoking, the RR for squamous (3.72, 95% CI 1.95-7.10, n = 8) is somewhat higher than that for all lung cancer (2.92, 2.38-3.57, n = 38), but the RR for adeno is not elevated (0.93, 0.62-1.40, n = 7). The lack of association of adeno with pipe and cigar smoking is also evident in the RRs for pipes only (0.50, 0.23-1.10, n = 4) and for cigars only (0.55, 0.11-2.88, n = 3).

The results for pipe and cigar smoking mainly apply to males, as the few available estimates for females have wide variability. The increased risk in smokers of pipes and cigars is evident in each location studied, though data for Asia are extremely sparse. Unlike for cigarettes, higher RRs are seen for Scandinavia (7.02, 4.72-10.44, n = 6) and for Other Europe (5.17, 2.91-9.19, n = 8) than for North America (2.27, 1.79-2.89, n = 26) or the UK (4.32, 2.73-6.84, n = 11). These results are for ever/current smoking, with the full results given in Additional file 5: Detailed Analysis Tables.

Table 13 also shows results for lung cancer for mixed smokers. For ever, current and ex smoking, the random-effects RRs are slightly, but not significantly, higher than those for smokers of cigarettes only. Available results for squamous and adeno are again limited, and mainly for ever smokers. The RRs for squamous (9.78, 4.94-19.35, n = 6) and for adeno (2.48, 1.25-4.95, n = 6) do not clearly differ from the RRs for squamous (11.09, 7.19-17.09, n = 10) and for adeno (2.63, 1.32-5.24, n = 10) for smokers of cigarettes only.

F. Risk by type of cigarette smoked

Table 14 summarizes results by type of cigarette smoked. For filter and plain cigarette smoking results are shown for three comparisons, including, for studies where there is a choice, the nearest available equivalents to only filter vs. only plain (with results for all lung cancer also shown in Figure 21), ever filter vs. only plain, and only filter vs. ever plain. Results are also shown for the comparison of handrolled and manufactured cigarette smoking, and for mentholated vs. non-mentholated cigarette smoking, with results for all lung cancer also shown in Figures 22 and 23.

Table 14 Meta-analyses by type of cigarette smoked a
Figure 21
figure 21

Forest plot of only filter vs. only plain cigarette smoking and all lung cancer. Table 14 presents the results of a meta-analysis for all lung cancer based on 42 relative risk (RR) and 95% confidence interval (CI) estimates for only filter vs. only plain cigarette smoking. The individual study estimates are shown numerically and graphically on a logarithmic scale. The studies are sorted in order of sex within study reference (REF) within start year of study (START) within continent (CONT). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. Also shown are the combined random-effects estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.

Figure 22
figure 22

Forest plot of handrolled vs. manufactured cigarette smoking and all lung cancer. Table 14 presents the results of a meta-analysis for all lung cancer based on 20 relative risk (RR) and 95% confidence interval (CI) estimates for handrolled vs. manufactured cigarette smoking. The individual study estimates are shown numerically and graphically on a logarithmic scale. The studies are sorted on sex within study reference (REF) within start year of study (START) within continent (CONT). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. Also shown are the combined random-effects estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.

Figure 23
figure 23

Forest plot of mentholated vs. non-mentholated cigarette smoking of any product and all lung cancer. Table 14 presents the results of a meta-analysis for all lung cancer based on six relative risk (RR) and 95% confidence interval (CI) estimates for mentholated vs. non-mentholated cigarette smoking. The individual study estimates are shown numerically and graphically on a logarithmic scale sorted on sex within study reference (REF) within start year of study (START) within continent (CONT). The studies are sorted in order of sex within study reference (REF). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside the range allocated. Also shown are the combined random-effects estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.

The random-effects RRs show a reduction in risk for only filter vs. only plain cigarette smoking that is significant for all lung cancer (RR 0.69, 95% CI 0.61-0.78, n = 42), and squamous (0.52, 0.40-0.68, n = 13), though not for adeno (0.84, 0.66-1.08, n = 10). The alternative comparisons for filter and plain, where only a third to a half of the RRs included actually differ, show clear reductions for all lung cancer and squamous associated with filter cigarette smoking, though no difference for adeno (see Table 14). The reductions for all lung cancer and squamous are evident in both sexes and all continents (see Additional file 5: Detailed Analysis Tables).

The risk associated with handrolled smoking is greater than that with manufactured cigarette smoking, with RRs of 1.29 (1.12-1.49, n = 20) for all lung cancer and 1.62 (1.18-2.21, n = 5) for squamous. The RR of 2.09 (0.83-5.25, n = 4) for adeno is based on very heterogeneous estimates, varying from 0.43 to 8.76, and allows no clear conclusion. As results for females are limited, and have wide variability, the conclusions mainly apply to males. The estimated RR for all lung cancer is greater than 1 in all locations studied, though not always statistically significant. However, there are no data from North America.

Data on mentholated cigarette smoking are limited, particularly by histological type. For all lung cancer, the RR of 0.98 (0.80-1.20, n = 6) is consistent with no effect of mentholation on risk, five RR estimates close to or below 1.0, counterbalancing one reported significant increase in males for study KAISER of 1.45 (1.03-2.02). There is some evidence (p < 0.05) of heterogeneity by sex with estimates of 1.15 (0.93-1.43, n = 3) for males, and 0.78 (0.63-0.98, n = 3) for females.

G. Risk by amount smoked

Table 15 summarizes the results of meta-analyses using RRs categorized by number of cigarettes (or cigarette equivalents) smoked per day and based on data for ever/current smoking and for smoking of any product (or cigarettes if not available). These are based on those 140 studies for all lung cancer, 36 for squamous, and 34 for adeno which provided data that could be used in the meta-analyses. For all three outcomes, results are shown for one of the sets of “key values” (see Methods). For all lung cancer, squamous and adeno, a clear increase is seen for RRs for categories including 5, but not 20, cigarettes/day, with the meta-analysis RR increasing monotonically with increasing amount smoked. Random-effects estimates for categories including 45, but not 20 cigarettes/day, are 13.69 (11.80-15.89, n = 128) for all lung cancer, 27.65 (20.42-37.44, n = 37) for squamous and 4.80 (3.29-7.01, n = 34) for adeno. The increase with amount smoked is also clearly evident when an alternative set of key values (1, 10, 20, 30, 40, 999) is used, though numbers of available RRs are quite sparse for the higher key values, when least-adjusted RRs are considered, and in both sexes (see Additional file 5: Detailed Analysis Tables). The key value analyses do not use results for all the dose–response data available, as a number of the studies use broad dose–response categories (such as 1–20 or 20+ cigs/day) which span more than one of the key values. Additional file 5: Detailed Analysis Tables also includes results for alternative definitions of smoking status and product smoked, which show a similarly clear dose–response. For example, for current smoking of any product, the RRs for squamous rise from 9.92 (7.41-13.28, n = 8) for key value 5 cigs/day to 39.16 (23.67-64.79, n = 12) for key value 45 cigs/day. Additional file 4: Dose Not Meta also includes available results for some other studies which present dose–response data in a form that cannot readily be included in the meta-analyses (e.g. where the only available comparison is with an inappropriate base group). These results do not appear inconsistent with those summarized in Table 15.

Table 15 Meta-analyses for number of cigarettes smoked a

Dose–response by amount smoked was investigated for pipe and cigar smoking, but the number of estimates available was small, and referred only to males. However, there was some evidence of dose–response. Thus for all lung cancer, one can compare RRs for cigar only smoking for the highest (8.21, 4.36-15.49, n = 6) and lowest exposure groups (1.84, 1.22-2.79, n = 5), and can also compare RRs for pipe only smoking for the highest (5.99, 3.57-10.04, n = 9) and lowest exposure groups (3.68, 2.75-4.93, n = 8).

H. Risk by age of starting to smoke

Table 16 summarizes meta-analysis results for age of starting to smoke based on data for ever/current smoking and for smoking of any product (or cigarettes if not available). Random-effects RRs for earliest compared to latest starting, and selecting results least-adjusted for other aspects of smoking, are significantly elevated for all lung cancer (2.35, 2.08-2.65, n = 73), squamous (2.23, 1.66-2.98, n = 18) and adeno (1.99, 1.48-2.67, n = 17). Alternatively selecting results most-adjusted for other aspects of smoking, the RR for all lung cancer is 2.20 (1.96-2.47, n = 73). The increase in risk with earlier starting is consistent with the results of the key value analyses, with, for example, random-effects estimates relative to never smokers for squamous rising from 11.06 (6.87-17.81, n = 14) for categories including 26 years but not including 18 years to 31.07 (17.93-53.85, n = 6) for categories including 14, but not 18 years. As seen in Additional file 5: Detailed Analysis Tables, a similar pattern is generally seen for other definitions of smoking status and product smoked, although data for smokers of pipes and/or cigars are very limited.

Table 16 Meta-analyses for age started to smoke a

I. Risk by duration of smoking

Table 17 is laid out similarly to Table 16 and also presents results for ever/current smoking. Random-effects RRs for longest compared to shortest duration of smoking, and selecting results least adjusted for other aspects of smoking, are significantly elevated for all lung cancer (3.56, 2.90-4.35, n = 76), squamous (3.93, 3.10-4.97, n = 27) and adeno (2.64, 2.04-3.43, n = 23). Alternatively selecting results most adjusted for other aspects of smoking, the RR for all lung cancer is 3.00 (2.57-3.49, n = 77). The increase in risk with longer duration is consistent with the results of the key value analyses, with, for example, random-effects estimates for all lung cancer rising from 2.48 (2.09-2.95, n = 55) for categories including 20 years but not including 35 years to 10.13 (7.66-13.39, n = 45) for categories including 50, but not 35 years. A clear trend of risk with increasing duration is also seen for other definitions of smoking status and product smoked (see Additional file 5: Detailed Analysis Tables). Data for pipe and cigar smoking are limited, though even so there is some evidence of a trend. Thus, for all lung cancer longest to shortest RRs are elevated, both in smokers of pipes only (4.32, 1.57-11.89, n = 5) and smokers of cigars only (2.43, 1.02-5.79, n = 3).

Table 17 Meta-analyses for duration of smoking a

J. Risk by duration of quitting (vs. never smoking)

Table 18 presents results for duration of quitting (vs. never smoking) based on results for smoking of any product (or cigarettes if not available). Random-effects RRs for shortest compared to longest duration of quitting, selecting results least adjusted for other aspects of smoking, are significantly elevated for all lung cancer (3.97, 3.32-4.75, n = 65), squamous (6.22, 3.75-10.30, n = 14) and adeno (3.32, 1.98-5.58, n = 14). Alternatively selecting results most adjusted for other aspects of smoking, the RR for all lung cancer is 3.61 (3.04-4.28, n = 65). The increase in risk with shorter duration of quitting is consistent with the results of the key value analyses, with, for example, random-effects estimates relative to never smokers for adeno rising from 2.10 (1.49-2.94, n = 12) for categories including 12 years but not including 7 years to 6.73 (3.46-13.12, n = 6) for categories including 3, but not 7 years. A clear trend of risk with increasing duration of quitting is also seen for cigarette smoking (or any product if not available), and for cigarette only smoking (see Additional file 5: Detailed Analysis Tables). Data for pipe and cigar smoking were too limited for reliable conclusions.

Table 18 Meta-analyses for duration of quitting (vs. never smoked) a

K. Risk by duration of quitting (vs. current smoking)

For duration of quitting compared to current smoking the number of data sets available are somewhat less than the corresponding number for duration of quitting compared to never smoking. Results included in the longest vs. shortest analysis shown in Table 19 are generally the inverse of those in the shortest vs. longest analysis in Table 18 (exceptions arising for studies which combined current smokers and recent quitters of more than 2 years). While the key value analyses shown in Table 19 echo the trends shown in Table 18, they also show that for shorter term quitting (categories including 3 but not 7 years) there is no evidence of a decline in risk from quitting. Thus the RRs for all lung cancer (0.95, 0.84-1.08, n = 41) and adeno (1.02, 0.85-1.22, n = 6) are close to 1.00, and the RR for squamous (1.15, 1.03-1.28, n = 6) is slightly elevated. Longer quit durations are, however, clearly associated with a reduction in risk. For all lung cancer, almost 40% of the RRs used in the key value analyses included short-term quitters (of up to 2 years) in the current smoker base. No difference was seen between those RRs and those with a more precisely defined current smoker base.

Table 19 Meta-analyses for duration of quitting (vs. current smoking) a

L. Risk by tar level

Due to the variety of different methods of quantifying tar levels, only highest vs. lowest analyses have been carried out. No data were available by histological type, and all data relate to cigarette smoking. For all lung cancer and for ever/current smoking of cigarettes the 14 available estimates, from 9 studies, showed some evidence of heterogeneity (H = 2.29, p < 0.01). However, 12 of the estimates showed a higher risk in the higher tar group, and the random-effect estimate (1.42, 1.18-1.71) confirmed the relationship between risk and tar level. The increase was evident for males (1.29, 1.08-1.53, n = 7) and females (1.48, 1.05-2.09, n = 6). There was no evidence of heterogeneity by any specific characteristic, including extent of adjustment, 7 of the 14 estimates being adjusted for one or more of aspects of smoking. These results are based on RRs that are selected as being least adjusted for other aspects of smoking. Alternatively, using RRs selected as most adjusted for other aspects of smoking, the overall estimate was 1.34 (1.16-1.56, n = 14).

M. Risk by butt length and fraction smoked

All the available data relate to cigarette smoking. As the number of available estimates were quite limited, particularly for butt length, they have been combined into a single analysis including RRs for shortest vs. longest butt lengths and for greatest vs. smallest fraction smoked, and including results for ever smoking and current smoking. The combined estimates were 1.43 (1.14-1.79, n = 11) for all lung cancer, 1.39 (1.04-1.86, n = 7) for squamous, and 1.30 (1.07-1.58, n = 6) for adeno. There was some evidence of heterogeneity for all lung cancer (H = 2.29, p < 0.05) and for squamous (H = 2.96, p <0.01), though not for adeno (H = 0.75), but a clear majority (18/24 = 75.0%) of the estimates indicated a higher risk associated with smoking more of the cigarette.

N. Further analyses by histological type

The results so far have been restricted to all lung cancer, squamous or adeno. Table 20 gives results for ever, current and ever/current smoking of any product (or cigarettes if not available) for small cell carcinoma and large cell carcinoma, with corresponding results also shown for all lung cancer, squamous cell carcinoma and for adenocarcinoma. For ever/current smoking, the RR for large cell carcinoma (5.33, 4.02-7.07, n = 29) is quite similar to that for all lung cancer (5.48, 5.07-5.93, n = 342), while the RR for small cell carcinoma (11.14, 8.59-14.46, n = 61) is markedly higher, and similar to that for squamous cell carcinoma (11.62, 9.80-13.78, n = 82). This pattern is also true for current smoking, where RR estimates are higher than for ever/current smoking, and for ever smoking. Additional file 5: Detailed Analysis Tables gives results by level of the various characteristics studied. As for all lung cancer, squamous and adeno, RRs for small cell and large cell carcinoma varied substantially by location, with RRs much higher in North America than in China, and no clear pattern for the other regions, some of which have sparse data. There was also a tendency for RRs to be higher where there was 100% histological confirmation. For ever/current smoking RRs and for small cell carcinoma, the RRs were 9.84 (7.19-13.45, n = 42) without such confirmation, and 14.62 (9.38-22.80, n = 19) with it (p < 0.01). For large cell carcinoma, the corresponding RRs were 3.90 (2.90-5.24, n = 19) without confirmation and 8.28 (5.89-11.65, n = 10) with it (p < 0.01). There was also some evidence for small cell carcinoma only that RRs were higher from more recent studies.

Table 20 Meta-analyses for additional lung cancer types (all lung cancer) a

O. Further analyses based on independent pairs of relative risks

Some studies provide independent RRs for males and females for the same definition of outcome and exposure. Random-effects meta-analysis of the male/female sex ratio confirms the impression already gained from the analyses shown in earlier Tables that RRs tend to be somewhat higher for males, although estimates are heterogeneous. For ever/current smoking, the sex ratio is 1.38 (1.23-1.54) for all lung cancer, based on 93 ratios, 64 higher in males; 1.31 (0.91-1.90) for squamous, based on 30 ratios, 18 higher in males, and 1.43 (1.14-1.78) for adeno, based on 33 ratios, 27 higher in males.

As sex differences may reflect greater cigarette consumption in males, meta-analysis estimates of the sex ratio for ever/current smokers and for all lung cancer were also calculated within levels of amount smoked (as defined in section G). The sex ratio is 1.33 (1.05-1.68) for smokers of about 5 cigs/day, based on 46 ratios, 26 higher in males, 1.59 (1.25-2.01) for smokers of about 20 cigs/day, based on 25 ratios, 20 higher in males, and 1.21 (0.99-1.49) for smokers of about 45 cigs/day, based on 26 ratios, 17 higher in males.

A number of studies provide RR estimates for ever/current smoking separately by age, and random-effects meta-analysis were conducted, based on the ratio of the estimate for the oldest age group for which data were available compared to that for the youngest. Despite only 22 of the 45 (48.9%) of the ratios showing a greater risk in the oldest age group, the meta-analysis showed a significantly higher risk in the oldest age group (ratio 1.17, 95% CI 1.10-1.25), the seven ratios with most weight all being greater than 1.0.

There were also eight studies, all conducted in the US, which provide comparable sex-specific results for ever/current smoking separately for white people and black people (or non-white people). Random-effects meta-analyses of the white/black race ratio showed no difference between the races (1.05, 0.90-1.23, n = 14).

P. Further analyses based on non-independent pairs of relative risks

Some studies also provide separate non-independent least-adjusted and most-adjusted RRs for the same definition of exposure. There is little evidence that adjustment reduces the RR for ever/current smoking. Using the same preferences as in Table 11, the most-adjusted estimate is lower than the least-adjusted estimate for 57 of the 126 (45.2%) pairs for all lung cancer, for 14 of the 36 (38.9%) pairs for squamous, and for 21 of the 41 (51.2%) pairs for adeno. In no case do the percentages differ from 50% (at p < 0.05), and in each case the random-effects meta-analysis estimate based on the most-adjusted pair members is similar to the corresponding estimate based on the least-adjusted pair members (data not shown).

RRs for a dose-related index of smoking may be adjusted for other such indices. For all lung cancer, and for four dose-related indices of smoking, pairs of otherwise similar highest vs lowest RRs were identified in which one of the pair was adjusted for the most available other aspects of smoking, and the other had no such adjustment. Both were also chosen as adjusted for the most possible other variables (although those other variables may differ between the pair). There was a clear tendency for the additional adjustment for other aspects of smoking, typically including amount smoked, to produce lower RR estimates. This was true for 18/22 (81.8%, p < 0.01) of the pairs of estimates for age of starting to smoke, 12/15 (80.0%, p < 0.05) of the pairs for duration of smoking, all 17 (100%, p < 0.001) of those for years quit, and 5/7 (71.4%, NS) of those for tar level.

Based on results for ever/current smoking and for all lung cancer, RRs for mixed smokers were compared with those for smokers of cigarettes only. For 22 of the 34 (64.7%) pairs, the RR was lower for mixed smokers, but this tendency was not significant (p = 0.12). RRs for mixed smokers were also compared with those for smokers of pipes/cigars only. Here 23 of the 24 (95.8%, p < 0.001) pairs showed a lower risk in the smokers of pipes/cigars only.

Q. Publication bias

Some results of Egger’s test [17] for publication bias are presented in Tables 5, 8 and 12, with further results given in Additional file 5: Detailed Analysis Tables, but have not previously been referred to in the text. For ever smoking there is evidence of publication bias for all lung cancer (p < 0.001) and adeno (p < 0.01), but not for squamous (p ≥ 0.1). For current smoking, some evidence of publication bias is seen for all lung cancer (p < 0.05), but not for squamous or adeno (p ≥ 0.1). For ex smoking, there is again evidence of bias for all lung cancer and for adeno (p < 0.001) but not for squamous. Figure 24 (all lung cancer), Figure 25 (squamous) and Figure 26 (adeno) show funnel plots for ever smoking. Where asymmetry is seen, this in the direction of there being more higher-weight RRs above the mean. This is consistent with the evidence in Table 5 of higher RRs for larger studies. Inspection of a funnel plot for ex-smoking for all lung cancer (data not shown) also showed the high weight RRs tended to be above the mean.

Figure 24
figure 24

Funnel plot for ever smoking and all lung cancer. Funnel plot of the 328 relative risk estimates for ever smoking and all lung cancer included in the main meta-analysis in Table 5 against their weight (inverse-variance of log RR). The dotted vertical line indicates the fixed-effect meta-analysis estimate.

Figure 25
figure 25

Funnel plot for ever smoking and squamous. Funnel plot of the 102 relative risk estimates for ever smoking and squamous included in the main meta-analysis in Table 5 against their weight (inverse-variance of log RR). The dotted vertical line indicates the fixed-effect meta-analysis estimate.

Figure 26
figure 26

Funnel plot for ever smoking and adeno. Funnel plot of the 107 relative risk estimates for ever smoking and adeno included in the main meta-analysis in Table 5 against their weight (inverse-variance of log RR). The dotted vertical line indicates the fixed-effect meta-analysis estimate.

Discussion

Evidence of a relationship

The meta-analyses carried out demonstrate a clear relationship of smoking to overall lung cancer risk. This is evident for ever, current and ex smoking, for pipes and cigars, and for all types of cigarette studied. The increased risk in smokers is evident in both sexes, in younger and older subjects, in all continents studied and in prospective and case–control studies. That this relationship is causal is supported by the evidence of a dose–response, risk increasing with increasing amount smoked, duration of smoking, tar level and fraction smoked, and with earlier age of starting to smoke, and decreasing with duration of quitting. It is also supported by the similarity of results based on most-adjusted and least-adjusted RRs (though adjustment for amount smoked reduces the association with other dose–response indices of smoking). The association is clearly evident with each of the major histological types of lung cancer studied, being stronger for squamous and small cell carcinoma, intermediate for large cell carcinoma, and weakest for adenocarcinoma. Exceptionally, no relationship is seen between adenocarcinoma and pipe or cigar smoking.

Heterogeneity

The studies are remarkably consistent in reporting an increased risk in ever smokers. Only two of the 328 all lung cancer RRs, none of the 102 squamous RRs, and nine of the 107 adeno RRs considered in Figures 1, 2, 3, 4, 5, 6, 7, 8, 9 are less than 1.0. However, studies also vary markedly in the magnitude of the estimated RR, as illustrated by the high values of H seen in the meta-analysis of the major smoking indices, which often exceed 5 and sometimes exceed 20. (H values of 5, 10 and 20 are the same as I2 values [16] of 80%, 90% and 95%). This heterogeneity is perhaps unsurprising given the many sources of variation involved, including sex, location, timing, study design and populations, definition of outcome and type of product smoked, and extent of confounder adjustment.

Using univariable and multivariable (meta-regression) methods, we investigated variation in risk by a number of characteristics of the study and the RR for the outcomes all lung cancer, squamous and adeno. While our “fixed” multivariable models involving six characteristics (sex, location, start year of study, study type, number of cases and number of adjustment factors) explained a substantial proportion of the variation (e.g. reducing H from 22.84 to 4.72 for all lung cancer for ever smoking), there was always substantial residual heterogeneity (with H varying from 2.43 to 4.72 in the six analyses in Tables 7 and 10). Of the six characteristics studied, location was generally the most important characteristic, with RR estimates for ever and for current smoking and for all three outcomes always highest in North America, and lowest in China, and (with the exception of ever smoking for squamous) lower in the rest of Asia than in Europe, with no consistent differences seen between results for the United Kingdom, Scandinavia and the rest of Europe. Another consistently seen relationship was the tendency for RRs to vary by start year of study, with higher RRs seen in more recent studies. Three other tendencies were generally seen, though the level of significance varied according to the analysis. One was the tendency for RRs to vary by number of cases, with the lowest estimates always seen for the smaller studies, (involving 100 to 249 cases), another was the tendency for RRs to be higher in prospective studies than in case–control studies, and the third was the tendency for RRs to be somewhat higher in males than females. The final characteristic included in the fixed model, number of adjustment factors, showed no clear relationship with the RR, with significance either not present or weak (0.01 < p < 0.05), and the direction of effect inconsistent.

We also tested for the effect of a number of other characteristics on the estimated RR. A number of relationships were seen in the univariable models that were significant. However, these mainly became non-significant in the multivariable models, presumably due to correlations between the characteristics. Where a characteristic was significant, this tended to be only in one of the six analyses, so not providing convincing evidence of a true effect. It would have been possible, for each of the six combinations of smoking status and outcome we considered, to present analyses of “best” models, based on forward stepwise regression, that each included a different set of predictive characteristics. However we felt that the regressions we presented based on a fixed model were more useful. Sources of variation are discussed further in the following paragraphs.

Sex

If possible, sex-specific results are included in the meta-analyses, with combined sex results included only if not. Though variation by sex was not significant in all the main analyses, risk estimates generally tended to be higher for males than females. This is supported by additional analyses comparing RRs within study for the same outcome and exposure definition. Somewhat higher RRs were found in males even in analyses where comparisons were made within the same levels of daily cigarette consumption (about 5, 20 or 45 cigs/day). Even so, the existence of somewhat higher RRs for males does not necessarily indicate any greater susceptibility, as it may reflect their increased exposure to occupational carcinogens, or other differences in smoking history such as greater duration of smoking or increased use of plain and higher tar cigarettes. It should be noted however that in prospective studies where smoking habits were determined at baseline, the greater tendency of males to quit during follow-up may cause bias in the reverse direction. It should also be noted that comparison of smoker/never smoker RRs for men and women does not take account of possible differences in risk between male and female never smokers, the base groups for these comparisons. A detailed overall assessment of this aspect is beyond the scope of this paper, and ideally would involve direct comparison of risk in male and female smokers, with detailed adjustment for age, smoking characteristics and major potential confounding variables. We note that Bain et al. [18] concluded, based on analysis of two large prospective studies and review of results from six other such studies, that “women do not appear to have a greater susceptibility to lung cancer than men, given equal smoking exposure”.

Age

While it is clear that absolute risk of lung cancer rises markedly with age, both in smokers and never smokers, it is far less clear whether the smoker/never smoker RR also does. Predictions based on the multistage model [19] suggest that there should be a modest rise, but there is difficulty in establishing this, especially when the great majority of the studies do not give results by age. Possible effects of age were investigated in two ways. The first method (see Tables 6 and 9) was to compare RRs which were specific to subjects in specific age groups. Data here were limited for squamous and adeno, and for all lung cancer suggested a possible increase in RR with age for current smoking, but not for ever smoking. More reliable are the comparisons (described in results section O), of RRs for the highest and lowest age groups within study for ever/current smoking; between-study differences are automatically controlled for under this approach. These showed a 17% greater risk for the highest age group (95% CI 10% to 25%). Whether or not a RR was adjusted for age was considered as a characteristic in the meta-regression analyses, but it never added significantly to the fixed model for either ever or current smoking for any of the three outcomes.

Race

Although RRs were entered onto the database, if available, there were few studies that provided such data. For eight studies which provided pairs of comparable RRs for ever/current smoking, there was no indication that RRs for white people differed systematically from those for black people (or non-white people). This, of course, does not rule out the possibility that absolute risks for white people and black people with similar smoking habits may differ. As our concern was only with RRs for smoking, and whether these vary by other characteristics, we have not attempted to collect data comparing absolute risk according to these characteristics, such as white/black RRs within never smokers, or within smokers. Detailed analysis and discussion of racial differences in lung cancer risk between black people and white people is therefore beyond the scope of this paper. Elsewhere Lee [20] points out that in the USA black men have a higher risk of lung cancer than do white men. However, interpretation of this difference in terms of effects of smoking is not straightforward for various reasons. Thus Lee notes that though black people are more often current smokers, are less likely to quit smoking, smoke cigarettes with a higher tar level, and have higher cotinine levels, all characteristics predictive of a higher risk of lung cancer, they are also less likely to have ever smoked, smoke fewer cigarettes a day and start to smoke later, all characteristics predictive of a lower risk. Also little or no difference in lung cancer rate is seen between black and white women. Black people are much more likely than white people to use mentholated cigarettes, but no evidence of a difference in lung cancer risk associated with mentholation was found, either in the present analysis or in other reviews [20, 21].

Location and national cigarette tobacco type

A consistent tendency in our meta-analyses was for RRs to be highest in studies in North America, intermediate in Europe and lowest in Asia, particularly in China. There was no very clear evidence of a difference between European countries, or between other countries in Asia, though some of the analyses suggested relatively lower RRs in Greece and Turkey than in the rest of Europe, and higher RRs in India than in the rest of Asia. In an attempt to study a possible explanation for this difference we divided countries into three groups by national cigarette tobacco type. One was the countries (Australia, Canada, India, South Africa, UK and Zimbabwe) which typically use flue-cured Virginia tobacco, another was the countries (all except those in the other two groups) which typically use blended tobacco, and the third included Taiwan and China (countries which used both types quite commonly or where we lacked confirmed information). Including this variable into the meta-analyses did not consistently improve the prediction of our model, a finding which is consistent with the conclusions of other analyses we have conducted based on national data on lung cancer rates and smoking frequency [22]. There are, of course other possible explanations of the clear differences in lung cancer RRs between continents, including genetic differences, and differences in baseline rates of the disease.

Study timing

Our meta-regressions generally showed a tendency for RRs to be lower in studies which started earlier. There may be a number of reasons for this, such as changes in the relative use of cigarettes and pipes or cigars, and improvement of study quality, with better standardization of questionnaires and definition of products smoked. However we consider the most plausible reason to be changes in patterns of uptake of smoking, with smokers in earlier born cohorts being less likely to have a lengthy smoking career than smokers in later born cohorts.

Study type

Though this was only clearly significant in the analyses of ever smoking for all lung cancer, there was a consistent tendency for RRs to be somewhat higher from prospective studies than from case–control studies. If this is a true effect, the explanation for it is unclear.

Number of cases

In order to limit the considerable amount of work needed, we limited attention to studies involving at least 100 lung cancer cases. Given that smaller studies would have contributed much less weight to the meta-analyses than would the studies that were included, we consider that this restriction unlikely to have any material effect on our conclusions. The meta-regression analyses did show a consistent tendency for RRs to be higher in larger studies, though this was only significant for ever smoking (all lung cancer p < 0.001), squamous and adeno p < 0.05). This tendency is in the opposite direction to that predicted from publication bias. The explanation is unclear.

Adjustment for other factors

Generally our analyses showed that adjustment for age and other factors had very little effect on the meta-analysis estimates of smoking-related RR, whether one considered the total number of adjustment factors, or the effect of specific factors. This conclusion of a minimal effect of confounding is consistent with that of a detailed analysis of data from the huge CPSII prospective study [23], and means that though the main results we report are based on most-adjusted estimates, this decision had little or no effect on our conclusions or on the magnitude of our estimates.

Adjustment for other aspects of smoking is, however, important when considering the dose-related variables. Though studies rarely, if ever, present results to allow detailed analysis of the effect of adjustment for one specific aspect of smoking on RRs for another aspect, we have shown that adjustment for other aspects of smoking (which typically includes amount smoked) consistently tends to reduce associations with age of starting to smoke, duration of smoking, years quit and tar level. This is presumably due to the tendency for earlier starters and high tar smokers to smoke more heavily than do later starters and low tar smokers, and for lighter smokers to be more ready to quit smoking. Below, we further discuss the effect of adjustment on results for type of cigarette.

Product smoked

There was consistent evidence that risk of lung cancer was higher for cigarette only smokers than for smokers of any product, and substantially higher than for smokers of pipes only, cigars only or pipes/cigars only. For current smokers, for example, RRs were 9.57 (7.90-11.59) for cigarettes only, as compared to 4.76 (3.44-6.59) for pipes/cigars only. Mixed smokers tended to have similar risks to cigarette only smokers. Interpretation of this finding is difficult as mixed smokers and cigarette only smokers may have a different total exposure to tobacco, as well as a different cigarette consumption. Data on the types of cigars or pipes smoked have not been recorded on the database, but the increased risk is evident in each continent. The results for pipes and cigars mainly apply to males and to RRs for all lung cancer. Though there are only limited results by histological type, it is interesting that there is no indication of an increased risk of adenocarcinoma for pipe and cigar smokers.

Type of cigarette smoked

The conclusions drawn from the results in Table 14 are consistent with those drawn by one of us in a review of the relationship between lung cancer and type of cigarette conducted in 2001 [24]. This is unsurprising, because the data sets considered are very similar. The conclusions are also very similar to those of a review by Kabat carried out in 2003 [25].

Comparisons between filter and plain smoking are made more difficult by the variety of ways in which different reports present their results, but based on the index most closely equivalent to only filter vs. only plain, the present report shows a reduction in risk that is significant for all lung cancer (0.69, 95% CI 0.61-0.78) and for squamous (0.52, 0.40-0.68), though not for adeno (0.84, 0.66-1.08). Significant reductions in risk for all lung cancer and squamous, but not for adeno were also evident for the alternative comparisons ever filter vs. only plain, and only filter vs. ever plain. Our analyses were based on most-adjusted RR estimates, with many of the estimates adjusted for other aspects of smoking, such as number of cigarettes smoked. In 2001, a National Cancer Institute monograph [26] claimed that apparent benefits of filter vs. plain and of low tar vs. high tar cigarettes may be illusory if RRs are adjusted for daily consumption, as switching to cigarettes with a lower machine-smoked delivery of tar and nicotine leads to “compensation” for the reduced nicotine intake by increasing numbers of cigarettes smoked. Lee and Sanders [27] investigated this claim in detail by comparing RRs for all lung cancer adjusted and unadjusted specifically for daily cigarette consumption, and concluded that “whether or not relative risk estimates are adjusted for cigarette consumption is not crucial to the conclusion of a clear advantage to filter cigarettes and tar reduction”. This analysis is more precise than that used in this report, but its conclusions are similar, as we also found adjustment not to affect our overall conclusion that filter vs. plain cigarette smoking was associated with a lower risk of all lung cancer and of squamous. It should be noted that although no significant reduction in risk for filter cigarette smoking was seen for adeno, there was also no evidence of an increase. This would seem to argue against the claim often made that the observed rise over time in the incidence of adenocarcinoma relative to squamous cell carcinoma seen in many countries is due to changes in cigarette design increasing the risk of smoking-related adenocarcinoma. In this context, it should be noted that though our database contains evidence by histological type for filter vs. plain cigarette smoking, no such data were found relating to tar level.

Our conclusions of a higher RR in handrolled vs. manufactured cigarette smokers is consistent with that of the 2001 review [24], with the increased risk evident, despite the limited amount of data, for squamous and adeno as well as for all lung cancer.

Our review also found no difference in risk between smokers of mentholated and non mentholated cigarette smokers, though based on data from only three studies, only one of which provided results by histological type. Though no more recent studies have reported results by histological type, five further studies have reported results for all lung cancer, and a recently published systematic review [20] confirms the lack of apparent effect of cigarette mentholation on the lung carcinogenicity of cigarettes.

Dose–response relationships

We have investigated the relationship of lung cancer risk to various indices of the dose–response relationship. We did not record data on our database for pack-years, as we wished to investigate the separate roles of daily amount smoked and duration of smoking. Indeed, previous work (e.g. [19, 28]) has in fact suggested that pack-years is not a valid measure, as for example, smokers of 20 cigs/day for 40 years and smokers of 40 cigs/day for 20 years have very different smoking RRs despite their identical pack-years. For those indices that we did consider where there were substantial amounts of data – daily amount smoked, duration, age of starting to smoke, and time of quit (relative both to current smoking and to never smoking) – there was very clear evidence that greater exposure leads to greater risk, not only for all lung cancer, but also for squamous and adeno. The results by time of quit extend the observation that RRs in ex smokers are intermediate between those of never smokers and current smokers. Because dose–response results are expressed in categories of exposure which vary from study to study, there are difficulties in combining the evidence over studies. We have used two approaches. One is to consider the RR for the highest vs. lowest level of exposure (where highest and lowest refer to expected risk, so that early ages of starting, for example, are considered highest). The other is the key value approach where we consider categories including a specified level of exposure and not including another specified level. Both approaches have limitations. The highest vs. lowest approach will vary between study in the ratio of exposures considered, while the key value approach, although combining results relating to different exposures in different studies to a lesser extent, necessarily omits results from studies with broader categories while somewhat arbitrarily selecting or discarding RRs from studies with narrow categories. Work is ongoing on a third approach to fit a dose–response curve to the RRs and estimated dose mid-points of the categories for each study. This approach is complex, and was considered outside the scope of the current paper, which was more intended to summarize major features of the data. However, a future paper is planned which will describe the shape of these dose–response relationships including characteristics of the curves, such as the estimated time after quitting by which half the excess risk associated with continued smoking has disappeared. We note that, when considering RR for time of quitting, the problem of “reverse causation” needs to be taken into account, as evidenced by the data in Table 19 showing no decrease in risk compared to current smokers for quitters of about 3 years. Our analyses also showed that for all lung cancer, risk increased with increasing tar level and with increasing fraction smoked (or equivalently short butt length), data here being more limited and non existent by histological type. As noted earlier, when discussing cigarette type, the relationship with tar level is not an artefact of inappropriate adjustment for amount smoked [27], as has been claimed [26].

Derivation of RRs

Almost a third of RRs used in meta-analyses were not directly available from the source or calculated directly from cross-tables of exposure by outcome, and required more complex methods to derive the required RR. It was reassuring that whether or not the RR was derived did not (with one minor exception) add predictive power to the main meta-regression models, suggesting that our extensive use of derived RRs caused no material bias.

Effect of studies with high RRs or large weight

The statistical analyses investigated the role of various characteristics on the estimated risk of all lung cancer, squamous and adeno in relation to ever and current smoking, but generally did not formally test the effect of exclusion of specific studies with extreme RRs or large weights. An exception was the case of study LIU4 for ever smoking and all lung cancer, this study not giving data for current smoking or by histological type. The two sex-specific RRs for this study together contributed 50.9% of the weight for the 328 available RRs from all the studies, and its exclusion increased the overall fixed-effect RR from 4.22 (95% CI 4.16-4.28) to 6.47 (95% 6.34-6.60). However there was little difference in the random-effects estimates, and in the meta-regression analysis the two LIU4 RRs did not produce unusual standardized residuals, suggesting that the relatively low RRs from this study (2.76, 2.69-2.83 for males, and 2.86, 2.77-2.95 for females), were due to the characteristics of the study included in the model (in particular that it was conducted in China) and not due to its unusual results. While there are other large studies, none involved nearly as many lung cancer cases as LIU4, and we feel it unlikely that excluding other specific studies would have had a major effect on our meta-analysis estimates or on our conclusions as to how RRs varied by exposure, outcome and study and RR characteristics.

Representativeness

We did not exclude studies on the basis of the population studied. However, most studies include subjects broadly representative of the general population. A small number of studies were conducted in miners or in other occupations with a known or suspected lung cancer risk, such as welding or foundry working. Risky occupation was considered as a characteristic in the meta-regression models but was never found to be an independent predictor of RRs associated with ever or current smoking.

Publication bias

It is well known that researchers are more likely to wish to publish, and editors more likely to accept for publication, studies finding a statistically significant association between exposure and disease. The published literature may therefore overstate any true association or produce a false-positive relationship. As part of each meta-analysis we have carried out Egger’s test of publication bias, though results are generally shown only in the detailed tables. While evidence for such bias generally is mixed, the results for all lung cancer suggest that, where significant bias is seen, it is not in the direction of smaller studies with lower-weight RRs producing higher RRs. Rather it is, as noted above, the larger studies that tend to produce higher RRs. The reason for this finding is unclear. It should also be noted that our analyses are based only on those studies satisfying the inclusion criteria, and that one of these criteria restricted attention to studies with at least 100 lung cancer cases.

We have not attempted to try to correct for publication bias for four reasons. Firstly, we feel that evidence for its existence is not strong. Second, any adjustment for it seems unlikely to affect our main conclusions. Third, any adjustment for it would be complicated by the restriction on study size. Finally, any correction for publication bias would be open to question, as it inevitably involves assumptions that are impossible to verify.

Bias due to misclassification of smoking status

Another source of bias is misclassification of smoking status. Random misclassification would dilute the association, as would any tendency for cases to deny or understate their smoking more than for the general population. Any tendency for current smokers to claim to be ex-smokers, as might happen in a study conducted in a clinical setting or where patients have been advised to stop smoking, would tend to inflate the risk for ex smoking. Adjustment for misclassification would be difficult, as denial rates are likely to vary by aspects of the study design, the way questions are asked, and also by sex, age, location and other demographic variables.

Limitations

This review has various limitations, many unavoidable. Lack of access to individual subject data limits the ability to carry out meta-analyses using similar exposure indices and confounder adjustment throughout, but obtaining such data was not feasible given many studies were conducted years ago. Obtaining a reliable definition of outcome and exposure is often hindered by incomplete information in the source papers. We do not consider that limiting attention to studies of 100 cases or more is of particular importance as results from smaller studies would contribute little weight to the overall meta-analyses. Limiting attention to studies conducted up to 1999 may be more relevant for some exposures and issues (particularly the trend in RR over time), though we feel that our consideration of data from 287 published studies should give a very reliable overall picture. The problem is that the procedures conducted for this review were extremely time-consuming and it would take some years to update the database and include smaller and more recent studies.

It may also be argued that the analyses presented here do not make full use of all the data collected. This is inevitable, given the extensive amount of information collected and the need to present the findings in a paper of reasonable length. As noted, when discussing dose–response, we do plan further analyses. We would also be willing to make the database available to bona fide researchers for further analysis.

Conclusions

After excluding studies involving less than 100 lung cancer cases, we identified 287 epidemiological studies of lung cancer which provided information on risk in relation to one or more of a defined list of smoking indices [2, 3, 6, 29689]. Of the 267 independent principal studies, 262 provided RRs relating to all lung cancer, 84 provided RRs relating to squamous cell carcinoma, and 86 provided RRs relating to adenocarcinoma (or to outcomes that are closely equivalent). One major conclusion is that for each outcome the RRs for all major smoking indices were markedly heterogeneous.

Another conclusion is that RR estimates for ever, current or ex smoking of any product (or cigarettes if not available) are clearly elevated for all three outcomes. Individual study RRs virtually all exceed 1.0, and based on random-effects meta-analyses of most-adjusted RRs, increases were seen for ever smoking (all lung cancer 5.50, CI 5.07-5.96, n = 328 RRs; squamous, 10.47, 8.88-12.33, n = 102; adeno 2.84, 2.41-3.35, n = 107), current smoking (all lung cancer 8.43, 7.63-9.31; squamous 16.91, 13.14-21.76; adeno 4.21, 3.32-5.34) and ex smoking (all lung cancer 4.30, 3.93-4.71; squamous 8.74, 6.94-11.01; adeno 2.85, 2.20-3.70). For all lung cancer, RRs were also elevated for cigarettes only smokers (ever smoking 6.36, 5.33-7.59) and mixed smokers of cigarettes and pipes/cigars (7.37, 5.97-9.11), though lower for smokers of pipes/cigars only (2.92, 2.38-3.57), pipes only (3.31, 2.51-4.35) and cigars only (2.95, 1.91-4.56). While pipe and cigar smoking is associated with an increased risk for squamous, there is no increase for adeno. The consistency and strength of the relationships are consistent with a causal relationship (except for pipe and cigar smoking and adenocarcinoma). A causal relationship is also supported by the fact that estimates are generally not materially affected by adjustment for confounding variables, and by the strong evidence of a dose–response relationship, with RRs for all outcomes clearly increasing with amount smoked, duration and earlier starting age, and decreasing with time quit, and for all lung cancer increasing with tar level and fraction smoked. Relationships were also clearly seen between smoking and RRs for the other major histological types, small cell carcinoma and large cell carcinoma.

Our review also provides evidence that risk varied by type of cigarette smoked, with filter cigarette smokers having lower risks than plain cigarette smokers (a conclusion not explained by “over-adjustment” for amount smoked), and that handrolled cigarette smokers have higher risks than manufactured cigarette smokers, though mentholation of cigarettes seems unrelated to risk. It also shows that various characteristics of the study and of the RR affect risk estimates. Thus RRs were generally highest for studies in North America and lowest for Asia, particularly in China, and higher in later starting, larger and prospective studies. RRs were also somewhat higher in males than in females, though this may be related to differences in their detailed smoking habits. There is no clear tendency for the smoking/lung cancer relationship to vary with age.

This comprehensive review provides further insight into the relationship of smoking to lung cancer and its major histological types.