Graphical comparisons of relative disease burden across multiple risk factors
Abstract
Background
Population attributable fractions (PAF) measure the proportion of disease prevalence that would be avoided in a hypothetical population, similar to the population of interest, but where a particular risk factor is eliminated. They are extensively used in epidemiology to quantify and compare disease burden due to various risk factors, and directly influence public policy regarding possible health interventions. In contrast to individual specific metrics such as relative risks and odds ratios, attributable fractions depend jointly on both risk factor prevalence and relative risk. The relative contributions of these two components is important, and usually needs to be presented in summary tables that are presented together with the attributable fraction calculation. However, representing PAF in an accessible graphical format, that captures both prevalence and relative risk, may assist interpretation.
Methods
Taylor-series approximations to PAF in terms of risk factor prevalence and log-odds ratio are derived that facilitate simultaneous representation of PAF, risk factor prevalence and risk-factor/disease log-odds ratios on a single co-ordinate axis. Methods are developed for binary, multi-category and continuous exposure variables.
Results
The methods are demonstrated using INTERSTROKE, a large international case control dataset focused on risk factors for stroke.
Conclusions
The described methods could be used as a complement to tables summarizing prevalence, odds ratios and PAF, and may convey the same information in a more intuitive and visually appealing manner. The suggested nomogram can also be used to visually estimate the effects of health interventions which only partially reduce risk factor prevalence. Finally, in the binary risk factor case, the approximations can also be used to quickly convert logistic regression coefficients for a risk factor into approximate PAFs.
Abbreviations
- OR
Odds Ratio
- PAF
Population attributable fraction
- RR
Relative risk
Background
Attributable fractions [1] have become a common way of measuring the disease burden attributable to a risk factor on a population level. More precisely, they measure that portion of disease prevalence which would be avoided in a hypothetical population where a particular risk factor was entirely eliminated, but is otherwise identical to the population of interest. Depending on the author, this quantity is referred to variously as a population attributable fraction (PAF), population attributable risk and excess fraction, although it has been given many other names [2, 3]. Such metrics are commonly reported (and misinterpreted) by the media, and often given erroneous interpretations. To clarify confusion, Greenland and Robbins distinguish PAF from ‘etiologic fractions’ that truly represent the proportion of disease prevalence that is caused by a particular risk factor [4], a quantity that can only be estimated under certain conditions Despite this misinterpretation, the attention garnered by PAF calculations signify their importance in both informing public policy regarding appropriate disease interventions and their power to influence public perception about what might and might not be healthy behaviour, or healthy levels of physiologic measures such as blood pressure.
Often, attributable fractions and their possible generalizations [5, 6] are used to rank the importance of the various risk factors that are involved in disease pathogenesis. As an example, we used attributable fractions to quantify and compare disease burden due to major stroke risk factors [7]; the analysis indicating that high blood pressure, physical inactivity and apolipoprotein levels were the most important risk factors contributing to stroke on a population level. Here, we use these same data to demonstrate an alternative and complementary graphical comparison of the importance of the risk factors under consideration. The suggested plots allow a quick visual assessment of the relative attributable fractions for differing risk factors, as well as risk factor prevalence and disease/risk factor odds ratios. The plots utilize approximations that facilitate graphical representation of PAF and impact fractions in terms of prevalence and Odds Ratio. In addition, the approximations can be used as a rule of thumb to quickly convert logistic regression coefficients into attributable fractions. Extensions of the methods to multi-category and continuous risk factors are also suggested.
Methods
Definition and previous estimators for PAF (binary exposures)
We first define PAF and possible estimators assuming a binary disease indicator, Y, and a binary risk factor (or synonymously binary disease exposure), A. We also state some approximations that will be used in the suggested plots, leaving their justification to the Additional file 1. While many authors have defined PAF using conditional probabilities for Y given A, attributable fractions are causal concepts and deserve a causal definition. With this in mind, we adopt a counterfactual notation, [8], where the pair (Y^{a = 0}, Y^{a = 1}) denotes the potential (or counterfactual) binary disease outcomes for an individual under the two scenarios that that they were exposed to the risk factor A (a = 1), and that they were not exposed to the risk factor A (a = 0). One interpretation of the pair (Y^{a = 0}, Y^{a = 1}) is that they are the disease outcomes that would be observed for that individual in two almost identical universes, which differ only according to whether that individual was exposed to the risk factor, and in the possible consequences of this exposure. In the situation that (Y^{a = 0}, Y^{a = 1}) = (0, 1), the risk factor, A, has is regarded as having a causal effect on disease for that individual . In reality, we observe either Y^{a = 0} or Y^{a = 1}, but not both, as every individual (at least at a point in time) is either exposed or unexposed to A.
Definitions, assumptions and approximations for PAF when the exposure is binary, multi-category and logistic
Binary | Multicategory | Continuous | |
---|---|---|---|
Counterfactual definition of PAF | \( \frac{P\left(Y=1\right)-P\left({Y}^{a=0}=1\right)}{P\left(Y=1\right)} \) | \( \frac{P\left(Y=1\right)-P\left({Y}^{a=0}=1\right)}{P\left(Y=1\right)} \) | \( \frac{P\left(Y=1\right)-P\left({Y}^{a={j}_0}=1\right)}{P\left(Y=1\right)} \) |
Assumptions: | 1. Standard causal inference assumptions • Conditional exchangeability (counterfactual outcome Y^{a = j} and assigned risk factor A are independent random variables, within strata of observed confounders c • Consistency of counterfactuals: Y^{a = j} = Y when A = j for all levels j of the risk factor A • Positivity 0 < P(Y^{a = j} = 1| C = c) < 1 for all j and strata c 2. No interactions (P(Y^{a = j} = 1| C = c)/P(Y^{a = k} = 1| C = c) does not depend on c), for any possible values of exposure j and k 3. Rare disease assumption (P(Y = 1) small) | ||
Re-expression of PAF (given assumptions 1. and 2.) | P(A = 1| Y = 1)(RR − 1)/RR | \( \sum \limits_{j=1}^KP\left(A=j|Y=1\right)\left(R{R}_j-1\right)/R{R}_j \)** | \( {\int}_{-\infty}^{\infty }f\left(j|1\right)\frac{RR(j)-1}{RR(j)} dj \) ** |
^{a}Corresponding logistic model (Given assumption 3.) | logit(P(Y = 1| A = j, C = c)) =μ + β_{j} + γ(c) | logit(P(Y = 1| A = j, C = c)) = μ + β_{j} + γ(c) | logit(P(Y = 1| A = j, C = c)) = μ + β(j) + γ(c) |
Logistic Approximation for PAF (Given assumptions 1,2 and 3) | \( \frac{\hat{P\left(A=1|Y=1\right)}\left({e}^{\hat{\beta_1}}-1\right)}{e^{\hat{\beta_1}}} \) | \( \sum \limits_{j=1}^K\hat{P}\left(A=j|Y=1\right)\left({e}^{\hat{\beta_j}}-1\right)/{e}^{\hat{\beta_j}} \) | \( {\int}_{-\infty}^{\infty}\hat{f}\left(j|1\right)\left({e}^{\hat{\beta (j)}}-1\right)/{e}^{\hat{\beta (j)}} dj \)*** |
Graphical Approximation | \( \hat{P\left(A=1|Y=0\right)}\times {\hat{\ \beta}}^{ave} \) | \( \hat{P}\left(A>0|Y=0\right)\times {\hat{\ \beta}}^{ave} \) | \( 1\times {\hat{\beta}}^{ave} \)**** |
“Average” estimated log-odds ratio: \( {\hat{\beta}}^{ave} \) | \( \hat{\beta_1} \) | \( \frac{\sum \limits_{j=1}^K\hat{P}\left(A=j|Y=0\right)\hat{\beta_j}}{1-\hat{P}\left(A=0|Y=0\right)} \) | \( {\int}_{-\infty}^{\infty}\hat{f}\left(j|0\right)\hat{\beta (j)} dj \) |
Note that under the same conditions other estimable expressions for E1 do exist (see (3)), but E2, an expression that was first derived in [9], has the added attraction of estimability in case-control studies. A short proof of the equality of E1 and E2 under these assumptions is provided for convenience in the Additional file 1, but similar results have been proven already elsewhere [17, 18].
Definition of PAF for multicategory and continuous exposures
Results
Application of approximations on INTERSTROKE
Illustration of the approximations on the INTERSTROKE dataset. For binary risk factors, \( {\hat{\beta}}^{ave}=\mathit{\log}\left(\hat{OR}\right) \), for multicategory risk factors \( {\hat{\beta}}^{ave} \) is a kind of weighted average log odds ratio summarizing the increase in risk of non-reference levels of the risk factor compared to the reference level. Confidence intervals for exact PAF are given at 99% level and calculated using Bootstrap
Risk factor | \( {\hat{\beta}}^{ave}\sim \mathit{\log}\left(\hat{OR}\right) \) | \( {e}^{{\hat{\beta}}^{ave}}\sim \hat{OR} \) | prevalence exposure in controls | Approximate PAF: [7] | Exact calculation PAF: [6] |
---|---|---|---|---|---|
High blood pressure (Y/N) | 1.093 | 2.98 | 47.4% | 51.8% | 47.9% (45.1–50.6) |
Lack of physical activity | 0.501 | 1.65 | 83.7% | 41.9% | 35.5% (27.7–44.7) |
ApoA, ApoB ratio (in tertiles) | 0.428 | 1.53 | 66.9% | 28.6% | 26.9% (22.2–31.9) |
Diet score (in tertiles) | 0.378 | 1.46 | 67.0% | 25.3% | 23.0% (18.2–28.9) |
Waist hip ratio (in tertiles) | 0.294 | 1.34 | 67.0% | 19.7% | 18.8% (13.3–25.3) |
Smoking (Y/N) | 0.513 | 1.67 | 22.4% | 11.5% | 12.4% (10.2–14.9) |
Cardiac causes (Y/N) | 1.156 | 3.18 | 4.9% | 5.7% | 9.1% (8.0–10.2) |
Frequency of alcohol consumption (3 levels) | 0.186 | 1.20 | 27.7% | 5.2% | 5.9% (3.4–9.7) |
Global stress (Y/N) | 0.301 | 1.35 | 14.4% | 4.3% | 5.0% (2.6–7.3) |
Diabetes (Y/N) | 0.148 | 1.16 | 12.9% | 1.9% | 2.4% (0.1–4.9) |
Simultaneous graphical representation of PAF, odds ratios and prevalence
Imagine now a set of N disease risk factors (either binary, multi-category or continuous); we denote the inverse prevalence, log-OR pair for the i^{th} risk factor as \( \left({\hat{P}}_i^{-1},{\hat{\beta}}_i^{ave}\right) \) . Plotting the \( \left({\hat{P}}_i^{-1},{\hat{\beta}}_i^{ave}\right) \) pairs on a standard x-y co-ordinate axis, risk factors with inverse prevalence/log-odds ratio pairs lying on the line of slope K: \( {\hat{\beta}}^{ave}=K/\hat{P} \) emanating from the origin both have the same approximate attributable fractions, \( \hat{PA{F}_a}=K \). Note that binary, multicategory and continuous exposure variables can all be represented on this same axis, with the understanding that \( {\hat{P}}_i \) represents prevalence of the risk factor (in the binary case), and the prevalence of a ‘risk-increasing’ level of the exposure (in the multicategory and continuous cases). The resulting plot resembles a fan, with risk factors bearing heavier disease burden lie on lines of increasing slope. The slope of any such line is an approximate attributable fraction. Another observation regarding equation E7: \( \mathrm{y}=\hat{PA{F}_a}.\frac{1}{\hat{\left(\mathrm{P}\right)}} \), is that \( \mathrm{y}=\hat{PA{F}_a} \) when \( \hat{P}=1 \), implying that if we move the y-axis to \( 1/\hat{\mathrm{P}}=1 \), the y-intercept of the line emanating from (0,0) to \( \left({\hat{P}}_i^{-1},{\hat{\beta}}_i^{ave}\right) \) will be the approximate PAF.
Attributable fraction nomograms
Biases in approximations for larger odds ratios
Discussion
The graphical approaches described in this manuscript facilitate the visual assessment of relative risk factor burden according to a number of different criteria on a single axis. The plots depend on a simple approximation formula for PAF, that may be of interest in of itself, both as a quick rule of thumb to calculate PAFs and impact fractions, and in that it demonstrates that risk factor prevalence and risk factor/disease log-odds ratio equally contribute to PAF. The 2 plots proposed both have their advantages and disadvantages. While both methods allow detection of risk factor clusters having similar prevalence and odds ratios, it is more natural to visualize clustering on a natural 2-dimensional x-y plane as in Fig. 1, than it is on a nomogram in Fig. 2. Conversely, while both methods offer an explanation as to why a certain risk factor has a particular PAF, perhaps the representation given by the nomogram lends extra intuition to some epidemiologists who are already familiar with the use of likelihood ratio nomograms in diagnostic testing. Admittedly, these plots have limitations. The inverse or log-scaling used may create confusion regarding the absolute differences in PAF between the different risk factors. For instance, there is a larger difference in the PAFs for hypertension and physical inactivity than Fig. 1 might suggest since the log-scaling has distorted the absolute difference in PAF. Second, the approximations derived are only valid for logistic disease models, with no effect modification between the risk factors and confounders. A third problem is that the approximations used may be inaccurate for larger odds ratios. These limitations indicate that the plots might be best used as a visual accompaniment to, and not a replacement for, exact calculations of attributable fractions. A final point is that the suggested graphs can be used to compare continuous and discrete risk factors on the same axis. Often naturally continuous risk factors such as blood pressure are discretized for clinical convenience and interpretability; but whether it is fair to rank the PAFs for artificially discretized risk factors against un-discretised continuous risk factors is questionable. For instance, categorizing a naturally continuous risk factor into two groups only makes statistical sense if there is a threshold effect, where the risk suddenly ‘jumps’ at the threshold separating the categories. Otherwise discretization can be a very crude approximation and is likely to disadvantage a risk factor in a ranking compared with continuous risk factors that have not be categorized.
The approximations derived will work well in genetic settings, where Odds Ratios tend to be low. Even though genetic variables (such as single nucleotide polymorphisms) are not modifiable, attributable fractions are still of interest and have been used as a measure of disease heritability in some settings [6, 15]. In contrast, while the Odds Ratios in INTERSTROKE are larger, the approximate calculations are perhaps acceptably accurate (Table 1). However, extremely large odds ratios are possible in traditional epidemiologic applications. For instance, the odds ratio linking smoking and lung cancer was initially estimated to be roughly 9 [16]. While in these extreme cases the approximate PAF will be unacceptable as a proxy for an exact calculation (and may indeed be larger than 1), the plots suggested here may still convey a robust measure of risk factor importance, provided the absolute quantification of PAF is not of interest.
Conclusions
The described methods could be used as a complement to tables summarizing prevalence, odds ratios and PAF, and may convey the same information in a more intuitive and visually appealing manner. The suggested nomogram can also be used to visually estimate the effects of health interventions which only partially reduce risk factor prevalence. Finally, in the binary risk factor case, the approximations can also be used to quickly convert logistic regression coefficients for a risk factor into approximate PAFs.
Notes
Acknowledgements
Not applicable.
Code and data
Code to produce the analysis described in the article can be obtained from the corresponding author by request.
Authors’ contributions
JF proposed the idea of the manuscript, developed and implemented methodology and wrote the manuscript; NOL helped develop and implement methodology and proof read the manuscript; MOD helped develop methodology; FM and SY helped with editing final manuscript. All authors read and approved the final submitted manuscript.
Funding
Dr. Ferguson is supported by the HRB grant: EIA-2017-017. The HRB had no direct role in the development of methodology, the collection, analysis and interpretation of data or in writing the manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Supplementary material
References
- 1.Levin ML. The occurrence of lung cancer in man. Acta Unio Int Contra Cancrum. 1953;9:531–941.PubMedGoogle Scholar
- 2.Gefeller O. An annotated bibliography on the attributable risk. Biom J. 1992;34(8):1007–12.CrossRefGoogle Scholar
- 3.Poole C. A history of the population attributable fraction and related measures. Ann Epidemiol. 2015;25(3):147–54.CrossRefGoogle Scholar
- 4.Greenland S, Robins JM. Conceptual problems in the definition and interpretation of attributable fractions. Am J Epidemiol. 1988;128(6):1185–97.CrossRefGoogle Scholar
- 5.Eide GE, Gefeller O. Sequential and average attributable fractions as aids in the selection of preventive strategies. J Clin Epidemiol. 1995;48(5):645–55.CrossRefGoogle Scholar
- 6.Ferguson J, Alvarez-Iglesias A, Newell J, Hinde J, O’Donnell M. Estimating average attributable fractions with confidence intervals for cohort and case–control studies. Stat Methods Med Res. 2018;27(4):1141–52.CrossRefGoogle Scholar
- 7.O'Donnell MJ, Chin SL, Rangarajan S, Xavier D, Liu L, Zhang H, et al. Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (INTERSTROKE): a case-control study. Lancet. 2016;388(10046):761–75.CrossRefGoogle Scholar
- 8.Hernán MA RJ,. Causal Inference. Boca Raton: Chapman & Hall/CC. 2018. [available as pre-print].Google Scholar
- 9.Miettinen OS. Proportion of disease caused or prevented by a given exposure, trait or intervention. Am J Epidemiol. 1974;99(5):325–32.CrossRefGoogle Scholar
- 10.Greenland S. Bias in methods for deriving standardized morbidity ratio and attributable fraction estimates. Stat Med. 1984;3(2):131–41.CrossRefGoogle Scholar
- 11.Bruzzi P, Green SB, Byar DP, Brinton LA, Schairer C. Estimating the population attributable risk for multiple risk factors using case-control data. Am J Epidemiol. 1985;122(5):904–14.CrossRefGoogle Scholar
- 12.Greenland S, Drescher K. Maximum likelihood estimation of the attributable fraction from logistic models. Biometrics. 1993;49:865–72.CrossRefGoogle Scholar
- 13.Krishnamurthi RV, Moran AE, Feigin VL, Barker-Collo S, Norrving B, Mensah GA, et al. Stroke prevalence, mortality and disability-adjusted life years in adults aged 20-64 years in 1990-2013: data from the global burden of disease 2013 study. Neuroepidemiology. 2015;45(3):190–202.CrossRefGoogle Scholar
- 14.Fagan T. Nomogram for Bayes's theorem. N Engl J Med. 1975;293:257.PubMedGoogle Scholar
- 15.Ramakrishnan V, Thacker LR. Population attributable fraction as a measure of heritability in dichotomous twin data. Commun Stat Simul Comput. 2012;41(3):405–18.CrossRefGoogle Scholar
- 16.Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder EL. Smoking and lung cancer: recent evidence and a discussion of some questions. J Natl Cancer Inst. 1959;22(1):173–203.PubMedGoogle Scholar
- 17.Sjölander A. Estimation of attributable fractions using inverse probability weighting. Stat Methods Med Res. 2011;20(4):415–28.CrossRefGoogle Scholar
- 18.Hernán MA. A definition of causal effect for epidemiological research. J Epidemiol Community Health. 2004;58(4):265–71.CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.