Environmental Exposure Mixtures: Questions and Methods to Address Them
- 25 Downloads
Purpose of This Review
This review provides a summary of statistical approaches that researchers can use to study environmental exposure mixtures. Two primary considerations are the form of the research question and the statistical tools best suited to address that question. Because the choice of statistical tools is not rigid, we make recommendations about when each tool may be most useful.
When dimensionality is relatively low, some statistical tools yield easily interpretable estimates of effect (e.g., risk ratio, odds ratio) or intervention impacts. When dimensionality increases, it is often necessary to compromise this interpretablity in favor of identifying interesting statistical signals from noise; this requires applying statistical tools that are oriented more heavily towards dimension reduction via shrinkage and/or variable selection.
The study of complex exposure mixtures has prompted development of novel statistical methods. We suggest that further validation work would aid practicing researchers in choosing among existing and emerging statistical tools for studying exposure mixtures.
KeywordsComplex mixtures Environmental epidemiology Bayesian methods Machine learning
Compliance with Ethical Standards
Conflict of Interest
The authors declare that they have no conflicts of interest.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance
- 7.Czarnota J, Gennings C, Wheeler DC. Assessment of weighted quantile sum regression for modeling chemical mixtures and cancer risk. Cancer Informat. 2015;14(Suppl 2):159–71.Google Scholar
- 9.Varshavsky JR, Zota AR, Woodruff TJ. A novel method for calculating potency-weighted cumulative phthalates exposure with implications for identifying racial/ethnic disparities among U.S. reproductive-aged women in NHANES 2001-2012. Environ Sci Technol. 2016;50(19):10616–24.PubMedPubMedCentralCrossRefGoogle Scholar
- 13.MacLehose RF, Hamra GB. Applications of Bayesian methods to epidemiologic research. Curr Epidemiol Rep. 2014;1–7.Google Scholar
- 17.Hamra GB, et al. Lung cancer risk associated with regulated and unregulated chrysotile asbestos fibers. Epidemiology. 2016.Google Scholar
- 18.Wold H. Partial least squares. In: Encyclopedia of statistical sciences. Hoboken: John Wiley & Sons, Inc.; 2004.Google Scholar
- 24.• Stafoggia M, et al. Statistical approaches to address multi-pollutant mixtures and multiplee: the state of the science. Curr Environ Health Rep. 2017;4(4):481–90. Provides an overview of methods that can be applied to higher dimensional mixtures problems, such as exposomics.PubMedCrossRefGoogle Scholar
- 25.Ho TK. Random decision forests, in Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1—Volume 1). 1995; IEEE Computer Society. pp. 278.Google Scholar
- 27.Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40.Google Scholar
- 28.Freund Y and Schapire RE. Experiments with a new boosting algorithm, in Proceedings of the Thirteenth International Conference on International Conference on Machine Learning. 1996; Morgan Kaufmann Publishers Inc., Bari. pp. 148–156.Google Scholar
- 31.Lenters V, Portengen L, Rignell-Hydbom A, Jönsson BA, Lindh CH, Piersma AH, et al. Prenatal phthalate, perfluoroalkyl acid, and organochlorine exposures and term birth weight in three birth cohorts: multi-pollutant models based on elastic net regression. Environ Health Perspect. 2016;124(3):365–72.PubMedGoogle Scholar
- 32.Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2(3):18–22.Google Scholar
- 35.Milligan GW. Cluster analysis, in encyclopedia of statistical sciences. Hoboken: John Wiley & Sons, Inc; 2004.Google Scholar
- 36.Keil AP, et al. A Bayesian approach to the g-formula. Stat Methods Med Res. 2017; 962280217694665.Google Scholar
- 42.Liu SH, et al. Lagged kernel machine regression for identifying time windows of susceptibility to exposures of complex mixtures. Biostatistics. 2017.Google Scholar
- 52.Carpenter B, et al. Stan: a probabilistic programming language 2017. 2017;76(1):32.Google Scholar