Exposome-Wide Association Studies: A Data-Driven Approach for Searching for Exposures Associated with Phenotype

  • Chirag J. PatelEmail author


The promise of a unified way to measure the human exposome is the discovery of novel environmental factors associated with and potentially causative of disease. The human exposome has been tentatively defined as the totality of environmental exposures such as dietary nutrients, pharmaceutical drugs, infectious agents, and pollutants encountered from birth to death. Much as human genetics has benefited from high-throughput profiling in the form of genome-wide association studies (GWAS), a data-driven paradigm for the exposome is needed to systematically and reproducibly discover the environmental determinants of disease. In this chapter, we describe methods for associating the exposome with phenotypic state such as disease. Specifically, this chapter will describe hands-on analytic examples and data to search the exposome for correlates with phenotype, called the “environment/exposome-wide association study” (EWAS). First, we will describe the philosophy behind such a study, including transparency and mitigation of the chances for selection biases. Second, we will describe how to mitigate chances of type 1 error and investigate the possibility of true signals in a sea of possible false positives. We will describe open-source tools for visualization and display of correlated data to enable investigators to efficiently ascertain patterns in phenotypic associations. We end by describing a few success stories of the approach.


EWAS Exposome-phenotype associations Open source tools 


  1. Bartell SM, Griffith WC, Faustman EM (2004) Temporal error in biomarker-based mean exposure estimates for individuals. J Expo Anal Environ Epidemiol 14:173–179CrossRefGoogle Scholar
  2. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodology 57(1):289–300Google Scholar
  3. Butte AJ, Kohane IS (2000) Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac Symp Biocomput 5:418–429Google Scholar
  4. Centers for Disease Control and Prevention (CDC), and National Center for Health Statistics (NCHS) (2013a) National Health and Nutrition Examination Survey Data, 1999–2000. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, Hyattsville
  5. Centers for Disease Control and Prevention (CDC), and National Center for Health Statistics (NCHS) (2013b) National Health and Nutrition Examination Survey Data, 2001–2002. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, Hyattsville
  6. Centers for Disease Control and Prevention (CDC), and National Center for Health Statistics (NCHS) (2013c) National Health and Nutrition Examination Survey Data, 2005–2006. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, Hyattsville
  7. Centers for Disease Control and Prevention (CDC), and National Center for Health Statistics (NCHS) (2013d) National Health and Nutrition Examination Survey Data, 2003–2004. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, Hyattsville
  8. Davey Smith G, Ebrahim S (2005) What can mendelian randomisation tell us about modifiable behavioural and environmental exposures? BMJ 330(7499):1076–1079CrossRefGoogle Scholar
  9. Dennis KK, Marder E, Balshaw DM, Cui Y, Lynes MA, Patti GJ, Rappaport SM, Shaughnessy DT, Vrijheid M, Barr DB (2017) Biomonitoring in the era of the exposome. Environ Health Perspect 125(4):502CrossRefGoogle Scholar
  10. Efron B (2010) Large-scale inference. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  11. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95(25):14863–14868CrossRefGoogle Scholar
  12. Frayling T, Timpson N, Weedon M, Zeggini E, Freathy R, Lindgren C, Perry J et al (2007) A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316(5826):889–894CrossRefGoogle Scholar
  13. Gibson G (2008) The environmental contribution to gene expression profiles. Nat Rev Genet 9(8):575–581CrossRefGoogle Scholar
  14. Goldstein D (2009) Common genetic variation and human traits. N Engl J Med 360(17):1696–1698CrossRefGoogle Scholar
  15. Greenland S, Morgenstern H (2011) Confounding in health research. Annu Rev Public Health 22:189–212CrossRefGoogle Scholar
  16. Hardy J, Singleton A (2009) Genomewide association studies and human disease. N Engl J Med 360(17):1759–1768CrossRefGoogle Scholar
  17. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106:9362–9367CrossRefGoogle Scholar
  18. Hooper L, Ness AR, Smith GD (2001) Antioxidant strategy for cardiovascular diseases. Lancet 357:1705–1706CrossRefGoogle Scholar
  19. International HapMap, Consortium (2005) A haplotype map of the human genome. Nature 437(7063):1299–1320CrossRefGoogle Scholar
  20. Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG (2001) Replication validity of genetic association studies. Nat Genet 29(3):306–309CrossRefGoogle Scholar
  21. Ioannidis JPA (2005) Why most published research findings are false. PLoS Med 2(8):e124CrossRefGoogle Scholar
  22. Ioannidis JPA (2016) Exposure-wide epidemiology: revisiting Bradford hill. Stat Med 35(11):1749–1762CrossRefGoogle Scholar
  23. Ioannidis JPA, En YL, Poulton R, Chia KS (2009) Researching genetic versus nongenetic determinants of disease: a comparison and proposed unification. Sci Transl Med 1(7):7ps8CrossRefGoogle Scholar
  24. Ioannidis JPA, Tarone R, McLaughlin JK (2011) The false-positive to false-negative ratio in epidemiologic studies. Epidemiology 22(4):450–456CrossRefGoogle Scholar
  25. James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning: with applications in R. Springer Texts in Statistics 103. Springer, New YorkCrossRefGoogle Scholar
  26. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19(9):1639–1645CrossRefGoogle Scholar
  27. Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9(1):559CrossRefGoogle Scholar
  28. Louis B, Germaine M, Sundaram R (2012) Exposome: time for transformative research. Stat Med 31(22):2569–2575CrossRefGoogle Scholar
  29. Manly BFJ (2007) Randomization, bootstrap and Monte Carlo methods in biology, 3rd edn. Chapman and Hall/CRC, Boca RatonGoogle Scholar
  30. Manolio TA, Brooks LD, Collins FS (2008) A HapMap harvest of insights into the genetics of common disease. J Clin Invest 118(5):1590–1605CrossRefGoogle Scholar
  31. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, Mccarthy MI et al (2009) Finding the missing heritability of complex diseases. Nature 461(7265):747–753CrossRefGoogle Scholar
  32. Manrai AK, Cui Y, Bushel PR, Hall M, Karakitsios S, Mattingly CJ, Ritchie M et al (2017) Informatics and data analytics to support exposome-based discovery for public health. Annu Rev Public Health 38(1):279–294CrossRefGoogle Scholar
  33. McGinnis DP, Brownstein JS, Patel CJ (2016) Environment-wide association study of blood pressure in the national health and nutrition examination survey (1999–2012). Sci Rep 6:30373CrossRefGoogle Scholar
  34. Miller GW, Jones DP (2014) The nature of nurture: refining the definition of the exposome. Toxicol Sci 137(1):1–2CrossRefGoogle Scholar
  35. NCI-NHGRI Working Group on Replication in Association Studies, Chanock SJ, Manolio T, Boehnke M, Boerwinkle E, Hunter DJ, Thomas G et al (2007) Replicating genotype-phenotype associations. Nature 447(7145):655–660CrossRefGoogle Scholar
  36. Noble WS (2009) How does multiple testing correction work? Nat Biotechnol 27(12):1135–1137CrossRefGoogle Scholar
  37. Omenn GS, Goodman GE, Thornquist MD, Balmes J, Cullen MR, Glass A, Keogh JP et al (1996) Effects of a combination of beta carotene and vitamin a on lung cancer and cardiovascular disease. N Engl J Med 334:1150–1155CrossRefGoogle Scholar
  38. Patel CJ, Ioannidis JPA (2014a) Studying the elusive environment in large scale. JAMA 311(21):2173–2174CrossRefGoogle Scholar
  39. Patel CJ, Ioannidis JPA (2014b) Placing epidemiological results in the context of multiplicity and typical correlations of exposures. J Epidemiol Community Health 68(11):1096–1100CrossRefGoogle Scholar
  40. Patel CJ, Manrai AK (2015) Development of exposome correlation globes to map out environment-wide associations. Pac Symp Biocomput 20:231–242Google Scholar
  41. Patel CJ, Bhattacharya J, Butte AJ (2010) An environment-wide association study (EWAS) on type 2 diabetes mellitus. PLoS One 5(5):e10746CrossRefGoogle Scholar
  42. Patel CJ, Cullen MR, Ioannidis JPA, Butte AJ (2012) Systematic evaluation of environmental factors: persistent pollutants and nutrients correlated with serum lipid levels. Int J Epidemiol 41(3):828–843CrossRefGoogle Scholar
  43. Patel CJ, Rehkopf DH, Leppert JT, Bortz WM, Cullen MR, Chertow GM, Ioannidis JPA (2013) Systematic evaluation of environmental and behavioural factors associated with all-cause mortality in the United States national health and nutrition examination survey. Int J Epidemiol 42(6):1795–1810CrossRefGoogle Scholar
  44. Patel CJ, Yang T, Zhongkai H, Wen Q, Sung J, El-Sayed YY, Cohen H et al (2014) Investigation of maternal environmental exposures in association with self-reported preterm birth. Reprod Toxicol 45:1–7CrossRefGoogle Scholar
  45. Patel CJ, Burford B, Ioannidis JPA (2015a) Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations. J Clin Epidemiol 68:1046–1058CrossRefGoogle Scholar
  46. Patel CJ, Ioannidis JPA, Cullen MR, Rehkopf DH (2015b) Systematic assessment of the correlations of household income with infectious, biochemical, physiological, and environmental factors in the United States, 1999–2006. Am J Epidemiol 181(3):171–179CrossRefGoogle Scholar
  47. Patel CJ, Ji J, Sundquist J, Ioannidis JPA, Sundquist K (2016a) Systematic assessment of pharmaceutical prescriptions in association with cancer risk: a method to conduct a population-wide medication-wide longitudinal study. Sci Rep 6:31308CrossRefGoogle Scholar
  48. Patel CJ, Manrai AK, Corona E, Kohane IS (2016b) Systematic correlation of environmental exposure and physiological and self-reported behaviour factors with leukocyte telomere length. Int J Epidemiol 46(1):44–56. Scholar
  49. Patel CJ, Pho N, McDuffie M, Easton-Marks J, Kothari C, Kohane IS, Avillach P (2016c) A database of human exposomes and phenomes from the US national health and nutrition examination survey. Sci Data 3:160096CrossRefGoogle Scholar
  50. Pearson TA, Manolio TA (2008) How to interpret a genome-wide association study. JAMA 299(11):1335–1344CrossRefGoogle Scholar
  51. Peto R, Doll R, Buckley JD, Sporn MB (1981) Can dietary beta-carotene materially reduce human cancer rates? Nature 290:201–208CrossRefGoogle Scholar
  52. Polderman TJC, Benyamin B, de Leeuw CA, Sullivan PF, van Bochoven A, Visscher PM, Posthuma D (2015) Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet 47:702–709CrossRefGoogle Scholar
  53. Rappaport SM (2012) Discovering environmental causes of disease. J Epidemiol Community Health 66:99–102CrossRefGoogle Scholar
  54. Rappaport SM, Barupal DK, Wishart D, Vineis P, Scalbert A (2014) The blood exposome and its role in discovering causes of disease. Environ Health Perspect 122(8):769–774CrossRefGoogle Scholar
  55. Schwartz D, Collins F (2007) MEDICINE: environmental biology and human disease. Science 316:695–696CrossRefGoogle Scholar
  56. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodology 58(1):267–288Google Scholar
  57. Tzoulaki I, Patel CJ, Okamura T, Chan Q, Brown IJ, Miura K, Ueshima H et al (2012) A nutrient-wide association study on blood pressure. Circulation 126(21):2456–2464CrossRefGoogle Scholar
  58. Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS discovery. Am J Hum Genet 90(1):7–24CrossRefGoogle Scholar
  59. Vittinghoff E, Glidden D, Shiboski S, McCulloch C (2005) Regression methods in biostatistics: linear, logistic, survival, and repeated measures models. Springer, New YorkGoogle Scholar
  60. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A et al (2014) The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42:D1001–D1006CrossRefGoogle Scholar
  61. Westfall PH, Stanley Young S (1993) Resampling-based multiple testing. Wiley, New YorkGoogle Scholar
  62. Wild CP, Scalbert A, Herceg Z (2013) Measuring the exposome: a powerful basis for evaluating environmental exposures and cancer risk. Environ Mol Mutagen 54(7):480–499CrossRefGoogle Scholar
  63. Wild CP (2012) The exposome: from concept to utility. Int J Epidemiol 41(1):24–32CrossRefGoogle Scholar
  64. Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, Chu AY et al (2014) Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet 46(11):1173–1186CrossRefGoogle Scholar
  65. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodology 67:301–320CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Biomedical InformaticsHarvard Medical SchoolBostonUSA

Personalised recommendations