, 15:7 | Cite as

Avoiding selection bias in metabolomics studies: a tutorial

  • S. C. BooneEmail author
  • S. le Cessie
  • K. Willems van Dijk
  • R. de Mutsert
  • D. O. Mook-Kanamori
Review Article



Metabolomics techniques are increasingly applied in epidemiologic research. Many available assays are still relatively expensive and therefore measurements are often performed in small patient population studies such as case series or case–control designs with strong participant selection criteria. Subsequently, metabolomics data are frequently used to assess secondary associations for which the original study was not explicitly designed. Especially in these secondary analyses, there is a risk that the original selection criteria and the conditioning that takes place due to this selection are not properly accounted for which can lead to selection bias.

Aim of review

In this tutorial, we start with a brief theoretical introduction on the issue of selection bias. Subsequently, we demonstrate how selection bias can occur in metabolomics studies by means of an investigation into associations of metabolites with total body fat in a nested case–control study that was originally designed to study effects of elevated fasting glucose.

Key scientific concepts of review

We demonstrate that standard analytical methods, such as stratification or adjustment in regression analyses, are not suited to deal with selection bias and may even induce the bias when analysing metabolite–phenotype relationships in selected groups. Finally, we show that inverse probability weighting, also known as survey weighting, can be used in some situations to make unbiased estimates of the outcomes.


Metabolomics Selection bias Collider bias Inverse probability weighting Epidemiology 



We express our gratitude to all individuals who participate in the Netherlands Epidemiology of Obesity study. We are grateful to all participating general practitioners for inviting eligible participants. We furthermore thank P. van Beelen and all research nurses for collecting the data and P. Noordijk and her team for sample handling and storage, and I. de Jonge, MSc for data management of the NEO study.

Author contributions

SCB: performed analyses, wrote manuscript and conceived idea, SLC: conceived idea, contributed to techniques for analysis, read and approved manuscript, KWD: helped in formulation of text aimed at target audience, read and approved manuscript, RDM: read and approved manuscript, DOM: conceived idea, read and approved manuscript.


The NEO study is supported by the participating Departments, the Division and the Board of Directors of the Leiden University Medical Centre, and by the Leiden University, Research Profile Area ‘Vascular and Regenerative Medicine’. Dennis Mook-Kanamori and the metabolomics measurements are supported by Dutch Science Organization (ZonMW-VENI Grant 916.14.023).

Compliance with ethical standards

Conflict of interest

All contributing authors declare that there is no conflict of interest involved in the creation of this manuscript.

Ethical approval

The Netherlands Epidemiology of Obesity study was approved by the medical ethical committee of the Leiden University Medical Center (LUMC).

Informed consent

All participants gave written informed consent.

Supplementary material

11306_2018_1463_MOESM1_ESM.docx (94 kb)
Supplementary material 1 (DOCX 93 KB)
11306_2018_1463_MOESM2_ESM.csv (27 kb)
Supplementary material 2 (CSV 27 KB)
11306_2018_1463_MOESM3_ESM.dta (32 kb)
Supplementary material 3 (DTA 31 KB)
11306_2018_1463_MOESM4_ESM.r (6 kb)
Supplementary material 4 (R 5 KB) (4 kb)
Supplementary material 5 (DO 4 KB)


  1. Ameta, K., Gupta, A., Kumar, S., Sethi, R., Kumar, D., & Mahdi, A. A. (2017). Essential hypertension: A filtered serum based metabolomics study. Scientific Reports, 7, 2153.CrossRefGoogle Scholar
  2. Berkson, J. (1946). Limitations of the application of fourfold table analysis to hospital data. Biometrics, 2, 47–53.CrossRefGoogle Scholar
  3. Cole, S. R., Platt, R. W., Schisterman, E. F., Chu, H., Westreich, D., Richardson, D., & Poole, C. (2010). Illustrating bias due to conditioning on a collider. International Journal of Epidemiology, 39, 417–420.CrossRefGoogle Scholar
  4. Day, F. R., Loh, P. R., Scott, R. A., Ong, K. K., & Perry, J. R. (2016). A robust example of collider bias in a Genetic Association Study. The American Journal of Human Genetics, 98, 392–393.CrossRefGoogle Scholar
  5. de Mutsert, R., den Heijer, M., Rabelink, T. J., Smit, J. W., Romijn, J. A., Jukema, J. W., de Roos, A., Cobbaert, C. M., Kloppenburg, M., le Cessie, S., Middeldorp, S., & Rosendaal, F. R. (2013). The Netherlands Epidemiology of Obesity (NEO) study: study design and data collection. European Journal of Epidemiology, 28, 513–523.CrossRefGoogle Scholar
  6. Floegel, A., Stefan, N., Yu, Z., Muhlenbruch, K., Drogan, D., Joost, H. G., Fritsche, A., Haring, H. U., Hrabe de Angelis, M., Peters, A., Roden, M., Prehn, C., Wang-Sattler, R., Illig, T., Schulze, M. B., Adamski, J., Boeing, H., & Pischon, T. (2013). Identification of serum metabolites associated with risk of type 2 diabetes using a targeted metabolomic approach. Diabetes, 62, 639–648.CrossRefGoogle Scholar
  7. Greenland, S. (2003). Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology, 14, 300–306.PubMedGoogle Scholar
  8. Haus, J. M., Kashyap, S. R., Kasumov, T., Zhang, R., Kelly, K. R., Defronzo, R. A., & Kirwan, J. P. (2009). Plasma ceramides are elevated in obese subjects with type 2 diabetes and correlate with the severity of insulin resistance. Diabetes, 58, 337–343.CrossRefGoogle Scholar
  9. Hernan, M. A., Hernandez-Diaz, S., & Robins, J. M. (2004). A structural approach to selection bias. Epidemiology, 15, 615–625.CrossRefGoogle Scholar
  10. Hernández-Díaz, S., Schisterman, E. F., & Hernán, M. A. (2006). The birth weight “paradox” uncovered? American Journal of Epidemiology, 164, 1115–1120.CrossRefGoogle Scholar
  11. Liu, W., Brookhart, M. A., Schneeweiss, S., Mi, X., & Setoguchi, S. (2012). Implications of M bias in epidemiologic studies: a simulation study. American Journal of Epidemiology, 176, 938–948.CrossRefGoogle Scholar
  12. Lumley, T. (2018) Analysis of complex survey samples, version 3.34. Retrieved November 2018, from
  13. Magnusson, M., Wang, T. J., Clish, C., Engstrom, G., Nilsson, P., Gerszten, R. E., & Melander, O. (2015). Dimethylglycine deficiency and the development of diabetes. Diabetes, 64, 3010–3016.CrossRefGoogle Scholar
  14. Monsees, G. M., Tamimi, R. M., & Kraft, P. (2009). Genome-wide association scans for secondary traits using case-control samples. Genetic Epidemiology, 33, 717–728.CrossRefGoogle Scholar
  15. Mook-Kanamori, D. O., de Mutsert, R., Rensen, P. C., Prehn, C., Adamski, J., den Heijer, M., le Cessie, S., Suhre, K., Rosendaal, F. R., & Dijk, K. W. (2016). Type 2 diabetes is associated with postprandial amino acid measures. Archives of Biochemistry and Biophysics, 589, 138–144.CrossRefGoogle Scholar
  16. Schechtman, E. (2002). Odds ratio, relative risk, absolute risk reduction, and the number needed to treat–which of these should we use? Value Health, 5, 431–436.CrossRefGoogle Scholar
  17. (2016). Retrieved June 06, 2018, from RIVM: Bilthoven.
  18. Wang-Sattler, R., Yu, Z., Herder, C., Messias, A. C., Floegel, A., He, Y., Heim, K., Campillos, M., Holzapfel, C., Thorand, B., Grallert, H., Xu, T., Bader, E., Huth, C., Mittelstrass, K., Doring, A., Meisinger, C., Gieger, C., Prehn, C., Roemisch-Margl, W., Carstensen, M., Xie, L., Yamanaka-Okumura, H., Xing, G., Ceglarek, U., Thiery, J., Giani, G., Lickert, H., Lin, X., Li, Y., Boeing, H., Joost, H. G., de Angelis, M. H., Rathmann, W., Suhre, K., Prokisch, H., Peters, A., Meitinger, T., Roden, M., Wichmann, H. E., Pischon, T., Adamski, J., & Illig, T. (2012). Novel biomarkers for pre-diabetes identified by metabolomics. Molecular Systems Biology, 8, 615.CrossRefGoogle Scholar
  19. Westreich, D. (2012). Berkson’s bias, selection bias, and missing data. Epidemiology, 23, 159–164.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Clinical Epidemiology, Department C7-PLeiden University Medical Center (LUMC)LeidenThe Netherlands
  2. 2.Department of Biomedical Data Sciences, Section Medical Statistics and BioinformaticsLeiden University Medical CenterLeidenThe Netherlands
  3. 3.Department of EndocrinologyLeiden University Medical CenterLeidenThe Netherlands
  4. 4.Einthoven Laboratory for Experimental Vascular MedicineLeiden University Medical CenterLeidenThe Netherlands
  5. 5.Department of Human GeneticsLeiden University Medical CenterLeidenThe Netherlands
  6. 6.Department of Public Health and Primary CareLeiden University Medical CenterLeidenThe Netherlands

Personalised recommendations