Variable Selection for High Dimensional Metagenomic Data

  • Pan Wang
  • Hongmei JiangEmail author
Part of the ICSA Book Series in Statistics book series (ICSABSS)


We address the high dimensional variable selection problem for associating the microbial compositions with a phenotype such as body mass index and disease status. Due to various sequencing depth, the number of reads assigned to a species or an operational taxonomic unit (OTU) is not directly comparable across different samples. Usually rarefying or normalization of the metagenomic count data has to be done before performing the downstream analysis. In this chapter, we employ a log contrast model bypassing the need for normalization. We propose a new method to identify phenotype associated species or OTUs using penalized regression and stability selection. The proposed method can also be applied to variable selection for regression analysis with compositional covariates. We compare the performance of different methods through simulation studies and real data analysis in the field of metagenomics.


  1. Aitchison, J.: The statistical analysis of compositional data. J. R. Stat. Soc. Ser. B (Methodol.) 44(2), 139–177 (1982)MathSciNetzbMATHGoogle Scholar
  2. Aitchison J.: The Statistical Analysis of Compositional Data. Chapman & Hall, London. Reprinted in 2003, with additional material, by The Blackburn Press (1986)Google Scholar
  3. Bragg, L., Tyson, G.W.: Metagenomics using next-generation sequencing. In: Paulsen, I., Holmes, A. (eds.) Environmental Microbiology. Methods in Molecular Biology (Methods and Protocols), vol. 1096. Humana Press, Totowa, NJ (2014)Google Scholar
  4. Caporaso, J.G., et al.: QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7(5), 335–336 (2010)CrossRefGoogle Scholar
  5. Furnari, M.E., Savarino, L.B., Moscatelli, A., Gemignani, L., Giannini, E.G., Zentilin, P.: Reassessment of the role of methane production between irritable bowel syndrome and functional constipation. J. Gastroenterol. Liver Dis. 21, 157–163 (2012)Google Scholar
  6. Krautkramer, K.A., Kreznar J.H., et al.: Diet-microbiota interactions mediate global epigenetic programming in multiple host tissues. Mol. Cell 64(5), 982–992 (2016)CrossRefGoogle Scholar
  7. Ley, R.E., Turnbaugh, P.J., Klein, S., Gordon, J.I.: Microbial ecology: human gut microbes associated with obesity. Nature 444, 1022–1023 (2006)CrossRefGoogle Scholar
  8. Lin, W., Shi, P., Feng, R., Li, H.: Variable selection in regression with compositional covariates. Biometrika 101(4), 785–797 (2014)MathSciNetCrossRefGoogle Scholar
  9. Liu, R., Hong, J., et al.: Gut microbiome and serum metabolome alterations in obesity and after weight-loss intervention. Nat. Medicine 23(7), 859–868 (2017)CrossRefGoogle Scholar
  10. Matson, V., et al.: The commensal microbiome is associated with anti-PD-1 efficacy in metastatic melanoma patients. Science 359(6371), 104–108 (2018)CrossRefGoogle Scholar
  11. Meinshausen, N., Bühlmann, P.: Stability selection (with discussion). J. R. Stat. Soc. Ser. B (Methodol.) 72, 417–473 (2010)MathSciNetCrossRefGoogle Scholar
  12. Paulson, J.N., et al.: Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 10(12), 1200–1202 (2013)CrossRefGoogle Scholar
  13. Qin, J.J., et al.: A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490(7418), 55–60 (2012)CrossRefGoogle Scholar
  14. Shah, R.D., Samworth, R.J.: Variable selection with error control: another look at stability selection. J. R. Stat. Soc. Ser. B (Methodol.) 75(1), 55–80 (2013)MathSciNetCrossRefGoogle Scholar
  15. Srinivas, G., et al.: Genome-wide mapping of gene-microbiota interactions in susceptibility to autoimmune skin blistering. Nat. Commun. 4, 2462 (2013)CrossRefGoogle Scholar
  16. Tibshirani, R.J.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  17. Turnbaugh, P.J., et al.: A core gut microbiome in obese and lean twins. Nature 457, 480–484 (2009)CrossRefGoogle Scholar
  18. Weiss, S., et al.: Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5, 27 (2017)CrossRefGoogle Scholar
  19. Yu, T., Guo, F., et al.: Fusobacterium nucleatum Promotes Chemoresistance to Colorectal cancer by modulating autophagy. Cell 170(3), 548–563.e16 (2017)CrossRefGoogle Scholar
  20. Zhang, CH.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2012)MathSciNetCrossRefGoogle Scholar
  21. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Methodol.) 67(2), 301–320 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of StatisticsNorthwestern UniversityEvanstonUSA

Personalised recommendations