Abstract
We address the high dimensional variable selection problem for associating the microbial compositions with a phenotype such as body mass index and disease status. Due to various sequencing depth, the number of reads assigned to a species or an operational taxonomic unit (OTU) is not directly comparable across different samples. Usually rarefying or normalization of the metagenomic count data has to be done before performing the downstream analysis. In this chapter, we employ a log contrast model bypassing the need for normalization. We propose a new method to identify phenotype associated species or OTUs using penalized regression and stability selection. The proposed method can also be applied to variable selection for regression analysis with compositional covariates. We compare the performance of different methods through simulation studies and real data analysis in the field of metagenomics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aitchison, J.: The statistical analysis of compositional data. J. R. Stat. Soc. Ser. B (Methodol.) 44(2), 139–177 (1982)
Aitchison J.: The Statistical Analysis of Compositional Data. Chapman & Hall, London. Reprinted in 2003, with additional material, by The Blackburn Press (1986)
Bragg, L., Tyson, G.W.: Metagenomics using next-generation sequencing. In: Paulsen, I., Holmes, A. (eds.) Environmental Microbiology. Methods in Molecular Biology (Methods and Protocols), vol. 1096. Humana Press, Totowa, NJ (2014)
Caporaso, J.G., et al.: QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7(5), 335–336 (2010)
Furnari, M.E., Savarino, L.B., Moscatelli, A., Gemignani, L., Giannini, E.G., Zentilin, P.: Reassessment of the role of methane production between irritable bowel syndrome and functional constipation. J. Gastroenterol. Liver Dis. 21, 157–163 (2012)
Krautkramer, K.A., Kreznar J.H., et al.: Diet-microbiota interactions mediate global epigenetic programming in multiple host tissues. Mol. Cell 64(5), 982–992 (2016)
Ley, R.E., Turnbaugh, P.J., Klein, S., Gordon, J.I.: Microbial ecology: human gut microbes associated with obesity. Nature 444, 1022–1023 (2006)
Lin, W., Shi, P., Feng, R., Li, H.: Variable selection in regression with compositional covariates. Biometrika 101(4), 785–797 (2014)
Liu, R., Hong, J., et al.: Gut microbiome and serum metabolome alterations in obesity and after weight-loss intervention. Nat. Medicine 23(7), 859–868 (2017)
Matson, V., et al.: The commensal microbiome is associated with anti-PD-1 efficacy in metastatic melanoma patients. Science 359(6371), 104–108 (2018)
Meinshausen, N., Bühlmann, P.: Stability selection (with discussion). J. R. Stat. Soc. Ser. B (Methodol.) 72, 417–473 (2010)
Paulson, J.N., et al.: Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 10(12), 1200–1202 (2013)
Qin, J.J., et al.: A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490(7418), 55–60 (2012)
Shah, R.D., Samworth, R.J.: Variable selection with error control: another look at stability selection. J. R. Stat. Soc. Ser. B (Methodol.) 75(1), 55–80 (2013)
Srinivas, G., et al.: Genome-wide mapping of gene-microbiota interactions in susceptibility to autoimmune skin blistering. Nat. Commun. 4, 2462 (2013)
Tibshirani, R.J.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)
Turnbaugh, P.J., et al.: A core gut microbiome in obese and lean twins. Nature 457, 480–484 (2009)
Weiss, S., et al.: Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5, 27 (2017)
Yu, T., Guo, F., et al.: Fusobacterium nucleatum Promotes Chemoresistance to Colorectal cancer by modulating autophagy. Cell 170(3), 548–563.e16 (2017)
Zhang, CH.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2012)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Methodol.) 67(2), 301–320 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Wang, P., Jiang, H. (2019). Variable Selection for High Dimensional Metagenomic Data. In: Zhang, L., Chen, DG., Jiang, H., Li, G., Quan, H. (eds) Contemporary Biostatistics with Biopharmaceutical Applications. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-15310-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-15310-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15309-0
Online ISBN: 978-3-030-15310-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)