Abstract
To identify disease-associated taxa is an important task in metagenomics. To date, many methods have been proposed for feature selection and prediction. However, those proposed methods are either using univariate (generalized) regression approaches to get the corresponding P-values without considering the interactions among taxa, or using lasso or L0 type sparse modeling approaches to identify taxa with best predictions without providing P-values. To the best of our knowledge, there are no available methods that consider taxon interactions and also generate P-values.
In this paper, we propose a treatment-effect model for identifying taxa (STEMIT) and performing statistical inference with high-dimensional metagenomic data. STEMIT will provide a P-value for a taxon through a two-step treatment-effect maximization. It will provide causal inference if the study is a clinical trial. We first identify taxa associated with the treatment-effect variable and the targeting feature with sparse modeling, and then estimate the P-value of the targeting gene with ordinary least square (OLS) regression. We demonstrate that the proposed method is efficient and can identify biologically important taxa with a real metagenomic data set. The software for L0 sparse modeling can be downloaded at https://cran.r-project.org/web/packages/l0ara/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Belloni A, Chernozhukov V, Hansen C (2014) High-dimensional methods and inference on structural and treatment effects. J Econ Perspect 28(2):29–50
Fang R, Wagner B, Harris J, Fillon S (2016) Zero-inflated negative binomial mixed model: an application to two microbial organisms important in oesophagitis. Epidemiol Infect 1:1–9
Gilbert JA, Jansson JK, Knight R (2014) The earth microbiome project: successes and aspirations. BMC Biol 12(1):1
Gruber S, van der Laan MJ (2010) A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Int J Biostat 6(1):26. http://doi.org/10.2202/1557-4679.1260.
Human Microbiome Project Consortium (2012) Structure, function and diversity of the healthy human microbiome. Nature 486(7402):207–214
Karlsson F, Tremaroli V, Nookaew I, Bergström G, Behre C, Fagerberg B, Nielsen J, Bäckhed F (2013) Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498:99–103
Law C, Chen Y, Shi W, Smyth G (2014) Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15(2):R29
Lippert K, Kedenko L, Antonielli L, Kedenko I, Gemeier C, Leitner M, Kautzky-Willer A, Paulweber B, Hackl E (2017) Gut microbiota dysbiosis associated with glucose metabolism disorders and the metabolic syndrome in older adults. Benef Microbes 13:1–12. http://doi.org/10.3920/BM2016.0184
Liu Z, Hsiao W, Cantarel BL, Drábek EF, Fraser-Liggett C (2011) Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data. Bioinformatics 27(23):3242–3249
Liu Z, Sun F, Braun J, McGovern D, Piantadosi S (2015) Multilevel regularized regression for simultaneous taxa selection and network construction with metagenomic count data. Bioinformatics 31(7):1067–1074
Liu Z, Li G (2016) Efficient regularized regression with L0 penalty for variable selection and network construction. Comput Math Methods Med 2016:3456153
Mackelprang R, Waldrop MP, DeAngelis KM, David MM, Chavarria KL, Blazewicz SJ, Rubin EM, Jansson JK (2011) Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw. Nature 480(7377):368–371
Manichanh C, Rigottier-Gois L, Bonnaud E, Gloux K, Pelletier E, Frangeul L, Nalin R, Jarrin C, Chardon P, Marteau P et al (2006). Reduced diversity of faecal microbiota in Crohn’s disease revealed by a metagenomic approach. Gut 55(2):205–211
Nayfach S, Pollard KS (2016) Toward accurate and quantitative comparative metagenomics. Cell 166(5):1103–1116
Paulson JN, Stine OC, Bravo HC, Pop M (2013) Differential abundance analysis for microbial marker-gene surveys. Nat Methods 10(12):1200–1202
Peng X, Li G, Liu Z (2016) Zero-inflated beta regression for differential abundance analysis with metagenomics data. J Comput Biol 23(2):102–110
Rubin DB (1974) Estimating causal effects of treatment in randomized and nonrandomized studies. J Educational Pschol 66:688–701
Rubin DB (2005) Causal inference using potential outcomes: design, modeling, decisions. J Am Stat Assoc 100:322–331
Shaw KA, Bertha M, Hofmekler T, Chopra P, Vatanen T, Srivatsa A, Prince J, Kumar A, Sauer C, Zwick ME, Satten GA, Kostic AD, Mulle JG, Xavier RJ, Kugathasan S (2016) Dysbiosis, inflammation, and response to treatment: a longitudinal study of pediatric subjects with newly diagnosed inflammatory bowel disease. Genome Med 8(1):75
Shawki A, McCole DF (2016) Mechanisms of intestinal epithelial barrier dysfunction by adherent-invasive Escherichia coli. Cell Mol Gastroenterol Hepatol 3(1):41–50
Smith RJ, Jeffries TC, Roudnew B, Fitch AJ, Seymour JR, Delpin MW, Newton K, Brown MH, Mitchell JG (2012) Metagenomic comparison of microbial communities inhabiting confined and unconfined aquifer ecosystems. Environ Microbiol 14(1):240–253
Takahashi K, Nishida A, Fujimoto T, Fujii M, Shioya M, Imaeda H, Inatomi O, Bamba S, Sugimoto M, Andoh A (2016) Reduced abundance of butyrate-producing bacteria species in the fecal microbial community in Crohn’s disease. Digestion 93(1): 59–65
Tong M et al (2013) A modular organization of the human intestinal mucosal microbiota and its association with inflammatory bowel disease. PLoS One 8:e80702
Turnbaugh P, Ley R, Hamady M, Liggett C, Knight R, Gordon J (2007) The human microbiome project: exploring the microbial part of ourselves in a changing world. Nature 449:804–810
Zhang X, Mallick H, Tang Z, Zhang L, Cui X, Benson AK, Yi N (2017) Negative binomial mixed models for analyzing microbiome count data. BMC Bioinf 18(1):4
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Liu, Z., Lin, S. (2018). Sparse Treatment-Effect Model for Taxon Identification with High-Dimensional Metagenomic Data. In: Beiko, R., Hsiao, W., Parkinson, J. (eds) Microbiome Analysis. Methods in Molecular Biology, vol 1849. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8728-3_19
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8728-3_19
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8726-9
Online ISBN: 978-1-4939-8728-3
eBook Packages: Springer Protocols