Sparse Treatment-Effect Model for Taxon Identification with High-Dimensional Metagenomic Data

  • Zhenqiu LiuEmail author
  • Shili Lin
Part of the Methods in Molecular Biology book series (MIMB, volume 1849)


To identify disease-associated taxa is an important task in metagenomics. To date, many methods have been proposed for feature selection and prediction. However, those proposed methods are either using univariate (generalized) regression approaches to get the corresponding P-values without considering the interactions among taxa, or using lasso or L0 type sparse modeling approaches to identify taxa with best predictions without providing P-values. To the best of our knowledge, there are no available methods that consider taxon interactions and also generate P-values.

In this paper, we propose a treatment-effect model for identifying taxa (STEMIT) and performing statistical inference with high-dimensional metagenomic data. STEMIT will provide a P-value for a taxon through a two-step treatment-effect maximization. It will provide causal inference if the study is a clinical trial. We first identify taxa associated with the treatment-effect variable and the targeting feature with sparse modeling, and then estimate the P-value of the targeting gene with ordinary least square (OLS) regression. We demonstrate that the proposed method is efficient and can identify biologically important taxa with a real metagenomic data set. The software for L0 sparse modeling can be downloaded at

Key words

Treatment effect Sparse modeling Taxon identification Metagenomics Statistical inference 


  1. 1.
    Belloni A, Chernozhukov V, Hansen C (2014) High-dimensional methods and inference on structural and treatment effects. J Econ Perspect 28(2):29–50CrossRefGoogle Scholar
  2. 2.
    Fang R, Wagner B, Harris J, Fillon S (2016) Zero-inflated negative binomial mixed model: an application to two microbial organisms important in oesophagitis. Epidemiol Infect 1:1–9Google Scholar
  3. 3.
    Gilbert JA, Jansson JK, Knight R (2014) The earth microbiome project: successes and aspirations. BMC Biol 12(1):1CrossRefGoogle Scholar
  4. 4.
    Gruber S, van der Laan MJ (2010) A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Int J Biostat 6(1):26. Scholar
  5. 5.
    Human Microbiome Project Consortium (2012) Structure, function and diversity of the healthy human microbiome. Nature 486(7402):207–214CrossRefGoogle Scholar
  6. 6.
    Karlsson F, Tremaroli V, Nookaew I, Bergström G, Behre C, Fagerberg B, Nielsen J, Bäckhed F (2013) Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498:99–103CrossRefGoogle Scholar
  7. 7.
    Law C, Chen Y, Shi W, Smyth G (2014) Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15(2):R29CrossRefGoogle Scholar
  8. 8.
    Lippert K, Kedenko L, Antonielli L, Kedenko I, Gemeier C, Leitner M, Kautzky-Willer A, Paulweber B, Hackl E (2017) Gut microbiota dysbiosis associated with glucose metabolism disorders and the metabolic syndrome in older adults. Benef Microbes 13:1–12. Scholar
  9. 9.
    Liu Z, Hsiao W, Cantarel BL, Drábek EF, Fraser-Liggett C (2011) Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data. Bioinformatics 27(23):3242–3249CrossRefGoogle Scholar
  10. 10.
    Liu Z, Sun F, Braun J, McGovern D, Piantadosi S (2015) Multilevel regularized regression for simultaneous taxa selection and network construction with metagenomic count data. Bioinformatics 31(7):1067–1074CrossRefGoogle Scholar
  11. 11.
    Liu Z, Li G (2016) Efficient regularized regression with L0 penalty for variable selection and network construction. Comput Math Methods Med 2016:3456153PubMedPubMedCentralGoogle Scholar
  12. 12.
    Mackelprang R, Waldrop MP, DeAngelis KM, David MM, Chavarria KL, Blazewicz SJ, Rubin EM, Jansson JK (2011) Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw. Nature 480(7377):368–371CrossRefGoogle Scholar
  13. 13.
    Manichanh C, Rigottier-Gois L, Bonnaud E, Gloux K, Pelletier E, Frangeul L, Nalin R, Jarrin C, Chardon P, Marteau P et al (2006). Reduced diversity of faecal microbiota in Crohn’s disease revealed by a metagenomic approach. Gut 55(2):205–211CrossRefGoogle Scholar
  14. 14.
    Nayfach S, Pollard KS (2016) Toward accurate and quantitative comparative metagenomics. Cell 166(5):1103–1116CrossRefGoogle Scholar
  15. 15.
    Paulson JN, Stine OC, Bravo HC, Pop M (2013) Differential abundance analysis for microbial marker-gene surveys. Nat Methods 10(12):1200–1202CrossRefGoogle Scholar
  16. 16.
    Peng X, Li G, Liu Z (2016) Zero-inflated beta regression for differential abundance analysis with metagenomics data. J Comput Biol 23(2):102–110CrossRefGoogle Scholar
  17. 17.
    Rubin DB (1974) Estimating causal effects of treatment in randomized and nonrandomized studies. J Educational Pschol 66:688–701CrossRefGoogle Scholar
  18. 18.
    Rubin DB (2005) Causal inference using potential outcomes: design, modeling, decisions. J Am Stat Assoc 100:322–331CrossRefGoogle Scholar
  19. 19.
    Shaw KA, Bertha M, Hofmekler T, Chopra P, Vatanen T, Srivatsa A, Prince J, Kumar A, Sauer C, Zwick ME, Satten GA, Kostic AD, Mulle JG, Xavier RJ, Kugathasan S (2016) Dysbiosis, inflammation, and response to treatment: a longitudinal study of pediatric subjects with newly diagnosed inflammatory bowel disease. Genome Med 8(1):75CrossRefGoogle Scholar
  20. 20.
    Shawki A, McCole DF (2016) Mechanisms of intestinal epithelial barrier dysfunction by adherent-invasive Escherichia coli. Cell Mol Gastroenterol Hepatol 3(1):41–50CrossRefGoogle Scholar
  21. 21.
    Smith RJ, Jeffries TC, Roudnew B, Fitch AJ, Seymour JR, Delpin MW, Newton K, Brown MH, Mitchell JG (2012) Metagenomic comparison of microbial communities inhabiting confined and unconfined aquifer ecosystems. Environ Microbiol 14(1):240–253CrossRefGoogle Scholar
  22. 22.
    Takahashi K, Nishida A, Fujimoto T, Fujii M, Shioya M, Imaeda H, Inatomi O, Bamba S, Sugimoto M, Andoh A (2016) Reduced abundance of butyrate-producing bacteria species in the fecal microbial community in Crohn’s disease. Digestion 93(1): 59–65CrossRefGoogle Scholar
  23. 23.
    Tong M et al (2013) A modular organization of the human intestinal mucosal microbiota and its association with inflammatory bowel disease. PLoS One 8:e80702CrossRefGoogle Scholar
  24. 24.
    Turnbaugh P, Ley R, Hamady M, Liggett C, Knight R, Gordon J (2007) The human microbiome project: exploring the microbial part of ourselves in a changing world. Nature 449:804–810CrossRefGoogle Scholar
  25. 25.
    Zhang X, Mallick H, Tang Z, Zhang L, Cui X, Benson AK, Yi N (2017) Negative binomial mixed models for analyzing microbiome count data. BMC Bioinf 18(1):4CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Samuel Oschin Comprehensive Cancer InstituteCedars-Sinai Medical CenterLos AngelesUSA
  2. 2.Department of StatisticsThe Ohio State UniversityColumbusUSA

Personalised recommendations