Skip to main content

Variable Selection for High Dimensional Metagenomic Data

  • Chapter
  • First Online:
Contemporary Biostatistics with Biopharmaceutical Applications

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

  • 815 Accesses

Abstract

We address the high dimensional variable selection problem for associating the microbial compositions with a phenotype such as body mass index and disease status. Due to various sequencing depth, the number of reads assigned to a species or an operational taxonomic unit (OTU) is not directly comparable across different samples. Usually rarefying or normalization of the metagenomic count data has to be done before performing the downstream analysis. In this chapter, we employ a log contrast model bypassing the need for normalization. We propose a new method to identify phenotype associated species or OTUs using penalized regression and stability selection. The proposed method can also be applied to variable selection for regression analysis with compositional covariates. We compare the performance of different methods through simulation studies and real data analysis in the field of metagenomics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aitchison, J.: The statistical analysis of compositional data. J. R. Stat. Soc. Ser. B (Methodol.) 44(2), 139–177 (1982)

    MathSciNet  MATH  Google Scholar 

  • Aitchison J.: The Statistical Analysis of Compositional Data. Chapman & Hall, London. Reprinted in 2003, with additional material, by The Blackburn Press (1986)

    Google Scholar 

  • Bragg, L., Tyson, G.W.: Metagenomics using next-generation sequencing. In: Paulsen, I., Holmes, A. (eds.) Environmental Microbiology. Methods in Molecular Biology (Methods and Protocols), vol. 1096. Humana Press, Totowa, NJ (2014)

    Google Scholar 

  • Caporaso, J.G., et al.: QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7(5), 335–336 (2010)

    Article  Google Scholar 

  • Furnari, M.E., Savarino, L.B., Moscatelli, A., Gemignani, L., Giannini, E.G., Zentilin, P.: Reassessment of the role of methane production between irritable bowel syndrome and functional constipation. J. Gastroenterol. Liver Dis. 21, 157–163 (2012)

    Google Scholar 

  • Krautkramer, K.A., Kreznar J.H., et al.: Diet-microbiota interactions mediate global epigenetic programming in multiple host tissues. Mol. Cell 64(5), 982–992 (2016)

    Article  Google Scholar 

  • Ley, R.E., Turnbaugh, P.J., Klein, S., Gordon, J.I.: Microbial ecology: human gut microbes associated with obesity. Nature 444, 1022–1023 (2006)

    Article  Google Scholar 

  • Lin, W., Shi, P., Feng, R., Li, H.: Variable selection in regression with compositional covariates. Biometrika 101(4), 785–797 (2014)

    Article  MathSciNet  Google Scholar 

  • Liu, R., Hong, J., et al.: Gut microbiome and serum metabolome alterations in obesity and after weight-loss intervention. Nat. Medicine 23(7), 859–868 (2017)

    Article  Google Scholar 

  • Matson, V., et al.: The commensal microbiome is associated with anti-PD-1 efficacy in metastatic melanoma patients. Science 359(6371), 104–108 (2018)

    Article  Google Scholar 

  • Meinshausen, N., Bühlmann, P.: Stability selection (with discussion). J. R. Stat. Soc. Ser. B (Methodol.) 72, 417–473 (2010)

    Article  MathSciNet  Google Scholar 

  • Paulson, J.N., et al.: Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 10(12), 1200–1202 (2013)

    Article  Google Scholar 

  • Qin, J.J., et al.: A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490(7418), 55–60 (2012)

    Article  Google Scholar 

  • Shah, R.D., Samworth, R.J.: Variable selection with error control: another look at stability selection. J. R. Stat. Soc. Ser. B (Methodol.) 75(1), 55–80 (2013)

    Article  MathSciNet  Google Scholar 

  • Srinivas, G., et al.: Genome-wide mapping of gene-microbiota interactions in susceptibility to autoimmune skin blistering. Nat. Commun. 4, 2462 (2013)

    Article  Google Scholar 

  • Tibshirani, R.J.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Turnbaugh, P.J., et al.: A core gut microbiome in obese and lean twins. Nature 457, 480–484 (2009)

    Article  Google Scholar 

  • Weiss, S., et al.: Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5, 27 (2017)

    Article  Google Scholar 

  • Yu, T., Guo, F., et al.: Fusobacterium nucleatum Promotes Chemoresistance to Colorectal cancer by modulating autophagy. Cell 170(3), 548–563.e16 (2017)

    Article  Google Scholar 

  • Zhang, CH.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2012)

    Article  MathSciNet  Google Scholar 

  • Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Methodol.) 67(2), 301–320 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongmei Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wang, P., Jiang, H. (2019). Variable Selection for High Dimensional Metagenomic Data. In: Zhang, L., Chen, DG., Jiang, H., Li, G., Quan, H. (eds) Contemporary Biostatistics with Biopharmaceutical Applications. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-15310-6_2

Download citation

Publish with us

Policies and ethics