Data Analysis for Gut Microbiota and Health

  • Xingpeng JiangEmail author
  • Xiaohua Hu
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 1028)


In recent years, data mining and analysis of high-throughput sequencing of microbiomes and metagenomic data enable researchers to discover biological knowledge by characterizing the composition and variation of species across environmental samples and to accumulate a huge amount of data, making it feasible to infer the complex principle of species interactions. The interactions of microbes in a microbial community play an important role in microbial ecological system. Data mining provides diverse approachs to identify the correlations between disease and microbes and how microbial species coexist and interact in a host-associated or natural environment. This is not only important to advance basic microbiology science and other related fields but also important to understand the impacts of microbial communities on human health and diseases.


Microbiome Data mining Data analysis Microbiota Microbes Diseases 


  1. 1.
    Shreiner AB, Kao JY, Young VB (2015) The gut microbiome in health and in disease. Curr Opin Gastroenterol 31(1):69–75CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Comparative metagenomics of microbial communities. Science. [Online]. Available: Accessed 04 Feb 2017
  3. 3.
    Woese CR, Kandler O, Wheelis ML (1990) Towards a natural system of organisms: proposal for the domains archaea, bacteria, and Eucarya. PNAS 87(12):4576–4579CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    T. H. M. P. Consortium (2012) Structure, function and diversity of the healthy human microbiome. Nature 486(7402):207–214CrossRefGoogle Scholar
  5. 5.
    Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett C, Knight R, Gordon JI (2007) The human microbiome project: exploring the microbial part of ourselves in a changing world. Nature 449(7164):804–810CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Qin J et al (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285):59–65CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Rieder R, Wisniewski PJ, Alderman BL, Campbell SC (2017) Microbes and mental health: a review. Brain Behav Immun, In PressGoogle Scholar
  8. 8.
    Dzutsev A, Badger JH, Perez-Chanona E et al (2017) Microbes and Cancer. Annu Rev Immunol 35:199–228Google Scholar
  9. 9.
    Tsilimigras MCB, Fodor AA (2016) Compositional data analysis of the microbiome: fundamentals, tools, and challenges. Ann Epidemiol 26(5):330–335CrossRefPubMedGoogle Scholar
  10. 10.
    2015 Microbiome, metagenomics, and high-dimensional compositional data analysis. Ann Rev Stat Appl 2(1):73–94Google Scholar
  11. 11.
    Xiao K-Q et al (2016) Metagenomic profiles of antibiotic resistance genes in paddy soils from South China. FEMS Microbiol Ecol 92(3), fiw023Google Scholar
  12. 12.
    Hamady M, Knight R (2009) Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res 19(7):1141–1152CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Machine learning techniques accurately classify microbial communities by bacterial vaginosis characteristics. [Online]. Available: Accessed 04 Feb 2017
  14. 14.
    Jiang X, Hu X, Xu W, He T, Park EK (2013) Comparison of dimensional reduction methods for detecting and visualizing novel patterns in human and marine microbiome. IEEE Trans Nanobioscience 12(3):199–205CrossRefPubMedGoogle Scholar
  15. 15.
    Tyler AD, Smith MI, Silverberg MS (2014) Analyzing the human microbiome: a ‘How To’ guide for physicians. Am J Gastroenterol 109(7):983–993CrossRefPubMedGoogle Scholar
  16. 16.
    Bartram AK et al (2014) Exploring links between pH and bacterial community composition in soils from the Craibstone experimental farm. FEMS Microbiol Ecol 87(2):403–415CrossRefPubMedGoogle Scholar
  17. 17.
    Jiang X et al (2012) Functional biogeography of ocean microbes revealed through non-negative matrix factorization. PLOS ONE 7(9):e43866CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Jiang X, Weitz JS, Dushoff J (Mar. 2012) A non-negative matrix factorization framework for identifying modular patterns in metagenomic profile data. J Math Biol 64(4):697–711CrossRefPubMedGoogle Scholar
  19. 19.
    Arumugam M et al (2011) Enterotypes of the human gut microbiome. Nature 473(7346):174–180CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Personalized microbial network inference via multi-view clustering of oral metagenomics data – TiFN. [Online]. Available: Accessed 04 Feb 2017
  21. 21.
    Kuang D, Ding C, Park H (2012) Symmetric nonnegative matrix factorization for graph clustering. In: Proceedings of the 2012 SIAM international conference on data mining (0 vols). Society for Industrial and Applied Mathematics. pp 106–117Google Scholar
  22. 22.
    Raes J, Letunic I, Yamada T, Jensen LJ, Bork P (2011) Toward molecular trait-based ecology through integration of biogeochemical, geographical and metagenomic data. Mol Syst Biol 7(1):n/a–n/aGoogle Scholar
  23. 23.
    Patel PV, Gianoulis TA, Bjornson RD, Yip KY, Engelman DM, Gerstein MB (2010) Analysis of membrane proteins in metagenomics: networks of correlated environmental features and protein families. Genome Res 20(7):960–971CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    He X, Cai D, Yan S, Zhang H-J (2005) Neighborhood preserving embedding. In: Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1 2:1208–1213. Vol. 2Google Scholar
  25. 25.
    Chen X, Hu X, Shen X, Rosen G (2010) Probabilistic topic modeling for genomic data interpretation. In: 2010 I.E. International Conference on Bioinformatics and Biomedicine, BIBM 2010, Hong Kong, China, December 18–21, 2010, Proceedings, pp 149–152Google Scholar
  26. 26.
    Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing | bioRxiv. [Online]. Available: Accessed 04 Feb 2017
  27. 27.
    Dietert RR, Silbergeld EK (2015) Biomarkers for the 21st century: listening to the microbiome. Toxicol Sci 144(2):208–216CrossRefPubMedGoogle Scholar
  28. 28.
    Jiang X, Hu X, Xu W, Wang Y (2013) Manifold-constrained regularization for variable selection in environmental microbiomic data. In: 2013 I.E. International Conference on Bioinformatics and Biomedicine, Shanghai, China, December 18–21, 2013, pp 86–89Google Scholar
  29. 29.
    Lin W, Shi P, Feng R, Li H (2014) Variable selection in regression with compositional covariates. Biometrika 101(4):785–797CrossRefGoogle Scholar
  30. 30.
    Shi P, Zhang A, Li H (2016) Regression analysis for microbiome compositional data. arXiv:1603.00974 [stat]Google Scholar
  31. 31.
    Randolph TW, Zhao S, Copeland W, Hullar M, Shojaie A (2015) Kernel-penalized regression for analysis of microbiome data. arXiv:1511.00297 [stat]Google Scholar
  32. 32.
    Faust K, Raes J (2012) Microbial interactions: from networks to models. Nat Rev Micro 10(8):538–550CrossRefGoogle Scholar
  33. 33.
    Fuhrman JA (2009) Microbial community structure and its functional implications. Nature 459(7244):193–199CrossRefPubMedGoogle Scholar
  34. 34.
    Fritz JV, Desai MS, Shah P, Schneider JG, Wilmes P (2013) From meta-omics to causality: experimental models for human microbiome research. Microbiome 1:14CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    @MInter: automated text-mining of microbial interactions | Bioinformatics | Oxford Academic. [Online]. Available: Accessed 04 Feb 2017
  36. 36.
    Cordero OX, Datta MS (2016) Microbial interactions and community assembly at microscales. Curr Opin Microbiol 31:227–234CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    NetCooperate: a network-based tool for inferring host-microbe and microbe-microbe cooperation | BMC Bioinformatics | Full Text. [Online]. Available: Accessed 04 Feb 2017
  38. 38.
    Orth JD, Thiele I, Palsson BØ (2010) What is flux balance analysis? Nat Biotechnol 28(3):245–248CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Constructing and analyzing metabolic flux models of microbial communities | KBaseGoogle Scholar
  40. 40.
    Shoaie S, Nielsen J (2014) Elucidating the interactions between the human gut microbiota and its host through metabolic modeling. Front Genet 5Google Scholar
  41. 41.
    Gerber GK (2014) The dynamic microbiome. FEBS Lett 588(22):4131–4139CrossRefPubMedGoogle Scholar
  42. 42.
    Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation. [Online]. Available: Accessed 04 Feb 2017
  43. 43.
    Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates | BMC Systems Biology | Full Text.” [Online]. Available: Accessed 2017
  44. 44.
    Jiang X, Hu X, Xu W, Park EK (2015) Predicting microbial interactions using vector autoregressive model with graph regularization. IEEE/ACM Trans Comput Biology Bioinform 12(2):254–261CrossRefGoogle Scholar
  45. 45.
    Ma Y, Hu X, He T et al (2016) Hessian regularization based symmetric nonnegative matrix factorization for clustering gene expression and microbiome data[J]. Methods 111:80–84Google Scholar
  46. 46.
    Rangel C et al (2004) Modeling T-cell activation using gene expression profiling and state-space models. Bioinformatics 20(9):1361–1372CrossRefPubMedGoogle Scholar
  47. 47.
    Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks | Bioinformatics | Oxford Academic. [Online]. Available: Accessed 04 Feb 2017

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  1. 1.School of ComputerCentral China Normal UniversityWuhanChina
  2. 2.College of Computing & InformaticsDrexel UniversityPhiladelphiaUSA

Personalised recommendations