Diversity Analysis in Viral Metagenomes

  • Jorge Francisco Vázquez-Castellanos
Part of the Methods in Molecular Biology book series (MIMB, volume 1838)


Viruses are the most abundant and diverse biological entity in the earth. Nowadays, there are several viral metagenomes from different ecological niches which have been used to characterize new viral particles and to determine their diversity. However, viral metagenomic data have the disadvantage to be high-dimensional compositional and sparse. This type of data renders many of the conventional multivariate statistical analyses inoperative. Fortunately, different libraries and statistical packages have been developed to deal with this problem and perform the different ecological and statistical analyses. In the present chapter, it is analyzed simulated viral metagenomes, based on real human gut-associated viral metagenomes, using different R and python packages. The example presented here includes the estimation and comparison of different indexes of diversity, evenness, and richness; perform different ordination and statistical analysis using different dissimilarity metrics; determine the optimal cluster configuration and perform biomarker discovery. The scripts and the simulated datasets are in

Key words

Ordination analysis Viral metagenomics Alpha diversity Beta diversity Clustering and biomarker discovery 


  1. 1.
    Council NR (1999) Perspectives on biodiversity: valuing its role in an everchanging world. The National Academies Press, WashingtonGoogle Scholar
  2. 2.
    Whittaker RH (1972) Evolution and measurement of species diversity. Taxon 21:213–251CrossRefGoogle Scholar
  3. 3.
    Shannon C (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423CrossRefGoogle Scholar
  4. 4.
    Tuomisto H (2010) A consistent terminology for quantifying species diversity? Yes, it does exist. Oecologia 164:853–860CrossRefPubMedGoogle Scholar
  5. 5.
    Chao A (1984) Non-parametric estimation of the number of classes in a population. Scand J Stat 1:265–270Google Scholar
  6. 6.
    Chao A, Lee SM (1992) Estimating the number of classes via sample coverage. J Am Stat Assoc:210–217Google Scholar
  7. 7.
    Mulder CPH, Bazeley-White E, Dimitrakopoulos PG et al (2004) Species evenness and productivity in experimental plant communities. Oikos 107:50–63CrossRefGoogle Scholar
  8. 8.
    Whittaker RH (1960) Vegetation of the siskiyou mountains, Oregon and California. Ecol Monogr 30:279–338CrossRefGoogle Scholar
  9. 9.
    Faith DP, Minchin PR, Belbin L (1987) Compositional dissimilarity as a robust measure of ecological distance. Vegetatio 69:57–68CrossRefGoogle Scholar
  10. 10.
    Caporaso J, Kuczynski J, Stombaugh J et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7:335–336CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Schloss PD, Westcott SL, Ryabin T et al (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75:7537–7541CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Philosof A, Yutin N, Flores-Uribe J et al (2017) Novel abundant oceanic viruses of uncultured marine group II Euryarchaeota. Curr Biol 27:1362–1368CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Paez-Espino D, Eloe-Fadrosh EA, Pavlopoulos GA et al (2016) Uncovering earth’s virome. Nature 536:425–430CrossRefPubMedGoogle Scholar
  14. 14.
    Vázquez-Castellanos JF, García-López R, Pérez-Brocal V et al (2014) Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut. BMC Genomics 15:37CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Aitchison J (1981) A new approach to null correlations of proportions. Math Geol 12:175–189CrossRefGoogle Scholar
  16. 16.
    Li H (2015) Microbiome, metagenomics, and high-dimensional compositional data analysis. Annu Rev Stat Its Appl 2:73–94CrossRefGoogle Scholar
  17. 17.
    Arumugam M, Raes J, Pelletier E et al (2011) Enterotypes of the human gut microbiome. Nature 473:174–180CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Paulson JN, Stine OC, Bravo HC et al (2013) Differential abundance analysis for microbial marker-gene surveys. Nat Methods 10:1200–1202CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Segata N, Izard J, Waldron L et al (2011) Metagenomic biomarker discovery and explanation. Genome Biol 12:R60CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Pérez-Brocal V, García-López R, Nos P et al (2015) Metagenomic analysis of crohn’s disease patients identifies changes in the virome and microbiome related to disease status and therapy, and detects potential interactions and biomarkers. Inflamm Bowel Dis 21(11):2515–2532CrossRefPubMedGoogle Scholar
  21. 21.
    Weiss S, Xu ZZ, Peddada S et al (2017) Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5:27CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Angly F, Rodriguez-Brito B, Bangor D et al (2005) PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information. BMC Bioinfo 6:41CrossRefGoogle Scholar
  23. 23.
    Reyes A, Haynes M, Hanson N et al (2010) Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature. Nat Publ Group 466:334–338Google Scholar
  24. 24.
    Yatsunenko T, Rey FE, Manary MJ et al (2012) Human gut microbiome viewed across age and geography. Nature 486:222–227CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Oksanen J, Kindt R, Legendre P et al (2008) Vegan: community ecology packageGoogle Scholar
  26. 26.
    Anderson MJ (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecol 26:32–46Google Scholar
  27. 27.
    Kruskal JB (1964) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29:1–27CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Jorge Francisco Vázquez-Castellanos
    • 1
  1. 1.Department of Genomics and HealthFundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunitat Valenciana (Fisabio)ValenciaSpain

Personalised recommendations