Skip to main content

A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6044))

Abstract

Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. Among the computational tools recently developed for metagenomic sequence analysis, binning tools attempt to classify all (or most) of the sequences in a metagenomic dataset into different bins (i.e., species), based on various DNA composition patterns (e.g., the tetramer frequencies) of various genomes. Composition-based binning methods, however, cannot be used to classify very short fragments, because of the substantial variation of DNA composition patterns within a single genome. We developed a novel approach (AbundanceBin) for metagenomics binning by utilizing the different abundances of species living in the same environment. AbundanceBin is an application of the Lander-Waterman model to metagenomics, which is based on the l-tuple content of the reads. AbundanceBin achieved accurate, unsupervised, clustering of metagenomic sequences into different bins, such that the reads classified in a bin belong to species of identical or very similar abundances in the sample. In addition, AbundanceBin gave accurate estimations of species abundances, as well as their genome sizes—two important parameters for characterizing a microbial community. We also show that AbundanceBin performed well when the sequence lengths are very short (e.g. 75 bp) or have sequencing errors.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Galperin, M.: Metagenomics: from acid mine to shining sea. Environ. Microbiol. 6, 543–545 (2004)

    Article  Google Scholar 

  2. Tringe, S., von Mering, C., Kobayashi, A., et al.: Comparative metagenomics of microbial communities. Science 308(5721), 554–557 (2005)

    Article  Google Scholar 

  3. Dinsdale, E., Pantos, O., Smriga, S., et al.: Microbial ecology of four coral atolls in the northern line islands. PLoS ONE 3(2), e158 (2008)

    Article  Google Scholar 

  4. Turnbaugh, P.J., Ley, R.E., Mahowald, M.A., et al.: An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444(7122), 1027–1131 (2006)

    Article  Google Scholar 

  5. Turnbaugh, P.J., Hamady, M., Yatsunenko, T., et al.: A core gut microbiome in obese and lean twins. Nature 457(7228), 480–484 (2009)

    Article  Google Scholar 

  6. Dinsdale, E.A., Edwards, R.A., Hall, D., et al.: Functional metagenomic profiling of nine biomes. Nature 452(7187), 629–632 (2008)

    Article  Google Scholar 

  7. Hutchison Jr., C.A.: DNA sequencing: bench to bedside and beyond. Nucleic Acids Res. 35(18), 6227–6237 (2007)

    Article  Google Scholar 

  8. Margulies, M., Egholm, M., Altman, W.E., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057), 376–380 (2005)

    Google Scholar 

  9. Bentley, D.R.: Whole-genome re-sequencing. Curr. Opin. Genet. Dev. 16(6), 545–552 (2006)

    Article  Google Scholar 

  10. Huson, D.H., Auch, A.F., Qi, J., et al.: MEGAN analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)

    Article  Google Scholar 

  11. Chakravorty, S., Helb, D., Burday, M., et al.: A detailed analysis of 16s ribosomal RNA gene segments for the diagnosis of pathogenic bacteria. J. Microbiol. Methods 69(2), 330–339 (2007)

    Article  Google Scholar 

  12. Monier, A., Claverie, J.M., Ogata, H.: Taxonomic distribution of large DNA viruses in the sea. Genome Biol. 9(7), R106 (2008)

    Article  Google Scholar 

  13. Ciccarelli, F.D., Doerks, T., von Mering, C., et al.: Toward automatic reconstruction of a highly resolved tree of life. Science 311(5765), 1283–1287 (2006)

    Article  Google Scholar 

  14. von Mering, C., Hugenholtz, P., Raes, J., et al.: Quantitative phylogenetic assessment of microbial communities in diverse environments. Science 315(5815), 1126–1130 (2007)

    Article  Google Scholar 

  15. Wu, M., Eisen, J.A.: A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 9(10), 151 (2008)

    Article  Google Scholar 

  16. Schmidt, H.A., Strimmer, K., Vingron, M., et al.: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18(3), 502–504 (2002)

    Google Scholar 

  17. Guindon, S., Gascuel, O.: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52(5), 696–704 (2003)

    Article  Google Scholar 

  18. Krause, L., Diaz, N.N., Goesmann, A., et al.: Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res. 36(7), 2230–2239 (2008)

    Google Scholar 

  19. Finn, R.D., Mistry, J., Schuster-Bockler, B., et al.: Pfam: clans, web tools and services. Nucleic Acids Res. 34(Database issue), D247–D251 (2006)

    Google Scholar 

  20. Brady, A., Salzberg, S.L.: Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods 6(9), 673–676 (2009)

    Google Scholar 

  21. Bentley, S.D., Parkhill, J.: Comparative genomic structure of prokaryotes. Annu. Rev. Genet. 38, 771–792 (2004)

    Article  Google Scholar 

  22. Teeling, H., Waldmann, J., Lombardot, T., et al.: TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5, 163 (2004)

    Article  Google Scholar 

  23. Woyke, T., Teeling, H., Ivanova, N.N., et al.: Symbiosis insights through metagenomic analysis of a microbial consortium. Nature 443(7114), 950–955 (2006)

    Google Scholar 

  24. Chatterji, S., Yamazaki, I., Bai, Z., et al.: CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 17–28. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  25. Diaz, N.N., Krause, L., Goesmann, A., et al.: TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics 10, 56 (2009)

    Article  Google Scholar 

  26. Zhou, F., Olman, V., Xu, Y.: Barcodes for genomes and applications. BMC Bioinformatics 9, 546 (2008)

    Article  Google Scholar 

  27. Foerstner, K.U., von Mering, C., Hooper, S.D., et al.: Environments shape the nucleotide composition of genomes. EMBO Rep. 6(12), 1208–1213 (2005)

    Google Scholar 

  28. Tyson, G.W., Chapman, J., Hugenholtz, P., et al.: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428(6978), 37–43 (2004)

    Article  Google Scholar 

  29. Lander, E.S., Waterman, M.S.: Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2(3), 231–239 (1988)

    Google Scholar 

  30. Li, X., Waterman, M.S.: Estimating the repeat structure and length of DNA sequences using l-tuples. Genome Res. 13(8), 1916–1922 (2003)

    Google Scholar 

  31. Sharon, I., Pati, A., Markowitz, V.M., et al.: A statistical framework for the functional analysis of metagenomes. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 496–511. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  32. Richter, D.C., Ott, F., Auch, A.F., et al.: MetaSim: a sequencing simulator for genomics and metagenomics. PLoS ONE 3(10), e3373 (2008)

    Article  Google Scholar 

  33. Huse, S.M., Huber, J.A., Morrison, H.G., Sogin, M.L., Welch, D.M., et al.: Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol. 8(7), 143 (2007)

    Article  Google Scholar 

  34. White, J.R., Roberts, M., Yorke, J.A., et al.: Figaro: a novel statistical method for vector sequence removal. Bioinformatics 24(4), 462–467 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, YW., Ye, Y. (2010). A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples. In: Berger, B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science(), vol 6044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12683-3_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12683-3_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12682-6

  • Online ISBN: 978-3-642-12683-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics