Skip to main content

GMeta: A Novel Algorithm to Utilize Highly Connected Components for Metagenomic Binning

  • Conference paper
  • First Online:
Future Data and Security Engineering (FDSE 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11814))

Included in the following conference series:

  • 1404 Accesses

Abstract

Metagenomic binning refers to the means of clustering or assigning taxonomy to metagenomic sequences or contigs. Due to the massive abundance of organisms in metagenomic samples, the number of nucleotide sequences skyrockets, and thus leading to the complexity of binning algorithms. Unsupervised classification is gaining a reputation in recent years since the lacking of the reference database required in the reference-based methods with various state-of-the-art tools released. By manipulating the overlapping information between reads drives to the success of various unsupervised methods with extraordinary accuracy. These research practices on the evidence that the average proportion of common l-mers between genomes of different species is practically miniature when l is sufficient. This paper introduces a novel algorithm for binning metagenomic sequences without requiring reference databases by utilizing highly connected components inside a weighted overlapping graph of reads. Experimental outcomes show that the precision is improved over other well-known binning tools for both short and long sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  Google Scholar 

  2. Chor, B., Horn, D., Goldman, N., Levy, Y., Massingham, T.: Genomic DNA k-mer spectra: models and modalities. Genome Biol. 10(10), R108 (2009)

    Article  Google Scholar 

  3. National Research Council: The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. National Academies Press (2007)

    Google Scholar 

  4. Girotto, S., Pizzi, C., Comin, M.: MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures. Bioinformatics 32(17), i567–i575 (2016)

    Article  Google Scholar 

  5. Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: Megan analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)

    Article  Google Scholar 

  6. Huson, D.H., et al.: Megan community edition - interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput. Biol. 12(6), 1–12 (2016)

    Article  Google Scholar 

  7. Kelley, D.R., Salzberg, S.L.: Clustering metagenomic sequences with interpolated markov models. BMC Bioinform. 11(1), 544 (2010)

    Article  Google Scholar 

  8. Kent, W.J.: Blat-the blast-like alignment tool. Genome Res. 12(4), 656–664 (2002)

    Article  Google Scholar 

  9. Kislyuk, A., Bhatnagar, S., Dushoff, J., Weitz, J.S.: Unsupervised statistical clustering of environmental shotgun sequences. BMC Bioinform. 10(1), 316 (2009)

    Article  Google Scholar 

  10. Qiao, Y., Jia, B., Hu, Z., Sun, C., Xiang, Y., Wei, C.: Metabing2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms. Biol. Direct 13(1), 15 (2018)

    Article  Google Scholar 

  11. Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: Metasim-a sequencing simulator for genomics and metagenomics. PLoS ONE 3(10), e3373 (2008)

    Article  Google Scholar 

  12. Roumpeka, D.D., Wallace, R.J., Escalettes, F., Fotheringham, I., Watson, M.: A review of bioinformatics tools for bio-prospecting from metagenomic sequence data. Front. Genet. 8, 23 (2017)

    Article  Google Scholar 

  13. Shendure, J., Ji, H.: Next-generation DNA sequencing. Nat. Biotechnol. 26(10), 1135 (2008)

    Article  Google Scholar 

  14. Tausch, S.H., et al.: Livekraken—real-time metagenomic classification of illumina data. Bioinformatics 34(21), 3750–3752 (2018)

    Article  Google Scholar 

  15. Van Le, V., Van Tran, L., Van Tran, H.: A novel semi-supervised algorithm for the taxonomic assignment of metagenomic reads. BMC Bioinform. 17(1), 22 (2016)

    Article  Google Scholar 

  16. Vinh, L.V., Lang, T.V., Binh, L.T., Hoai, T.V.: A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads. Algorithms Mol. Biol. 10(1), 2 (2015)

    Article  Google Scholar 

  17. Wang, Y., Leung, H.C., Yiu, S.M., Chin, F.Y.: Metacluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics 28(18), i356–i362 (2012)

    Article  Google Scholar 

  18. Wood, D.E., Salzberg, S.L.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), R46 (2014)

    Article  Google Scholar 

  19. Wu, Y.W., Simmons, B.A., Singer, S.W.: Maxbin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32(4), 605–607 (2015)

    Article  Google Scholar 

  20. Wu, Y.W., Ye, Y.: A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. J. Comput. Biol. 18(3), 523–534 (2011)

    Article  MathSciNet  Google Scholar 

  21. Zhou, F., Olman, V., Xu, Y.: Barcodes for genomes and applications. BMC Bioinform. 9(1), 546 (2008)

    Article  Google Scholar 

Download references

Acknowledgment

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) under grant number B2019-20-06.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tran Van Lang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pham, H.T., Vinh, L.V., Lang, T.V., Tran, V.H. (2019). GMeta: A Novel Algorithm to Utilize Highly Connected Components for Metagenomic Binning. In: Dang, T., Küng, J., Takizawa, M., Bui, S. (eds) Future Data and Security Engineering. FDSE 2019. Lecture Notes in Computer Science(), vol 11814. Springer, Cham. https://doi.org/10.1007/978-3-030-35653-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-35653-8_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-35652-1

  • Online ISBN: 978-3-030-35653-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics