Skip to main content

MetaFlow: Metagenomic Profiling Based on Whole-Genome Coverage Analysis with Min-Cost Flows

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2016)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9649))

Abstract

High-throughput sequencing (HTS) of metagenomes is proving essential in understanding the environment and diseases. State-of-the-art methods for discovering the species and their abundances in an HTS sample are based on genome-specific markers, which can lead to skewed results, especially at species level. We present MetaFlow, the first method based on coverage analysis across entire genomes that also scales to HTS samples. We formulated this problem as an NP-hard matching problem in a bipartite graph, which we solved in practice by min-cost flows. On synthetic data sets of varying complexity and similarity, MetaFlow is more precise and sensitive than popular tools such as MetaPhlAn, mOTU, GSMer and BLAST, and its abundance estimations at species level are two to four times better in terms of \(\ell _1\)-norm. On a real human stool data set, MetaFlow identifies B.uniformis as most predominant, in line with previous human gut studies, whereas marker-based methods report it as rare. MetaFlow is freely available at http://cs.helsinki.fi/gsa/metaflow.

S. Ahmed and A.I. Tomescu—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Altschul, S.F., et al.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  Google Scholar 

  2. Brady, A., Salzberg, S.L.: Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods 6(9), 673–676 (2009)

    Article  Google Scholar 

  3. Durbin, R., et al.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  4. Huson, D.H., et al.: MEGAN analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)

    Article  MathSciNet  Google Scholar 

  5. Lo, C., et al.: Evaluating genome architecture of a complex region via generalized bipartite matching. BMC Bioinform. 14(S–5), S13 (2013)

    Article  Google Scholar 

  6. Mavromatis, K., et al.: Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat. Methods 4(6), 495–500 (2007)

    Article  Google Scholar 

  7. Poretsky, R., et al.: Strengths and limitations of 16S rRNA gene amplicon sequencing in revealing temporal microbial community dynamics. PLoS One 9(4), e93827 (2014)

    Article  Google Scholar 

  8. Qin, J., et al.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285), 59–65 (2010)

    Article  Google Scholar 

  9. Raymond, J., et al.: The natural history of nitrogen fixation. Mol. Biol. Evol. 21(3), 541–554 (2004)

    Article  Google Scholar 

  10. Richter, D.C., et al.: MetaSim-A sequencing simulator for genomics and metagenomics. PLoS One 3(10), e3373 (2008)

    Article  Google Scholar 

  11. Rocap, G., et al.: Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature 424(6952), 1042–1047 (2003)

    Article  Google Scholar 

  12. Segata, N., et al.: Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9(8), 811–814 (2012)

    Article  Google Scholar 

  13. Steinhaus, H.: Sur la division des corps matériels en parties. Bull. Acad. Polon. Sci. Cl. III. 4, 801–804 (1956)

    MATH  MathSciNet  Google Scholar 

  14. Sunagawa, S., et al.: Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10(12), 1196–1199 (2013)

    Article  Google Scholar 

  15. Tu, Q., et al.: Strain/species identification in metagenomes using genome-specific markers. Nucleic Acids Res. 42, e67 (2014)

    Article  Google Scholar 

Download references

Acknowledgement

We thank Romeo Rizzi for discussions about the computational complexity of our problem. This work was partially supported by the Academy of Finland under grants 284598 (CoECGR) to A.S. and V.M. and 274977 to A.T.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandru I. Tomescu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Sobih, A., Tomescu, A.I., Mäkinen, V. (2016). MetaFlow: Metagenomic Profiling Based on Whole-Genome Coverage Analysis with Min-Cost Flows. In: Singh, M. (eds) Research in Computational Molecular Biology. RECOMB 2016. Lecture Notes in Computer Science(), vol 9649. Springer, Cham. https://doi.org/10.1007/978-3-319-31957-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31957-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31956-8

  • Online ISBN: 978-3-319-31957-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics