Skip to main content

Cloud Computing for De Novo Metagenomic Sequence Assembly

  • Conference paper
Book cover Bioinformatics Research and Applications (ISBRA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7875))

Included in the following conference series:

Abstract

In metagenomics, the population sequencing is an approach to recover the genomic sequences in the genetically diverse environment. Combined with the recently developed next generation sequencing platform, mategenomics data analysis has greatly enlarged the size of sequencing datasets and decreased the cost. The complete and accurate assembly of sequenced reads from an environmental sample improves the efficiency of genome functional and taxonomical classification. A common bottleneck of the available tools is the high computing requirement for efficiently assembling vast amounts of data generated from large-scale sequencing projects. To address these limitations, we developed a parallel strategy to accelerate computation and boost accuracy. We also presented an instance of this strategy for a state-of-the-art assembly tool, Genovo, on Apache hadoop platform. As a demonstration of the capability of our approach, we compared the performance of our method to two other short read assembly programs on a series of synthetic and real datasets created using the 454 platform, the largest of which has 683k reads. Under the parallel strategy, the ability of reconstruction of bases outperformed other tools both on speed and several assembly evaluation metrics

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wu, X., Cai, Z., Wan, X.F., Hoang, T., Goebel, R., Lin, G.: Nucleotide composition string selection in hiv-1 subtyping using whole genomes. Bioinformatics 23(14), 1744–1752 (2007)

    Article  Google Scholar 

  2. Gill, S.R., Pop, M., DeBoy, R.T., Eckburg, P.B., Turnbaugh, P.J., Samuel, B.S., Gordon, J.I., Relman, D.A., Fraser-Liggett, C.M., Nelson, K.E.: Metagenomic analysis of the human distal gut microbiome. Science 312(5778), 1355–1359 (2006)

    Article  Google Scholar 

  3. Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D., Paulsen, I., Nelson, K.E., Nelson, W., Fouts, D.E., Levy, S., Knap, A.H., Lomas, M.W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y.H., Smith, H.O.: Environmental genome shotgun sequencing of the sargasso sea. Science 304(5667), 66–74 (2004)

    Article  Google Scholar 

  4. Qin, J.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature (2009)

    Google Scholar 

  5. Khachatryan, Z.A., Ktsoyan, Z.A., Manukyan, G.P., Kelly, D., Ghazaryan, K.A., Aminov, R.I.: Predominant Role of Host Genetics in Controlling the Composition of Gut Microbiota. PLoS ONE 3(8), e3064 (2008)

    Google Scholar 

  6. Nguyen, K.D.: On the edge of web-based multiple sequence alignment services. Tsinghua Science and Technology 17(6), 629–637 (2012)

    Google Scholar 

  7. Turnbaugh, P.J.: A core gut microbiome in obese and lean twins. Nature (2009)

    Google Scholar 

  8. Pignatelli, M., Moya, A.: Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data. PLoS ONE 6(5), e19984 (2011)

    Google Scholar 

  9. Namiki, T., Hachiya, T., Tanaka, H., Sakakibara, Y.: Metavelvet: an extension of velvet assembler to de novo metagenome assembly from short sequence reads. In: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2011, pp. 116–124. ACM, New York (2011)

    Google Scholar 

  10. Laserson, J., Jojic, V., Koller, D.: Genovo: de novo assembly for metagenomes. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 341–356. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: Meta-idba: a de novo assembler for metagenomic data. Bioinformatics 27(13), i94–i101 (2011)

    Google Scholar 

  12. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  13. Grillo, G., Attimonelli, M., Liuni, S., Pesole, G.: Cleanup: a fast computer program for removing redundancies from nucleotide sequence databases. Computer Applications in the Biosciences: CABIOS 12(1), 1–8 (1996)

    Google Scholar 

  14. Smith, T., Waterman, M., Fitch, W.: Comparative biosequence metrics. Journal of Molecular Evolution 18, 38–46 (1981)

    Article  Google Scholar 

  15. Lander, E.S., Waterman, M.S.: Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics 2(3), 231–239 (1988)

    Article  Google Scholar 

  16. Lasken, R., Stockwell, T.: Mechanism of chimera formation during the multiple displacement amplification reaction. BMC Biotechnology 7(1), 19 (2007)

    Google Scholar 

  17. Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSim–A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3(10), e3373 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guo, X., Ding, X., Meng, Y., Pan, Y. (2013). Cloud Computing for De Novo Metagenomic Sequence Assembly. In: Cai, Z., Eulenstein, O., Janies, D., Schwartz, D. (eds) Bioinformatics Research and Applications. ISBRA 2013. Lecture Notes in Computer Science(), vol 7875. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38036-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38036-5_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38035-8

  • Online ISBN: 978-3-642-38036-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics