Skip to main content

Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2011)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6577))

Abstract

The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically incorporated into most next generation assemblers as various heuristic post-processing steps to correct the assembly graph or to link contigs into scaffolds. Such methods have allowed the identification of longer contigs than would be possible with single reads; however, they can still fail to resolve complex repeats. Thus, improved methods for incorporating mate pairs will have a strong effect on contig length in the future.

Here, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. This graph has the potential to be used in place of the de Bruijn graph in any de Bruijn graph based assembler, maintaining all other assembly steps such as error-correction and repeat resolution. Through assembly results on simulated error-free data, we argue that this can effectively improve the contig sizes in assembly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Batzoglou, S., Jaffe, D.B., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J.P., Lander, E.S.: ARACHNE: A Whole-Genome Shotgun Assembler. Genome Research 12(1), 177–189 (2002)

    Article  Google Scholar 

  2. Bentley, D.R., Balasubramanian, S., Swerdlow, H.P., Smith, G.P., Milton, J., Brown, C.G., Hall, K.P., Evers, D.J., Barnes, C.L., Bignell, H.R., et al.: Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456(7218), 53–59 (2008)

    Article  Google Scholar 

  3. Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S., Nusbaum, C., Jaffe, D.B.: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Research 18, 810–820 (2008)

    Article  Google Scholar 

  4. Chaisson, M.J., Brinza, D., Pevzner, P.A.: De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research 19, 336–346 (2009)

    Article  Google Scholar 

  5. Chaisson, M.J., Pevzner, P.A.: Short read fragment assembly of bacterial genomes. Genome Research 18(2), 324–330 (2008)

    Article  Google Scholar 

  6. Drmanac, R., Sparks, A.B., Callow, M.J., Halpern, A.L., Burns, N.L., Kermani, B.G., Carnevali, P., Nazarenko, I., Nilsen, G.B., Yeung, G., et al.: Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327(5961), 78 (2010)

    Article  Google Scholar 

  7. Genome 10K Community of Scientists: Genome 10K: A proposal to obtain whole-genome sequence for 10000 vertebrate species. Journal of Heredity 100(6), 659–674 (2009)

    Google Scholar 

  8. Harris, T.D., Buzby, P.R., Babcock, H., Beer, E., Bowers, J., Braslavsky, I., Causey, M., Colonell, J., DiMeo, J., William Efcavitch, J., et al.: Single-molecule DNA sequencing of a viral genome. Science 320(5872), 106 (2008)

    Article  Google Scholar 

  9. Idury, R.M., Waterman, M.S.: A new algorithm for DNA sequence assembly. Journal of Computational Biology 2, 291–306 (1995)

    Article  Google Scholar 

  10. Kececioglu, J.D.: Exact and approximation algorithms for DNA sequence reconstruction. PhD thesis, University of Arizona, Tucson, AZ, USA (1992)

    Google Scholar 

  11. Medvedev, P., Brudno, M.: Ab initio whole genome shotgun assembly with mated short reads. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 50–64. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Medvedev, P., Georgiou, K., Myers, G., Brudno, M.: Computability of models for sequence assembly. In: Giancarlo, R., Hannenhalli, S. (eds.) WABI 2007. LNCS (LNBI), vol. 4645, pp. 289–301. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Myers, E.W.: Toward simplifying and accurately formulating fragment assembly. Journal of Computational Biology 2, 275–290 (1995)

    Article  Google Scholar 

  14. Myers, E.W.: The fragment assembly string graph. Bioinformatics 21(suppl 2), ii79–ii85 (2005)

    Article  Google Scholar 

  15. Pevzner, P.A.: L-Tuple DNA sequencing: computer analysis. J. Biomol. Struct. Dyn. 7(1), 63–73 (1989)

    Google Scholar 

  16. Pevzner, P.A., Tang, H.: Fragment assembly with double-barreled data. Bioinformatics 17(suppl 1), S223–S225 (2001)

    Article  Google Scholar 

  17. Pevzner, P.A., Tang, H., Tesler, G.: De novo repeat classification and fragment assembly. Genome Research 14(9), 1786–1796 (2004)

    Article  Google Scholar 

  18. Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences of the United States of America 98(17), 9748–9753 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  19. Schatz, M.C., Delcher, A.L., Salzberg, S.L.: Assembly of large genomes using second-generation sequencing. Genome Research 20(9), 1165–1173 (2010)

    Article  Google Scholar 

  20. Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., Birol, I.: ABySS: A parallel assembler for short read sequence data. Genome Research 6, 1117 (2009)

    Article  Google Scholar 

  21. Weber, J.L., Myers, E.W.: Human whole-genome shotgun sequencing. Genome Research 7, 401–409 (1997)

    Article  Google Scholar 

  22. Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18, 821–829 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Medvedev, P., Pham, S., Chaisson, M., Tesler, G., Pevzner, P. (2011). Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers. In: Bafna, V., Sahinalp, S.C. (eds) Research in Computational Molecular Biology. RECOMB 2011. Lecture Notes in Computer Science(), vol 6577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20036-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20036-6_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20035-9

  • Online ISBN: 978-3-642-20036-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics