Pathset Graphs: A Novel Approach for Comprehensive Utilization of Paired Reads in Genome Assembly

  • Son K. Pham
  • Dmitry Antipov
  • Alexander Sirotkin
  • Glenn Tesler
  • Pavel A. Pevzner
  • Max A. Alekseyev
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7262)


One of the key advances in genome assembly that has led to a significant improvement in contig lengths has been utilization of paired reads (mate-pairs). While in most assemblers, mate-pair information is used in a post-processing step, the recently proposed Paired de Bruijn Graph (PDBG) approach incorporates the mate-pair information directly in the assembly graph structure. However, the PDBG approach faces difficulties when the variation in the insert sizes is high. To address this problem, we first transform mate-pairs into edge-pair histograms that allow one to better estimate the distance between edges in the assembly graph that represent regions linked by multiple mate-pairs. Further, we combine the ideas of mate-pair transformation and PDBGs to construct new data structures for genome assembly: pathsets and pathset graphs.


Insert Size Genomic Walk Genomic Distance Comprehensive Utilization Assembly Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bankevich, A., Nurk, S., Antipov, D., Gurevich, A., Dvorkin, M., Kulikov, A., Lesin, V., Nikolenko, S., Pham, S., Prjibelski, A., Pyshkin, A., Sirotkin, A., Vyahhi, N., Tesler, G., Alekseyev, M., Pevzner, P.: SPAdes: a New Genome Assembler and its Applications to Single Cell Sequencing (submitted, 2012)Google Scholar
  2. 2.
    Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I., Belmonte, M., Lander, E., Nusbaum, C., Jaffe, D.: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Research 18(5), 810 (2008)CrossRefGoogle Scholar
  3. 3.
    Chaisson, M., Brinza, D., Pevzner, P.: De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research 19(2), 336 (2009)CrossRefGoogle Scholar
  4. 4.
    Chaisson, M., Pevzner, P.: Short read fragment assembly of bacterial genomes. Genome Research 18(2), 324 (2008)CrossRefGoogle Scholar
  5. 5.
    Chen, K., Wallis, J., McLellan, M., Larson, D., Kalicki, J., Pohl, C., McGrath, S., Wendl, M., Zhang, Q., Locke, D., et al.: BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods 6(9), 677–681 (2009)CrossRefGoogle Scholar
  6. 6.
    Chikhi, R., Lavenier, D.: Localized Genome Assembly from Reads to Scaffolds: Practical Traversal of the Paired String Graph. In: Przytycka, T.M., Sagot, M.-F. (eds.) WABI 2011. LNCS, vol. 6833, pp. 39–48. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Donmez, N., Brudno, M.: Hapsembler: An Assembler for Highly Polymorphic Genomes. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 38–52. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Kelley, D., Schatz, M., Salzberg, S.: Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11(11), R116 (2010)Google Scholar
  9. 9.
    Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20(2), 265 (2010)CrossRefGoogle Scholar
  10. 10.
    Medvedev, P., Pham, S., Chaisson, M., Tesler, G., Pevzner, P.: Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 238–251. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Moitra, A., Valiant, G.: Settling the polynomial learnability of mixtures of gaussians. In: 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 93–102. IEEE (2010)Google Scholar
  12. 12.
    Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: IDBA – A Practical Iterative de Bruijn Graph De Novo Assembler. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 426–440. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  13. 13.
    Pevzner, P., Tang, H.: Fragment assembly with double-barreled data. Bioinformatics 17(suppl. 1), S225 (2001)CrossRefGoogle Scholar
  14. 14.
    Pevzner, P., Tang, H., Waterman, M.: An Eulerian path approach to DNA fragment assembly. PNAS 98(17), 9748 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Simpson, J., Wong, K., Jackman, S., Schein, J., Jones, S., Birol, İ.: ABySS: a parallel assembler for short read sequence data. Genome Research 19(6), 1117 (2009)CrossRefGoogle Scholar
  16. 16.
    Young, S., Barthelson, R., McFarlin, A., Rounsley, S.: Plantagora toolset (2011),
  17. 17.
    Zerbino, D., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18(5), 821 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Son K. Pham
    • 1
  • Dmitry Antipov
    • 1
  • Alexander Sirotkin
    • 2
  • Glenn Tesler
    • 3
  • Pavel A. Pevzner
    • 1
    • 2
  • Max A. Alekseyev
    • 2
    • 4
  1. 1.Dept. of Computer Science and EngineeringUniversity of CaliforniaSan DiegoUSA
  2. 2.Algorithmic Biology LaboratorySt. Petersburg Academic UniversitySt. PetersburgRussia
  3. 3.Dept. of MathematicsUniversity of CaliforniaSan DiegoUSA
  4. 4.Dept. of Computer Science and EngineeringUniversity of South CarolinaColumbiaUSA

Personalised recommendations