Pathset Graphs: A Novel Approach for Comprehensive Utilization of Paired Reads in Genome Assembly
One of the key advances in genome assembly that has led to a significant improvement in contig lengths has been utilization of paired reads (mate-pairs). While in most assemblers, mate-pair information is used in a post-processing step, the recently proposed Paired de Bruijn Graph (PDBG) approach incorporates the mate-pair information directly in the assembly graph structure. However, the PDBG approach faces difficulties when the variation in the insert sizes is high. To address this problem, we first transform mate-pairs into edge-pair histograms that allow one to better estimate the distance between edges in the assembly graph that represent regions linked by multiple mate-pairs. Further, we combine the ideas of mate-pair transformation and PDBGs to construct new data structures for genome assembly: pathsets and pathset graphs.
KeywordsInsert Size Genomic Walk Genomic Distance Comprehensive Utilization Assembly Graph
Unable to display preview. Download preview PDF.
- 1.Bankevich, A., Nurk, S., Antipov, D., Gurevich, A., Dvorkin, M., Kulikov, A., Lesin, V., Nikolenko, S., Pham, S., Prjibelski, A., Pyshkin, A., Sirotkin, A., Vyahhi, N., Tesler, G., Alekseyev, M., Pevzner, P.: SPAdes: a New Genome Assembler and its Applications to Single Cell Sequencing (submitted, 2012)Google Scholar
- 8.Kelley, D., Schatz, M., Salzberg, S.: Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11(11), R116 (2010)Google Scholar
- 11.Moitra, A., Valiant, G.: Settling the polynomial learnability of mixtures of gaussians. In: 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 93–102. IEEE (2010)Google Scholar
- 16.Young, S., Barthelson, R., McFarlin, A., Rounsley, S.: Plantagora toolset (2011), http://www.plantagora.org