Design of an Efficient Out-of-Core Read Alignment Algorithm
New genome sequencing technologies are poised to enter the sequencing landscape with significantly higher throughput of read data produced at unprecedented speeds and lower costs per run. However, current in-memory methods to align a set of reads to one or more reference genomes are ill-equipped to handle the expected growth of read-throughput from newer technologies.
This paper reports the design of a new out-of-core read mapping algorithm, Syzygy, which can scale to large volumes of read and genome data. The algorithm is designed to run in a constant, user-stipulated amount of main memory – small enough to fit on standard desktops – irrespective of the sizes of read and genome data. Syzygy achieves a superior spatial locality-of-reference that allows all large data structures used in the algorithm to be maintained on disk. We compare our prototype implementation with several popular read alignment programs. Our results demonstrate clearly that Syzygy can scale to very large read volumes while using only a fraction of memory in comparison, without sacrificing performance.
KeywordsMain Memory Reverse Complement Tile Size Radix Sort Genome Sequencing Technology
Unable to display preview. Download preview PDF.
- 7.Kent, W.J.: BLAT–the blast-like alignment tool 12, 656–664 (April 2002)Google Scholar
- 8.Cox, A.J.: Ultra-high throughput alignment of short sequence tags (2007) (unpublished)Google Scholar
- 9.Rumble, S.M., Lacroute, P., Dalca, A.V., Fiume, M., Sidow, A., Brudno, M.: SHRiMP: accurate mapping of short color-space reads. PLoS Computational Biology 5 (May 2009)Google Scholar
- 10.Li, H., Ruan, J., Durbin, R.: Mapping short dna sequencing reads and calling variants using mapping quality scores. Genome Research (August 2008)Google Scholar
- 17.Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Research 10 (March 2009)Google Scholar
- 20.Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of Foundations of Computer Science, pp. 390–398 (2000)Google Scholar
- 21.McIlroy, P.K., Bostic, K., Mcilroy, M.D.: Engineering radix sort. Computing Systems 6, 5–27 (1993)Google Scholar
- 23.The quest for an accelerated population count. In: Oram, A., Wilson, G. (eds.) Beautiful code, pp. 147–160. O‘ Reilly, Sebastopol (2007)Google Scholar