Abstract
Next-generation sequencing (NGS) technologies have rapidly evolved in the last 5 years, leading to the generation of millions of short reads in a single run. Consequently, various sequence alignment algorithms have been developed to compare these reads to an appropriate reference in order to perform important downstream analysis. SOAP2 from the SOAP series is one of the most commonly used alignment programs to handle NGS data, and it efficiently does so using low computer memory usage and fast alignment speed. This chapter describes the protocol used to align short reads to a reference genome using SOAP2, and highlights the significance of using the in-built command-line options to tune the behavior of the algorithm according to the inputs and the desired results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, Sicheritz-Ponten T, Turner K, Zhu H, Yu C, Li S, Jian M, Zhou Y, Li Y, Zhang X, Li S, Qin N, Yang H, Wang J, Brunak S, Dore J, Guarner F, Kristiansen K, Pedersen O, Parkhill J, Weissenbach J, Bork P, Ehrlich SD, Wang J (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285):59–65
Van Tassell CP, Smith TP, Matukumalli LK, Taylor JF, Schnabel RD, Lawley CT, Haudenschild CD, Moore SS, Warren WC, Sonstegard TS (2008) SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries. Nat Methods 5(3):247–252
Taylor KH, Kramer RS, Davis JW, Guo J, Duff DJ, Xu D, Caldwell CW, Shi H (2007) Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing. Cancer Res 67(18):8511–8518
Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, Schmidt D, O’Keeffe S, Haas S, Vingron M, Lehrach H, Yaspo ML (2008) A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321(5891):956–960
Guffanti A, Iacono M, Pelucchi P, Kim N, Solda G, Croft LJ, Taft RJ, Rizzi E, Askarian-Amiri M, Bonnal RJ, Callari M, Mignone F, Pesole G, Bertalot G, Bernardi LR, Albertini A, Lee C, Mattick JS, Zucchi I, De Bellis G (2009) A transcriptional sketch of a primary human breast cancer by 454 deep sequencing. BMC Genomics 10:163
Auffray C, Chen Z, Hood L (2009) Systems medicine: the future of medical genomics and healthcare. Genome Med 1(1):2
Yu X, Guda K, Willis J, Veigl M, Wang Z, Markowitz S, Adams MD, Sun S (2012) How do alignment programs perform on sequencing data with varying qualities and from repetitive regions? BioData Min 5(1):6
Flicek P, Birney E (2009) Sense from sequence reads: methods for alignment and assembly. Nat Methods 6(11 Suppl):S6–S12
Flicek P (2009) The need for speed. Genome Biol 10(3):212
Ferragina P, Manzini G (2005) Indexing compressed text. J ACM 52(4):552–581
Schbath S, Martin V, Zytnicki M, Fayolle J, Loux V, Gibrat JF (2012) Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. J Comput Biol 19(6):796–813
Ruffalo M, LaFramboise T, Koyuturk M (2011) Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics 27(20):2790–2796
Hatem A, Bozdag D, Toland AE, Catalyurek UV (2013) Benchmarking short sequence mapping tools. BMC Bioinformatics 14:184
Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24(5):713–714
Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15):1966–1967
Liu CM, Wong T, Wu E, Luo R, Yiu SM, Li Y, Wang B, Yu C, Chu X, Zhao K, Li R, Lam TW (2012) SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28(6):878–879
Luo R, Wong T, Zhu J, Liu CM, Zhu X, Wu E, Lee LK, Lin H, Zhu W, Cheung DW, Ting HF, Yiu SM, Peng S, Yu C, Li Y, Li R, Lam TW (2013) SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner. PLoS One 8(5), e65632
Minoche AE, Dohm JC, Himmelbauer H (2011) Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol 12(11):R112
Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40(D1):D1178–D1186
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079
Reynoso V, Putonti C (2011) Mapping short sequencing reads to distant relatives. In: Proceedings of the 2nd ACM conference on bioinformatics, computational biology and biomedicine, 2011. ACM, Chicago, IL, p 420–424
Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408(6814):796–815
Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11(5):473–483
Lorenc MT, Hayashi S, Stiller J, Lee H, Manoli S, Ruperao P, Visendi P, Berkman PJ, Lai K, Batley J, Edwards D (2012) Discovery of single nucleotide polymorphisms in complex genomes using SGSautoSNP. Biology 1(2):370–382
Siragusa E, Weese D, Reinert K (2013) Fast and accurate read mapping with approximate seeds and multiple backtracking. Nucleic Acids Res 41(7), e78
Mott R, Tribe R (1999) Approximate statistics of gapped alignments. J Comput Biol 6(1):91–112
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this protocol
Cite this protocol
Hurgobin, B. (2016). Short Read Alignment Using SOAP2. In: Edwards, D. (eds) Plant Bioinformatics. Methods in Molecular Biology, vol 1374. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3167-5_13
Download citation
DOI: https://doi.org/10.1007/978-1-4939-3167-5_13
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3166-8
Online ISBN: 978-1-4939-3167-5
eBook Packages: Springer Protocols