Short Read Alignment Using SOAP2

Hurgobin, Bhavna

doi:10.1007/978-1-4939-3167-5_13

Bhavna Hurgobin^3,4

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1374))

8257 Accesses
8 Citations
2 Altmetric

Abstract

Next-generation sequencing (NGS) technologies have rapidly evolved in the last 5 years, leading to the generation of millions of short reads in a single run. Consequently, various sequence alignment algorithms have been developed to compare these reads to an appropriate reference in order to perform important downstream analysis. SOAP2 from the SOAP series is one of the most commonly used alignment programs to handle NGS data, and it efficiently does so using low computer memory usage and fast alignment speed. This chapter describes the protocol used to align short reads to a reference genome using SOAP2, and highlights the significance of using the in-built command-line options to tune the behavior of the algorithm according to the inputs and the desired results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, Sicheritz-Ponten T, Turner K, Zhu H, Yu C, Li S, Jian M, Zhou Y, Li Y, Zhang X, Li S, Qin N, Yang H, Wang J, Brunak S, Dore J, Guarner F, Kristiansen K, Pedersen O, Parkhill J, Weissenbach J, Bork P, Ehrlich SD, Wang J (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285):59–65
Article PubMed Central CAS PubMed Google Scholar
Van Tassell CP, Smith TP, Matukumalli LK, Taylor JF, Schnabel RD, Lawley CT, Haudenschild CD, Moore SS, Warren WC, Sonstegard TS (2008) SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries. Nat Methods 5(3):247–252
Article PubMed Google Scholar
Taylor KH, Kramer RS, Davis JW, Guo J, Duff DJ, Xu D, Caldwell CW, Shi H (2007) Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing. Cancer Res 67(18):8511–8518
Article CAS PubMed Google Scholar
Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, Schmidt D, O’Keeffe S, Haas S, Vingron M, Lehrach H, Yaspo ML (2008) A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321(5891):956–960
Article CAS PubMed Google Scholar
Guffanti A, Iacono M, Pelucchi P, Kim N, Solda G, Croft LJ, Taft RJ, Rizzi E, Askarian-Amiri M, Bonnal RJ, Callari M, Mignone F, Pesole G, Bertalot G, Bernardi LR, Albertini A, Lee C, Mattick JS, Zucchi I, De Bellis G (2009) A transcriptional sketch of a primary human breast cancer by 454 deep sequencing. BMC Genomics 10:163
Article PubMed Central PubMed Google Scholar
Auffray C, Chen Z, Hood L (2009) Systems medicine: the future of medical genomics and healthcare. Genome Med 1(1):2
Article PubMed Central PubMed Google Scholar
Yu X, Guda K, Willis J, Veigl M, Wang Z, Markowitz S, Adams MD, Sun S (2012) How do alignment programs perform on sequencing data with varying qualities and from repetitive regions? BioData Min 5(1):6
Article PubMed Central PubMed Google Scholar
Flicek P, Birney E (2009) Sense from sequence reads: methods for alignment and assembly. Nat Methods 6(11 Suppl):S6–S12
Article CAS PubMed Google Scholar
Flicek P (2009) The need for speed. Genome Biol 10(3):212
Article PubMed Central PubMed Google Scholar
Ferragina P, Manzini G (2005) Indexing compressed text. J ACM 52(4):552–581
Article Google Scholar
Schbath S, Martin V, Zytnicki M, Fayolle J, Loux V, Gibrat JF (2012) Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. J Comput Biol 19(6):796–813
Article PubMed Central CAS PubMed Google Scholar
Ruffalo M, LaFramboise T, Koyuturk M (2011) Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics 27(20):2790–2796
Article CAS PubMed Google Scholar
Hatem A, Bozdag D, Toland AE, Catalyurek UV (2013) Benchmarking short sequence mapping tools. BMC Bioinformatics 14:184
Article PubMed Central PubMed Google Scholar
Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24(5):713–714
Article CAS PubMed Google Scholar
Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15):1966–1967
Article CAS PubMed Google Scholar
Liu CM, Wong T, Wu E, Luo R, Yiu SM, Li Y, Wang B, Yu C, Chu X, Zhao K, Li R, Lam TW (2012) SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28(6):878–879
Article CAS PubMed Google Scholar
Luo R, Wong T, Zhu J, Liu CM, Zhu X, Wu E, Lee LK, Lin H, Zhu W, Cheung DW, Ting HF, Yiu SM, Peng S, Yu C, Li Y, Li R, Lam TW (2013) SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner. PLoS One 8(5), e65632
Article PubMed Central CAS PubMed Google Scholar
Minoche AE, Dohm JC, Himmelbauer H (2011) Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol 12(11):R112
Article PubMed Central CAS PubMed Google Scholar
Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40(D1):D1178–D1186
Article PubMed Central CAS PubMed Google Scholar
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079
Article PubMed Central PubMed Google Scholar
Reynoso V, Putonti C (2011) Mapping short sequencing reads to distant relatives. In: Proceedings of the 2nd ACM conference on bioinformatics, computational biology and biomedicine, 2011. ACM, Chicago, IL, p 420–424
Google Scholar
Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408(6814):796–815
Article Google Scholar
Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11(5):473–483
Article PubMed Central CAS PubMed Google Scholar
Lorenc MT, Hayashi S, Stiller J, Lee H, Manoli S, Ruperao P, Visendi P, Berkman PJ, Lai K, Batley J, Edwards D (2012) Discovery of single nucleotide polymorphisms in complex genomes using SGSautoSNP. Biology 1(2):370–382
Article PubMed Central CAS PubMed Google Scholar
Siragusa E, Weese D, Reinert K (2013) Fast and accurate read mapping with approximate seeds and multiple backtracking. Nucleic Acids Res 41(7), e78
Article PubMed Central CAS PubMed Google Scholar
Mott R, Tribe R (1999) Approximate statistics of gapped alignments. J Comput Biol 6(1):91–112
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

University of Queensland, Hartley Teakle Building 83, St Lucia, QLD, 4072, Australia
Bhavna Hurgobin
School of Plant Biology, University of Western Australia, Perth, WA, 6009, Australia
Bhavna Hurgobin

Authors

Bhavna Hurgobin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bhavna Hurgobin .

Editor information

Editors and Affiliations

School of Plant Biology, University of Western Australia, Perth, Australia
David Edwards

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Hurgobin, B. (2016). Short Read Alignment Using SOAP2. In: Edwards, D. (eds) Plant Bioinformatics. Methods in Molecular Biology, vol 1374. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3167-5_13

Download citation

DOI: https://doi.org/10.1007/978-1-4939-3167-5_13
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3166-8
Online ISBN: 978-1-4939-3167-5
eBook Packages: Springer Protocols

Publish with us

Policies and ethics