Comparative Assessment of Alignment Algorithms for NGS Data: Features, Considerations, Implementations, and Future

Shen, Carol; Shen, Tony; Lin, Jimmy

doi:10.1007/978-3-319-59826-0_9

Carol Shen²,
Tony Shen² &
Jimmy Lin³

1838 Accesses

Abstract

Due to the nature of massively parallel sequencing use of shorter reads, the algorithms developed for alignment have been crucial to the widespread adoption of Next-Generation Sequencing (NGS). There has been great progress in the development of a variety of different algorithms for different purposes. Researchers are now able to use sensitive and efficient alignment algorithms for a wide variety of applications, including genome-wide variation studies [1], quantitative RNA-seq expression analyses [2], the study of secondary RNA structure [3], microRNA discovery [4], identification of protein-binding sites using ChIP-sequencing [5], recognizing histone modification patterns for epigenetic studies [6], simultaneous alignment of multiple genomes for comparative genomics [7], and the assembly of de novo genomes and transcriptomes [8]. In clinical settings, alignment to reference genomes has led to rapid pathogen discovery [9], identification of causative mutations for rare genetic diseases [10–12], detection of chromosomal abnormalities in tumor genomes [13], and many other advances which similarly depend on rapid and cost-effective genome-wide sequencing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dalca, A.V., Brudno, M.: Genome variation discovery with high-throughput sequencing data. Brief. Bioinform. 11(1), 3–14 (2010)
Article Google Scholar
Engstrom, P.G., et al.: Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods. 10(12), 1185–1191 (2013)
Article Google Scholar
Zhong, C., Zhang, S.: Efficient alignment of RNA secondary structures using sparse dynamic programming. BMC Bioinformatics. 14, 269 (2013)
Article Google Scholar
Sun, Z., et al.: CAP-miRSeq: a comprehensive analysis pipeline for microRNA sequencing data. BMC Genomics. 15, 423 (2014)
Article Google Scholar
Johnson, D.S., et al.: Genome-wide mapping of in vivo protein-DNA interactions. Science. 316(5830), 1497–1502 (2007)
Article Google Scholar
Hong, C., et al.: Probabilistic alignment leads to improved accuracy and read coverage for bisulfite sequencing data. BMC Bioinformatics. 14, 337 (2013)
Article Google Scholar
Kim, J., Ma, J.: PSAR-align: improving multiple sequence alignment using probabilistic sampling. Bioinformatics. 30(7), 1010–1012 (2014)
Article Google Scholar
Li, R., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20(2), 265–272 (2010)
Article MathSciNet Google Scholar
Naccache, S.N., et al.: A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 24(7), 1180–1192 (2014)
Article Google Scholar
Ng, B.G., et al.: Mosaicism of the UDP-galactose transporter SLC35A2 causes a congenital disorder of glycosylation. Am. J. Hum. Genet. 92(4), 632–636 (2013)
Article Google Scholar
Green, R.C., et al.: Exploring concordance and discordance for return of incidental findings from clinical sequencing. Genet. Med. 14(4), 405–410 (2012)
Article Google Scholar
Goh, V., et al.: Next-generation sequencing facilitates the diagnosis in a child with twinkle mutations causing cholestatic liver failure. J. Pediatr. Gastroenterol. Nutr. 54(2), 291–294 (2012)
Article Google Scholar
Schroder, J., et al.: Socrates: identification of genomic rearrangements in tumour genomes by re-aligning soft clipped reads. Bioinformatics. 30(8), 1064–1072 (2014)
Article Google Scholar
Rizzo, J.M., Buck, M.J.: Key principles and clinical applications of “next-generation” DNA sequencing. Cancer Prev. Res. (Phila.) 5(7), 887–900 (2012)
Article Google Scholar
Shang, J., et al.: Evaluation and comparison of multiple aligners for next-generation sequencing data analysis. Biomed. Res. Int. 2014, 16 (2014)
Article Google Scholar
Metzker, M.L.: Sequencing technologies—the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010)
Article Google Scholar
Lander, E.S.: Initial impact of the sequencing of the human genome. Nature. 470(7333), 187–197 (2011)
Article Google Scholar
Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinform. 11(5), 473–483 (2010)
Article Google Scholar
Li, R., et al.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 25(15), 1966–1967 (2009)
Article Google Scholar
Margulies, M., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437(7057), 376–380 (2005)
Google Scholar
David, M., et al.: SHRiMP2: Sensitive yet Practical Short Read Mapping. Bioinformatics. 27(7), 1011–1012 (2011)
Article Google Scholar
Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 25(14), 1754–1760 (2009)
Article Google Scholar
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)
Article Google Scholar
Bentley, D.R., et al.: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 456(7218), 53–59 (2008)
Article Google Scholar
Smith, A.D., Xuan, Z., Zhang, M.Q.: Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics. 9(128), 128 (2008)
Article Google Scholar
Hoffmann, S., et al.: Fast mapping of short sequences with mismatches, insertions and deletions using index structures. PLoS Comput. Biol. 5(9), e1000502 (2009)
Article MathSciNet Google Scholar
Ondov, B.D., et al.: Efficient mapping of applied biosystems SOLiD sequence data to a reference genome for functional genomic applications. Bioinformatics. 24(23), 2776–2777 (2008)
Article Google Scholar
Kim, D., et al.: TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14(4), R36 (2013)
Article Google Scholar
Rothberg, J.M., et al.: An integrated semiconductor device enabling non-optical genome sequencing. Nature. 475(7356), 348–352 (2011)
Article Google Scholar
Quail, M.A., et al.: A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 13, 341 (2012)
Article Google Scholar
Novocraft Technologies: Novoalign 30 June 2014. Available from: http://www.novocraft.com/main/index.php (2014). Accessed 20 September 2014
Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods. 9(4), 357–359 (2012)
Article Google Scholar
Otto, C., Stadler, P.F., Hoffmann, S.: Lacking alignments? The next-generation sequencing mapper segemehl revisited. Bioinformatics. 30(13), 1837–1843 (2014)
Article Google Scholar
Caboche, S., et al.: Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data. BMC Genomics. 15, 264 (2014)
Article Google Scholar
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 8 (1990)
Article Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Article Google Scholar
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics. 18(3), 440–445 (2002)
Article Google Scholar
Ruffalo, M., LaFramboise, T., Koyutürk, M.: Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics. 27(20), 2790–2796 (2011)
Article Google Scholar
Cao, X., Cheng, L.S., Tung, A.K.H.: Indexing DNA sequences using q-Grams. DASFAA, Lecture Notes in Computer Science, vol. 3453: p. 13 (2005)
Google Scholar
Weese, D., et al.: RazerS—fast read mapping with sensitivity control. Genome Res. 19(9), 1646–1654 (2009)
Article Google Scholar
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. Proceedings of the 41st symposium on foundations of computer science, Redondo Beach, CA, USA, p. 9. (2000)
Google Scholar
Liu, Y., Schmidt, B., Maskell, D.L.: CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows–Wheeler transform. Bioinformatics. 28(14), 1830–1837 (2012)
Article Google Scholar
Santana-Quintero, L., et al.: HIVE-hexagon: high-performance, parallelized sequence alignment for next-generation sequencing data analysis. PLoS One. 9(6), e99033 (2014)
Article Google Scholar
Li, H., Durbin, R.: Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics. 26(5), 589–595 (2010)
Article Google Scholar
Lindner, R., Friedel, C.C.: A comprehensive evaluation of alignment algorithms in the context of RNA-Seq. PLoS One. 7(12), e52403 (2012)
Article Google Scholar
Wu, T.D., Nacu, S.: Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 26(7), 873–881 (2010)
Article Google Scholar
Wang, K., et al.: MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 38(18), e178 (2010)
Article Google Scholar
Dobin, A., et al.: STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29(1), 15–21 (2013)
Article Google Scholar
Kertesz, M., et al.: Genome-wide measurement of RNA secondary structure in yeast. Nature. 467(7311), 103–107 (2010)
Article Google Scholar
Underwood, J.G., et al.: FragSeq: transcriptome-wide RNA structure probing using high-throughput sequencing. Nat. Methods. 7(12), 995–1001 (2010)
Article Google Scholar
Lucks, J.B., et al.: Multiplexed RNA structure characterization with selective 2’-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc. Natl. Acad. Sci. U. S. A. 108(27), 11063–11068 (2011)
Article Google Scholar
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18, 1245–1262 (1989)
Article MathSciNet MATH Google Scholar
Jiang, T., Wang, L., Zhang, K.: Alignment of trees–an alternative to tree edit. Theor. Comput. Sci. 143, 137–148 (1995)
Article MathSciNet MATH Google Scholar
Hochsmann, M., Toller, T., Giergerich, R., Kurtz, S.: Local similarity in RNA secondary structures. In: Proceedings of the 2nd IEEE Computer Society Bioinformatics Conference, Washington DC, (2003). pp. 159–168
Google Scholar
Li, Y., et al.: Performance comparison and evaluation of software tools for microRNA deep-sequencing data analysis. Nucleic Acids Res. 40(10), 4298–4305 (2012)
Article Google Scholar
Krueger, F., Andrews, S.R.: Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 27(11), 1571–1572 (2011)
Article Google Scholar
Xi, Y., Li, W.: BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics. 10, 232 (2009)
Article Google Scholar
Coarfa, C., et al.: Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing. BMC Bioinformatics. 11, 572 (2010)
Article Google Scholar
Lim, J.Q., et al.: BatMeth: improved mapper for bisulfite sequencing reads on DNA methylation. Genome Biol. 13(10), R82 (2012)
Article Google Scholar
Chen, P.Y., Cokus, S.J., Pellegrini, M.: BS Seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics. 11, 203 (2010)
Article Google Scholar
Kunde-Ramamoorthy, G., et al.: Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing. Nucleic Acids Res. 42(6), e43 (2014)
Article Google Scholar
Schatz, M.C., Langmead, B., Salzberg, S.L.: Cloud computing and the DNA data race. Nat. Biotechnol. 28(7), 691–693 (2010)
Article Google Scholar
Maji, R.K., et al.: PVT: an efficient computational procedure to speed up next-generation sequence analysis. BMC Bioinformatics. 15, 167 (2014)
Article Google Scholar
Onsongo, G., et al.: Implementation of cloud based next generation sequencing data analysis in a clinical laboratory. BMC Res. Notes. 7, 314 (2014)
Article Google Scholar
Reid, J.G., et al.: Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline. BMC Bioinformatics. 15(1), 30 (2014)
Article Google Scholar
Oldach, L.: Edico genome makes first sale of NGS processor. In: Bio-IT World, Cambridge Healthtech Institute, 2014
Google Scholar
Kalari, K.R., et al.: MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing. BMC Bioinformatics. 15(1), 224 (2014)
Article Google Scholar
Chin, C.-S., et al.: Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods. 10(6), 563–569 (2013)
Article Google Scholar
English, A.C., et al.: Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology. PLoS One. 7(11), e47768 (2012)
Article Google Scholar
Branton, D., et al.: The potential and challenges of nanopore sequencing. Nat. Biotechnol. 26(10), 1146–1153 (2008)
Article Google Scholar
Laszlo, A.H., et al.: Decoding long nanopore sequencing reads of natural DNA. Nat. Biotechnol. 32(8), 829–833 (2014)
Article Google Scholar
Ummat, A., Bashir, A.: Resolving complex tandem repeats with long reads. Bioinformatics. 30(24), 3491–3498 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
Carol Shen & Tony Shen
Rare Genomics Institute, St. Louis, MO, USA
Jimmy Lin

Authors

Carol Shen
View author publications
You can also search for this author in PubMed Google Scholar
Tony Shen
View author publications
You can also search for this author in PubMed Google Scholar
Jimmy Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carol Shen .

Editor information

Editors and Affiliations

LaTICE, Tunis, Tunisia
Mourad Elloumi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shen, C., Shen, T., Lin, J. (2017). Comparative Assessment of Alignment Algorithms for NGS Data: Features, Considerations, Implementations, and Future. In: Elloumi, M. (eds) Algorithms for Next-Generation Sequencing Data. Springer, Cham. https://doi.org/10.1007/978-3-319-59826-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-59826-0_9
Published: 19 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59824-6
Online ISBN: 978-3-319-59826-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics