Abstract
Due to the nature of massively parallel sequencing use of shorter reads, the algorithms developed for alignment have been crucial to the widespread adoption of Next-Generation Sequencing (NGS). There has been great progress in the development of a variety of different algorithms for different purposes. Researchers are now able to use sensitive and efficient alignment algorithms for a wide variety of applications, including genome-wide variation studies [1], quantitative RNA-seq expression analyses [2], the study of secondary RNA structure [3], microRNA discovery [4], identification of protein-binding sites using ChIP-sequencing [5], recognizing histone modification patterns for epigenetic studies [6], simultaneous alignment of multiple genomes for comparative genomics [7], and the assembly of de novo genomes and transcriptomes [8]. In clinical settings, alignment to reference genomes has led to rapid pathogen discovery [9], identification of causative mutations for rare genetic diseases [10–12], detection of chromosomal abnormalities in tumor genomes [13], and many other advances which similarly depend on rapid and cost-effective genome-wide sequencing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dalca, A.V., Brudno, M.: Genome variation discovery with high-throughput sequencing data. Brief. Bioinform. 11(1), 3–14 (2010)
Engstrom, P.G., et al.: Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods. 10(12), 1185–1191 (2013)
Zhong, C., Zhang, S.: Efficient alignment of RNA secondary structures using sparse dynamic programming. BMC Bioinformatics. 14, 269 (2013)
Sun, Z., et al.: CAP-miRSeq: a comprehensive analysis pipeline for microRNA sequencing data. BMC Genomics. 15, 423 (2014)
Johnson, D.S., et al.: Genome-wide mapping of in vivo protein-DNA interactions. Science. 316(5830), 1497–1502 (2007)
Hong, C., et al.: Probabilistic alignment leads to improved accuracy and read coverage for bisulfite sequencing data. BMC Bioinformatics. 14, 337 (2013)
Kim, J., Ma, J.: PSAR-align: improving multiple sequence alignment using probabilistic sampling. Bioinformatics. 30(7), 1010–1012 (2014)
Li, R., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20(2), 265–272 (2010)
Naccache, S.N., et al.: A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 24(7), 1180–1192 (2014)
Ng, B.G., et al.: Mosaicism of the UDP-galactose transporter SLC35A2 causes a congenital disorder of glycosylation. Am. J. Hum. Genet. 92(4), 632–636 (2013)
Green, R.C., et al.: Exploring concordance and discordance for return of incidental findings from clinical sequencing. Genet. Med. 14(4), 405–410 (2012)
Goh, V., et al.: Next-generation sequencing facilitates the diagnosis in a child with twinkle mutations causing cholestatic liver failure. J. Pediatr. Gastroenterol. Nutr. 54(2), 291–294 (2012)
Schroder, J., et al.: Socrates: identification of genomic rearrangements in tumour genomes by re-aligning soft clipped reads. Bioinformatics. 30(8), 1064–1072 (2014)
Rizzo, J.M., Buck, M.J.: Key principles and clinical applications of “next-generation” DNA sequencing. Cancer Prev. Res. (Phila.) 5(7), 887–900 (2012)
Shang, J., et al.: Evaluation and comparison of multiple aligners for next-generation sequencing data analysis. Biomed. Res. Int. 2014, 16 (2014)
Metzker, M.L.: Sequencing technologies—the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010)
Lander, E.S.: Initial impact of the sequencing of the human genome. Nature. 470(7333), 187–197 (2011)
Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinform. 11(5), 473–483 (2010)
Li, R., et al.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 25(15), 1966–1967 (2009)
Margulies, M., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437(7057), 376–380 (2005)
David, M., et al.: SHRiMP2: Sensitive yet Practical Short Read Mapping. Bioinformatics. 27(7), 1011–1012 (2011)
Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 25(14), 1754–1760 (2009)
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)
Bentley, D.R., et al.: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 456(7218), 53–59 (2008)
Smith, A.D., Xuan, Z., Zhang, M.Q.: Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics. 9(128), 128 (2008)
Hoffmann, S., et al.: Fast mapping of short sequences with mismatches, insertions and deletions using index structures. PLoS Comput. Biol. 5(9), e1000502 (2009)
Ondov, B.D., et al.: Efficient mapping of applied biosystems SOLiD sequence data to a reference genome for functional genomic applications. Bioinformatics. 24(23), 2776–2777 (2008)
Kim, D., et al.: TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14(4), R36 (2013)
Rothberg, J.M., et al.: An integrated semiconductor device enabling non-optical genome sequencing. Nature. 475(7356), 348–352 (2011)
Quail, M.A., et al.: A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 13, 341 (2012)
Novocraft Technologies: Novoalign 30 June 2014. Available from: http://www.novocraft.com/main/index.php (2014). Accessed 20 September 2014
Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods. 9(4), 357–359 (2012)
Otto, C., Stadler, P.F., Hoffmann, S.: Lacking alignments? The next-generation sequencing mapper segemehl revisited. Bioinformatics. 30(13), 1837–1843 (2014)
Caboche, S., et al.: Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data. BMC Genomics. 15, 264 (2014)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 8 (1990)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics. 18(3), 440–445 (2002)
Ruffalo, M., LaFramboise, T., Koyutürk, M.: Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics. 27(20), 2790–2796 (2011)
Cao, X., Cheng, L.S., Tung, A.K.H.: Indexing DNA sequences using q-Grams. DASFAA, Lecture Notes in Computer Science, vol. 3453: p. 13 (2005)
Weese, D., et al.: RazerS—fast read mapping with sensitivity control. Genome Res. 19(9), 1646–1654 (2009)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. Proceedings of the 41st symposium on foundations of computer science, Redondo Beach, CA, USA, p. 9. (2000)
Liu, Y., Schmidt, B., Maskell, D.L.: CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows–Wheeler transform. Bioinformatics. 28(14), 1830–1837 (2012)
Santana-Quintero, L., et al.: HIVE-hexagon: high-performance, parallelized sequence alignment for next-generation sequencing data analysis. PLoS One. 9(6), e99033 (2014)
Li, H., Durbin, R.: Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics. 26(5), 589–595 (2010)
Lindner, R., Friedel, C.C.: A comprehensive evaluation of alignment algorithms in the context of RNA-Seq. PLoS One. 7(12), e52403 (2012)
Wu, T.D., Nacu, S.: Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 26(7), 873–881 (2010)
Wang, K., et al.: MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 38(18), e178 (2010)
Dobin, A., et al.: STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29(1), 15–21 (2013)
Kertesz, M., et al.: Genome-wide measurement of RNA secondary structure in yeast. Nature. 467(7311), 103–107 (2010)
Underwood, J.G., et al.: FragSeq: transcriptome-wide RNA structure probing using high-throughput sequencing. Nat. Methods. 7(12), 995–1001 (2010)
Lucks, J.B., et al.: Multiplexed RNA structure characterization with selective 2’-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc. Natl. Acad. Sci. U. S. A. 108(27), 11063–11068 (2011)
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18, 1245–1262 (1989)
Jiang, T., Wang, L., Zhang, K.: Alignment of trees–an alternative to tree edit. Theor. Comput. Sci. 143, 137–148 (1995)
Hochsmann, M., Toller, T., Giergerich, R., Kurtz, S.: Local similarity in RNA secondary structures. In: Proceedings of the 2nd IEEE Computer Society Bioinformatics Conference, Washington DC, (2003). pp. 159–168
Li, Y., et al.: Performance comparison and evaluation of software tools for microRNA deep-sequencing data analysis. Nucleic Acids Res. 40(10), 4298–4305 (2012)
Krueger, F., Andrews, S.R.: Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 27(11), 1571–1572 (2011)
Xi, Y., Li, W.: BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics. 10, 232 (2009)
Coarfa, C., et al.: Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing. BMC Bioinformatics. 11, 572 (2010)
Lim, J.Q., et al.: BatMeth: improved mapper for bisulfite sequencing reads on DNA methylation. Genome Biol. 13(10), R82 (2012)
Chen, P.Y., Cokus, S.J., Pellegrini, M.: BS Seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics. 11, 203 (2010)
Kunde-Ramamoorthy, G., et al.: Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing. Nucleic Acids Res. 42(6), e43 (2014)
Schatz, M.C., Langmead, B., Salzberg, S.L.: Cloud computing and the DNA data race. Nat. Biotechnol. 28(7), 691–693 (2010)
Maji, R.K., et al.: PVT: an efficient computational procedure to speed up next-generation sequence analysis. BMC Bioinformatics. 15, 167 (2014)
Onsongo, G., et al.: Implementation of cloud based next generation sequencing data analysis in a clinical laboratory. BMC Res. Notes. 7, 314 (2014)
Reid, J.G., et al.: Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline. BMC Bioinformatics. 15(1), 30 (2014)
Oldach, L.: Edico genome makes first sale of NGS processor. In: Bio-IT World, Cambridge Healthtech Institute, 2014
Kalari, K.R., et al.: MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing. BMC Bioinformatics. 15(1), 224 (2014)
Chin, C.-S., et al.: Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods. 10(6), 563–569 (2013)
English, A.C., et al.: Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology. PLoS One. 7(11), e47768 (2012)
Branton, D., et al.: The potential and challenges of nanopore sequencing. Nat. Biotechnol. 26(10), 1146–1153 (2008)
Laszlo, A.H., et al.: Decoding long nanopore sequencing reads of natural DNA. Nat. Biotechnol. 32(8), 829–833 (2014)
Ummat, A., Bashir, A.: Resolving complex tandem repeats with long reads. Bioinformatics. 30(24), 3491–3498 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Shen, C., Shen, T., Lin, J. (2017). Comparative Assessment of Alignment Algorithms for NGS Data: Features, Considerations, Implementations, and Future. In: Elloumi, M. (eds) Algorithms for Next-Generation Sequencing Data. Springer, Cham. https://doi.org/10.1007/978-3-319-59826-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-59826-0_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59824-6
Online ISBN: 978-3-319-59826-0
eBook Packages: Computer ScienceComputer Science (R0)