Skip to main content

Comparative Assessment of Alignment Algorithms for NGS Data: Features, Considerations, Implementations, and Future

  • Chapter
  • First Online:
Algorithms for Next-Generation Sequencing Data
  • 1838 Accesses

Abstract

Due to the nature of massively parallel sequencing use of shorter reads, the algorithms developed for alignment have been crucial to the widespread adoption of Next-Generation Sequencing (NGS). There has been great progress in the development of a variety of different algorithms for different purposes. Researchers are now able to use sensitive and efficient alignment algorithms for a wide variety of applications, including genome-wide variation studies [1], quantitative RNA-seq expression analyses [2], the study of secondary RNA structure [3], microRNA discovery [4], identification of protein-binding sites using ChIP-sequencing [5], recognizing histone modification patterns for epigenetic studies [6], simultaneous alignment of multiple genomes for comparative genomics [7], and the assembly of de novo genomes and transcriptomes [8]. In clinical settings, alignment to reference genomes has led to rapid pathogen discovery [9], identification of causative mutations for rare genetic diseases [10–12], detection of chromosomal abnormalities in tumor genomes [13], and many other advances which similarly depend on rapid and cost-effective genome-wide sequencing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dalca, A.V., Brudno, M.: Genome variation discovery with high-throughput sequencing data. Brief. Bioinform. 11(1), 3–14 (2010)

    Article  Google Scholar 

  2. Engstrom, P.G., et al.: Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods. 10(12), 1185–1191 (2013)

    Article  Google Scholar 

  3. Zhong, C., Zhang, S.: Efficient alignment of RNA secondary structures using sparse dynamic programming. BMC Bioinformatics. 14, 269 (2013)

    Article  Google Scholar 

  4. Sun, Z., et al.: CAP-miRSeq: a comprehensive analysis pipeline for microRNA sequencing data. BMC Genomics. 15, 423 (2014)

    Article  Google Scholar 

  5. Johnson, D.S., et al.: Genome-wide mapping of in vivo protein-DNA interactions. Science. 316(5830), 1497–1502 (2007)

    Article  Google Scholar 

  6. Hong, C., et al.: Probabilistic alignment leads to improved accuracy and read coverage for bisulfite sequencing data. BMC Bioinformatics. 14, 337 (2013)

    Article  Google Scholar 

  7. Kim, J., Ma, J.: PSAR-align: improving multiple sequence alignment using probabilistic sampling. Bioinformatics. 30(7), 1010–1012 (2014)

    Article  Google Scholar 

  8. Li, R., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20(2), 265–272 (2010)

    Article  MathSciNet  Google Scholar 

  9. Naccache, S.N., et al.: A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 24(7), 1180–1192 (2014)

    Article  Google Scholar 

  10. Ng, B.G., et al.: Mosaicism of the UDP-galactose transporter SLC35A2 causes a congenital disorder of glycosylation. Am. J. Hum. Genet. 92(4), 632–636 (2013)

    Article  Google Scholar 

  11. Green, R.C., et al.: Exploring concordance and discordance for return of incidental findings from clinical sequencing. Genet. Med. 14(4), 405–410 (2012)

    Article  Google Scholar 

  12. Goh, V., et al.: Next-generation sequencing facilitates the diagnosis in a child with twinkle mutations causing cholestatic liver failure. J. Pediatr. Gastroenterol. Nutr. 54(2), 291–294 (2012)

    Article  Google Scholar 

  13. Schroder, J., et al.: Socrates: identification of genomic rearrangements in tumour genomes by re-aligning soft clipped reads. Bioinformatics. 30(8), 1064–1072 (2014)

    Article  Google Scholar 

  14. Rizzo, J.M., Buck, M.J.: Key principles and clinical applications of “next-generation” DNA sequencing. Cancer Prev. Res. (Phila.) 5(7), 887–900 (2012)

    Article  Google Scholar 

  15. Shang, J., et al.: Evaluation and comparison of multiple aligners for next-generation sequencing data analysis. Biomed. Res. Int. 2014, 16 (2014)

    Article  Google Scholar 

  16. Metzker, M.L.: Sequencing technologies—the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010)

    Article  Google Scholar 

  17. Lander, E.S.: Initial impact of the sequencing of the human genome. Nature. 470(7333), 187–197 (2011)

    Article  Google Scholar 

  18. Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinform. 11(5), 473–483 (2010)

    Article  Google Scholar 

  19. Li, R., et al.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 25(15), 1966–1967 (2009)

    Article  Google Scholar 

  20. Margulies, M., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437(7057), 376–380 (2005)

    Google Scholar 

  21. David, M., et al.: SHRiMP2: Sensitive yet Practical Short Read Mapping. Bioinformatics. 27(7), 1011–1012 (2011)

    Article  Google Scholar 

  22. Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 25(14), 1754–1760 (2009)

    Article  Google Scholar 

  23. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)

    Article  Google Scholar 

  24. Bentley, D.R., et al.: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 456(7218), 53–59 (2008)

    Article  Google Scholar 

  25. Smith, A.D., Xuan, Z., Zhang, M.Q.: Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics. 9(128), 128 (2008)

    Article  Google Scholar 

  26. Hoffmann, S., et al.: Fast mapping of short sequences with mismatches, insertions and deletions using index structures. PLoS Comput. Biol. 5(9), e1000502 (2009)

    Article  MathSciNet  Google Scholar 

  27. Ondov, B.D., et al.: Efficient mapping of applied biosystems SOLiD sequence data to a reference genome for functional genomic applications. Bioinformatics. 24(23), 2776–2777 (2008)

    Article  Google Scholar 

  28. Kim, D., et al.: TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14(4), R36 (2013)

    Article  Google Scholar 

  29. Rothberg, J.M., et al.: An integrated semiconductor device enabling non-optical genome sequencing. Nature. 475(7356), 348–352 (2011)

    Article  Google Scholar 

  30. Quail, M.A., et al.: A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 13, 341 (2012)

    Article  Google Scholar 

  31. Novocraft Technologies: Novoalign 30 June 2014. Available from: http://www.novocraft.com/main/index.php (2014). Accessed 20 September 2014

  32. Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods. 9(4), 357–359 (2012)

    Article  Google Scholar 

  33. Otto, C., Stadler, P.F., Hoffmann, S.: Lacking alignments? The next-generation sequencing mapper segemehl revisited. Bioinformatics. 30(13), 1837–1843 (2014)

    Article  Google Scholar 

  34. Caboche, S., et al.: Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data. BMC Genomics. 15, 264 (2014)

    Article  Google Scholar 

  35. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 8 (1990)

    Article  Google Scholar 

  36. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  37. Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics. 18(3), 440–445 (2002)

    Article  Google Scholar 

  38. Ruffalo, M., LaFramboise, T., Koyutürk, M.: Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics. 27(20), 2790–2796 (2011)

    Article  Google Scholar 

  39. Cao, X., Cheng, L.S., Tung, A.K.H.: Indexing DNA sequences using q-Grams. DASFAA, Lecture Notes in Computer Science, vol. 3453: p. 13 (2005)

    Google Scholar 

  40. Weese, D., et al.: RazerS—fast read mapping with sensitivity control. Genome Res. 19(9), 1646–1654 (2009)

    Article  Google Scholar 

  41. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. Proceedings of the 41st symposium on foundations of computer science, Redondo Beach, CA, USA, p. 9. (2000)

    Google Scholar 

  42. Liu, Y., Schmidt, B., Maskell, D.L.: CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows–Wheeler transform. Bioinformatics. 28(14), 1830–1837 (2012)

    Article  Google Scholar 

  43. Santana-Quintero, L., et al.: HIVE-hexagon: high-performance, parallelized sequence alignment for next-generation sequencing data analysis. PLoS One. 9(6), e99033 (2014)

    Article  Google Scholar 

  44. Li, H., Durbin, R.: Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics. 26(5), 589–595 (2010)

    Article  Google Scholar 

  45. Lindner, R., Friedel, C.C.: A comprehensive evaluation of alignment algorithms in the context of RNA-Seq. PLoS One. 7(12), e52403 (2012)

    Article  Google Scholar 

  46. Wu, T.D., Nacu, S.: Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 26(7), 873–881 (2010)

    Article  Google Scholar 

  47. Wang, K., et al.: MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 38(18), e178 (2010)

    Article  Google Scholar 

  48. Dobin, A., et al.: STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29(1), 15–21 (2013)

    Article  Google Scholar 

  49. Kertesz, M., et al.: Genome-wide measurement of RNA secondary structure in yeast. Nature. 467(7311), 103–107 (2010)

    Article  Google Scholar 

  50. Underwood, J.G., et al.: FragSeq: transcriptome-wide RNA structure probing using high-throughput sequencing. Nat. Methods. 7(12), 995–1001 (2010)

    Article  Google Scholar 

  51. Lucks, J.B., et al.: Multiplexed RNA structure characterization with selective 2’-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc. Natl. Acad. Sci. U. S. A. 108(27), 11063–11068 (2011)

    Article  Google Scholar 

  52. Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18, 1245–1262 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  53. Jiang, T., Wang, L., Zhang, K.: Alignment of trees–an alternative to tree edit. Theor. Comput. Sci. 143, 137–148 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  54. Hochsmann, M., Toller, T., Giergerich, R., Kurtz, S.: Local similarity in RNA secondary structures. In: Proceedings of the 2nd IEEE Computer Society Bioinformatics Conference, Washington DC, (2003). pp. 159–168

    Google Scholar 

  55. Li, Y., et al.: Performance comparison and evaluation of software tools for microRNA deep-sequencing data analysis. Nucleic Acids Res. 40(10), 4298–4305 (2012)

    Article  Google Scholar 

  56. Krueger, F., Andrews, S.R.: Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 27(11), 1571–1572 (2011)

    Article  Google Scholar 

  57. Xi, Y., Li, W.: BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics. 10, 232 (2009)

    Article  Google Scholar 

  58. Coarfa, C., et al.: Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing. BMC Bioinformatics. 11, 572 (2010)

    Article  Google Scholar 

  59. Lim, J.Q., et al.: BatMeth: improved mapper for bisulfite sequencing reads on DNA methylation. Genome Biol. 13(10), R82 (2012)

    Article  Google Scholar 

  60. Chen, P.Y., Cokus, S.J., Pellegrini, M.: BS Seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics. 11, 203 (2010)

    Article  Google Scholar 

  61. Kunde-Ramamoorthy, G., et al.: Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing. Nucleic Acids Res. 42(6), e43 (2014)

    Article  Google Scholar 

  62. Schatz, M.C., Langmead, B., Salzberg, S.L.: Cloud computing and the DNA data race. Nat. Biotechnol. 28(7), 691–693 (2010)

    Article  Google Scholar 

  63. Maji, R.K., et al.: PVT: an efficient computational procedure to speed up next-generation sequence analysis. BMC Bioinformatics. 15, 167 (2014)

    Article  Google Scholar 

  64. Onsongo, G., et al.: Implementation of cloud based next generation sequencing data analysis in a clinical laboratory. BMC Res. Notes. 7, 314 (2014)

    Article  Google Scholar 

  65. Reid, J.G., et al.: Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline. BMC Bioinformatics. 15(1), 30 (2014)

    Article  Google Scholar 

  66. Oldach, L.: Edico genome makes first sale of NGS processor. In: Bio-IT World, Cambridge Healthtech Institute, 2014

    Google Scholar 

  67. Kalari, K.R., et al.: MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing. BMC Bioinformatics. 15(1), 224 (2014)

    Article  Google Scholar 

  68. Chin, C.-S., et al.: Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods. 10(6), 563–569 (2013)

    Article  Google Scholar 

  69. English, A.C., et al.: Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology. PLoS One. 7(11), e47768 (2012)

    Article  Google Scholar 

  70. Branton, D., et al.: The potential and challenges of nanopore sequencing. Nat. Biotechnol. 26(10), 1146–1153 (2008)

    Article  Google Scholar 

  71. Laszlo, A.H., et al.: Decoding long nanopore sequencing reads of natural DNA. Nat. Biotechnol. 32(8), 829–833 (2014)

    Article  Google Scholar 

  72. Ummat, A., Bashir, A.: Resolving complex tandem repeats with long reads. Bioinformatics. 30(24), 3491–3498 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carol Shen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Shen, C., Shen, T., Lin, J. (2017). Comparative Assessment of Alignment Algorithms for NGS Data: Features, Considerations, Implementations, and Future. In: Elloumi, M. (eds) Algorithms for Next-Generation Sequencing Data. Springer, Cham. https://doi.org/10.1007/978-3-319-59826-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59826-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59824-6

  • Online ISBN: 978-3-319-59826-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics