Skip to main content

Algorithms and Data Structures in Next-Generation Sequencing

  • Chapter
  • First Online:
Next Generation Sequencing Technologies and Challenges in Sequence Assembly

Part of the book series: SpringerBriefs in Systems Biology ((BRIEFSBIOSYS,volume 7))

Abstract

This chapter provides an overview of prevalent data structures and algorithms that are commonly utilized in bioinformatics. In particular, we place emphasis on data structures and algorithms that are employed in bioinformatic techniques during next-generation sequence assembly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cormen TH, Leiserson EL, Rivest RL, Stein C (2009) Introduction to Algorithms (3rd. Ed.). MIT Press, Cambridge, MA

    Google Scholar 

  2. Sung WK (2010) Algorithms in Bioinformatics: A Practical Introduction. Chapman & Hall/CRC, Boca Raton, FL, USA

    Google Scholar 

  3. Jones N, Pevzner P (2004) An introduction to Bioinformatics Algorithms (Computational Molecular Biology). MIT Press, Cambridge, MA, USA

    Google Scholar 

  4. Chacko E, Ranganathan S (eds) (2011) Chapter10: Graphs in Bioinformatics. in Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications. John Wiley & Sons, Inc., Hoboken, NJ. doi:10.1002/9780470892107.ch10

  5. Compeau PEC, Pevzner PA, Tesler G (2011) How to apply de Bruijn graphs to genome assembly. Nat Biotech 29 (11):987-991. doi:10.1038/nbt.2023

    Google Scholar 

  6. Ilie L, Molnar M (2013) RACER: Rapid and accurate correction of errors in reads. Bioinformatics. doi:btt407

    Google Scholar 

  7. Melsted P, Pritchard J (2011) Efficient counting of k-mers in DNA sequences using a bloom filter. BMC bioinformatics 12 (1):1-7. doi:10.1186/1471-2105-12-333

    Article  Google Scholar 

  8. Zhang Q, Pell J, Canino-Koning R, Chuang Howe CA, Brown T (under review) These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. Preprint arXiv:1309:2975. In review, PloS One

    Google Scholar 

  9. Dasgupta S, Papadimitriou C, Vazirani U (2006) Algorithms. McGraw-Hill Science/Engineering/Math, Berkshire, UK, USA

    Google Scholar 

  10. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS et al. (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464 (7285):59-65. doi:10.1038/nature08821

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  11. Guffanti A, Iacono M, Pelucchi P, Kim N, Solda G et al. (2009) A transcriptional sketch of a primary human breast cancer by 454 deep sequencing. BMC Genomics 10:163. doi:10.1186/1471-2164-10-163

    Article  PubMed Central  PubMed  Google Scholar 

  12. Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A et al. (2008) A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321 (5891):956-960. doi:10.1126/science.1160342

    Article  CAS  PubMed  Google Scholar 

  13. Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11 (5):473-483. doi:10.1093/bib/bbq015

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  14. Novocraft Technologies (2012) Novoalign. http://novocraft.wordpress.com/2012/07/02/novoalign-v2-08-02-novoaligncs-v1-02-02-and-novosort-v1-0-released/

  15. Rumble SM, Lacroute P, Dalca AV, Fiume M, Sidow A et al. (2009) SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol 5 (5):e1000386. doi:10.1371/journal.pcbi.1000386

    Article  PubMed Central  PubMed  Google Scholar 

  16. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10 (3):R25. doi:10.1186/gb-2009-10-3-r25

    Article  PubMed Central  PubMed  Google Scholar 

  17. Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24 (5):713-714. doi:10.1093/bioinformatics/btn025

    Article  CAS  PubMed  Google Scholar 

  18. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25 (14):1754-1760. doi:10.1093/bioinformatics/btp324

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  19. Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F et al. (2009) Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 41 (10):1061-1067. doi:10.1038/ng.437

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  20. Hach F, Hormozdiari F, Alkan C, Birol I, Eichler EE et al. (2010) mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat Methods 7 (8):576-577. doi:10.1038/nmeth0810-576

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  21. Russell S, Norvig P (2009) Artificial Intelligence: A Modern Approach (3rd Ed.). Prentice Hall, New Jersey, USA

    Google Scholar 

  22. Warren RL, Sutton GG, Jones SJ, Holt RA (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23 (4):500-501. doi:10.1093/bioinformatics/btl629

    Article  CAS  PubMed  Google Scholar 

  23. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ et al. (2009) ABySS: a parallel assembler for short read sequence data. Genome research 19 (6):1117-1123. doi:10.1101/gr.089532.108

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  24. Dohm JC, Lottaz C, Borodina T, Himmelbauer H (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res 17 (11):1697-1706. doi:gr.6435207

    Google Scholar 

  25. Jeck WR, Reinhardt JA, Baltrus DA, Hickenbotham MT, Magrini V et al. (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics 23 (21):2942-2944. doi:10.1093/bioinformatics/btm451

    Article  CAS  PubMed  Google Scholar 

  26. Reinhardt JA, Baltrus DA, Nishimura MT, Jeck WR, Jones CD et al. (2009) De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome research 19 (2):294-305. doi:10.1101/gr.083311.108

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  27. Kao WC, Chan AH, Song YS (2011) ECHO: a reference-free short-read error correction algorithm. Genome research 21 (7):1181-1192. doi:10.1101/gr.111351.110

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  28. Goldberg SM, Johnson J, Busam D, Feldblyum T, Ferriera S et al. (2006) A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. Proc Natl Acad Sci U S A 103 (30):11240-11245. doi:0604351103

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  29. Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95 (6):315-327. doi:10.1016/j.ygeno.2010.03.001

    Article  CAS  PubMed Central  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 The Authors

About this chapter

Cite this chapter

El-Metwally, S., Ouda, O.M., Helmy, M. (2014). Algorithms and Data Structures in Next-Generation Sequencing. In: Next Generation Sequencing Technologies and Challenges in Sequence Assembly. SpringerBriefs in Systems Biology, vol 7. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0715-1_2

Download citation

Publish with us

Policies and ethics