Abstract
This chapter provides an overview of prevalent data structures and algorithms that are commonly utilized in bioinformatics. In particular, we place emphasis on data structures and algorithms that are employed in bioinformatic techniques during next-generation sequence assembly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cormen TH, Leiserson EL, Rivest RL, Stein C (2009) Introduction to Algorithms (3rd. Ed.). MIT Press, Cambridge, MA
Sung WK (2010) Algorithms in Bioinformatics: A Practical Introduction. Chapman & Hall/CRC, Boca Raton, FL, USA
Jones N, Pevzner P (2004) An introduction to Bioinformatics Algorithms (Computational Molecular Biology). MIT Press, Cambridge, MA, USA
Chacko E, Ranganathan S (eds) (2011) Chapter10: Graphs in Bioinformatics. in Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications. John Wiley & Sons, Inc., Hoboken, NJ. doi:10.1002/9780470892107.ch10
Compeau PEC, Pevzner PA, Tesler G (2011) How to apply de Bruijn graphs to genome assembly. Nat Biotech 29 (11):987-991. doi:10.1038/nbt.2023
Ilie L, Molnar M (2013) RACER: Rapid and accurate correction of errors in reads. Bioinformatics. doi:btt407
Melsted P, Pritchard J (2011) Efficient counting of k-mers in DNA sequences using a bloom filter. BMC bioinformatics 12 (1):1-7. doi:10.1186/1471-2105-12-333
Zhang Q, Pell J, Canino-Koning R, Chuang Howe CA, Brown T (under review) These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. Preprint arXiv:1309:2975. In review, PloS One
Dasgupta S, Papadimitriou C, Vazirani U (2006) Algorithms. McGraw-Hill Science/Engineering/Math, Berkshire, UK, USA
Qin J, Li R, Raes J, Arumugam M, Burgdorf KS et al. (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464 (7285):59-65. doi:10.1038/nature08821
Guffanti A, Iacono M, Pelucchi P, Kim N, Solda G et al. (2009) A transcriptional sketch of a primary human breast cancer by 454 deep sequencing. BMC Genomics 10:163. doi:10.1186/1471-2164-10-163
Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A et al. (2008) A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321 (5891):956-960. doi:10.1126/science.1160342
Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11 (5):473-483. doi:10.1093/bib/bbq015
Novocraft Technologies (2012) Novoalign. http://novocraft.wordpress.com/2012/07/02/novoalign-v2-08-02-novoaligncs-v1-02-02-and-novosort-v1-0-released/
Rumble SM, Lacroute P, Dalca AV, Fiume M, Sidow A et al. (2009) SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol 5 (5):e1000386. doi:10.1371/journal.pcbi.1000386
Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10 (3):R25. doi:10.1186/gb-2009-10-3-r25
Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24 (5):713-714. doi:10.1093/bioinformatics/btn025
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25 (14):1754-1760. doi:10.1093/bioinformatics/btp324
Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F et al. (2009) Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 41 (10):1061-1067. doi:10.1038/ng.437
Hach F, Hormozdiari F, Alkan C, Birol I, Eichler EE et al. (2010) mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat Methods 7 (8):576-577. doi:10.1038/nmeth0810-576
Russell S, Norvig P (2009) Artificial Intelligence: A Modern Approach (3rd Ed.). Prentice Hall, New Jersey, USA
Warren RL, Sutton GG, Jones SJ, Holt RA (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23 (4):500-501. doi:10.1093/bioinformatics/btl629
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ et al. (2009) ABySS: a parallel assembler for short read sequence data. Genome research 19 (6):1117-1123. doi:10.1101/gr.089532.108
Dohm JC, Lottaz C, Borodina T, Himmelbauer H (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res 17 (11):1697-1706. doi:gr.6435207
Jeck WR, Reinhardt JA, Baltrus DA, Hickenbotham MT, Magrini V et al. (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics 23 (21):2942-2944. doi:10.1093/bioinformatics/btm451
Reinhardt JA, Baltrus DA, Nishimura MT, Jeck WR, Jones CD et al. (2009) De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome research 19 (2):294-305. doi:10.1101/gr.083311.108
Kao WC, Chan AH, Song YS (2011) ECHO: a reference-free short-read error correction algorithm. Genome research 21 (7):1181-1192. doi:10.1101/gr.111351.110
Goldberg SM, Johnson J, Busam D, Feldblyum T, Ferriera S et al. (2006) A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. Proc Natl Acad Sci U S A 103 (30):11240-11245. doi:0604351103
Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95 (6):315-327. doi:10.1016/j.ygeno.2010.03.001
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 The Authors
About this chapter
Cite this chapter
El-Metwally, S., Ouda, O.M., Helmy, M. (2014). Algorithms and Data Structures in Next-Generation Sequencing. In: Next Generation Sequencing Technologies and Challenges in Sequence Assembly. SpringerBriefs in Systems Biology, vol 7. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0715-1_2
Download citation
DOI: https://doi.org/10.1007/978-1-4939-0715-1_2
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0714-4
Online ISBN: 978-1-4939-0715-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)