Algorithms and Data Structures in Next-Generation Sequencing

El-Metwally, Sara; Ouda, Osama M.; Helmy, Mohamed

doi:10.1007/978-1-4939-0715-1_2

Sara El-Metwally⁴,
Osama M. Ouda^4,5 &
Mohamed Helmy^6,7

Part of the book series: SpringerBriefs in Systems Biology ((BRIEFSBIOSYS,volume 7))

3082 Accesses
1 Citations

Abstract

This chapter provides an overview of prevalent data structures and algorithms that are commonly utilized in bioinformatics. In particular, we place emphasis on data structures and algorithms that are employed in bioinformatic techniques during next-generation sequence assembly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cormen TH, Leiserson EL, Rivest RL, Stein C (2009) Introduction to Algorithms (3rd. Ed.). MIT Press, Cambridge, MA
Google Scholar
Sung WK (2010) Algorithms in Bioinformatics: A Practical Introduction. Chapman & Hall/CRC, Boca Raton, FL, USA
Google Scholar
Jones N, Pevzner P (2004) An introduction to Bioinformatics Algorithms (Computational Molecular Biology). MIT Press, Cambridge, MA, USA
Google Scholar
Chacko E, Ranganathan S (eds) (2011) Chapter10: Graphs in Bioinformatics. in Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications. John Wiley & Sons, Inc., Hoboken, NJ. doi:10.1002/9780470892107.ch10
Compeau PEC, Pevzner PA, Tesler G (2011) How to apply de Bruijn graphs to genome assembly. Nat Biotech 29 (11):987-991. doi:10.1038/nbt.2023
Google Scholar
Ilie L, Molnar M (2013) RACER: Rapid and accurate correction of errors in reads. Bioinformatics. doi:btt407
Google Scholar
Melsted P, Pritchard J (2011) Efficient counting of k-mers in DNA sequences using a bloom filter. BMC bioinformatics 12 (1):1-7. doi:10.1186/1471-2105-12-333
Article Google Scholar
Zhang Q, Pell J, Canino-Koning R, Chuang Howe CA, Brown T (under review) These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. Preprint arXiv:1309:2975. In review, PloS One
Google Scholar
Dasgupta S, Papadimitriou C, Vazirani U (2006) Algorithms. McGraw-Hill Science/Engineering/Math, Berkshire, UK, USA
Google Scholar
Qin J, Li R, Raes J, Arumugam M, Burgdorf KS et al. (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464 (7285):59-65. doi:10.1038/nature08821
Article CAS PubMed Central PubMed Google Scholar
Guffanti A, Iacono M, Pelucchi P, Kim N, Solda G et al. (2009) A transcriptional sketch of a primary human breast cancer by 454 deep sequencing. BMC Genomics 10:163. doi:10.1186/1471-2164-10-163
Article PubMed Central PubMed Google Scholar
Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A et al. (2008) A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321 (5891):956-960. doi:10.1126/science.1160342
Article CAS PubMed Google Scholar
Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11 (5):473-483. doi:10.1093/bib/bbq015
Article CAS PubMed Central PubMed Google Scholar
Novocraft Technologies (2012) Novoalign. http://novocraft.wordpress.com/2012/07/02/novoalign-v2-08-02-novoaligncs-v1-02-02-and-novosort-v1-0-released/
Rumble SM, Lacroute P, Dalca AV, Fiume M, Sidow A et al. (2009) SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol 5 (5):e1000386. doi:10.1371/journal.pcbi.1000386
Article PubMed Central PubMed Google Scholar
Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10 (3):R25. doi:10.1186/gb-2009-10-3-r25
Article PubMed Central PubMed Google Scholar
Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24 (5):713-714. doi:10.1093/bioinformatics/btn025
Article CAS PubMed Google Scholar
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25 (14):1754-1760. doi:10.1093/bioinformatics/btp324
Article CAS PubMed Central PubMed Google Scholar
Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F et al. (2009) Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 41 (10):1061-1067. doi:10.1038/ng.437
Article CAS PubMed Central PubMed Google Scholar
Hach F, Hormozdiari F, Alkan C, Birol I, Eichler EE et al. (2010) mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat Methods 7 (8):576-577. doi:10.1038/nmeth0810-576
Article CAS PubMed Central PubMed Google Scholar
Russell S, Norvig P (2009) Artificial Intelligence: A Modern Approach (3rd Ed.). Prentice Hall, New Jersey, USA
Google Scholar
Warren RL, Sutton GG, Jones SJ, Holt RA (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23 (4):500-501. doi:10.1093/bioinformatics/btl629
Article CAS PubMed Google Scholar
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ et al. (2009) ABySS: a parallel assembler for short read sequence data. Genome research 19 (6):1117-1123. doi:10.1101/gr.089532.108
Article CAS PubMed Central PubMed Google Scholar
Dohm JC, Lottaz C, Borodina T, Himmelbauer H (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res 17 (11):1697-1706. doi:gr.6435207
Google Scholar
Jeck WR, Reinhardt JA, Baltrus DA, Hickenbotham MT, Magrini V et al. (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics 23 (21):2942-2944. doi:10.1093/bioinformatics/btm451
Article CAS PubMed Google Scholar
Reinhardt JA, Baltrus DA, Nishimura MT, Jeck WR, Jones CD et al. (2009) De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome research 19 (2):294-305. doi:10.1101/gr.083311.108
Article CAS PubMed Central PubMed Google Scholar
Kao WC, Chan AH, Song YS (2011) ECHO: a reference-free short-read error correction algorithm. Genome research 21 (7):1181-1192. doi:10.1101/gr.111351.110
Article CAS PubMed Central PubMed Google Scholar
Goldberg SM, Johnson J, Busam D, Feldblyum T, Ferriera S et al. (2006) A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. Proc Natl Acad Sci U S A 103 (30):11240-11245. doi:0604351103
Article CAS PubMed Central PubMed Google Scholar
Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95 (6):315-327. doi:10.1016/j.ygeno.2010.03.001
Article CAS PubMed Central PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Mansoura University, Mansoura, Egypt
Sara El-Metwally & Osama M. Ouda
Department of Information Technology, Michigan State University (MSU), East Lansing, MI, USA
Osama M. Ouda
Botany Department and Biotechnology Department, Al-Azhar University, Cairo, Egypt
Mohamed Helmy
The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto (UofT), Toronto, Canada
Mohamed Helmy

Authors

Sara El-Metwally
View author publications
You can also search for this author in PubMed Google Scholar
Osama M. Ouda
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Helmy
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

El-Metwally, S., Ouda, O.M., Helmy, M. (2014). Algorithms and Data Structures in Next-Generation Sequencing. In: Next Generation Sequencing Technologies and Challenges in Sequence Assembly. SpringerBriefs in Systems Biology, vol 7. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0715-1_2

Download citation

DOI: https://doi.org/10.1007/978-1-4939-0715-1_2
Published: 21 March 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0714-4
Online ISBN: 978-1-4939-0715-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics