Abstract
With rapid development of next-generation sequencing (NGS) technologies, de novo genome assembly appears increasingly common. However, inherent features of NGS data pose great challenges for de novo genome assembly. Many genomes, such as Brassica rapa, having undergone three paleo-polyploidy events, contain high content repeats, makes genome assembly of NGS data tougher. In past several years, numerous algorithms have been developed to address the challenges in de novo genome assembly from NGS reads. Here we summarize the main approaches for genome assembly. We also describe several algorithms for each approach. In addition, we compare the performance of existing assemblers in the accuracy and contiguity of assemblies. The comparative analysis shows that there is not any assembler that performs best in all the observed measures, which are also dependent on the dataset used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ariyaratne PN, Sung WK (2011) PE-assembler: de novo assembler using short paired-end reads. Bioinformatics 27:167–174
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477
Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK et al (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res 18:810–820
Chaisson MJ, Brinza D, Pevzner PA (2009) De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res 19:336–346
Chu TC, Lu CH, Liu T, Lee GC, Li WH, Shih AC (2013) Assembler for de novo assembly of large genomes. Proc Natl Acad Sci USA 110:E3417–E3424
Compeau PE, Pevzner PA, Tesler G (2011) How to apply de Bruijn graphs to genome assembly. Nat Biotechnol 29:987–991
Delcher AL, Phillippy A, Carlton J, Salzberg SL (2002) Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 30:2478–2483
Dohm JC, Lottaz C, Borodina T, Himmelbauer H (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res 17:1697–1706
Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN et al (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA 108:1513–1518
Hernandez D, Francois P, Farinelli L, Osteras M, Schrenzel J (2008) De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res 18:802–809
Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M, Otto TD (2013) REAPR: a universal tool for genome assembly evaluation. Genome Biol 14:R47
Jaillon O, Aury JM, Noel B, Policriti A, Clepet C et al (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449:463–467
Kent WJ (2002) BLAT–the BLAST-like alignment tool. Genome Res 12:656–664
Li R, Zhu H, Ruan J, Qian W, Fang X et al (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20:265–272
Li Z, Chen Y, Mu D, Yuan J, Shi Y et al (2012) Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Brief Funct Genomics 11:25–37
Luo R, Liu B, Xie Y, Li Z, Huang W et al (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1:18
Maccallum I, Przybylski D, Gnerre S, Burton J, Shlyakhter I et al (2009) ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol 10:R103
Magoc T, Pabinger S, Canzar S, Liu X, Su Q et al (2013) GAGE-B: an evaluation of genome assemblers for bacterial organisms. Bioinformatics 29:1718–1725
Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11:31–46
Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95:315–327
Peng Y, Leung HC, Yiu SM, Chin FY (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428
Pevzner PA, Tang H (2001) Fragment assembly with double-barreled data. Bioinformatics 17:S225–S233
Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T et al (2012) GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Res 22:557–567
Simpson JT, Durbin R (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Res 22:549–556
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–1123
Wang X, Wang H, Wang J, Sun R, Wu J et al (2011) The genome of the mesopolyploid crop species Brassica rapa. Nat Genet 43:1035–1039
Warren RL, Sutton GG, Jones SJ, Holt RA (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23:500–501
Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829
Zhang T, Luo Y, Chen Y, Li X, Yu J (2012) BIGrat: a repeat resolver for pyrosequencing-based re-sequencing with Newbler. BMC Res Notes 5:567
Zimin AV, Marcais G, Puiu D, Roberts M, Salzberg SL, Yorke JA (2013) The MaSuRCA genome assembler. Bioinformatics 29:2669–2677
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Liu, M., Liu, D., Zheng, H. (2015). De Novo Genome Assembly of Next-Generation Sequencing Data. In: Wang, X., Kole, C. (eds) The Brassica rapa Genome. Compendium of Plant Genomes. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-47901-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-662-47901-8_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-47900-1
Online ISBN: 978-3-662-47901-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)