Arthropod Genome Sequencing and Assembly Strategies

  • Stephen RichardsEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1858)


As in any endeavor, the strategy applied to a genome project can mean the difference between success and failure. This is especially important when limited funding often means only a single approach may be tried at a given time. Although the advance of all areas of genomics and transcriptomics in recent years has led to an embarrassment of riches, methods in the field have not quite reached the turn-key production status for all species, despite being closer than ever. Here I contrast and compare the technical approaches to genome projects in the hope of enabling strategy choices with higher probabilities of success. Finally, I review the new technologies that are not yet widely distributed which are revolutionizing the future of genomics.

Key words

Genome project strategy Genome assembly Genome sequencing Genomics Insect genomes Oxford nanopore Pacific biosciences 10× Genomics 



I thank Susan Brown and Michael Pfrender for the invitation to author this article. This work was funded by NHGRI grant U54 HG003273 to Richard A. Gibbs.


  1. 1.
    Pennisi E (2017) Biologists propose to sequence the DNA of all life on Earth. Science.
  2. 2.
    Kaiser TS, Poehn B, Szkiba D, Preussner M, Sedlazeck FJ, Zrim A et al (2016) The genomic basis of circadian and circalunar timing adaptations in a midge. Nature 540(7631):69–73. CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Weisenfeld NI, Kumar V, Shah P, Church DM, Jaffe DB (2017) Direct determination of diploid genome sequences. Genome Res 27(5):757–767. CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27(5):722–736. CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24(5):713–714. CrossRefPubMedGoogle Scholar
  6. 6.
    Maccallum I, Przybylski D, Gnerre S, Burton J, Shlyakhter I, Gnirke A et al (2009) ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol 10(10):R103. CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Benoit JB, Adelman ZN, Reinhardt K, Dolan A, Poelchau M, Jennings EC et al (2016) Unique features of a global human ectoparasite identified through sequencing of the bed bug genome. Nat Commun 7:10165. CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Welcome_Trust (2003) Sharing data from large-scale biological research projects: a system of tripartite responsibility Washington DC: The Welcome Trust.
  9. 9.
    Holt C, Yandell M (2011) MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12:491. CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Keller O, Kollmar M, Stanke M, Waack S (2011) A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics 27(6):757–763. CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19):3210–3212. CrossRefPubMedGoogle Scholar
  12. 12.
    Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23(9):1061–1067. CrossRefPubMedGoogle Scholar
  13. 13.
    Poelchau M, Childers C, Moore G, Tsavatapalli V, Evans J, Lee CY et al (2015) The i5k Workspace@NAL—enabling genomic data access, visualization and curation of arthropod genomes. Nucleic Acids Res 43(Database issue):D714–D719. CrossRefGoogle Scholar
  14. 14.
    Lee E, Helt GA, Reese JT, Munoz-Torres MC, Childers CP, Buels RM et al (2013) Web Apollo: a web-based genomic annotation editing platform. Genome Biol 14(8):R93. CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    NewblerKJ (2005) 454 Life Sciences. Sequence read assembly software. p 454Google Scholar
  16. 16.
    Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18(5):821–829. CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK, Lander ES et al (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res 18(5):810–820. CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I et al (2013) Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience 2(1):10. CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Kajitani R, Toshimoto K, Noguchi H, Toyoda A, Ogura Y, Okuno M et al (2014) Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res 24(8):1384–1395. CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Pryszcz LP, Gabaldon T (2016) Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res 44(12):e113. CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Voskoboynik A, Neff NF, Sahoo D, Newman AM, Pushkarev D, Koh W et al (2013) The genome sequence of the colonial chordate, Botryllus schlosseri. elife 2:e00569. CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    McCoy RC, Taylor RW, Blauwkamp TA, Kelley JL, Kertesz M, Pushkarev D et al (2014) Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS One 9(9):e106689. CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Jing J, Reed J, Huang J, Hu X, Clarke V, Edington J et al (1998) Automated high resolution optical mapping using arrayed, fluid-fixed DNA molecules. Proc Natl Acad Sci U S A 95(14):8046–8051CrossRefGoogle Scholar
  24. 24.
    Shelton JM, Coleman MC, Herndon N, Lu N, Lam ET, Anantharaman T et al (2015) Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool. BMC Genomics 16:734. CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC et al (2017) De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356(6333):92–95. CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Putnam NH, O'Connell BL, Stites JC, Rice BJ, Blanchette M, Calef R et al (2016) Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res 26(3):342–350. CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C et al (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10(6):563–569. CrossRefPubMedGoogle Scholar
  28. 28.
    Chin J (2013) FALCONGoogle Scholar
  29. 29.
    Burton JN, Adey A, Patwardhan RP, Qiu R, Kitzman JO, Shendure J (2013) Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol 31(12):1119–1125. CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Jaffe DB (2015) DISCOVAR: Assemble genomes, find variants. Broad Institute: Broad Institute. Accessed 25 May 2016
  31. 31.
    English AC, Richards S, Han Y, Wang M, Vee V, Qu J et al (2012) Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One 7(11):e47768. CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Jain M, Koren S, Quick J, Rand AC, Sasani TA, Tyson JR et al (2018) Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 36:338–345CrossRefGoogle Scholar
  33. 33.
    Oxford_Nanopore_Technologies (2016) A technology update from Clive Brown, September 2016—Wafer thin update. Oxford UKGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Human Genome Sequencing CenterBaylor College of MedicineHoustonUSA

Personalised recommendations