Long Range Sequencing and Validation of Insect Genome Assemblies

  • Surya SahaEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1858)


Advances in long read and long range sequencing technologies have enabled chromosome length resolution for de novo genome assemblies even in the absence of complementary resources such as physical maps. Herein, I introduce a few methods for quality control and discuss potential pitfalls when assembling insect genomes with long reads.

Key words

Assembly Scaffolding Next-generation sequencing De bruijn graph PacBio Nanopore 



I would like to thank Susan Brown and Michael Pfrender for the invitation to author this article. I would also like to thank my colleagues Prashant Hosmani and Mirella Flores for insightful discussions for troubleshooting assembly issues. This work was funded by USDA NIFA grant 2015-70016-23028 funded to Susan Brown and Lukas Mueller.


  1. 1.
    i5K Consortium (2013) The i5K initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment. J Hered 104:595–600CrossRefGoogle Scholar
  2. 2.
    Richards S, Murali SC (2015) Best practices in insect genome sequencing: what works and what doesn’t. Curr Opin Insect Sci 7:1–7CrossRefGoogle Scholar
  3. 3.
    T.I.A.G. Consortium (2010) Genome sequence of the pea aphid Acyrthosiphon pisum. PLoS Biol 8:e1000313CrossRefGoogle Scholar
  4. 4.
    Maumus F, Fiston-Lavier A-S, Quesneville H (2015) Impact of transposable elements on insect genomes and biology. Curr Opin Insect Sci 7:30–36CrossRefGoogle Scholar
  5. 5.
    Nene V, Wortman JR, Lawson D et al (2007) Genome sequence of aedes aegypti, a major arbovirus vector. Science 316:1718–1723CrossRefGoogle Scholar
  6. 6.
    Dudchenko O, Batra SS, Omer AD et al (2017) De novo assembly of the Aedes aegypti genome using Hi–C yields chromosome-length scaffolds. ScienceGoogle Scholar
  7. 7.
    Koren S, Walenz BP, Berlin K et al (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736CrossRefGoogle Scholar
  8. 8.
    Li H (2016) Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32(14):2103–2110CrossRefGoogle Scholar
  9. 9.
    English AC, Richards S, Han Y et al (2012) Mind the gap: upgrading genomes with pacific biosciences RS long-read sequencing technology. PLoS One 7:e47768CrossRefGoogle Scholar
  10. 10.
    Yeo S, Coombe L, Chu J et al (2018) ARCS: assembly roundup by chromium scaffolding. Bioinformatics 34(5):725–731CrossRefGoogle Scholar
  11. 11.
    Pryszcz LP, Gabaldón T (2016) Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res 44(12):e113CrossRefGoogle Scholar
  12. 12.
    Walker BJ, Abeel T, Shea T et al (2014) Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9:e112963CrossRefGoogle Scholar
  13. 13.
    Simao FA, Waterhouse RM, Ioannidis P et al (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212CrossRefGoogle Scholar
  14. 14.
    Wick RR, Schultz MB, Zobel J et al (2015) Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31:3350–3352CrossRefGoogle Scholar
  15. 15.
    Krumsiek J, Arnold R, Rattei T (2007) Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23:1026–1028CrossRefGoogle Scholar
  16. 16.
    Kurtz S, Phillippy A, Delcher AL et al (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12CrossRefGoogle Scholar
  17. 17.
    Li H (2018) Minimap2: fast pairwise alignment for long DNA sequences. Bioinformatics. CrossRefGoogle Scholar
  18. 18.
    Adams MD, Celniker SE, Holt RA et al (2000) The genome sequence of drosophila melanogaster. Science 287(5461):2185–2195CrossRefGoogle Scholar
  19. 19.
    Berlin K, Koren S, Chin C-S et al (2015) Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol 33(6):623–630CrossRefGoogle Scholar
  20. 20.
    Miller JR, Delcher AL, Koren S et al (2008) Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 24:2818–2824CrossRefGoogle Scholar
  21. 21.
    Chaisson MJ, Tesler G, Ramaraj T et al (2012) Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 13:238CrossRefGoogle Scholar
  22. 22.
    Putnam NH, O’Connell BL, Stites JC et al (2016) Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res 26:342–350CrossRefGoogle Scholar
  23. 23.
    Zheng GXY, Lau BT, Schnall-Levin M et al (2016) Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat Biotechnol 34:303–311CrossRefGoogle Scholar
  24. 24.
    Rhoads A, Au KF (2015) PacBio sequencing and its applications. Genom Proteom Bioinf 13:278–289CrossRefGoogle Scholar
  25. 25.
    Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/Map format and SAMtools. Bioinformatics 25:2078–2079CrossRefGoogle Scholar
  26. 26.
    Chin C-S, Peluso P, Sedlazeck FJ et al (2016) Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods 13(12):1050–1054CrossRefGoogle Scholar
  27. 27.
    Saha S, Hosmani PS, Villalobos-Ayala K et al (2017) Improved annotation of the insect vector of citrus greening disease: biocuration by a diverse genomics community, Database. bax032Google Scholar
  28. 28.
    Benoit JB, Adelman ZN, Reinhardt K et al (2016) Unique features of a global human ectoparasite identified through sequencing of the bed bug genome. Nat Commun 7:10165CrossRefGoogle Scholar
  29. 29.
    Chen W, Hasegawa DK, Kaur N et al (2016) The draft genome of whitefly Bemisia tabaci MEAM1, a global crop pest, provides novel insights into virus transmission, host adaptation, and insecticide resistance. BMC Biol 14:110CrossRefGoogle Scholar
  30. 30.
    Saha S, Hosmani P, Flores M, et al (2017) Using long reads, optical maps and long-range scaffolding to improve the Diaphorina citri genomeGoogle Scholar
  31. 31.
    Huang S, Kang M, Xu A (2017) HaploMerger2: rebuilding both haploid sub-assemblies from high-heterozygosity diploid genome assembly. Bioinformatics 490:49–54Google Scholar
  32. 32.
    Jiao W-B, Garcia Accinelli G, Hartwig B et al (2017) Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. Genome Res 27(5):778–786CrossRefGoogle Scholar
  33. 33.
    Mostovoy Y, Levy-Sakin M, Lam J et al (2016) A hybrid approach for de novo human genome sequence assembly and phasing. Nat Methods 13(7):587–590CrossRefGoogle Scholar
  34. 34.
    Jain M, Koren S, Quick J et al (2018) Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 36:338–345CrossRefGoogle Scholar
  35. 35.
    Erlich Y (2015) A vision for ubiquitous sequencing. Genome Res 25:1411–1416CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Sol Genomics NetworkBoyce Thompson InstituteIthacaUSA

Personalised recommendations