Advertisement

OMGS: Optical Map-Based Genome Scaffolding

  • Weihua Pan
  • Tao Jiang
  • Stefano LonardiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11467)

Abstract

Due to the current limitations of sequencing technologies, de novo genome assembly is typically carried out in two stages, namely contig (sequence) assembly and scaffolding. While scaffolding is computationally easier than sequence assembly, the scaffolding problem can be challenging due to the high repetitive content of eukaryotic genomes, possible mis-joins in assembled contigs and inaccuracies in the linkage information. Genome scaffolding tools either use paired-end/mate-pair/linked/Hi-C reads or genome-wide maps (optical, physical or genetic) as linkage information. Optical maps (in particular Bionano Genomics maps) have been extensively used in many recent large-scale genome assembly projects (e.g., goat, apple, barley, maize, quinoa, sea bass, among others). However, the most commonly used scaffolding tools have a serious limitation: they can only deal with one optical map at a time, forcing users to alternate or iterate over multiple maps. In this paper, we introduce a novel scaffolding algorithm called OMGS that for the first time can take advantages of multiple optical maps. OMGS solves several optimization problems to generate scaffolds with optimal contiguity and correctness. Extensive experimental results demonstrate that our tool outperforms existing methods when multiple optical maps are available, and produces comparable scaffolds using a single optical map. OMGS can be obtained from https://github.com/ucrbioinfo/OMGS.

Keywords

De novo genome assembly Scaffolding Optical maps Combinatorial optimization 

Notes

Acknowledgements

This work was supported in part by National Science Foundation grants IIS-1814359, IOS-1543963, IIS-1526742 and IIS-1646333, the Natural Science Foundation of China grant 61772197 and the National Key Research and Development Program of China grant 2018YFC0910404.

References

  1. 1.
    Avriel, M.: Nonlinear Programming: Analysis and Methods. Courier Corporation, New York (2003)zbMATHGoogle Scholar
  2. 2.
    Baharev, A., Schichl, H., Neumaier, A., Achterberg, T.: An exact method for the minimum feedback arc set problem, vol. 10, pp. 35–60. University of Vienna (2015)Google Scholar
  3. 3.
    Berlin, K., Koren, S., Chin, C.-S., Drake, J.P., Landolin, J.M., Phillippy, A.M.: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nature Biotechnol. 33(6), 623 (2015)CrossRefGoogle Scholar
  4. 4.
    Bickhart, D.M., et al.: Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nature Genet. 49(4), 643 (2017)CrossRefGoogle Scholar
  5. 5.
    Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., Pirovano, W.: Scaffolding pre-assembled contigs using sspace. Bioinformatics 27(4), 578–579 (2010)CrossRefGoogle Scholar
  6. 6.
    Chin, C.-S., et al.: Phased diploid genome assembly with single-molecule real-time sequencing. Nature Methods 13(12), 1050 (2016)CrossRefGoogle Scholar
  7. 7.
    Daccord, N., et al.: High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development. Nature Genet. 49(7), 1099 (2017)CrossRefGoogle Scholar
  8. 8.
    Dayarian, A., Michael, T.P., Sengupta, A.M.: SOPRA: scaffolding algorithm for paired reads via statistical optimization. BMC Bioinform. 11(1), 345 (2010)CrossRefGoogle Scholar
  9. 9.
    Demaine, E.D., Immorlica, N.: Correlation clustering with partial information. In: Arora, S., Jansen, K., Rolim, J.D.P., Sahai, A. (eds.) APPROX/RANDOM -2003. LNCS, vol. 2764, pp. 1–13. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-540-45198-3_1CrossRefGoogle Scholar
  10. 10.
    Dessmark, A., Jansson, J., Lingas, A., Lundell, E.-M., Persson, M.: On the approximability of maximum and minimum edge clique partition problems. Int. J. Found. Comput. Sci. 18(02), 217–226 (2007)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Donmez, N., Brudno, M.: SCARPA: scaffolding reads with practical algorithms. Bioinformatics 29(4), 428–434 (2012)CrossRefGoogle Scholar
  12. 12.
    Gao, S., Nagarajan, N., Sung, W.-K.: Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 437–451. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20036-6_40CrossRefGoogle Scholar
  13. 13.
    Gritsenko, A.A., Nijkamp, J.F., Reinders, M.J.T., de Ridder, D.: GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies. Bioinformatics 28(11), 1429–1437 (2012)CrossRefGoogle Scholar
  14. 14.
    Hunt, M., Newbold, C., Berriman, M., Otto, T.D.: A comprehensive evaluation of assembly scaffolding tools. Genome Biol. 15(3), R42 (2014)CrossRefGoogle Scholar
  15. 15.
    Jiao, W.-B., et al.: Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. Genome Res. 27(5), 778–786 (2017)CrossRefGoogle Scholar
  16. 16.
    Kajitani, R., et al.: Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 24(8), 1384–1395 (2014).  https://doi.org/10.1101/gr.170720.113CrossRefGoogle Scholar
  17. 17.
    Koren, S., Treangen, T.J., Pop, M.: Bambus 2: scaffolding metagenomes. Bioinformatics 27(21), 2964–2971 (2011)CrossRefGoogle Scholar
  18. 18.
    Koren, S., Walenz, B.P., Berlin, K., Miller, J.R., Bergman, N.H., Phillippy, A.M.: Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27(5), 722–736 (2017).  https://doi.org/10.1101/gr.215087.116CrossRefGoogle Scholar
  19. 19.
    Li, H.: Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32(14), 2103–2110 (2016)CrossRefGoogle Scholar
  20. 20.
    Lin, Y., Yuan, J., Kolmogorov, M., Shen, M.W., Chaisson, M., Pevzner, P.A.: Assembly of long error-prone reads using de Bruijn graphs. Proc. National Acad. Sci. 113(52), E8396–E8405 (2016)CrossRefGoogle Scholar
  21. 21.
    Loman, N.J., Quick, J., Simpson, J.T.: A complete bacterial genome assembled de novo using only nanopore sequencing data. Nature Methods 12(8), 733 (2015)CrossRefGoogle Scholar
  22. 22.
    Luo, R., et al.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1(1), 18 (2012)CrossRefGoogle Scholar
  23. 23.
    Mascher, M., et al.: A chromosome conformation capture ordered sequence of the barley genome. Nature 544(7651), 427 (2017)CrossRefGoogle Scholar
  24. 24.
    Muñoz-Amatriaín, M., et al.: Genome resources for climate-resilient cowpea, an essential crop for food security. Plant J. 89(5), 1042–1054 (2017)CrossRefGoogle Scholar
  25. 25.
    Nagarajan, N., Read, T.D., Pop, M.: Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics 24(10), 1229–1235 (2008)CrossRefGoogle Scholar
  26. 26.
    Pan, W., Lonardi, S.: Accurate detection of chimeric contigs via bionano optical maps. Bioinformatics (2018, in press)Google Scholar
  27. 27.
    Pan, W., Wanamaker, S.I., Ah-Fong, A.M.V., Judelson, H.S., Lonardi, S.: Novo&stitch: accurate reconciliation of genome assemblies via optical maps. Bioinformatics 34(13), i43–i51 (2018)CrossRefGoogle Scholar
  28. 28.
    Pendleton, M., et al.: Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nature Methods 12(8), 780 (2015)CrossRefGoogle Scholar
  29. 29.
    Pop, M., Kosack, D.S., Salzberg, S.L.: Hierarchical scaffolding with Bambus. Genome Res. 14(1), 149–159 (2004)CrossRefGoogle Scholar
  30. 30.
    Saha, S., Rajasekaran, S.: Efficient and scalable scaffolding using optical restriction maps. BMC Genomics 15(5), S5 (2014)CrossRefGoogle Scholar
  31. 31.
    Salmela, L., Mäkinen, V., Välimäki, N., Ylinen, J., Ukkonen, E.: Fast scaffolding with small independent mixed integer programs. Bioinformatics 27(23), 3259–3265 (2011)CrossRefGoogle Scholar
  32. 32.
    Samad, A., Huff, E.F., Cai, W., Schwartz, D.C.: Optical mapping: a novel, single-molecule approach to genomic analysis. Genome Res. 5(1), 1–4 (1995)CrossRefGoogle Scholar
  33. 33.
    Shelton, J.M., et al.: Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool. BMC Genomics 16(1), 734 (2015)CrossRefGoogle Scholar
  34. 34.
    Simpson, J.T., Durbin, R.: Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22(3), 549–556 (2012)CrossRefGoogle Scholar
  35. 35.
    Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., Birol, I.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009).  https://doi.org/10.1101/gr.089532.108CrossRefGoogle Scholar
  36. 36.
    Solares, E.A., et al.: Rapid low-cost assembly of the Drosophila melanogaster reference genome using low-coverage, long-read sequencing. G3: Genes Genomes Genet. 8(10), 3143–3154 (2018)CrossRefGoogle Scholar
  37. 37.
    Tang, H., et al.: ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol. 16(1), 3 (2015)CrossRefGoogle Scholar
  38. 38.
    Walker, B.J., et al.: Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PloS One 9(11), e112963 (2014)CrossRefGoogle Scholar
  39. 39.
    Ye, C., Hill, C.M., Wu, S., Ruan, J., Ma, Z.S.: DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Sci. Rep. 6 (2016). Article number: 31900Google Scholar
  40. 40.
    Zheng, J., Lonardi, S.: Discovery of repetitive patterns in DNA with accurate boundaries. In: Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE 2005), pp. 105–112, October 2005Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringUniversity of CaliforniaRiversideUSA

Personalised recommendations