Skip to main content

Limits to Sequencing and de novo Assembly: Classic Benchmark Sequences for Optimizing Fungal NGS Designs

  • Conference paper
Advances in Computational Biology

Abstract

Planning of pipelines for next-generation sequencing (NGS) projects could be facilitated by using simple DNA sequence benchmarks, i.e., standard test sequences that could monitor or help to predict ease or difficulty of (a) short-read sequencing and (b) de novo assembly of the sequenced reads. We propose that familiar, gene-sized sequences, including but not limited to nuclear protein-coding genes, would provide feasible consensus benchmarks allowing simple visualization. We illustrate our proposal for fungi with candidates from ribosomal DNA (rDNA, used in phylogeny and identification/diagnostics), mitochondrial DNA (mtDNA), and combinatorially constructed conceptual (synthetic) DNA sequences. The exploratory analysis of such familiar candidate loci could be a step toward finding, testing and establishing familiar, biologically interpretable consensus benchmark sequences for fungal and other eukaryotic genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Audas, T.E., Jacob, M.D., Lee, S.: Immobilization of proteins in the nucleolus by ribosomal intergenic spacer noncoding RNA. Mol. Cell 45, 147–157 (2012)

    Article  Google Scholar 

  2. Berge, C.: Graphs. North Holland, Amsterdam (1989)

    Google Scholar 

  3. Bernardi, G.: Lessons from a small, dispensable genome: The mitochondrial genome of yeast. Gene 354, 189–200 (2005)

    Article  Google Scholar 

  4. Bernardi, G.: Structural and evolutionary genomics: Natural selection in genome evolution. Elsevier, Amsterdam (2005)

    Google Scholar 

  5. Bradnam, K.R., Fass, J.N., Alexandrov, A., Baranay, P., Bechner, M., et al.: Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Giga Science (submitted, 2013), preprint at http://arxiv.org/abs/1301.5406

  6. Brooks, F.P.: The Mythical Man-Month: Essays on Software Engineering, with four new chapters, Anniversary edn. Addison-Wesley, Reading (1995)

    Google Scholar 

  7. Camp, R.: The Search for Industry Best Practices that Lead to Superior Performance, 1st edn. Productivity Press (2006)

    Google Scholar 

  8. Carels, N., Barakat, A., Bernardi, G.: The gene distribution of the maize genome. Proc. Natl. Acad. Sci. USA 92, 11057–11060 (1995)

    Article  Google Scholar 

  9. Chromatic: Extreme Programming Pocket Guide. O’Reilly Media, Sebastopol (2003)

    Google Scholar 

  10. Conway, T.C., Bromage, A.J.: Succinct data structures for assembling large genomes. Bioinformatics 27, 479–486 (2011)

    Article  Google Scholar 

  11. Deng, A., Wu, Y.: De Bruijn digraphs and affine transformations. Eur. J. Comb. 26, 1191–1206 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  12. Dimitrov, L.N., Brem, R.B., Kruglyak, L., Gottschling, D.E.: Polymorphisms in multiple genes contribute to the spontaneous mitochondrial genome instability of Saccharomyces cerevisiae S288C strains. Genetics 183, 365–383 (2009)

    Article  Google Scholar 

  13. Duzhin, S., Pasechnik, D.: Automorphisms of necklaces and sandpile groups. Preprint, arXiv:1304.2563v1 (2013)

    Google Scholar 

  14. Foury, F., Roganti, T., Lecrenier, N., Purnelle, B.: The complete sequence of the mitochondrial genome of Saccharomyces cerevisiae. FEBS Lett. 440, 325–331 (1998)

    Article  Google Scholar 

  15. Fraenkel, A.S., Gillis, J.: Proof that sequences of A, C, G, and T can be assembled to produce chains of ultimate length avoiding repetitions everywhere. Prog. Nucleic Acid Res. Mol. Biol. 5, 343–348 (1966)

    Article  Google Scholar 

  16. Gonzalez, I.L., Sylvester, J.E.: Complete sequence of the 43-kb human ribosomal DNA repeat: analysis of the intergenic spacer. Genomics 27, 320–328 (1995)

    Article  Google Scholar 

  17. Henry, T., Iwen, P.C., Hinrichs, S.H.: Identification of Aspergillus species using internal transcribed spacer regions 1 and 2. J. Clin. Microbiol. 38, 1510–1515 (2000)

    Google Scholar 

  18. Hinrikson, H.P., Hurst, S.F., De Aguirre, L., Morrison, C.J.: Molecular methods for the identification of Aspergillus species. Med. Mycol. 43 (suppl. 1), S129–S137 (2005)

    Google Scholar 

  19. Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., et al.: Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012)

    Article  Google Scholar 

  20. Kingsford, C., Schatz, M.C., Pop, M.: Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010)

    Article  Google Scholar 

  21. Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)

    Article  Google Scholar 

  22. Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009)

    Article  Google Scholar 

  23. Lovasz, L.: Combinatorial Problems and Exercises. North Holland-Elsevier, Amsterdam (1993)

    MATH  Google Scholar 

  24. Luo, R., Liu, B., Xie, Y., Li, Z., et al.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18 (2012)

    Article  Google Scholar 

  25. Lynch, M., Sung, W., Morris, K., Coffey, N., Landry, C.R., et al.: A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc. Natl. Acad. USA 105, 9272–9277 (2008)

    Article  Google Scholar 

  26. Medvedev, P., Brudno, M.: Maximum likelihood genome assembly. J. Comput. Biol. 16, 1101–1116 (2009)

    Article  MathSciNet  Google Scholar 

  27. Morgulis, A., Gertz, E.M., Schäfer, A.A., Agarwala, R.: A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J. Comp. Biol. 13, 1028–1040 (2006)

    Article  Google Scholar 

  28. Parra, G., Bradnam, K., Korf, I.: CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007)

    Article  Google Scholar 

  29. Parra, G., Bradnam, K., Ning, Z., Keane, T., Korf, I.: Assessing the gene space in draft genomes. Nucleic Acids Res. 37, 289–297 (2009)

    Article  Google Scholar 

  30. Ruskey, F.: Combinatorial Generation. Working version 1j-CSC 425/520. Available at CiteSeer:10.1.1.93.5967 (2003)

    Google Scholar 

  31. Seifert, K.A., Samson, R.A., de Waard, J.R., Houbraken, J., Lévesque, C.A., et al.: Prospects for fungus identification using CO1 DNA barcodes, with Penicillium as a test case. Proc. Natl. Acad. USA 104, 3901–3906 (2007)

    Article  Google Scholar 

  32. Thomas Jr., C.A.: Recombination of DNA molecules. Prog. Nucleic Acid Res. Mol. Biol. 5, 315–337 (1966)

    Article  Google Scholar 

  33. Wang, W., Wei, Z., Lam, T.-W., Wang, J.: Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions. Sci. Rep. 1, 55 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Fernando Muñoz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Muñoz, J.F., Misas, E., Gallo, J.E., McEwen, J.G., Clay, O.K. (2014). Limits to Sequencing and de novo Assembly: Classic Benchmark Sequences for Optimizing Fungal NGS Designs. In: Castillo, L., Cristancho, M., Isaza, G., Pinzón, A., Rodríguez, J. (eds) Advances in Computational Biology. Advances in Intelligent Systems and Computing, vol 232. Springer, Cham. https://doi.org/10.1007/978-3-319-01568-2_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01568-2_32

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01567-5

  • Online ISBN: 978-3-319-01568-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics