Abstract
Planning of pipelines for next-generation sequencing (NGS) projects could be facilitated by using simple DNA sequence benchmarks, i.e., standard test sequences that could monitor or help to predict ease or difficulty of (a) short-read sequencing and (b) de novo assembly of the sequenced reads. We propose that familiar, gene-sized sequences, including but not limited to nuclear protein-coding genes, would provide feasible consensus benchmarks allowing simple visualization. We illustrate our proposal for fungi with candidates from ribosomal DNA (rDNA, used in phylogeny and identification/diagnostics), mitochondrial DNA (mtDNA), and combinatorially constructed conceptual (synthetic) DNA sequences. The exploratory analysis of such familiar candidate loci could be a step toward finding, testing and establishing familiar, biologically interpretable consensus benchmark sequences for fungal and other eukaryotic genomes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Audas, T.E., Jacob, M.D., Lee, S.: Immobilization of proteins in the nucleolus by ribosomal intergenic spacer noncoding RNA. Mol. Cell 45, 147–157 (2012)
Berge, C.: Graphs. North Holland, Amsterdam (1989)
Bernardi, G.: Lessons from a small, dispensable genome: The mitochondrial genome of yeast. Gene 354, 189–200 (2005)
Bernardi, G.: Structural and evolutionary genomics: Natural selection in genome evolution. Elsevier, Amsterdam (2005)
Bradnam, K.R., Fass, J.N., Alexandrov, A., Baranay, P., Bechner, M., et al.: Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Giga Science (submitted, 2013), preprint at http://arxiv.org/abs/1301.5406
Brooks, F.P.: The Mythical Man-Month: Essays on Software Engineering, with four new chapters, Anniversary edn. Addison-Wesley, Reading (1995)
Camp, R.: The Search for Industry Best Practices that Lead to Superior Performance, 1st edn. Productivity Press (2006)
Carels, N., Barakat, A., Bernardi, G.: The gene distribution of the maize genome. Proc. Natl. Acad. Sci. USA 92, 11057–11060 (1995)
Chromatic: Extreme Programming Pocket Guide. O’Reilly Media, Sebastopol (2003)
Conway, T.C., Bromage, A.J.: Succinct data structures for assembling large genomes. Bioinformatics 27, 479–486 (2011)
Deng, A., Wu, Y.: De Bruijn digraphs and affine transformations. Eur. J. Comb. 26, 1191–1206 (2005)
Dimitrov, L.N., Brem, R.B., Kruglyak, L., Gottschling, D.E.: Polymorphisms in multiple genes contribute to the spontaneous mitochondrial genome instability of Saccharomyces cerevisiae S288C strains. Genetics 183, 365–383 (2009)
Duzhin, S., Pasechnik, D.: Automorphisms of necklaces and sandpile groups. Preprint, arXiv:1304.2563v1 (2013)
Foury, F., Roganti, T., Lecrenier, N., Purnelle, B.: The complete sequence of the mitochondrial genome of Saccharomyces cerevisiae. FEBS Lett. 440, 325–331 (1998)
Fraenkel, A.S., Gillis, J.: Proof that sequences of A, C, G, and T can be assembled to produce chains of ultimate length avoiding repetitions everywhere. Prog. Nucleic Acid Res. Mol. Biol. 5, 343–348 (1966)
Gonzalez, I.L., Sylvester, J.E.: Complete sequence of the 43-kb human ribosomal DNA repeat: analysis of the intergenic spacer. Genomics 27, 320–328 (1995)
Henry, T., Iwen, P.C., Hinrichs, S.H.: Identification of Aspergillus species using internal transcribed spacer regions 1 and 2. J. Clin. Microbiol. 38, 1510–1515 (2000)
Hinrikson, H.P., Hurst, S.F., De Aguirre, L., Morrison, C.J.: Molecular methods for the identification of Aspergillus species. Med. Mycol. 43 (suppl. 1), S129–S137 (2005)
Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., et al.: Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012)
Kingsford, C., Schatz, M.C., Pop, M.: Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010)
Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)
Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009)
Lovasz, L.: Combinatorial Problems and Exercises. North Holland-Elsevier, Amsterdam (1993)
Luo, R., Liu, B., Xie, Y., Li, Z., et al.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18 (2012)
Lynch, M., Sung, W., Morris, K., Coffey, N., Landry, C.R., et al.: A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc. Natl. Acad. USA 105, 9272–9277 (2008)
Medvedev, P., Brudno, M.: Maximum likelihood genome assembly. J. Comput. Biol. 16, 1101–1116 (2009)
Morgulis, A., Gertz, E.M., Schäfer, A.A., Agarwala, R.: A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J. Comp. Biol. 13, 1028–1040 (2006)
Parra, G., Bradnam, K., Korf, I.: CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007)
Parra, G., Bradnam, K., Ning, Z., Keane, T., Korf, I.: Assessing the gene space in draft genomes. Nucleic Acids Res. 37, 289–297 (2009)
Ruskey, F.: Combinatorial Generation. Working version 1j-CSC 425/520. Available at CiteSeer:10.1.1.93.5967 (2003)
Seifert, K.A., Samson, R.A., de Waard, J.R., Houbraken, J., Lévesque, C.A., et al.: Prospects for fungus identification using CO1 DNA barcodes, with Penicillium as a test case. Proc. Natl. Acad. USA 104, 3901–3906 (2007)
Thomas Jr., C.A.: Recombination of DNA molecules. Prog. Nucleic Acid Res. Mol. Biol. 5, 315–337 (1966)
Wang, W., Wei, Z., Lam, T.-W., Wang, J.: Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions. Sci. Rep. 1, 55 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Muñoz, J.F., Misas, E., Gallo, J.E., McEwen, J.G., Clay, O.K. (2014). Limits to Sequencing and de novo Assembly: Classic Benchmark Sequences for Optimizing Fungal NGS Designs. In: Castillo, L., Cristancho, M., Isaza, G., Pinzón, A., Rodríguez, J. (eds) Advances in Computational Biology. Advances in Intelligent Systems and Computing, vol 232. Springer, Cham. https://doi.org/10.1007/978-3-319-01568-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-01568-2_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01567-5
Online ISBN: 978-3-319-01568-2
eBook Packages: EngineeringEngineering (R0)