Limits to Sequencing and de novo Assembly: Classic Benchmark Sequences for Optimizing Fungal NGS Designs

Muñoz, José Fernando; Misas, Elizabeth; Gallo, Juan Esteban; McEwen, Juan Guillermo; Clay, Oliver Keatinge

doi:10.1007/978-3-319-01568-2_32

José Fernando Muñoz^7,8,
Elizabeth Misas^7,8,
Juan Esteban Gallo^7,9,
Juan Guillermo McEwen^7,10 &
…
Oliver Keatinge Clay^7,11

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 232))

2160 Accesses
2 Citations
2 Altmetric

Abstract

Planning of pipelines for next-generation sequencing (NGS) projects could be facilitated by using simple DNA sequence benchmarks, i.e., standard test sequences that could monitor or help to predict ease or difficulty of (a) short-read sequencing and (b) de novo assembly of the sequenced reads. We propose that familiar, gene-sized sequences, including but not limited to nuclear protein-coding genes, would provide feasible consensus benchmarks allowing simple visualization. We illustrate our proposal for fungi with candidates from ribosomal DNA (rDNA, used in phylogeny and identification/diagnostics), mitochondrial DNA (mtDNA), and combinatorially constructed conceptual (synthetic) DNA sequences. The exploratory analysis of such familiar candidate loci could be a step toward finding, testing and establishing familiar, biologically interpretable consensus benchmark sequences for fungal and other eukaryotic genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Audas, T.E., Jacob, M.D., Lee, S.: Immobilization of proteins in the nucleolus by ribosomal intergenic spacer noncoding RNA. Mol. Cell 45, 147–157 (2012)
Article Google Scholar
Berge, C.: Graphs. North Holland, Amsterdam (1989)
Google Scholar
Bernardi, G.: Lessons from a small, dispensable genome: The mitochondrial genome of yeast. Gene 354, 189–200 (2005)
Article Google Scholar
Bernardi, G.: Structural and evolutionary genomics: Natural selection in genome evolution. Elsevier, Amsterdam (2005)
Google Scholar
Bradnam, K.R., Fass, J.N., Alexandrov, A., Baranay, P., Bechner, M., et al.: Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Giga Science (submitted, 2013), preprint at http://arxiv.org/abs/1301.5406
Brooks, F.P.: The Mythical Man-Month: Essays on Software Engineering, with four new chapters, Anniversary edn. Addison-Wesley, Reading (1995)
Google Scholar
Camp, R.: The Search for Industry Best Practices that Lead to Superior Performance, 1st edn. Productivity Press (2006)
Google Scholar
Carels, N., Barakat, A., Bernardi, G.: The gene distribution of the maize genome. Proc. Natl. Acad. Sci. USA 92, 11057–11060 (1995)
Article Google Scholar
Chromatic: Extreme Programming Pocket Guide. O’Reilly Media, Sebastopol (2003)
Google Scholar
Conway, T.C., Bromage, A.J.: Succinct data structures for assembling large genomes. Bioinformatics 27, 479–486 (2011)
Article Google Scholar
Deng, A., Wu, Y.: De Bruijn digraphs and affine transformations. Eur. J. Comb. 26, 1191–1206 (2005)
Article MathSciNet MATH Google Scholar
Dimitrov, L.N., Brem, R.B., Kruglyak, L., Gottschling, D.E.: Polymorphisms in multiple genes contribute to the spontaneous mitochondrial genome instability of Saccharomyces cerevisiae S288C strains. Genetics 183, 365–383 (2009)
Article Google Scholar
Duzhin, S., Pasechnik, D.: Automorphisms of necklaces and sandpile groups. Preprint, arXiv:1304.2563v1 (2013)
Google Scholar
Foury, F., Roganti, T., Lecrenier, N., Purnelle, B.: The complete sequence of the mitochondrial genome of Saccharomyces cerevisiae. FEBS Lett. 440, 325–331 (1998)
Article Google Scholar
Fraenkel, A.S., Gillis, J.: Proof that sequences of A, C, G, and T can be assembled to produce chains of ultimate length avoiding repetitions everywhere. Prog. Nucleic Acid Res. Mol. Biol. 5, 343–348 (1966)
Article Google Scholar
Gonzalez, I.L., Sylvester, J.E.: Complete sequence of the 43-kb human ribosomal DNA repeat: analysis of the intergenic spacer. Genomics 27, 320–328 (1995)
Article Google Scholar
Henry, T., Iwen, P.C., Hinrichs, S.H.: Identification of Aspergillus species using internal transcribed spacer regions 1 and 2. J. Clin. Microbiol. 38, 1510–1515 (2000)
Google Scholar
Hinrikson, H.P., Hurst, S.F., De Aguirre, L., Morrison, C.J.: Molecular methods for the identification of Aspergillus species. Med. Mycol. 43 (suppl. 1), S129–S137 (2005)
Google Scholar
Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., et al.: Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012)
Article Google Scholar
Kingsford, C., Schatz, M.C., Pop, M.: Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010)
Article Google Scholar
Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)
Article Google Scholar
Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009)
Article Google Scholar
Lovasz, L.: Combinatorial Problems and Exercises. North Holland-Elsevier, Amsterdam (1993)
MATH Google Scholar
Luo, R., Liu, B., Xie, Y., Li, Z., et al.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18 (2012)
Article Google Scholar
Lynch, M., Sung, W., Morris, K., Coffey, N., Landry, C.R., et al.: A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc. Natl. Acad. USA 105, 9272–9277 (2008)
Article Google Scholar
Medvedev, P., Brudno, M.: Maximum likelihood genome assembly. J. Comput. Biol. 16, 1101–1116 (2009)
Article MathSciNet Google Scholar
Morgulis, A., Gertz, E.M., Schäfer, A.A., Agarwala, R.: A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J. Comp. Biol. 13, 1028–1040 (2006)
Article Google Scholar
Parra, G., Bradnam, K., Korf, I.: CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007)
Article Google Scholar
Parra, G., Bradnam, K., Ning, Z., Keane, T., Korf, I.: Assessing the gene space in draft genomes. Nucleic Acids Res. 37, 289–297 (2009)
Article Google Scholar
Ruskey, F.: Combinatorial Generation. Working version 1j-CSC 425/520. Available at CiteSeer:10.1.1.93.5967 (2003)
Google Scholar
Seifert, K.A., Samson, R.A., de Waard, J.R., Houbraken, J., Lévesque, C.A., et al.: Prospects for fungus identification using CO1 DNA barcodes, with Penicillium as a test case. Proc. Natl. Acad. USA 104, 3901–3906 (2007)
Article Google Scholar
Thomas Jr., C.A.: Recombination of DNA molecules. Prog. Nucleic Acid Res. Mol. Biol. 5, 315–337 (1966)
Article Google Scholar
Wang, W., Wei, Z., Lam, T.-W., Wang, J.: Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions. Sci. Rep. 1, 55 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Cellular and Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia
José Fernando Muñoz, Elizabeth Misas, Juan Esteban Gallo, Juan Guillermo McEwen & Oliver Keatinge Clay
Institute of Biology, Universidad de Antioquia, Medellín, Colombia
José Fernando Muñoz & Elizabeth Misas
Doctoral Program in Biomedical Sciences, Universidad del Rosario, Bogotá, Colombia
Juan Esteban Gallo
School of Medicine, Universidad de Antioquia, Medellín, Colombia
Juan Guillermo McEwen
School of Medicine and Health Sciences, Universidad del Rosario, Bogotá, Colombia
Oliver Keatinge Clay

Authors

José Fernando Muñoz
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth Misas
View author publications
You can also search for this author in PubMed Google Scholar
Juan Esteban Gallo
View author publications
You can also search for this author in PubMed Google Scholar
Juan Guillermo McEwen
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Keatinge Clay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José Fernando Muñoz .

Editor information

Editors and Affiliations

University of Caldas, Manizales, Colombia
Luis F. Castillo
Cenicafé - Centro Nacional de Investigaciones del Café en Colombia, Chinchiná, Colombia
Marco Cristancho
University of Caldas, Manizales, Colombia
Gustavo Isaza
BIOS - Centro Bioinformática y Biologia Computacional de Colombia, Manizales, Colombia
Andrés Pinzón
Department of Computer Science School of Science, University of Salamanca, Salamanca, Spain
Juan Manuel Corchado Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muñoz, J.F., Misas, E., Gallo, J.E., McEwen, J.G., Clay, O.K. (2014). Limits to Sequencing and de novo Assembly: Classic Benchmark Sequences for Optimizing Fungal NGS Designs. In: Castillo, L., Cristancho, M., Isaza, G., Pinzón, A., Rodríguez, J. (eds) Advances in Computational Biology. Advances in Intelligent Systems and Computing, vol 232. Springer, Cham. https://doi.org/10.1007/978-3-319-01568-2_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-01568-2_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01567-5
Online ISBN: 978-3-319-01568-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics