Abstract
The progress in viromics research has led to the accumulation of a large number of sequences from different types of viruses obtained from different sources. Most databases are specific to different of species or types of viruses. However, raw sequences, as deposited in the reliable online collections, provide a valuable asset in the exploration of genomic and metagenomics datasets.
The International Nucleotide Sequence Database Collaboration (INSDC) is the largest coordinated effort for compiling, sharing, and maintaining the most comprehensive collections of nucleic acids deposited throughout the most important public databases. The compendium includes different types of data such as complete genomes, genes, expressed sequence tags, and data generated by whole genome shotgun analyses spanning all domains of life, as well as the most complete collection of viral sequences available online.
This chapter presents simplified computational methods for the automation of viral nucleotide sequence retrieval from online repositories of the INSDC databases, including all available sequences, except synthetic ones. The subsequent steps can be used for obtaining the taxonomy (including ranks: virus type, baltimore classification, order, family, subfamily, genus and species), and split the database into species subsets to dereplicate the sequences for other downstream applications. Only basic computational knowledge is required.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Beerenwinkel N, Günthard HF, Roth V et al (2012) Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data. Front Microbiol 3:1–16
Pérez-Brocal V, García-López R, Vázquez-Castellanos JF et al (2013) Study of the viral and microbial communities associated with Crohn’s disease: a metagenomic approach. Clin Transl Gastroenterol 4:e36
Morgan GJ (2016) What is a virus species? Radical pluralism in viral taxonomy. Stud Hist Phil Biol Biomed Sci 59:64–70
Calisher CH (2016) The taxonomy of viruses should include viruses. Arch Virol 161:1419–1422
Cobián Güemes AG, Youle M, Cantú VA et al (2016) Viruses as winners in the game of life. Annu Rev Virol 3(1):197–214
Van Regenmortel MH, Ackermann HW, Calisher CH et al (2013) Virus species polemics: 14 senior virologists oppose a proposed change to the ICTV definition of virus species. Arch Virol 158:1115–1119
Edwards R, Rohwer F (2005) Viral metagenomics. Nat Rev Microbiol 3:504–510
Proux C, van Sinderen D, Suarez J et al (2002) The dilemma of phage taxonomy illustrated by comparative genomics of Sfi21-like Siphoviridae in lactic acid bacteria. J Bacteriol 184:6026–6036
Gibbs AJ (2013) Viral taxonomy needs a spring clean. Its exploration era is over. Virol J 10:254
Moreira D, López-García P (2009) Ten reasons to exclude viruses from the tree of life. Nat Rev Microbiol 7:306–311
International Committee on Taxonomy of Viruses, ICTV Species List 2016 v 1.2, (https://talk.ictvonline.org/)
Roux S, Emerson JB, Eloe-Fadrosh EA et al (2017) Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. PeerJ 5:e3817
Watkins SC, Putonti C (2017) The use of informativity in the development of robust viromics-based examinations. PeerJ 5:e3281
Pickett BE, Sadat EL, Zhang Y et al (2012) ViPR: an open bioinformatics database and analysis resource for virology research. Nucleic Acids Res 40:593–598
Liechti R, Gleizes A, Kuznetsov D et al (2010) OpenFluDB, a database for human and animal influenza virus. Database (Oxford) 2010:baq004
Cochrane G, Karsch-Mizrachi I, Takagi T (2016) The international nucleotide sequence database collaboration. Nucleic Acids Res 44(D1):D48–D50
Nakamura Y, Cochrane G, Karsch-Mizrachi I (2013) The international nucleotide sequence database collaboration. Nucleic Acids Res 41(Database issue):D21–D24
Rognes T, Flouri T, Nichols B et al (2016) VSEARCH: a versatile open source tool for metagenomics. PeerJ 4:e2584
Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
García-López, R. (2018). Construction of a Comprehensive Database from the Existing Viral Sequences Available from the International Nucleotide Sequence Database Collaboration. In: Moya, A., Pérez Brocal, V. (eds) The Human Virome. Methods in Molecular Biology, vol 1838. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8682-8_16
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8682-8_16
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8681-1
Online ISBN: 978-1-4939-8682-8
eBook Packages: Springer Protocols