Skip to main content

Construction of a Comprehensive Database from the Existing Viral Sequences Available from the International Nucleotide Sequence Database Collaboration

  • Protocol
  • First Online:
The Human Virome

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1838))

Abstract

The progress in viromics research has led to the accumulation of a large number of sequences from different types of viruses obtained from different sources. Most databases are specific to different of species or types of viruses. However, raw sequences, as deposited in the reliable online collections, provide a valuable asset in the exploration of genomic and metagenomics datasets.

The International Nucleotide Sequence Database Collaboration (INSDC) is the largest coordinated effort for compiling, sharing, and maintaining the most comprehensive collections of nucleic acids deposited throughout the most important public databases. The compendium includes different types of data such as complete genomes, genes, expressed sequence tags, and data generated by whole genome shotgun analyses spanning all domains of life, as well as the most complete collection of viral sequences available online.

This chapter presents simplified computational methods for the automation of viral nucleotide sequence retrieval from online repositories of the INSDC databases, including all available sequences, except synthetic ones. The subsequent steps can be used for obtaining the taxonomy (including ranks: virus type, baltimore classification, order, family, subfamily, genus and species), and split the database into species subsets to dereplicate the sequences for other downstream applications. Only basic computational knowledge is required.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Beerenwinkel N, Günthard HF, Roth V et al (2012) Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data. Front Microbiol 3:1–16

    Article  CAS  Google Scholar 

  2. Pérez-Brocal V, García-López R, Vázquez-Castellanos JF et al (2013) Study of the viral and microbial communities associated with Crohn’s disease: a metagenomic approach. Clin Transl Gastroenterol 4:e36

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Morgan GJ (2016) What is a virus species? Radical pluralism in viral taxonomy. Stud Hist Phil Biol Biomed Sci 59:64–70

    Article  Google Scholar 

  4. Calisher CH (2016) The taxonomy of viruses should include viruses. Arch Virol 161:1419–1422

    Article  CAS  PubMed  Google Scholar 

  5. Cobián Güemes AG, Youle M, Cantú VA et al (2016) Viruses as winners in the game of life. Annu Rev Virol 3(1):197–214

    Article  CAS  PubMed  Google Scholar 

  6. Van Regenmortel MH, Ackermann HW, Calisher CH et al (2013) Virus species polemics: 14 senior virologists oppose a proposed change to the ICTV definition of virus species. Arch Virol 158:1115–1119

    Article  CAS  PubMed  Google Scholar 

  7. Edwards R, Rohwer F (2005) Viral metagenomics. Nat Rev Microbiol 3:504–510

    Article  CAS  PubMed  Google Scholar 

  8. Proux C, van Sinderen D, Suarez J et al (2002) The dilemma of phage taxonomy illustrated by comparative genomics of Sfi21-like Siphoviridae in lactic acid bacteria. J Bacteriol 184:6026–6036

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Gibbs AJ (2013) Viral taxonomy needs a spring clean. Its exploration era is over. Virol J 10:254

    Article  PubMed  PubMed Central  Google Scholar 

  10. Moreira D, López-García P (2009) Ten reasons to exclude viruses from the tree of life. Nat Rev Microbiol 7:306–311

    Article  CAS  PubMed  Google Scholar 

  11. International Committee on Taxonomy of Viruses, ICTV Species List 2016 v 1.2, (https://talk.ictvonline.org/)

  12. Roux S, Emerson JB, Eloe-Fadrosh EA et al (2017) Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. PeerJ 5:e3817

    Article  PubMed  PubMed Central  Google Scholar 

  13. Watkins SC, Putonti C (2017) The use of informativity in the development of robust viromics-based examinations. PeerJ 5:e3281

    Article  PubMed  PubMed Central  Google Scholar 

  14. Pickett BE, Sadat EL, Zhang Y et al (2012) ViPR: an open bioinformatics database and analysis resource for virology research. Nucleic Acids Res 40:593–598

    Article  CAS  Google Scholar 

  15. Liechti R, Gleizes A, Kuznetsov D et al (2010) OpenFluDB, a database for human and animal influenza virus. Database (Oxford) 2010:baq004

    Google Scholar 

  16. Cochrane G, Karsch-Mizrachi I, Takagi T (2016) The international nucleotide sequence database collaboration. Nucleic Acids Res 44(D1):D48–D50

    Article  CAS  PubMed  Google Scholar 

  17. Nakamura Y, Cochrane G, Karsch-Mizrachi I (2013) The international nucleotide sequence database collaboration. Nucleic Acids Res 41(Database issue):D21–D24

    Article  CAS  PubMed  Google Scholar 

  18. Rognes T, Flouri T, Nichols B et al (2016) VSEARCH: a versatile open source tool for metagenomics. PeerJ 4:e2584

    Article  PubMed  PubMed Central  Google Scholar 

  19. Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodrigo García-López .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

García-López, R. (2018). Construction of a Comprehensive Database from the Existing Viral Sequences Available from the International Nucleotide Sequence Database Collaboration. In: Moya, A., Pérez Brocal, V. (eds) The Human Virome. Methods in Molecular Biology, vol 1838. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8682-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8682-8_16

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8681-1

  • Online ISBN: 978-1-4939-8682-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics