Sequence Databases

  • Vivek Kumar Chaturvedi
  • Divya Mishra
  • Aprajita Tiwari
  • V. P. Snijesh
  • Noor Ahmad Shaik
  • M. P. Singh


A continuous increase in the genomic data has led to the implementation of computational approaches to store, analyze, annotate, and derive the meaningful interpretation of data. This demands the need for proper databases to submit, store, edit, and update the large-scale data. One of the crucial requirements of biological databases is to provide updated high-quality data collection. Over the years, publicly available data repositories and resources have been advanced to save and cope with the staggering extent of records. Also, database a computerized archive is used to store and prepare records in such a way that the data can be retrieved effortlessly through a ramification of search criteria. A large number of databases are available that offer precious record assets for various biological researches. In general biological databases are categorized into primary, secondary, and specialized databases. As of today, there are many databases which host the specific information about particular genes and proteins from diversified biological sources like humans, animals, plants, and microbes. These databases are continuously being enriched and updated with the latest research information generated by cutting-edge biotechnological methods. All these databases act as useful platforms in extracting, analyzing, and interpreting the multidimensional molecular information. Various computer-based programs and webtools including BLAST, FASTA, Entrez, MAGE, Chime, RasMol, CASP, CAFASPI, PDB3D browse, GCG, etc. are being used widely for algorithmic analysis, modeling, and data visualization of the databases on genomics and proteomics.


Primary databases Bioinformatics BLAST Genomics 


  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410Google Scholar
  2. Ansorge WJ (2009) Next-generation-sequencing techniques. J New Biotechnol 25(4):195–203Google Scholar
  3. Apweiler R, Bairoch A, Wu CH (2004) Protein sequence databases. Curr Opin Chem Biol 8:76–80PubMedGoogle Scholar
  4. Bagchi A (2012) A brief overview of a few popular and important protein databases. Comput Mol Biosci 2:115–120Google Scholar
  5. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580PubMedPubMedCentralGoogle Scholar
  6. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Rapp BA, Wheeler DL (2000) GenBank. Nucleic Acids Res 28(1):15–18PubMedPubMedCentralGoogle Scholar
  7. Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2014) GenBank. Nucleic Acids Res 42:32–37Google Scholar
  8. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein databank. Nucleic Acids Res 28(1):235–242PubMedPubMedCentralGoogle Scholar
  9. Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C, Phan I, Pilbout S, Schneider M (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL. Nucleic Acids Res 31:365–370PubMedPubMedCentralGoogle Scholar
  10. Cochrane G, Alako B, Amid C, Bower L, Cerdeño-Tárraga A, Cleland I, Gibson R, Goodgame N, Jang M, Kay S, Leinonen R, Lin X, Lopez R, McWilliam H, Oisel A, Pakseresht N, Pallreddy S, Park Y, Plaister S, Radhakrishnan R, Rivière S, Rossello M, Senf A, Silvester N, Smirnov D, Ten Hoopen P, Toribio A, Vaughan D, Zalunin V (2013) Facing growth in the European nucleotide archive. Nucleic Acids Res 41:30–35Google Scholar
  11. Curwen V, Eyras E, Andrews TD, Clarke L, Mongin E, Searle SM, Clamp M (2004) The Ensembl automatic gene annotation system. Genome Res 14:942–950PubMedPubMedCentralGoogle Scholar
  12. Dennis AB, llene KM, David JL, James O, Barbara AR, David LW (2000) GenBank. Nucleic Acids Res 34:16–20Google Scholar
  13. Emmert DB, Stoehr PJ, Stoesser G, Cameron GN (1994) The European Bioinformatics Institute (EBI) databases. Nucleic Acids Res 22:3445–3449PubMedPubMedCentralGoogle Scholar
  14. Farrell CM, O’Leary NA, Harte RA, Loveland JE, Wilming LG, Wallin C, Diekhans M, Barrell D, Searle SM, Aken B et al (2014) Current status and new features of the Consensus Coding Sequence database. Nucleic Acids Res 42:865–872Google Scholar
  15. Finn RD, Mistry J, Schuster BB, Griffiths JS, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A (2006) Pfam: clans, webtools and services. Nucleic Acids Res 34:247–251Google Scholar
  16. Gray KA, Yates B, Seal RL, Wright MW, Bruford EA (2015) the HGNC resources in 2015. Nucleic Acids Res 43:1079–1085Google Scholar
  17. Heather JM, Chain B (2016) The sequence of sequencers: the history of sequencing DNA. Genomics 107:1–8PubMedPubMedCentralGoogle Scholar
  18. Hughes EA (2001) Sequence databases and the internet. Methods Mol Biol 167:215–223PubMedGoogle Scholar
  19. John GSM, Chellan R, Satoru T (2011) Understanding tools and techniques in protein structure prediction. In: System and computational biology. InTech London UK, 185–212Google Scholar
  20. Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J (2005) Repbase up-date, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110:462–467PubMedGoogle Scholar
  21. Kosuge T, Mashima J, Kodama Y et al (2014) DDBJ progress report: a new submission system for leading to a correct annotation. Nucleic Acids Res 42:44–49Google Scholar
  22. Morgulis A, Gertz EM, Schäffer AA, Agarwala R (2006) A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J Comput Biol 13:1028–1040PubMedGoogle Scholar
  23. Neelameghan A (1997) S.R. Ranganathan’s general theory of knowledge classification in designing, indexing, and retrieving from specialized databases. Inform J 34:25–42Google Scholar
  24. Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, Buck S, Chambers CD, Chin G, Christensen G (2015) Promoting an open research culture. Science 348:1422–1425PubMedPubMedCentralGoogle Scholar
  25. Pearson WR (1994) Using the FASTA program to search protein and DNA sequence databases. Methods Mol Biol 24:307–331PubMedGoogle Scholar
  26. Potter SC, Clarke L, Curwen V et al (2004) The Ensembl analysis pipeline. Genome Res 14:934–941PubMedPubMedCentralGoogle Scholar
  27. Ragunath PK, Venkatesan P, Ravimohan R (2009) New curriculum design model for bioinformatics postgraduate program using systems biology approach. J Comput Sci Syst Biol 2:300–305Google Scholar
  28. Ruzicka L, Bradford YM, Frazer K, Howe DG, Paddock H, Ramachandran S, Singer A, Toro S, Van Slyke CE, Eagle AE, Fashena D, Kalita P, Knight J, Mani P, Martin R, Moxon SA, Pich C, Schaper K, Shao X, Westerfield M (2015) ZFIN, the zebrafish model organism database: updates and new directions. Genesis 53:498–509PubMedPubMedCentralGoogle Scholar
  29. Smit AFA, Hubley R, Green P (2013) RepeatMasker Open-4.0.2013–2015.
  30. Stabenau A (2004) The Ensembl core software libraries. Genome Res 14:929–933PubMedPubMedCentralGoogle Scholar
  31. Tateno Y, Miyazaki S, Ota M, Sugawara H, Gojobori T (2000) DNA Data Bank of Japan (DDBJ) in collaboration with mass sequencing teams. Nucleic Acids Res 28:24–26PubMedPubMedCentralGoogle Scholar
  32. Toomula N, Kumar A, Kumar S, Bheemidi VS (2011) Biological databases-integration of life science data. J Comput Sci Syst Biol 4(5):087–092Google Scholar
  33. UniProtC (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:204–212Google Scholar
  34. Whitfield EJ, Pruess M, Apweiler R (2006) Bioinformatics database infrastructure for biotechnology research. J Biotechnol 124:629–639PubMedGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Vivek Kumar Chaturvedi
    • 1
  • Divya Mishra
    • 2
  • Aprajita Tiwari
    • 1
  • V. P. Snijesh
    • 3
  • Noor Ahmad Shaik
    • 4
  • M. P. Singh
    • 1
  1. 1.Centre of BiotechnologyUniversity of AllahabadAllahabadIndia
  2. 2.Centre of BioinformaticsUniversity of AllahabadAllahabadIndia
  3. 3.Innov4Sight Health and Biomedical System Private LimitedBangaloreIndia
  4. 4.Department of Genetic Medicine, Faculty of MedicineKing Abdulaziz UniversityJeddahSaudi Arabia

Personalised recommendations