Bioinformatics pp 381-401 | Cite as

Data and Databases

  • Daniel Damian


Owing to the recent advances in technology and to the growth in the number and size of projects tasked with collecting and assembling biological and genomic information, a highly heterogeneous collection of databases have become available to the community in the last two decades. As a consequence of rapid and distributed progress throughout the field, bioinformatics databases are provided in a variety of formats and specifications. This chapter discusses the most frequently encountered data formats in bioinformatics and the tools used to access these data.


Relational Database Regular Expression Document Type Definition European Molecular Biology Laboratory Flat File 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Aho A, Hopcroft JE, Ullman J (1986) Compilers: Principles, techniques, and tools. Addison-Wesley, ReadingGoogle Scholar
  2. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410PubMedGoogle Scholar
  3. Appel A (1998) Modern compiler implementation in C. Cambridge University Press, New YorkGoogle Scholar
  4. Stajich et al (2002) The bioperl toolkit: Perl modules for the life sciences, Genome Res (12):1611–1618Google Scholar
  5. Codd EF (1983) A relational model of data for large shared data banks. Commun ACM 26, 1 (Jan. 1983):64–69Google Scholar
  6. The Document Object Model (
  7. Extensible Markup Language (XML) (2006) 1.0 (Fourth Edition). W3C Recommendation, March 2006 (
  8. The Gene Ontology Consortium (2000) Gene Ontology: Tool for the unification of biology. Nature Genet 25:25–29Google Scholar
  9. Grune D, Jacobs CJH (1990) Parsing techniques – A practical guide. Ellis Horwood, Chichester, England (∼dick/PTAPG.html)
  10. Higgins DG, Sharp PM (1988) CLUSTAL: A package for performing multiple sequence alignment on a microcomputer. Gene 73:237–244CrossRefPubMedGoogle Scholar
  11. Hubbard TJP, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y et al (2007) Ensembl 2007. Nucleic Acids Res 35, Database issue:D610–D617Google Scholar
  12. IUPAC-IUB Joint Commission on Biochemical Nomenclature (1984) Nomenclature and symbolism for amino acids and peptides. Recommendations. Eur J Biochem 138:9–37Google Scholar
  13. OWL Web Ontology Language Reference (1999) W3C Recommendation November 1999 (
  14. Rice P, Longden I, Bleasby A (2000) EMBOSS: The European molecular biology open software suite. Trends Genet 16(6):276–277CrossRefPubMedGoogle Scholar
  15. The European Bioinformatics Institute (
  16. The UniProt Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res 36:D190–D195CrossRefGoogle Scholar
  17. The World-Wide-Web Consortium (
  18. XML (1999) Path Language (XPath). W3C Recommendation, November 1999 (
  19. XQuery 1.0: An XML Query Language. W3C Recommendation January 2007 (
  20. XSL Transformations (XSLT) (1999) Version 1.0. W3C Recommendation November 1999 (
  21. W3C Semantic Web Activity (
  22. Wall L, Christiansen T, Orwant J (2000) Programming Perl. O’Reilly & Associates, Inc.Google Scholar
  23. Wilkinson M, Links M (2002) BioMOBY: An open source biological web services proposal. Brief Bioinformatics 3(4):331–341CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.BioWisdom Ltd.CambridgeUK

Personalised recommendations