DNA Sequence Databases

  • David Edwards
  • David Hansen
  • Jason E. Stajich


The ability to sequence the DNA of an organism has become one of the most important tools in modern biological research. Beginning as a manual process, where DNA was sequenced a few tens or hundreds of nucleotides at a time, DNA sequencing is now performed by high throughput sequencing machines, with billions of bases of DNA being sequenced daily around the world. The recent development of “next generation” sequencing technology increases the throughput of sequence production many fold and reduces costs by orders of magnitude. This will eventually enable the sequencing of the whole genome of an individual for under 1,000 dollars. However, mechanisms for sharing and analysing this data, and for the efficient storage of the data, will become more critical as the amount of data being collected grows. Most importantly for biologists around the world, the analysis of this data will depend on the quality of the sequence data and annotations which are maintained in the public databases.

In this chapter we will give an overview of sequencing technology as it has changed over time, including some of the new technologies that will enable the sequencing of personal genomes. We then discuss the public DNA databases which collect, check, and publish DNA sequences from around the world. Finally we describe how to access this data.


Whole Genome Shotgun Document Type Definition European Molecular Biology Laboratory Model Organism Database International Nucleotide Sequence Database Collaboration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H et al (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252:1651–1656CrossRefPubMedGoogle Scholar
  2. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410PubMedGoogle Scholar
  3. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2006) Genbank. Nucl Acids Res 34:D16–D20CrossRefPubMedGoogle Scholar
  4. Cochrane G, Aldebert P, Althorpe N, Andersson M, Baker W, Baldwin A et al (2006) EMBL nucleotide sequence database: developments in 2005. Nucl Acids Res 34:D10–D15CrossRefPubMedGoogle Scholar
  5. Drysdale R (2008) FlyBase – A database for the Drosophila research community. Methods Mol Biol 420:45–59CrossRefPubMedGoogle Scholar
  6. Federhen, S. (2003) The taxonomy project, in The NCBI Handbook. National Center for Biotechnology Information.Google Scholar
  7. Hong EL, Balakrishnan R, Dong Q, Christie KR, Park J, Binkley G et al (2008) Gene Ontology annotations at SGD: new data sources and annotation methods. Nucleic Acids Res 36:D577–D581CrossRefPubMedGoogle Scholar
  8. Kanz C, Aldebert P, Althorpe N, Baker W, Baldwin A, Bates K et al (2005) The EMBL Nucleotide Sequence Database. Nucleic Acids Res 33:D29–D33CrossRefPubMedGoogle Scholar
  9. Lee Y, Tsai J, Sunkara S, Karamycheva S, Pertea G, Sultana R et al (2005) The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes. Nucl Acids Res 33:D71–D74CrossRefPubMedGoogle Scholar
  10. Leinonen R, Nardone F, Oyewole O, Redaschi N, Stoehr P (2003) The EMBL sequence version archive. Bioinformatics 19:1861–1862CrossRefPubMedGoogle Scholar
  11. Rogers A, Antoshechkin I, Bieri T, Blasiar D, Bastiani C, Canaran P et al (2008) WormBase. Nucleic Acids Res 36:D612–D617CrossRefPubMedGoogle Scholar
  12. Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74(12):5463–5467CrossRefPubMedGoogle Scholar
  13. Sugawara H, Ogasawara O, Okubo K, Gojobori T, Tateno Y (2008) DDBJ with new system and face. Nucleic Acids Res 36:D22–D24CrossRefPubMedGoogle Scholar
  14. Swarbreck S, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H et al (2008) The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res 36:D1009–D1014CrossRefPubMedGoogle Scholar
  15. The International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Australian Centre for Plant Functional Genomics, Institute for Molecular Biosciences and School of Land, Crop and Food SciencesUniversity of QueenslandBrisbaneAustralia

Personalised recommendations