Skip to main content

DNA Sequence Databases

  • Chapter
  • First Online:
Bioinformatics

Abstract

The ability to sequence the DNA of an organism has become one of the most important tools in modern biological research. Beginning as a manual process, where DNA was sequenced a few tens or hundreds of nucleotides at a time, DNA sequencing is now performed by high throughput sequencing machines, with billions of bases of DNA being sequenced daily around the world. The recent development of “next generation” sequencing technology increases the throughput of sequence production many fold and reduces costs by orders of magnitude. This will eventually enable the sequencing of the whole genome of an individual for under 1,000 dollars. However, mechanisms for sharing and analysing this data, and for the efficient storage of the data, will become more critical as the amount of data being collected grows. Most importantly for biologists around the world, the analysis of this data will depend on the quality of the sequence data and annotations which are maintained in the public databases.

In this chapter we will give an overview of sequencing technology as it has changed over time, including some of the new technologies that will enable the sequencing of personal genomes. We then discuss the public DNA databases which collect, check, and publish DNA sequences from around the world. Finally we describe how to access this data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H et al (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252:1651–1656

    Article  CAS  PubMed  Google Scholar 

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    CAS  PubMed  Google Scholar 

  • Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2006) Genbank. Nucl Acids Res 34:D16–D20

    Article  CAS  PubMed  Google Scholar 

  • Cochrane G, Aldebert P, Althorpe N, Andersson M, Baker W, Baldwin A et al (2006) EMBL nucleotide sequence database: developments in 2005. Nucl Acids Res 34:D10–D15

    Article  CAS  PubMed  Google Scholar 

  • Drysdale R (2008) FlyBase – A database for the Drosophila research community. Methods Mol Biol 420:45–59

    Article  CAS  PubMed  Google Scholar 

  • Federhen, S. (2003) The taxonomy project, in The NCBI Handbook. National Center for Biotechnology Information.

    Google Scholar 

  • Hong EL, Balakrishnan R, Dong Q, Christie KR, Park J, Binkley G et al (2008) Gene Ontology annotations at SGD: new data sources and annotation methods. Nucleic Acids Res 36:D577–D581

    Article  CAS  PubMed  Google Scholar 

  • Kanz C, Aldebert P, Althorpe N, Baker W, Baldwin A, Bates K et al (2005) The EMBL Nucleotide Sequence Database. Nucleic Acids Res 33:D29–D33

    Article  CAS  PubMed  Google Scholar 

  • Lee Y, Tsai J, Sunkara S, Karamycheva S, Pertea G, Sultana R et al (2005) The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes. Nucl Acids Res 33:D71–D74

    Article  CAS  PubMed  Google Scholar 

  • Leinonen R, Nardone F, Oyewole O, Redaschi N, Stoehr P (2003) The EMBL sequence version archive. Bioinformatics 19:1861–1862

    Article  CAS  PubMed  Google Scholar 

  • Rogers A, Antoshechkin I, Bieri T, Blasiar D, Bastiani C, Canaran P et al (2008) WormBase. Nucleic Acids Res 36:D612–D617

    Article  CAS  PubMed  Google Scholar 

  • Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74(12):5463–5467

    Article  CAS  PubMed  Google Scholar 

  • Sugawara H, Ogasawara O, Okubo K, Gojobori T, Tateno Y (2008) DDBJ with new system and face. Nucleic Acids Res 36:D22–D24

    Article  CAS  PubMed  Google Scholar 

  • Swarbreck S, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H et al (2008) The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res 36:D1009–D1014

    Article  CAS  PubMed  Google Scholar 

  • The International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Edwards .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Edwards, D., Hansen, D., Stajich, J.E. (2009). DNA Sequence Databases. In: Edwards, D., Stajich, J., Hansen, D. (eds) Bioinformatics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-92738-1_1

Download citation

Publish with us

Policies and ethics