Using GenBank

Wheeler, David

doi:10.1007/978-1-59745-535-0_2

David Wheeler²

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 406))

1838 Accesses

Summary

GenBank(R) is a comprehensive database of publicly available DNA sequences for more than 205,000 named organisms and for more than 60,000 within the embryophyta, obtained through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Daily data exchange with the European Molecular Biology Laboratory (EMBL) in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases with taxonomy, genome, mapping, protein structure, and domain information and the biomedical journal literature through PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available through FTP. GenBank usage scenarios ranging from local analyses of the data available through FTP to online analyses supported by the NCBI Web-based tools are discussed. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reference

Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and Wheeler, D.L. (2006) Genbank. Nucleic Acids Res., 34, 16–20.
Article Google Scholar
Mizrachi, I. (2004) Genbank, in The NCBI Handbook. National Center for Biotechnology Information.
Google Scholar
Cochrane, G., Aldebert, P., Althorpe, N., Andersson, M., Baker, W., Baldwin, A., Bates, K., Bhattacharyya, S., Browne, P., van den Broek, A., Castro, M., Duggan, K., Eberhardt, R., Faruque, N., Gamble, J., Kanz, C., Kulikova, T., Lee, C., Leinonen, R., Lin, Q., Lombard, V., Lopez, R., McHale, M., McWilliam, H., Mukherjee, G., Nardone, F., Pastor, M.P., Sobhany, S., Stoehr, P., Tzouvara, K., Vaughan, R., Wu, D., Zhu, W., and Apweiler, R. (2006) EMBL nucleotide sequence database: developments in 2005. Nucleic Acids Res., 34,10–15.
Article Google Scholar
Ohyanagi, H., Tanaka, T., Sakai, H., Shigemoto, Y., Yamaguchi, K., Habara, T., Fujii, Y., Antonio, B.A., Nagamura, Y., Imanishi, T., Ikeo, K., Itoh, T., Gojobori, T., and Sasaki, T. (2006) The rice annotation project database (rap-db): hub for Oryza sativa ssp. japonica genome information. Nucleic Acids Res., 34, 741–744.
Article Google Scholar
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Geer, L.Y., Helmberg, W., Kapustin, Y., Kenton, D.L., Khovayko, O., Lipman, D.J., Madden, T.L., Maglott, D.R., Ostell, J., Pruitt, K.D., Schuler, G.D., Schriml, L.M., Sequeira, E., Sherry, S.T., Sirotkin, K., Souvorov, A., Starchenko, G., Suzek, T.O., Tatusov, T., Tatusova, T.A., Wagner, L., and Yaschenko, E. (2006) Database resources of the national center for biotechnology information. Nucleic Acids Res., 34, 173–180.
Article Google Scholar
Federhen, S. (2003) The taxonomy project, in The NCBI Handbook. National Center for Biotechnology Information.
Google Scholar
The Arabidopsis Genome Initiative. (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 408, 796–815.
Google Scholar
Wang, J., Wong, G.K., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., Yu, J., and Hu, S. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science, 296, 79–92.
Google Scholar
Yamamoto, K., Sakata, K., Baba, T., Katayose, Y., Wu, J., Niimura, Y., Cheng, Z., Nagamura, Y., Sasaki, T., and Matsumoto, T. (2002) The genome sequence and structure of rice chromosome 1. Nature, 420, 312–316.
Article PubMed Google Scholar
Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno. R.F., Adams, M.D., and Kelley, J.M. (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science, 252, 1651–1656.
Article PubMed Google Scholar
Tolstoshev, C.M., Boguski, M.S, and Lowe, T.M. (1993) dbEST–database for expressed sequence tags. Nat. Genet., 4, 332–333.
Article PubMed Google Scholar
Boguski, M.S. (1995) The turning point in genome research. Trends Biochem. Sci., 20, 295–296.
Article CAS PubMed Google Scholar
Wagner, L., Pontius, J.U., and Schuler, G.D. (2003) Unigene: A unified view of the transcriptome in The NCBI Handbook. National Center for Biotechnology Information.
Google Scholar
Kitts, A., and Sherry, S. (2003) The single nucleotide polymorphism database (dbSNP) of nucleotide sequence variation, in The NCBI Handbook. National Center for Biotechnology Information.
Google Scholar
Ostell, J.M. (2003) The entrez search and retrieval system, in The NCBI Handbook. National Center for Biotechnology Information.
Google Scholar
Anderson, J., Fedorova, N., DeWeese-Scott, C., Geer, L.Y., Hurwitz, D., Jackson, J.J., Jacobs, A., Lanczycki, C., Liebert, C., and Marchler-Bauer, A. (2005) MMdb: Entrez’s 3D-structure database. Nucleic Acids Res., 33, D192–D196.
Article PubMed Google Scholar
Sayers, E., and Bryant, S. (2003) Macromolecular structure databases, in The NCBI Handbook. National Center for Biotechnology Information.
Google Scholar
Jentsch, J., Canese, K., and Myers, C. Pubmed: the bibliographic database, in The NCBI Handbook. National Center for Biotechnology Information.
Google Scholar
Beck, J., and Sequeira, E. (2003) Pubmed central (PMC): an archive for literature from life sciences journals, in The NCBI Handbook. National Center for Biotechnology Information.
Google Scholar
Madden, T. (2003) The blast sequence analysis tool, in The NCBI Handbook. National Center for Biotechnology Information.
Google Scholar
Kwan, K. Linkout: linking to external resources from entrez databases, in The NCBI Handbook. National Center for Biotechnology Information.
Google Scholar
Jaiswal, P., Ni, J., Yap, I., Ware, D., Spooner, W., Youens-Clark, K., Ren, L., Liang, C., Zhao, W., Ratnapu, K., Faga, B., Canaran, P., Fogleman, M., Hebbard, C., Avraham, S., Schmidt, S., Casstevens, T.M., Buckler, E.S., Stein, L., and McCouch S. (2006) Gramene: a bird’s eye view of cereal genomes. Nucleic Acids Res., 34, D717–D723.
Article CAS PubMed Google Scholar
Garcia-Hernandez, M., Berardini, T.Z., Chen, G., Crist, D., Doyle, A., Huala, E., Knee, E., Lambrecht, M., Miller, N., Mueller, L.A., Mundodi, D., Reiser, L., Rhee, S.Y., Scholl, R., Tacklind, J., Weems, D.C., Wu, Y., Xu, I., Yoo, D., Yoon, J., and Zhang, P. (2002) TAIR: a resource for integrated Arabidopsis data. Funct. Integr. Genomics, 2, 239.
Article CAS PubMed Google Scholar
Gundlach, H., Lemcke, K., Rudd, S., Kolesov, G., Arnold, R., Mewes, H.W., Mayer, K.F., Schoof, H., and Zaccaria, P. (2002) MIPS Arabidopsis thaliana database (MAtdb): an integrated biological knowledge resource based on the first complete plant genome. Nucleic Acids Res., 30, 91–93.
Article PubMed Google Scholar
Dong, Q., Polacco, M.L., Seigfried, T.E., Lawrence, C.J., and Brendel, V. (2004) MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res., 32, D393–D397.
Article PubMed Google Scholar
Gene Ontology Consortium. (2006) The gene ontology (GO) project in 2006. Nucleic Acids Res., 34, 322–326.
Article Google Scholar
Sayers, E., and Wheeler, D. (2004) Building customized data pipelines using the entrez programming utilities (eutils), in NCBI Short Courses. National Center for Biotechnology Information.
Google Scholar
Pruitt, K.D., Tatusova, T., and Maglott, D.R. (2005) NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res., 33(1), D501–D504.
CAS PubMed Google Scholar
Tatusova, T., Pruitt, K.D., and Ostell, J.M.(2003) The reference sequence (refseq) project, in The NCBI Handbook. National Center for Biotechnology Information.
Google Scholar
Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Altschul, S.F., and Lipman, D.J. (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.
Article PubMed Google Scholar
Madden, T.L., and McGinnis, S., (2004) BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 32, W20–W25.
Article PubMed Google Scholar
Wootton, J.C., and Federhen, S. (1996) Analysis of compositionally biased regions in sequence databases. Methods Enzymol., 266, 554–571.
Article CAS PubMed Google Scholar
Zhang, Z., Schwartz, S., Wagner, L., and Miller, W. (2000) A greedy algorithm for aligning DNA sequences. J. Comput. Biol., 7(1–2), 203–214.
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD
David Wheeler

Authors

David Wheeler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Primary Industries Research Victoria, Plant Genetics and Genomics,Victorian AgriBiosciences Centre, La Trobe University, Bundoora, Victoria, Australia
David Edwards

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Wheeler, D. (2007). Using GenBank. In: Edwards, D. (eds) Plant Bioinformatics. Methods in Molecular Biology™, vol 406. Humana Press. https://doi.org/10.1007/978-1-59745-535-0_2

Download citation

DOI: https://doi.org/10.1007/978-1-59745-535-0_2
Publisher Name: Humana Press
Print ISBN: 978-1-58829-653-5
Online ISBN: 978-1-59745-535-0
eBook Packages: Springer Protocols

Publish with us

Policies and ethics