Summary
GenBank(R) is a comprehensive database of publicly available DNA sequences for more than 205,000 named organisms and for more than 60,000 within the embryophyta, obtained through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Daily data exchange with the European Molecular Biology Laboratory (EMBL) in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases with taxonomy, genome, mapping, protein structure, and domain information and the biomedical journal literature through PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available through FTP. GenBank usage scenarios ranging from local analyses of the data available through FTP to online analyses supported by the NCBI Web-based tools are discussed. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Reference
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and Wheeler, D.L. (2006) Genbank. Nucleic Acids Res., 34, 16–20.
Mizrachi, I. (2004) Genbank, in The NCBI Handbook. National Center for Biotechnology Information.
Cochrane, G., Aldebert, P., Althorpe, N., Andersson, M., Baker, W., Baldwin, A., Bates, K., Bhattacharyya, S., Browne, P., van den Broek, A., Castro, M., Duggan, K., Eberhardt, R., Faruque, N., Gamble, J., Kanz, C., Kulikova, T., Lee, C., Leinonen, R., Lin, Q., Lombard, V., Lopez, R., McHale, M., McWilliam, H., Mukherjee, G., Nardone, F., Pastor, M.P., Sobhany, S., Stoehr, P., Tzouvara, K., Vaughan, R., Wu, D., Zhu, W., and Apweiler, R. (2006) EMBL nucleotide sequence database: developments in 2005. Nucleic Acids Res., 34,10–15.
Ohyanagi, H., Tanaka, T., Sakai, H., Shigemoto, Y., Yamaguchi, K., Habara, T., Fujii, Y., Antonio, B.A., Nagamura, Y., Imanishi, T., Ikeo, K., Itoh, T., Gojobori, T., and Sasaki, T. (2006) The rice annotation project database (rap-db): hub for Oryza sativa ssp. japonica genome information. Nucleic Acids Res., 34, 741–744.
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Geer, L.Y., Helmberg, W., Kapustin, Y., Kenton, D.L., Khovayko, O., Lipman, D.J., Madden, T.L., Maglott, D.R., Ostell, J., Pruitt, K.D., Schuler, G.D., Schriml, L.M., Sequeira, E., Sherry, S.T., Sirotkin, K., Souvorov, A., Starchenko, G., Suzek, T.O., Tatusov, T., Tatusova, T.A., Wagner, L., and Yaschenko, E. (2006) Database resources of the national center for biotechnology information. Nucleic Acids Res., 34, 173–180.
Federhen, S. (2003) The taxonomy project, in The NCBI Handbook. National Center for Biotechnology Information.
The Arabidopsis Genome Initiative. (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 408, 796–815.
Wang, J., Wong, G.K., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., Yu, J., and Hu, S. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science, 296, 79–92.
Yamamoto, K., Sakata, K., Baba, T., Katayose, Y., Wu, J., Niimura, Y., Cheng, Z., Nagamura, Y., Sasaki, T., and Matsumoto, T. (2002) The genome sequence and structure of rice chromosome 1. Nature, 420, 312–316.
Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno. R.F., Adams, M.D., and Kelley, J.M. (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science, 252, 1651–1656.
Tolstoshev, C.M., Boguski, M.S, and Lowe, T.M. (1993) dbEST–database for expressed sequence tags. Nat. Genet., 4, 332–333.
Boguski, M.S. (1995) The turning point in genome research. Trends Biochem. Sci., 20, 295–296.
Wagner, L., Pontius, J.U., and Schuler, G.D. (2003) Unigene: A unified view of the transcriptome in The NCBI Handbook. National Center for Biotechnology Information.
Kitts, A., and Sherry, S. (2003) The single nucleotide polymorphism database (dbSNP) of nucleotide sequence variation, in The NCBI Handbook. National Center for Biotechnology Information.
Ostell, J.M. (2003) The entrez search and retrieval system, in The NCBI Handbook. National Center for Biotechnology Information.
Anderson, J., Fedorova, N., DeWeese-Scott, C., Geer, L.Y., Hurwitz, D., Jackson, J.J., Jacobs, A., Lanczycki, C., Liebert, C., and Marchler-Bauer, A. (2005) MMdb: Entrez’s 3D-structure database. Nucleic Acids Res., 33, D192–D196.
Sayers, E., and Bryant, S. (2003) Macromolecular structure databases, in The NCBI Handbook. National Center for Biotechnology Information.
Jentsch, J., Canese, K., and Myers, C. Pubmed: the bibliographic database, in The NCBI Handbook. National Center for Biotechnology Information.
Beck, J., and Sequeira, E. (2003) Pubmed central (PMC): an archive for literature from life sciences journals, in The NCBI Handbook. National Center for Biotechnology Information.
Madden, T. (2003) The blast sequence analysis tool, in The NCBI Handbook. National Center for Biotechnology Information.
Kwan, K. Linkout: linking to external resources from entrez databases, in The NCBI Handbook. National Center for Biotechnology Information.
Jaiswal, P., Ni, J., Yap, I., Ware, D., Spooner, W., Youens-Clark, K., Ren, L., Liang, C., Zhao, W., Ratnapu, K., Faga, B., Canaran, P., Fogleman, M., Hebbard, C., Avraham, S., Schmidt, S., Casstevens, T.M., Buckler, E.S., Stein, L., and McCouch S. (2006) Gramene: a bird’s eye view of cereal genomes. Nucleic Acids Res., 34, D717–D723.
Garcia-Hernandez, M., Berardini, T.Z., Chen, G., Crist, D., Doyle, A., Huala, E., Knee, E., Lambrecht, M., Miller, N., Mueller, L.A., Mundodi, D., Reiser, L., Rhee, S.Y., Scholl, R., Tacklind, J., Weems, D.C., Wu, Y., Xu, I., Yoo, D., Yoon, J., and Zhang, P. (2002) TAIR: a resource for integrated Arabidopsis data. Funct. Integr. Genomics, 2, 239.
Gundlach, H., Lemcke, K., Rudd, S., Kolesov, G., Arnold, R., Mewes, H.W., Mayer, K.F., Schoof, H., and Zaccaria, P. (2002) MIPS Arabidopsis thaliana database (MAtdb): an integrated biological knowledge resource based on the first complete plant genome. Nucleic Acids Res., 30, 91–93.
Dong, Q., Polacco, M.L., Seigfried, T.E., Lawrence, C.J., and Brendel, V. (2004) MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res., 32, D393–D397.
Gene Ontology Consortium. (2006) The gene ontology (GO) project in 2006. Nucleic Acids Res., 34, 322–326.
Sayers, E., and Wheeler, D. (2004) Building customized data pipelines using the entrez programming utilities (eutils), in NCBI Short Courses. National Center for Biotechnology Information.
Pruitt, K.D., Tatusova, T., and Maglott, D.R. (2005) NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res., 33(1), D501–D504.
Tatusova, T., Pruitt, K.D., and Ostell, J.M.(2003) The reference sequence (refseq) project, in The NCBI Handbook. National Center for Biotechnology Information.
Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Altschul, S.F., and Lipman, D.J. (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.
Madden, T.L., and McGinnis, S., (2004) BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 32, W20–W25.
Wootton, J.C., and Federhen, S. (1996) Analysis of compositionally biased regions in sequence databases. Methods Enzymol., 266, 554–571.
Zhang, Z., Schwartz, S., Wagner, L., and Miller, W. (2000) A greedy algorithm for aligning DNA sequences. J. Comput. Biol., 7(1–2), 203–214.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Humana Press Inc.
About this protocol
Cite this protocol
Wheeler, D. (2007). Using GenBank. In: Edwards, D. (eds) Plant Bioinformatics. Methods in Molecular Biology™, vol 406. Humana Press. https://doi.org/10.1007/978-1-59745-535-0_2
Download citation
DOI: https://doi.org/10.1007/978-1-59745-535-0_2
Publisher Name: Humana Press
Print ISBN: 978-1-58829-653-5
Online ISBN: 978-1-59745-535-0
eBook Packages: Springer Protocols