Protein Sequence Databases

Rebhan, Michael

doi:10.1007/978-1-60327-241-4_3

Michael Rebhan³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 609))

3382 Accesses
1 Citations

Abstract

Protein sequence databases do not contain just the sequence of the protein itself but also annotation that reflects our knowledge of its function and contributing residues. In this chapter, we will discuss various public protein sequence databases, with a focus on those that are generally applicable. Special attention is paid to issues related to the reliability of both sequence and annotation, as those are fundamental to many questions researchers will ask. Using both well-annotated and scarcely annotated human proteins as examples, it will be shown what information about the targets can be collected from freely available Internet resources and how this information can be used. The results are shown to be summarized in a simple graphical model of the protein’s sequence architecture highlighting its structural and functional modules.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 159.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Stretton, A. O. W. (2002) The first sequence: Fred Sanger and insulin. Genetics 162, 527–532.
PubMed Google Scholar
Dayhoff, M. O., Eck, R. V., Chang, M. A., Sochard, M. R. (1965) Atlas of Protein Sequence and Structure. Silver Spring, Maryland: National Biomedical Research Foundation.
Google Scholar
Hunt, L. (1984) Margaret Oakley Dayhoff, 1925–1983. Bull Math Biol 46, 467–472.
Google Scholar
George, D. G., Barker, W. C., Hunt, L. T. (1986) The protein identification resource (PIR). Nucl Acids Res 14, 11–15.
Article CAS PubMed Google Scholar
Bairoch, A., Boeckmann, B. (1991) The SWISS-PROT protein sequence data bank. Nucl Acids Res 19, 2247–2249.
CAS PubMed Google Scholar
Appel, R. D., Bairoch, A., Hochstrasser, D. F. (1994) A new generation of information retrieval tools for biologists: the example of the ExPASy WWW server. Trends Biochem Sci 19, 258–260.
Article CAS PubMed Google Scholar
Rebhan, M., Chalifa-Caspi, V., Prilusky, J., Lancet, D. (1998) GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14, 656–664.
Article CAS PubMed Google Scholar
Maglott, D. R., Katz, K. S., Sicotte, H., Pruitt, K. D. (2000) NCBI’s LocusLink and RefSeq. Nucl Acids Res 28, 126–128.
Article CAS PubMed Google Scholar
(2004) Genome Res 14(Special issue on Ensembl), 925–995.
Google Scholar
Bairoch, A., Apweiler, R. (1996) The SWISS-PROT protein sequence data bank and its new supplement TREMBL. Nucl Acids Res 24, 21–25.
Article CAS PubMed Google Scholar
Claverie, J. M., Sauvaget, I., Bouqueleret, L. (1985) Computer generation and statistical analysis of a data bank of protein sequences translated from Genbank. Biochimie 67, 437–443.
Article CAS PubMed Google Scholar
Schuler, G. D., Epstein, J. A., Ohkawa, H., Kans, J. A. (1996) Entrez: molecular biology database and retrieval system. Methods Enzymol 266, 141–162.
Article CAS PubMed Google Scholar
Mulder, N., Apweiler, R. (2007) InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol Biol 396, 59–70.
Article CAS PubMed Google Scholar
Clamp, M., Fry, B., Kamal, M., Xie, X., Cuff, J., Lin, M. F., Kellis, M., Lindblad-Toh, K., Lander, E. S. (2007) Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci USA 104, 19428–19433.
Article CAS PubMed Google Scholar
Pruitt, K. D., Tatusova, T., Maglott, D. R. (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucl Acids Res 35, D61–D65.
Article CAS PubMed Google Scholar
http://www.expasy.org/uniprot/P01106
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=71774083
Barrett, T., Troup, D. B., Wilhite, S. E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I. F., Soboleva, A., Tomashevsky, M., Edgar, R. (2007) NCBI GEO: mining tens of millions of expression profiles – database and tools update. Nucl Acids Res 35, D760–D765.
Article CAS PubMed Google Scholar
Pagni, M., Ioannidis, V., Cerutti, L., Zahn-Zabal, M., Jongeneel, C. V., Falquet, L. (2004) MyHits: a new interactive resource for protein annotation and domain identification. Nucl Acids Res 32, W332–W335.
Article CAS PubMed Google Scholar
Sperisen, P., Iseli, C., Pagni, M., Stevenson, B. J., Bucher, P., Jongeneel, C. V. (2004) trome, trEST and trGEN: databases of predicted protein sequences. Nucl Acids Res 32, D509–D511.
Article CAS PubMed Google Scholar
Bult, C. J., Eppig, J. T., Kadin, J. A., Richardson, J. E., Blake, J. A. (2008) The Mouse Genome Database (MGD): mouse biology and model systems. Nucl Acids Res 36, D724–D728.
Article CAS PubMed Google Scholar
Drysdale, R. A., Crosby, M. A., FlyBase Consortium (2005) FlyBase: Genes and gene models. Nucl Acid Res 33, D390–D395.
Article CAS Google Scholar
Stein, L. D., Sternberg, P., Durbin, R., Thierry-Mieg, J., Spieth, J. (2001) WormBase: network access to the genome and biology of Caenorhabditis elegans. Nucl Acids Res 29, 82–86.
Article CAS PubMed Google Scholar
Sickmeier, M., Hamilton, J. A., LeGall, T., Vacic, V., Cortese, M. S., Tantos, A., Szabo, B., Tompa, P., Chen, J., Uversky, V. N., Obradovic, Z., Dunker, A. K. (2007) DisProt: the database of disordered proteins. Nucl Acids Res 35, D786–D793.
Article CAS PubMed Google Scholar
Hornbeck, P. V., Chabra, I., Kornhauser, J. M., Skrzypek, E., Zhang, B. (2004) PhosphoSite: a bioinformatics resource dedicated to physiological protein phosphorylation. Proteomics 4, 1551–1561.
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Head Bioinformatics Support, Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
Michael Rebhan

Authors

Michael Rebhan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Max F. Perutz Laboratories GmbH, Universität Wien, Dr. Bohr-Gasse 9, Wien, 1030, Austria
Oliviero Carugo
Research (A*STAR), Agency for Science & Technology, Biopolis Street 30, Singapore, 138671, Singapore
Frank Eisenhaber

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Rebhan, M. (2010). Protein Sequence Databases. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 609. Humana Press. https://doi.org/10.1007/978-1-60327-241-4_3

Download citation

DOI: https://doi.org/10.1007/978-1-60327-241-4_3
Published: 30 October 2009
Publisher Name: Humana Press
Print ISBN: 978-1-60327-240-7
Online ISBN: 978-1-60327-241-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics