Resources for functional annotation

Bridge, A. J.; Veuthey, A.-Lise; Mulder, N. J.

doi:10.1007/978-3-211-75123-7_8

A. J. Bridge³,
A.-Lise Veuthey³ &
N. J. Mulder⁴

887 Accesses

Abstract

The continued success of genome sequencing projects has led to an explosion in the availability of sequence data. The Genomes On Line Database (GOLD) currently lists more than 2000 ongoing and completed genome projects, and this number is continuously increasing (Liolios et al. 2008). In order for this sequence information to be useful in the formulation and testing of biological hypotheses, these genome sequences must be adequately annotated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32: D226–D229
Article PubMed CAS Google Scholar
Attwood TK, Bradley P, Flower DR, Gaulton A, Maudling N, Mitchell AL, Moulton G, Nordle A, Paine K, Taylor P, Uddin A, Zygouri C (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res 31: 400–402
Article PubMed CAS Google Scholar
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2003) GenBank. Nucleic Acids Res 31: 23–27
Article PubMed CAS Google Scholar
Berman H, Henrick K, Nakamura H, Markley JL (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 35: D301–D303
Article PubMed CAS Google Scholar
Bieri T, Blasiar D, Ozersky P, Antoshechkin I, Bastiani C, Canaran P, Chan J, Chen N, Chen WJ, Davis P, Fiedler TJ, Girard L, Han M, Harris TW, Kishore R, Lee R, McKay S, Muller HM, Nakamura C, Petcherski A, Rangarajan A, Rogers A, Schindelman G, Schwarz EM, Spooner W, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Durbin R, Stein LD, Sternberg PW, Spieth J (2007) WormBase: new content and better access. Nucleic Acids Res 35: D506–D510
Article PubMed CAS Google Scholar
Bru C, Courcelle E, Carrere S, Beausse Y, Dalmar S, Kahn D (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res 33: D212–D215
Article PubMed CAS Google Scholar
Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R (2004) The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with gene ontology. Nucleic Acids Res 32: D262–D266
Article PubMed CAS Google Scholar
Christie KR, Weng S, Balakrishnan R, Costanzo MC, Dolinski K, Dwight SS, Engel SR, Feierbach B, Fisk DG, Hirschman JE, Hong EL, Issel-Tarver L, Nash R, Sethuraman A, Starr B, Theesfeld CL, Andrada R, Binkley G, Dong Q, Lane C, Schroeder M, Botstein D, Cherry JM (2004) Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res 32:D311–D314
Article PubMed CAS Google Scholar
Cooper CA, Joshi HJ, Harrison MJ, Wilkins MR, Packer NH (2003) GlycoSuiteDB: a curated relational database of glycoprotein glycan structures and their biological sources. 2003 update. Nucleic Acids Res 31: 511–513
Article PubMed CAS Google Scholar
Crosby MA, Goodman JL, Strelets VB, Zhang P, Gelbart WM (2007) FlyBase: genomes by the dozen. Nucleic Acids Res 35: D486–D491
Article PubMed CAS Google Scholar
Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, UK
Google Scholar
Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34: D247–D251
Article PubMed CAS Google Scholar
Fleischmann W, Moller S, Gateau A, Apweiler R (1999) A novel method for automatic functional annotation of proteins. Bioinformatics 15: 228–233
Article PubMed CAS Google Scholar
Garavelli JS (2004) The RESID Database of Protein Modifications as a resource and annotation tool. Proteomics 4: 1527–1533
Article PubMed CAS Google Scholar
Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A (2003) ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res 31: 3784–3788
Article PubMed CAS Google Scholar
Gattiker A, Michoud K, Rivoire C, Auchincloss AH, Coudert E, Lima T, Kersey P, Pagni M, Sigrist CJ, Lachaize C, Veuthey AL, Gasteiger E, Bairoch A (2003) Automated annotation of microbial proteomes in SWISS-PROT. Comput Biol Chem 27: 49–58
Article PubMed CAS Google Scholar
Gene Ontology Consortium (2006) The Gene Ontology (GO) project in 2006. Nucleic Acids Res 34: D322–D326
Article CAS Google Scholar
Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA (2007) The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 35: D291–D297
Article PubMed CAS Google Scholar
Gribskov M, Luthy R, Eisenberg D (1990) Profile analysis. Methods Enzymol 183: 146–159
Article PubMed CAS Google Scholar
Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Ouverdin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A, Birney E (2007) Ensembl 2007. Nucleic Acids Res 35: D610–D617
Article PubMed CAS Google Scholar
Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ (2006) The PROSITE database. Nucleic Acids Res 34: D227–D230
Article PubMed CAS Google Scholar
Karp PD, Ouzounis CA, Moore-Kochlacs C, Goldovsky L, Kaipa P, Ahren D, Tsoka S, Darzentas N, Kunin V, Lopez-Bigas N (2005) Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res 33: 6083–6089
Article PubMed CAS Google Scholar
Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, Kohler C, Khadake J, Leroy C, Liban A, Lieftink C, Montecchi-Palazzi L, Orchard S, Risse J, Robbe K, Roechert B, Thorneycroft D, Zhang Y, Apweiler R, Hermjakob H (2007) IntAct — open source resource for molecular interaction data. Nucleic Acids Res 35: D561–D565
Article PubMed CAS Google Scholar
Kersey P, Bower L, Morris L, Horne A, Petryszak R, Kanz C, Kanapin A, Das U, Michoud K, Phan I, Gattiker A, Kulikova T, Faruque N, Duggan K, McLaren P, Reimholz B, Duret L, Penel S, Reuter I, Apweiler R (2005) Integr8 and genome reviews: integrated views of complete genomes and proteomes. Nucleic Acids Res 33: D297–D302
Article PubMed CAS Google Scholar
Kersey PJ, Duarte J, Williams A, Karavidopoulou Y, Birney E, Apweiler R (2004) The International Protein Index: an integrated database for proteomics experiments. Proteomics 4: 1985–1988
Article PubMed CAS Google Scholar
Kopp J, Schwede T (2006) The SWISS-MODEL repository: new features and functionalities. Nucleic Acids Res 34: D315–D318
Article PubMed CAS Google Scholar
Kretschmann E, Fleischmann W, Apweiler R (2001) Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT. Bioinformatics 17: 920–926
Article PubMed CAS Google Scholar
Krogh A, Brown M, Mian IS, Sjolander K, Haussler D(1994) Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 235: 1501–1531
Article PubMed CAS Google Scholar
Kuhn RM, Karolchik D, Zweig AS, Trumbower H, Thomas DJ, Thakkapallayil A, Sugnet CW, Stanke M, Smith KE, Siepel A, Rosenbloom KR, Rhead B, Raney BJ, Pohl A, Pedersen JS, Hsu F, Hinrichs AS, Harte RA, Diekhans M, Clawson H, Bejerano G, Barber GP, Baertsch R, Haussler D, Kent WJ (2007) The UCSC genome browser database: update 2007. Nucleic Acids Res 35: D668–D673
Article PubMed CAS Google Scholar
Leinonen R, Diez FG, Binns D, Fleischmann W, Lopez R, Apweiler R(2004) UniProt archive. Bioinformatics 20: 3236–3237
Article PubMed CAS Google Scholar
Lenhard B, Hayes WS, Wasserman WW (2001) GeneLynx: a gene-centric portal to the human genome. Genome Res 11: 2151–2157
Article PubMed CAS Google Scholar
Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res 34: D257–D260
Article PubMed CAS Google Scholar
Liolios K, Mavromatis K, Tavernarakis N, Kyrpides NC (2008) The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 36: D475–D479
Article PubMed CAS Google Scholar
McKusick VA (2007) Mendelian Inheritance in Man and its online version, OMIM. Am J Hum Genet 80: 588–604
Article PubMed CAS Google Scholar
Mi H, Guo N, Kejariwal A, Thomas PD (2007) PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways. Nucleic Acids Res 35: D247–D252
Article PubMed CAS Google Scholar
Miyazaki S, Sugawara H, Gojobori T, Tateno Y (2003) DNA Data Bank of Japan (DDBJ) in XML. Nucleic Acids Res 31: 13–16
Article PubMed CAS Google Scholar
Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Nikolskaya AN, Orchard S, Orengo C, Petryszak R, Selengut JD, Sigrist CJ, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C (2007) New developments in the InterPro database. Nucleic Acids Res 35: D224–D228
Article PubMed CAS Google Scholar
Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M (2006) Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127: 635–648
Article PubMed CAS Google Scholar
Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated nonredundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35: D61–D65
Article PubMed CAS Google Scholar
Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33: W116–W120
Article PubMed CAS Google Scholar
Safran M, Solomon I, Shmueli O, Lapidot M, Shen-Orr S, Adato A, Ben-Dor U, Esterman N, Rosen N, Peter I, Olender T, Chalifa-Caspi V, Lancet D (2002) GeneCards 2002: towards a complete, objectoriented, human gene compendium. Bioinformatics 18: 1542–1543
Article PubMed CAS Google Scholar
Selengut JD, Haft DH, Davidsen T, Ganapathy A, Gwinn-Giglio M, Nelson WC, Richter AR, White O (2007) TIGRFAMs and genome properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res 35: D260–D264
Article PubMed CAS Google Scholar
Stoesser G, Baker W, van den Broek A, Garcia-Pastor M, Kanz C, Kulikova T, Leinonen R, Lin Q, Lombard V, Lopez R, Mancuso R, Nardone F, Stoehr P, Tuli MA, Tzouvara K, Vaughan R (2003) The EMBL Nucleotide Sequence Database: major new developments. Nucleic Acids Res 31: 17–22
Article PubMed CAS Google Scholar
Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH (2007) UniRef: comprehensive and nonredundant UniProt reference clusters. Bioinformatics 23: 1282–1288
Article PubMed CAS Google Scholar
Tagari M, Tate J, Swaminathan GJ, Newman R, Naim A, Vranken W, Kapopoulou A, Hussain A, Fillon J, Henrick K, Velankar S (2006) E-MSD: improving data deposition and structure quality. Nucleic Acids Res 34: D287–D290
Article PubMed CAS Google Scholar
Tamaki S, Arakawa K, Kono N, Tomita M (2007) Restauro-G: a rapid genome re-annotation system for comparative genomics. Genomics Proteomics Bioinformatics 5: 53–58
Article PubMed CAS Google Scholar
The International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437: 1299–1320
Article CAS Google Scholar
UniProt Consortium (2007) The Universal Protein Resource (UniProt). Nucleic Acids Res 35: D193–D197
Article Google Scholar
Vastrik I, D’Eustachio P, Schmidt E, Joshi-Tope G, Gopinath G, Croft D, de Bono B, Gillespie M, Jassal B, Lewis S, Matthews L, Wu G, Birney E, Stein L (2007) Reactome: a knowledge base of biologic pathways and processes. Genome Biol 8: R39
Article PubMed CAS Google Scholar
Wilson D, Madera M, Vogel C, Chothia C, Gough J (2007) The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Res 35: D308–D313
Article PubMed CAS Google Scholar
Wu CH, Nikolskaya A, Huang H, Yeh LS, Natale DA, Vinayaka CR, Hu ZZ, Mazumder R, Kumar S, Kourtesis P, Ledley RS, Suzek BE, Arminski L, Chen Y, Zhang J, Cardenas JL, Chung S, Castro-Alvear J, Dinkov G, Barker WC (2004) PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res 32: D112–D114
Article PubMed CAS Google Scholar
Yeats C, Maibaum M, Marsden R, Dibley M, Lee D, Addou S, Orengo CA (2006) Gene3D: modelling protein structure, function and evolution. Nucleic Acids Res 34: D281–D284
Article PubMed CAS Google Scholar
Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A (2004) The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants. Hum Mutat 23: 464–470
Article PubMed CAS Google Scholar
Zdobnov EM, Apweiler R (2001) InterProScan — an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17: 847–848
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Swiss-Prot, Swiss-Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211, Geneva 4, Switzerland
A. J. Bridge & A.-Lise Veuthey
EMBL Outstation, European Bioinformatics Institute Hinxton, Cambridge, UK
N. J. Mulder

Authors

A. J. Bridge
View author publications
You can also search for this author in PubMed Google Scholar
A.-Lise Veuthey
View author publications
You can also search for this author in PubMed Google Scholar
N. J. Mulder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. J. Bridge .

Editor information

Editors and Affiliations

Wissenschaftszentrum Weihenstephan, TU München, Freising, Germany
Dmitrij Frishman
Structural and Computational Programme, Spanish National Cancer Research Centre, Madrid, Spain
Alfonso Valencia

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bridge, A.J., Veuthey, AL., Mulder, N.J. (2008). Resources for functional annotation. In: Frishman, D., Valencia, A. (eds) Modern Genome Annotation. Springer, Vienna. https://doi.org/10.1007/978-3-211-75123-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-211-75123-7_8
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-75122-0
Online ISBN: 978-3-211-75123-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics