Infrastructure for distributed protein annotation

Reeves, G. A.; Prlic, A.; Jimenez, R. C.; Kulesha, E.; Hermjakob, H.

doi:10.1007/978-3-211-75123-7_18

Infrastructure for distributed protein annotation

G. A. Reeves³,
A. Prlic⁴,
R. C. Jimenez^3,5,
E. Kulesha³ &
…
H. Hermjakob³

Chapter

874 Accesses

Abstract

Understanding human variation and disease often requires knowledge of a broad array of biomolecular data items, down to the role of an individual amino acid in a protein, and how mutations or alternative splicing events can change function and phenotype. There are a number of key databases that collect biomolecular information; the EMBL DNA database (Cochrane et al. 2006) and Ensembl (Flicek et al. 2007) collect annotations on genomic sequence features, the UniProt knowledge base (Bairoch et al. 2005) provides detailed annotation on protein sequences, and the Worldwide PDB member databases (Berman et al. 2007) provide protein structural information. Whilst these databases house a great deal of information on sequences and structures, the advent of high throughput methods in genome sequencing and structural genomics initiatives has produced an explosion in the quantity of uncharacterised data. As a result, the development of tools which annotate these sequences and structures by prediction or transfer of information from homologous relatives has also increased in number and diversity. These methods are crucial in order to fill in the functional space between characterised and uncharacterised protein sequences and structures.

These authors contributed equally.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abascal F, Valencia A (2003) Automatic annotation of protein function based on family identification. Proteins 53: 683–692
Article PubMed CAS Google Scholar
Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32: D226–D229
Article PubMed CAS Google Scholar
Andreeva A, Prlic A, Hubbard TJ, Murzin AG (2007) SISYPHUS — structural alignments for proteins with non-trivial relationships. Nucleic Acids Res 35: D253–D259
Article PubMed CAS Google Scholar
Attwood TK, Bradley P, Flower DR, Gaulton A, Maudling N, Mitchell AL, Moulton G, Nordle A, Paine K, Taylor P, Uddin A, Zygouri C (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res 31: 400–402
Article PubMed CAS Google Scholar
Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O’Donovan C, Redaschi N, Yeh LS (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res 33(Database Issue): D154–D159
Article PubMed CAS Google Scholar
Bendtsen JD, Nielsen H, von Heijne G, Brunak S (2004) Improved prediction of signal peptides: signal P 3.0. J Mol Biol 340: 783–795
Article PubMed Google Scholar
Berman H, Henrick K, Nakamura H, Markley JL (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 35: D301–D303
Article PubMed CAS Google Scholar
Blom N, Gammeltoft S, Brunak S (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294: 1351–1362
Article PubMed CAS Google Scholar
Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT (2005) Protein structure prediction servers at University College London. Nucleic Acids Res 33: W36–W38
Article PubMed CAS Google Scholar
Cochrane G, Aldebert P, Althorpe N, Andersson M, Baker W, Baldwin A, Bates K, Bhattacharyya S, Browne P, van den Broek A, Castro M, Duggan K, Eberhardt R, Faruque N, Gamble J, Kanz C, Kulikova T, Lee C, Leinonen R, Lin Q, Lombard V, Lopez R, McHale M, McWilliam H, Mukherjee G, Nardone F, Pastor MP, Sobhany S, Stoehr P, Tzouvara K, Vaughan R, Wu D, Zhu W, Apweiler R (2006) EMBL nucleotide sequence database: developments in 2005. Nucleic Acids Res 34: D10–D15
Article PubMed CAS Google Scholar
Day-Richter J, Harris MA, Haendel M, Lewis S (2007) OBO-Edit — an ontology editor for biologists. Bioinformatics 23: 2198–2200
Article PubMed CAS Google Scholar
Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L (2001) The distributed annotation system. BMC Bioinformatics 2: 7
Article PubMed CAS Google Scholar
Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M (2005) The sequence ontology: a tool for the unification of genome annotations. Genome Biol 6: R44
Article PubMed Google Scholar
Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300: 1005–1016
Article PubMed CAS Google Scholar
Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34: D247–D251
Article PubMed CAS Google Scholar
Finn RD, Stalker JW, Jackson DK, Kulesha E, Clements J, Pettett R (2007) ProServer: a simple, extensible Perl DAS server. Bioinformatics 23: 1568–1570
Article PubMed CAS Google Scholar
Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Eyre T, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Holland R, Howe KL, Howe K, Johnson N, Jenkinson A, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Slater G, Smedley D, Spudich G, Trevanion S, Vilella AJ, Vogel J, White S, Wood M, Birney E, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Hubbard TJ, Kasprzyk A, Proctor G, Smith J, Ureta-Vidal A, Searle S (2007) Ensembl 2008. Nucleic Acids Res 36: D707–D714
Article PubMed Google Scholar
Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ (2006) The PROSITE database. Nucleic Acids Res 34: D227–D230
Article PubMed CAS Google Scholar
Jones DT, Bryson K, Coleman A, McGuffin LJ, Sadowski MI, Sodhi JS, Ward JJ (2005a) Prediction of novel and analogous folds using fragment assembly and fold recognition. Proteins 61(Suppl 7): 143–151
Article PubMed CAS Google Scholar
Jones DT, Taylor WR, Thornton JM (1994) A model recognition approach to the prediction of allhelical membrane protein structure and topology. Biochemistry 33: 3038–3049
Article PubMed CAS Google Scholar
Jones P, Vinod N, Down T, Hackmann A, Kahari A, Kretschmann E, Quinn A, Wieser D, Hermjakob H, Apweiler R (2005b) Dasty and UniProt DAS: a perfect pair for protein feature visualization. Bioinformatics 21: 3198–3199
Article PubMed CAS Google Scholar
Julenius K, Molgaard A, Gupta R, Brunak S (2005) Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology 15: 153–164
Article PubMed CAS Google Scholar
Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res 34: D257–D260
Article PubMed CAS Google Scholar
Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T, Tramontano A (2007) Critical assessment of methods of protein structure prediction-Round VII. Proteins 69(Suppl 8): 3–9
Article PubMed CAS Google Scholar
Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, Copley R, Courcelle E, Das U, Durbin R, Fleischmann W, Gough J, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McDowall J, Mitchell A, Nikolskaya AN, Orchard S, Pagni M, Ponting CP, Quevillon E, Selengut J, Sigrist CJ, Silventoinen V, Studholme DJ, Vaughan R, Wu CH (2005) InterPro, progress and status in 2005. Nucleic Acids Res 33: D201–D205
Article PubMed CAS Google Scholar
Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, Orengo C (2005) The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 33: D247–D251
Article PubMed CAS Google Scholar
Prlic A, Down TA, Hubbard TJ (2005) Adding some SPICE to DAS. Bioinformatics 21(Suppl 2): ii40–ii41
Article PubMed CAS Google Scholar
Prlic A, Down TA, Kulesha E, Finn RD, Kahari A, Hubbard TJ (2007) Integrating sequence and structural biology with DAS. BMC Bioinformatics 8:333
Article PubMed Google Scholar
Torrance JW, Bartlett GJ, Porter CT, Thornton JM (2005) Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families. J Mol Biol 347: 565–581
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

European Molecular Biology Laboratory Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
G. A. Reeves, R. C. Jimenez, E. Kulesha & H. Hermjakob
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
A. Prlic
Bioinformatics and Genomics Department, Centra de Investigacion Principe Felipe (CIPF), Valencia, Spain
R. C. Jimenez

Authors

G. A. Reeves
View author publications
You can also search for this author in PubMed Google Scholar
A. Prlic
View author publications
You can also search for this author in PubMed Google Scholar
R. C. Jimenez
View author publications
You can also search for this author in PubMed Google Scholar
E. Kulesha
View author publications
You can also search for this author in PubMed Google Scholar
H. Hermjakob
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to H. Hermjakob .

Editor information

Editors and Affiliations

Wissenschaftszentrum Weihenstephan, TU München, Freising, Germany
Dmitrij Frishman
Structural and Computational Programme, Spanish National Cancer Research Centre, Madrid, Spain
Alfonso Valencia

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Reeves, G.A., Prlic, A., Jimenez, R.C., Kulesha, E., Hermjakob, H. (2008). Infrastructure for distributed protein annotation. In: Frishman, D., Valencia, A. (eds) Modern Genome Annotation. Springer, Vienna. https://doi.org/10.1007/978-3-211-75123-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-211-75123-7_18
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-75122-0
Online ISBN: 978-3-211-75123-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics