Skip to main content

Infrastructure for distributed protein annotation

  • Chapter
  • 874 Accesses

Abstract

Understanding human variation and disease often requires knowledge of a broad array of biomolecular data items, down to the role of an individual amino acid in a protein, and how mutations or alternative splicing events can change function and phenotype. There are a number of key databases that collect biomolecular information; the EMBL DNA database (Cochrane et al. 2006) and Ensembl (Flicek et al. 2007) collect annotations on genomic sequence features, the UniProt knowledge base (Bairoch et al. 2005) provides detailed annotation on protein sequences, and the Worldwide PDB member databases (Berman et al. 2007) provide protein structural information. Whilst these databases house a great deal of information on sequences and structures, the advent of high throughput methods in genome sequencing and structural genomics initiatives has produced an explosion in the quantity of uncharacterised data. As a result, the development of tools which annotate these sequences and structures by prediction or transfer of information from homologous relatives has also increased in number and diversity. These methods are crucial in order to fill in the functional space between characterised and uncharacterised protein sequences and structures.

These authors contributed equally.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abascal F, Valencia A (2003) Automatic annotation of protein function based on family identification. Proteins 53: 683–692

    Article  PubMed  CAS  Google Scholar 

  • Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32: D226–D229

    Article  PubMed  CAS  Google Scholar 

  • Andreeva A, Prlic A, Hubbard TJ, Murzin AG (2007) SISYPHUS — structural alignments for proteins with non-trivial relationships. Nucleic Acids Res 35: D253–D259

    Article  PubMed  CAS  Google Scholar 

  • Attwood TK, Bradley P, Flower DR, Gaulton A, Maudling N, Mitchell AL, Moulton G, Nordle A, Paine K, Taylor P, Uddin A, Zygouri C (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res 31: 400–402

    Article  PubMed  CAS  Google Scholar 

  • Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O’Donovan C, Redaschi N, Yeh LS (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res 33(Database Issue): D154–D159

    Article  PubMed  CAS  Google Scholar 

  • Bendtsen JD, Nielsen H, von Heijne G, Brunak S (2004) Improved prediction of signal peptides: signal P 3.0. J Mol Biol 340: 783–795

    Article  PubMed  Google Scholar 

  • Berman H, Henrick K, Nakamura H, Markley JL (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 35: D301–D303

    Article  PubMed  CAS  Google Scholar 

  • Blom N, Gammeltoft S, Brunak S (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294: 1351–1362

    Article  PubMed  CAS  Google Scholar 

  • Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT (2005) Protein structure prediction servers at University College London. Nucleic Acids Res 33: W36–W38

    Article  PubMed  CAS  Google Scholar 

  • Cochrane G, Aldebert P, Althorpe N, Andersson M, Baker W, Baldwin A, Bates K, Bhattacharyya S, Browne P, van den Broek A, Castro M, Duggan K, Eberhardt R, Faruque N, Gamble J, Kanz C, Kulikova T, Lee C, Leinonen R, Lin Q, Lombard V, Lopez R, McHale M, McWilliam H, Mukherjee G, Nardone F, Pastor MP, Sobhany S, Stoehr P, Tzouvara K, Vaughan R, Wu D, Zhu W, Apweiler R (2006) EMBL nucleotide sequence database: developments in 2005. Nucleic Acids Res 34: D10–D15

    Article  PubMed  CAS  Google Scholar 

  • Day-Richter J, Harris MA, Haendel M, Lewis S (2007) OBO-Edit — an ontology editor for biologists. Bioinformatics 23: 2198–2200

    Article  PubMed  CAS  Google Scholar 

  • Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L (2001) The distributed annotation system. BMC Bioinformatics 2: 7

    Article  PubMed  CAS  Google Scholar 

  • Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M (2005) The sequence ontology: a tool for the unification of genome annotations. Genome Biol 6: R44

    Article  PubMed  Google Scholar 

  • Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300: 1005–1016

    Article  PubMed  CAS  Google Scholar 

  • Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34: D247–D251

    Article  PubMed  CAS  Google Scholar 

  • Finn RD, Stalker JW, Jackson DK, Kulesha E, Clements J, Pettett R (2007) ProServer: a simple, extensible Perl DAS server. Bioinformatics 23: 1568–1570

    Article  PubMed  CAS  Google Scholar 

  • Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Eyre T, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Holland R, Howe KL, Howe K, Johnson N, Jenkinson A, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Slater G, Smedley D, Spudich G, Trevanion S, Vilella AJ, Vogel J, White S, Wood M, Birney E, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Hubbard TJ, Kasprzyk A, Proctor G, Smith J, Ureta-Vidal A, Searle S (2007) Ensembl 2008. Nucleic Acids Res 36: D707–D714

    Article  PubMed  Google Scholar 

  • Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ (2006) The PROSITE database. Nucleic Acids Res 34: D227–D230

    Article  PubMed  CAS  Google Scholar 

  • Jones DT, Bryson K, Coleman A, McGuffin LJ, Sadowski MI, Sodhi JS, Ward JJ (2005a) Prediction of novel and analogous folds using fragment assembly and fold recognition. Proteins 61(Suppl 7): 143–151

    Article  PubMed  CAS  Google Scholar 

  • Jones DT, Taylor WR, Thornton JM (1994) A model recognition approach to the prediction of allhelical membrane protein structure and topology. Biochemistry 33: 3038–3049

    Article  PubMed  CAS  Google Scholar 

  • Jones P, Vinod N, Down T, Hackmann A, Kahari A, Kretschmann E, Quinn A, Wieser D, Hermjakob H, Apweiler R (2005b) Dasty and UniProt DAS: a perfect pair for protein feature visualization. Bioinformatics 21: 3198–3199

    Article  PubMed  CAS  Google Scholar 

  • Julenius K, Molgaard A, Gupta R, Brunak S (2005) Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology 15: 153–164

    Article  PubMed  CAS  Google Scholar 

  • Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res 34: D257–D260

    Article  PubMed  CAS  Google Scholar 

  • Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T, Tramontano A (2007) Critical assessment of methods of protein structure prediction-Round VII. Proteins 69(Suppl 8): 3–9

    Article  PubMed  CAS  Google Scholar 

  • Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, Copley R, Courcelle E, Das U, Durbin R, Fleischmann W, Gough J, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McDowall J, Mitchell A, Nikolskaya AN, Orchard S, Pagni M, Ponting CP, Quevillon E, Selengut J, Sigrist CJ, Silventoinen V, Studholme DJ, Vaughan R, Wu CH (2005) InterPro, progress and status in 2005. Nucleic Acids Res 33: D201–D205

    Article  PubMed  CAS  Google Scholar 

  • Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, Orengo C (2005) The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 33: D247–D251

    Article  PubMed  CAS  Google Scholar 

  • Prlic A, Down TA, Hubbard TJ (2005) Adding some SPICE to DAS. Bioinformatics 21(Suppl 2): ii40–ii41

    Article  PubMed  CAS  Google Scholar 

  • Prlic A, Down TA, Kulesha E, Finn RD, Kahari A, Hubbard TJ (2007) Integrating sequence and structural biology with DAS. BMC Bioinformatics 8:333

    Article  PubMed  Google Scholar 

  • Torrance JW, Bartlett GJ, Porter CT, Thornton JM (2005) Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families. J Mol Biol 347: 565–581

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. Hermjakob .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag/Wien

About this chapter

Cite this chapter

Reeves, G.A., Prlic, A., Jimenez, R.C., Kulesha, E., Hermjakob, H. (2008). Infrastructure for distributed protein annotation. In: Frishman, D., Valencia, A. (eds) Modern Genome Annotation. Springer, Vienna. https://doi.org/10.1007/978-3-211-75123-7_18

Download citation

Publish with us

Policies and ethics