Bioinformatics pp 243-258 | Cite as

Classification of Information About Proteins

  • Amandeep S. Sidhu
  • Matthew I. Bellgard
  • Tharam S. Dillon


The use of advanced high throughput technology applied to proteomics results in the production of large volumes of information rich data. This data requires considerable knowledge management to allow biologists and bioinformaticians to access and understand the information in the context of their experiments. As the volume of data increases, the results from these high throughput experiments will provide the foundations for advancing proteome biology.

In this chapter, we consider the challenges of information integration in proteomics from the perspective of researchers using information technology as an integral part of their discovery process. We firstly describe the information about proteins that is collected from high throughput experimentation and how this is managed. We then describe how protein ontologies can be used to classify this information. Finally we discuss some of the uses of protein classification systems and the biological challenges in proteomics which they help to resolve.


Generic Concept Resource Description Framework Site Group Unify Medical Language System Origin Recognition Complex 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Ashburner M (1993) FlyBase. Genome News 13:19–20Google Scholar
  2. Ashburner M, Ball CA, Blake JA, Butler H, Cherry JC, Corradi J, Dolinski K (2001) Creating the gene ontology resource: design and implementation. Genome Res 11:1425–1433CrossRefGoogle Scholar
  3. Blake JA, Eppig JT, Richardson JE, Davisson MT (1998) The Mouse Genome Database (MGD): a community resource. Status and enhancements. The Mouse Genome Informatics Group. Nucleic Acids Res 26:130–137CrossRefPubMedGoogle Scholar
  4. Collins FS, Morgan M, Patrinos A (2003) The Human Genome Project: lessons from large-scale biology. Science 300:286–290CrossRefPubMedGoogle Scholar
  5. Fenn JB, Mann M, Meng CK, Wong SF, Whitehouse CM (1989) Electrospray ionization for the mass spectrometry of large biomolecules. Science 246:64–71CrossRefPubMedGoogle Scholar
  6. Fraser AG, Marcotte EM (2004) A probabilistic view of gene function. Nat Genet 36:559–564CrossRefPubMedGoogle Scholar
  7. Frazier ME, Johnson GM, Thomassen DG, Oliver CE, Patrinos A (2003a) Realizing the Potential of Genome Revolution: The Genomes to Life Program. Science 300:290–293CrossRefPubMedGoogle Scholar
  8. Frazier, M. E., Thomassen, D. G., Patrinos, A., Johnson, G. M., Oliver, C. E. & Uberbacher, E. (2003b) Setting Up the Pace of Discovery: the Genomes to Life Program. 2nd IEEE Computer Society Bioinformatics Conference (CSB 2003). Stanford, CA, USA, IEEE CS Press.Google Scholar
  9. George DG, Mewes H-W, Kihara H (1987) A standardized format for sequence data exchange. Protein Seq Data Anal 1:27–29PubMedGoogle Scholar
  10. George DG, Orcutt BC, Mewes H-W, Tsugita A (1993) An object-oriented sequence database definition language (sddl). Protein Seq Data Anal 5:357–399Google Scholar
  11. Gruber TR (1993) A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5:199–220CrossRefGoogle Scholar
  12. Hadzic, F., Dillon, T.S., Sidhu, A.S., Chang, E. and Tan, H. (2006) Mining Substructures in Protein Data. 2006 IEEE Workshop on Data Mining in Bioinformatics (DMB 2006) in conjunction with 6th IEEE ICDM 2006. IEEE Computer Society, Hong KonGoogle Scholar
  13. Hafner, C. D. & Fridman, N. (1996) Ontological foundations for biology knowledge models. 4th International Conference on Intelligent Systems for Molecular Biology. St. Louis, AAAI.Google Scholar
  14. Jenssen TK, Laegreid A, Komorowski J, Hovig E (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28:21–28CrossRefPubMedGoogle Scholar
  15. Karas M, Hillenkamp F (1988) Laser desorption ionization of proteins with molecular masses exceeding 10, 000 daltons. Anal Chem 60:2299–2301CrossRefPubMedGoogle Scholar
  16. King OD, Foulger RE, Dwight S, White J, Roth FP (2003) Predicting gene function from patterns of annotation. Genome Res 13:896–904CrossRefPubMedGoogle Scholar
  17. Lewis SE (2004) Gene Ontology: looking backwards and forwards. Genome Biol 6:103.1–103.4CrossRefGoogle Scholar
  18. Li, Q., Shilane, P., Noy, N. F. & Musen, M. A. (2000) Ontology acquisition from on-line knowledge sources. AMIA 2000 Annual Symposium. Los Angeles, CA.Google Scholar
  19. Mani, I., Hu, Z. & Hu, W. (2004) PRONTO: A Large-scale Machine-induced Protein Ontology. 2nd Standards and Ontologies for Functional Genomics Conference (SOFG 2004). UK.Google Scholar
  20. Ohkawa, H., Ostell, J. & Bryant, S. (1995) MMDB: an ASN.1 specification for macromolecular structure. 3 rd International Conference on Intelligent Systems for Molecular Biology. Cambridge, United Kingdom, AAAI.Google Scholar
  21. Ostell, J. (1990) GenInfo ASN.1 Syntax: Sequences. NCBI Technical Report Series. National Library of Medicine, NIH.Google Scholar
  22. Pandey A, Mann M (2000) Proteomics to study genes and genomes. Nature 405:837–846CrossRefPubMedGoogle Scholar
  23. Paoli M, Liddington R, Tame J, Wilkinson A, Dodson G (1996) Crystal structure of T state haemoglobin with oxygen bound at all four haems. J Mol Biol 256(4):775–792CrossRefPubMedGoogle Scholar
  24. Pennisi E (1998) Genome data shake tree of life. Science 280:672–674CrossRefPubMedGoogle Scholar
  25. Pongor S (1998) Novel databases for molecular biology. Nature 332:24–24CrossRefGoogle Scholar
  26. Rawlings CJ (1998) Designing databases for molecular biology. Nature 334:447–447Google Scholar
  27. Schuler GD, Boguski MS, Stewart EA, Stein LD, Gyapay G, Rice K et al (1996) A gene map of the human genome. Science 274:540–546CrossRefPubMedGoogle Scholar
  28. Schulze-Kremer, S. (1998) Ontologies for Molecular Biology. Pacific Symposium of Biocomputing. Hawaii, PSB 1998 Electronic Proceedings.Google Scholar
  29. Sidhu AS, Dillon TS, Sidhu BS, Setiawan H (2004a) A unified representation of protein structure databases. In: Reddy MS, Khanna S (eds) Biotechnological approaches for sustainable development. Allied Publishers, India, pp 396–408Google Scholar
  30. Sidhu AS, Dillon TS, Sidhu BS, Setiawan H (2004b) An XML based semantic protein map. In: Zanasi A, Ebecken NFF, Brebbia CA (eds) 5th International Conference on Data Mining, Text Mining and their Business Applications (Data Mining 2004). WIT Press, Malaga, Spain, pp 51–60Google Scholar
  31. Sidhu, A. S., Dillon, T. S. & Chang, E. (2005a) An Ontology for Protein Data Models. 27th Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2005 (IEEE EMBC 2005). Shanghai, China, IEEE PressGoogle Scholar
  32. Sidhu, A. S., Dillon, T. S., Chang, E. & Sidhu, B. S. (2005b) Protein ontology: vocabulary for protein data. IN He, X., Hintz, T., Piccardi, M., Wu, Q., Huang, M. & Tien, D. (Eds.) 3rd International IEEE Conference on Information Technology and Applications, 2005 (IEEE ICITA 2005). Sydney, IEEE CS PressGoogle Scholar
  33. Sidhu, A.S., Dillon, T.S. and Chang, E. (2006) Towards Semantic Interoperability of Protein Data Sources. 2nd IFIP WG 2.12 & WG 12.4 International Workshop on Web Semantics (SWWS 2006) in conjunction with OTM 2006. Springer, France, 1835-1843Google Scholar
  34. Sidhu AS, Dillon TS, Chang E (2007) Protein ontology. In: Chen J, Sidhu AS (eds) Biological database modeling. Artech House, New YorkGoogle Scholar
  35. Tan, H., Dillon, T.S., Hadzic, F., Chang, E. and Feng, L. (2006) IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding. 10th Pacific-Asia Knowledge Discovery and Data Mining Conference (PAKDD 2006). Springer, Singapore, 450-461.Google Scholar
  36. W3C-RDFSCHEMA (2004) RDF Vocabulary Description Language 1.0: RDF Schema. In Brickley, D., Guha, R. V. & Mcbride, B. (Eds.) W3C Recommendation 10 February 2004. World Wide Web Consortium.Google Scholar
  37. Westbrook J, Ito N, Nakamura H, Henrick K, Berman HM (2005) PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21:988–992CrossRefPubMedGoogle Scholar
  38. Whetzel PL, Parkinson H, Causton HC, Fan L, Fostel J, Fragoso G, Game L, Heiskanen M, Morrison N, Rocca-Serra P, Sansone S, Taylor C, White J, Stoeckert CJ (2006) The MGED Ontology: a resource for semantics-based description of microarray experiments. Bioinformatics 22:866–873CrossRefPubMedGoogle Scholar
  39. Yamazaki T, Hinck AP, Wang YX, Nicholson LK, Torchia DA, Wingfield P, Stahl SJ, Kaufman JD, Chang CH, Domaille PJ, Lam PY (1996) Three-dimensional solution structure of the HIV-1 protease complexed with DMP323, a novel cyclic urea-type inhibitor, determined by nuclear magnetic resonance spectroscopy. Protein Sci 5(3):495–506CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Amandeep S. Sidhu
    • 1
  • Matthew I. Bellgard
  • Tharam S. Dillon
  1. 1.Centre for Comparative GenomicsMurdoch UniversityPerthAustralia

Personalised recommendations