Skip to main content

Comparison of Protein Descriptors Used in Hierarchical Multi-label Classification Based on Gene Ontology

  • Conference paper
ICT Innovations 2011 (ICT Innovations 2011)

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 150))

Included in the following conference series:

  • 840 Accesses

Abstract

Proteins are the main, building cell blocks, responsible for the existing cell biological processes. Therefore, precise knowledge of protein function is of great significance. There are a lot of methods which are used for protein comparison and for determining protein function. Some of them use structure alignment, others use sequence alignment, while some use protein descriptors. Here, we use two protein descriptors: Voxel and Ray-based descriptors to encode the structural and biological features of proteins. In biology there is a trend to hierarchically organize the things, like protein functions, cell components and the whole living world. There are a lot of classification systems which generate proteins in tree structure. However, due to the fact that it often happens that one protein has more than one parent, the Directed Acyclic Graph (DAG) hierarchy is used. Gene Ontology (GO) is a system for structural and hierarchical representation of proteins and gene products which support DAG hierarchy. CLUS, however, is a system which deals with hierarchical data. In this paper, we present a comparison between the two previously mentioned protein descriptors for predicting protein function. Firstly, protein descriptors are extracted from the structural coordinates found in the Protein Data Bank (PDB) and proteins backbone, appropriately. Afterwards, GO class hierarchy is added to each protein which has descriptor data. This created file is used as an input to the CLUS system. CLUS generates a decision tree model which is trained from the protein structure. The results from this system are the GO classes in which the protein belongs. The generated output shows that the predicting protein function with the Voxel protein descriptor gives better results instead of predicting protein function with the Ray protein descriptor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jensen, L.J., Gupta, R., Staerfeldt, H.H., Brunak, S.: Prediction of human protein function according to Gene Ontology categories. Bioinformatics 19(5), 635–642 (2003)

    Article  Google Scholar 

  2. Protein Data Bank, http://www.pdb.org

  3. Protein Databases–Genpept, http://www.renabi.fr/article202.html

  4. The Reference Sequence-RefSeq Database, http://www.ncbi.nlm.nih.gov/books/NBK21091

  5. SWISS-PROTprotein sequence data bank, http://arep.med.harvard.edu/labgc/jong/Fetch/SwissProtAll.html

  6. Protein Information Resource, http://pir.georgetown.edu

  7. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  8. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)

    Article  Google Scholar 

  9. Taylor, W.R., Orengo, C.A.: Protein structure alignment. Journal of Molecular Biology 208, 1–22 (1989)

    Article  Google Scholar 

  10. Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology 233, 123–138 (1993)

    Article  Google Scholar 

  11. Shindyalov, H.N., Bourne, P.E.: Protein structure alignment by incremental combinatorial extension (ce) of the optimal path. Protein Eng. 9, 739–747 (1998)

    Article  Google Scholar 

  12. Madej, T., Gibrat, J.F., Bryant, S.H.: Threading a database of protein cores. Proteins 23(3), 356–369 (1995)

    Article  Google Scholar 

  13. SCOP: Structural Classification of Proteins, http://scop.mrc-lmb.cam.ac.uk/scop

  14. CATH: Protein Structure Classification, http://www.cathdb.info

  15. FSSP: Families of Structurally Similar Proteins, http://srs.ebi.ac.uk/srsbin

  16. Gene Ontology, http://www.geneontology.org

  17. CLUS system, http://dtai.cs.kuleuven.be/clus

  18. Kalajdziski, S., Mirceva, G., Trivodaliev, K., Davcev, D.: Protein Classification by Matching Voxel Structures. In: IFMBE Proceedings 13th International Conference on Biomedical Engineering, vol. 23, Track 1, pp. 174–178 (2009)

    Google Scholar 

  19. Mirceva, G., Davcev, D.: Incorporating several features in the protein ray descriptor for more accurate protein Voxel structure retrieval. In: ACM Workshop on Voxel Object Retrieval (VoxelOR 2010), ACM Multimedia 2010, pp. 51–56 (2010)

    Google Scholar 

  20. Vens, C., Struyf, J., Schietgat, L., Dzeroski, S., Blockeel, H.: Decision Trees for Hierarchical Multi-label Classification. Machine Learning 73(2), 185–214 (2008)

    Article  Google Scholar 

  21. Struyf, J., Zenko, B., Blockeel, H., Vens, C., Dzeroski, S.: CLUS: User’s Manual (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natasha Pavlovikj .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag GmbH Berlin Heidelberg

About this paper

Cite this paper

Pavlovikj, N., Ivanoska, I., Kalajdziski, S. (2012). Comparison of Protein Descriptors Used in Hierarchical Multi-label Classification Based on Gene Ontology. In: Kocarev, L. (eds) ICT Innovations 2011. ICT Innovations 2011. Advances in Intelligent and Soft Computing, vol 150. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28664-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28664-3_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28663-6

  • Online ISBN: 978-3-642-28664-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics