Abstract
Proteins are the main, building cell blocks, responsible for the existing cell biological processes. Therefore, precise knowledge of protein function is of great significance. There are a lot of methods which are used for protein comparison and for determining protein function. Some of them use structure alignment, others use sequence alignment, while some use protein descriptors. Here, we use two protein descriptors: Voxel and Ray-based descriptors to encode the structural and biological features of proteins. In biology there is a trend to hierarchically organize the things, like protein functions, cell components and the whole living world. There are a lot of classification systems which generate proteins in tree structure. However, due to the fact that it often happens that one protein has more than one parent, the Directed Acyclic Graph (DAG) hierarchy is used. Gene Ontology (GO) is a system for structural and hierarchical representation of proteins and gene products which support DAG hierarchy. CLUS, however, is a system which deals with hierarchical data. In this paper, we present a comparison between the two previously mentioned protein descriptors for predicting protein function. Firstly, protein descriptors are extracted from the structural coordinates found in the Protein Data Bank (PDB) and proteins backbone, appropriately. Afterwards, GO class hierarchy is added to each protein which has descriptor data. This created file is used as an input to the CLUS system. CLUS generates a decision tree model which is trained from the protein structure. The results from this system are the GO classes in which the protein belongs. The generated output shows that the predicting protein function with the Voxel protein descriptor gives better results instead of predicting protein function with the Ray protein descriptor.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jensen, L.J., Gupta, R., Staerfeldt, H.H., Brunak, S.: Prediction of human protein function according to Gene Ontology categories. Bioinformatics 19(5), 635–642 (2003)
Protein Data Bank, http://www.pdb.org
Protein Databases–Genpept, http://www.renabi.fr/article202.html
The Reference Sequence-RefSeq Database, http://www.ncbi.nlm.nih.gov/books/NBK21091
SWISS-PROTprotein sequence data bank, http://arep.med.harvard.edu/labgc/jong/Fetch/SwissProtAll.html
Protein Information Resource, http://pir.georgetown.edu
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)
Taylor, W.R., Orengo, C.A.: Protein structure alignment. Journal of Molecular Biology 208, 1–22 (1989)
Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology 233, 123–138 (1993)
Shindyalov, H.N., Bourne, P.E.: Protein structure alignment by incremental combinatorial extension (ce) of the optimal path. Protein Eng. 9, 739–747 (1998)
Madej, T., Gibrat, J.F., Bryant, S.H.: Threading a database of protein cores. Proteins 23(3), 356–369 (1995)
SCOP: Structural Classification of Proteins, http://scop.mrc-lmb.cam.ac.uk/scop
CATH: Protein Structure Classification, http://www.cathdb.info
FSSP: Families of Structurally Similar Proteins, http://srs.ebi.ac.uk/srsbin
Gene Ontology, http://www.geneontology.org
CLUS system, http://dtai.cs.kuleuven.be/clus
Kalajdziski, S., Mirceva, G., Trivodaliev, K., Davcev, D.: Protein Classification by Matching Voxel Structures. In: IFMBE Proceedings 13th International Conference on Biomedical Engineering, vol. 23, Track 1, pp. 174–178 (2009)
Mirceva, G., Davcev, D.: Incorporating several features in the protein ray descriptor for more accurate protein Voxel structure retrieval. In: ACM Workshop on Voxel Object Retrieval (VoxelOR 2010), ACM Multimedia 2010, pp. 51–56 (2010)
Vens, C., Struyf, J., Schietgat, L., Dzeroski, S., Blockeel, H.: Decision Trees for Hierarchical Multi-label Classification. Machine Learning 73(2), 185–214 (2008)
Struyf, J., Zenko, B., Blockeel, H., Vens, C., Dzeroski, S.: CLUS: User’s Manual (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag GmbH Berlin Heidelberg
About this paper
Cite this paper
Pavlovikj, N., Ivanoska, I., Kalajdziski, S. (2012). Comparison of Protein Descriptors Used in Hierarchical Multi-label Classification Based on Gene Ontology. In: Kocarev, L. (eds) ICT Innovations 2011. ICT Innovations 2011. Advances in Intelligent and Soft Computing, vol 150. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28664-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-28664-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28663-6
Online ISBN: 978-3-642-28664-3
eBook Packages: EngineeringEngineering (R0)