Decision Tree Classifier for Classification of Proteins Using the Protein Data Bank
Identifying the family of an unknown protein is a challenging problem in computational biology and bioinformatics. Our aim here is to classify proteins into different families and also to identify the family of an unknown protein. For this purpose, we use the surface roughness of the proteins as a criterion. The Protein Data Bank (PDB) is the repository for protein data which contains the Cartesian coordinates of the sequences forming proteins. However, PDB coordinates give no indication of the orientation of the protein, which must be known in order to determine the surface roughness. For this purpose, we designed an invariant coordinate system (ICS) in which we took the origin as the protein center of gravity (CG). From the PDB we obtain the coordinates of all the amino acid residues which form the protein. But we are interested in the surface coordinates only in order to determine the surface similarity. Therefore, we developed a methodology to determine only the surface residues, and we recorded their coordinates. We then divided those coordinates into eight octants based on the signs of the x, y and z coordinates. For the residues in every octant, we found the standard deviation of the coordinates and created a parameter called the surface-invariant coordinate (SIC). Thus, for every protein, we obtained eight SIC values.
KeywordsProtein classification Structural classification of proteins SCOP Protein data bank PDB Surface-invariant coordinate SIC Decision tree classifier
- 2.Richards, Joseph W., and Mark Fetherolf. 2016. Real-world machine learning henrik brink. ISBN 9781617291920.Google Scholar
- 3.Wang, D., and G.B. Huang. 2005. Protein sequence classification using extreme learning machine. In Proceedings of international joint conference on neural networks (IJCNN, 2005), Montreal, Canada.Google Scholar
- 4.Datta, A., V. Talukdar, A. Konar, and L.C. Jain. 2009. A neural network based approach for protein structural class prediction. Journal of Intelligent and Fuzzy Systems 20: 61–71.Google Scholar