Classification of Proteins Using Naïve Bayes Classifier and Surface-Invariant Coordinates
Protein classification is one of the challenging problems in computational biology and bioinformatics. Our aim here is to classify proteins into different families using the surface roughness similarity of proteins as a criterion. Because Protein Data Bank (PDB) (http://www.rcsb.org/pdb/ ) coordinates give no indication of the orientation of the protein, we designed an invariant coordinate system (ICS) in which we took as the origin the protein’s center of gravity (CG). From PDB we found the surface residue coordinates. We then divided those coordinates into eight octants based on the sign of x, y and z coordinates. For the residues in each octant, we found the standard deviation of the coordinates and created a parameter called the surface-invariant coordinate (SIC). Thus, for every protein we obtained eight SIC values. We also made use of the Structural Classification of Proteins (SCOP) (http://scop.mrc-lmb.cam.ac.uk/scop/ ) database. SCOP classifies proteins on the basis of the surface structure of the protein. As it is a classification problem, we used the naïve Bayes classifier algorithm for the classification to achieve better results.
KeywordsProtein classification Structural classification of proteins SCOP Protein data bank PDB Surface-invariant coordinate SIC Naïve Bayes classifier
- 5.Vipsita, S., B.K. Shee and S.K. Rath. 2010. An efficient technique for protein classification using feature extraction by artificial neural networks IEEE India conference: Green energy, computing and communication, INDICON.Google Scholar
- 6.Wang, D., and G.B. Huang. 2005. Protein sequence classification using extreme learning machine. In Proceedings of international joint conference on neural networks (IJCNN, 2005), Montreal, Canada.Google Scholar
- 7.Brink, Henrik, Joseph W. Richards, and Mark Fetherolf. Real-World Machine Learning. ISBN 9781617291920.Google Scholar
- 8.Datta, A., V. Talukdar, A. Konar, and L.C. Jain. 2009. A neural network based approach for protein structural class prediction. Journal ofIntelligent and Fuzzy Systems 20: 61–71.Google Scholar