Correlating Binding Site Residues of the Protein and Ligand Features to Its Functionality

  • B. Ravindra Reddy
  • T. Sobha Rani
  • S. Durga Bhavani
  • Raju S. Bapi
  • G. Narahari Sastry
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7077)


Machine learning tools are employed to establish relationship between the characteristics of protein-ligand binding site and enzyme class. Enzyme classification is a challenging problem from data mining perspective due to (i) class imbalance problem and (ii) appropriate feature selection. We address the problem by choosing novel features from protein binding site. Protein Ligand Interaction Database (PLID), which gives a comprehensive view of binding sites in a protein along with other contact information, is updated and presented here as PLID v1.1 . The database facilitates the study of protein-ligand interaction. Novel features due to protein ligand interaction including the chemical compound features as well as fraction of contact and tightness are investigated for classification task. The weighted classification accuracy for the data set with binding site residues as features is found to be 56% using a Random Forest classifier. It may be concluded that either the binding site features are not adequately representing the enzyme class information or the problem is caused due to the class imbalance. This problem needs further investigation.


Random Forest Protein Data Bank Enzyme Commission Enzyme Commission Number Enzyme Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Bork et al., 1998]
    Bork, P., Koonin, E.V.: Predicting functions from protein sequences where are the bottlenecks? Nat. Genet. 18, 313–318 (1998)CrossRefGoogle Scholar
  2. [Bray et al., 2009]
    Bray, T., Doig, A.J., Warwicker, J.: Sequence and Structural Features of Enzymes and their Active Sites by EC Class. J. Mol. Biol. 386, 1423–1436 (2009)CrossRefGoogle Scholar
  3. [Devos et al., 2000]
    Devos, D., Valencia, A.: Practical Limits of Function Prediction. PROTEINS: Str., Fun. and Genetics 41, 98–107 (2000)CrossRefGoogle Scholar
  4. [Dobson et al., 2005]
    Dobson, P.D., Doig, A.J.: Predicting enzyme class from protein structure without alignments. J. Mol. Biol. 345, 187–199 (2005)CrossRefGoogle Scholar
  5. [Malik et al., 2007]
    Malik, A., Ahmad, S.: Sequence and structural features of carbohydrate binding in proteins and assessment of predictability using a neural network. BMC Structural Biology 7(1) (2007)Google Scholar
  6. [Mitchell et al., 1997]
    Mitchell, T.M.: Machine Learning, vol. 52. McGraw-Hill Series in Comp. Sci., New York (1997)Google Scholar
  7. [PDB]
    Protein Data Bank,
  8. [Porter et al., 2004]
    Porter, C.T., Bartlett, G.J., Thornton, J.M.: The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res. 32, D129–D133 (2004)CrossRefGoogle Scholar
  9. [Reddy et al., 2008]
    Reddy, A.S., Amarnath, H.S.D., Bapi, R.S., Sastry, G.M., Sastry, G.N.: Protein ligand interaction database (PLID). Comp. Biol. and Chem. 32, 387–390 (2008)CrossRefzbMATHGoogle Scholar
  10. [Rost et al., 2003]
    Rost, B., Liu, J., Nair, R., Wrzeszczynski, K.O., Ofran, Y.: Automatic prediction of protein function. Cell. Mol. Life Sci. 60, 2637–2650 (2003)CrossRefGoogle Scholar
  11. [Shen et al., 2007]
    Shen, H.-B., Chou, K.-C.: EzyPred: A top down approach for predicting enzyme functional classes and subclasses. Biochemical and Biophysical Research Communications 364, 53–59 (2007)CrossRefGoogle Scholar
  12. [Soga et al., 2007]
    Soga, S., Shirai, H., Kobori, M., Hirayama, N.: Use of Amino Acid Composition to Predict Ligand-Binding Sites. J. Chem. Inf. Model. 47, 400–406 (2007)CrossRefGoogle Scholar
  13. [Watson et al., 2004]
    Watson, J.D., Sanderson, S., Ezersky, A., Savchenko, A., Edwards, O.C., Joachimiak, A., Laskowski, R.A., Thornton, J.M.: Towards fully automated structure-based function prediction in structural genomics: a case study. J. Mol. Biol. 367, 1511–1522 (2007)CrossRefGoogle Scholar
  14. [Hall et al., 2009]
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009),
  15. [Irwin et al., 2005]
    Irwin, J.J., Shoichet, B.K.: ZINC - A Free Database of Commercially Available Compounds for Virtual Screening. J. Chem. Inf. Model. 45(1), 177–182 (2005), CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • B. Ravindra Reddy
    • 1
  • T. Sobha Rani
    • 1
  • S. Durga Bhavani
    • 1
  • Raju S. Bapi
    • 1
  • G. Narahari Sastry
    • 2
  1. 1.Computational Intelligence Lab, Department of Computer and Information SciencesUniversity of HyderabadHyderabadIndia
  2. 2.Molecular Modeling GroupIndian Institute of Chemical TechnologyHyderabadIndia

Personalised recommendations