An Empirical Investigation of Discretization Techniques on the Classification of Protein–Protein Interaction

  • Dilip Singh SisodiaEmail author
  • Maheep Singh
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 748)


Protein–protein interaction is a biological process, which plays a vital role in the functioning of the metabolic process inside the organism. More than 80% of protein does not perform function alone but performs in combination. Some non-identified protein can be identified with their interaction with a protein whose function is already known. Protein–protein interactions (PPI) and Protein–protein non-interactions (PPNI) display different levels of growth rate, and the number of PPI is significantly greater than that of PPNI. This significant difference in the number of PPI and PPNI increases the cost of constructing a balanced data set. In this paper, the effect of various discretization techniques including Ameva, Class-Attribute Inter-Dependence Maximization (CAIM), Chi-merge, and Fu sinter is investigated with different classification techniques. The CAIM Discretization with SVM has a significant impact on the result as compared to normal SVM using 10-fold cross-validation. Experiments are performed on E. coli and H. Sapiens protein datasets, and we achieved excellent results with accuracies 92.8% and 93.8% on average in CAIM Discretization using SVM classifier, with AUC values of 80.7% and 82.1% respectively.


Discretization Metabolic SVM C4.5 Protein–protein interaction 


  1. 1.
    Scientific, T.F.: Thermo Scientific Pierce Protein Assay Technical Handbook. Thermo Scientific (2009)Google Scholar
  2. 2.
    Gonzalez-Abril, L., Cuberos, F.J., Velasco, F., Ortega, J.A.: Ameva: An autonomous discretization algorithm. Expert Syst. Appl. 36, 5327–5332 (2009)CrossRefGoogle Scholar
  3. 3.
    Kurgan, L.A., Cios, K.J.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2004)CrossRefGoogle Scholar
  4. 4.
    Kerber, R.: Chimerge: Discretization of numeric attributes. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 123–128 (1992)Google Scholar
  5. 5.
    Zighed, D.A., Rabaséda, S., Rakotomalala, R.: FUSINTER: a method for discretization of continuous attributes. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 6, 307–326 (1998)Google Scholar
  6. 6.
    Chauhan, H., Chauhan, A.: Implementation of decision tree algorithm c4. 5. Int. J. Sci. Res. Publ. 3 (2013)Google Scholar
  7. 7.
    Ho, Y., Gruhler, A., Heilbut, A., Bader, G.D., Moore, L., Adams, S.-L., Millar, A., Taylor, P., Bennett, K., Boutilier, K.: others: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002)CrossRefGoogle Scholar
  8. 8.
    Fields, S., Song, O.: A novel genetic system to detect protein-protein interactions. Nature 340, 245–246 (1989)CrossRefGoogle Scholar
  9. 9.
    Sriwastava, B.K., Basu, S., Maulik, U.: Protein???Protein interaction site prediction in Homo sapiens and E. coli using an interaction-affinity based membership function in fuzzy SVM. J. Biosci. 40, 809–818 (2015)CrossRefGoogle Scholar
  10. 10.
    Rao, V.S., Srinivas, K., Sujini, G.N., Kumar, G.N.: Protein-protein interaction detection: methods and analysis. Int. J. Proteomics 2014 (2014)Google Scholar
  11. 11.
    Wang, L., You, Z.-H., Xia, S.-X., Liu, F., Chen, X., Yan, X., Zhou, Y.: Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from the position-specific scoring matrix and ensemble classifier. J. Theor. Biol. 418, 105–110 (2017)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Du, X., Sun, S., Hu, C., Li, X., Xia, J.: Prediction of protein-protein interaction sites by means of ensemble learning and weighted feature descriptor. J. Biol. Res. 23, 10 (2016)Google Scholar
  13. 13.
    Guo, F., Ding, Y., Li, S.C., Shen, C., Wang, L.: Protein-protein interface prediction based on hexagon structure similarity. Comput. Biol. Chem. 63, 83–88 (2016)CrossRefGoogle Scholar
  14. 14.
    Zhou, H.-X., Shan, Y.: Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins Struct. Funct. Bioinform. 44, 336–343 (2001)CrossRefGoogle Scholar
  15. 15.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. In: 1999-International Tables for Crystallography Volume F: Crystallography of biological macromolecules, pp. 675–684 (2006)Google Scholar
  16. 16.
    Salwinski, L., Miller, C.S., Smith, A.J., Pettit, F.K., Bowie, J.U., Eisenberg, D.: The database of interacting proteins: 2004 update. Nucl. Acids Res. 32, D449–451 (2004)CrossRefGoogle Scholar
  17. 17.
    Sriwastava, B.K., Basu, S., Maulik, U.: Predicting protein-protein interaction sites with a novel membership based fuzzy SVM classifier. IEEE/ACM Trans. Comput. Biol. Bioinforma. 12, 1394–1404 (2015)CrossRefGoogle Scholar
  18. 18.
    Singh, R., Park, D., Xu, J., Hosur, R., Berger, B.: Struct2Net: a web service to predict protein-protein interactions using a structure-based approach. Nucl. Acids Res. 38, W508–W515 (2010)CrossRefGoogle Scholar
  19. 19.
    Saha, I., Maulik, U., Bandyopadhyay, S., Plewczynski, D.: Fuzzy clustering of physicochemical and biochemical properties of amino acids. Amino Acids 43, 583–594 (2012)CrossRefGoogle Scholar
  20. 20.
    Dougherty, J., Kohavi, R., Sahami, M.: others: Supervised and unsupervised discretization of continuous features. In: Machine Learning: Proceedings of the Twelfth International Conference, pp. 194–202 (1995)Google Scholar
  21. 21.
    Quinlan, J.R.: C4. 5: Programs for Machine Learning. Elsevier (2014)Google Scholar
  22. 22.
    Hsu, C.-W., Chang, C.-C., Lin, C.-J.: Others: A Practical Guide to Support Vector Classification (2003)Google Scholar
  23. 23.
    Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17, 299–310 (2005)CrossRefGoogle Scholar
  24. 24.
    Vishwanathan, S.V.M., Murty, M.N.: SSVM: a simple SVM algorithm. In: Proceedings of the 2002 International Joint Conference on Neural Networks, 2002. IJCNN’02, pp. 2393–2398 (2002)Google Scholar
  25. 25.
    Markowetz, F.: Classification by support vector machines. Pract. DNA Microarray Anal. (2003)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.National Institute of Technology RaipurRaipurIndia
  2. 2.Guru Ghasidas Central UniversityBilaspurIndia

Personalised recommendations