Multiobjective evolutionary-based multi-kernel learner for realizing transfer learning in the prediction of HIV-1 protease cleavage sites

  • Deepak SinghEmail author
  • Dilip Singh Sisodia
  • Pradeep Singh
Methodologies and Application


Due to the unavailability of adequate patients and expensive labeling cost, many real-world biomedical cases have scarcity in the annotated data. This holds very true for HIV-1 protease specificity problem where only a few experimentally verified cleavage sites are present. The challenge then is to exploit the auxiliary data. However, the problem becomes more complicated when the underlying train and test data are generated from different distributions. To deal with the challenges, we formulate the HIV-1 protease cleavage site prediction problem into a bi-objective optimization problem and solving it by introducing a multiobjective evolutionary-based multi-kernel model. A solution for the optimization problem will lead us to decide the optimal number of base kernels with the best pairing of features. The bi-objective criteria encourage different individual kernels in the ensemble to mitigate the effect of distribution difference in training and test data with the ideal number of base kernels. In this paper, we considered eight different feature descriptors and three different kernel variants of support vector machines to generate the optimal multi-kernel learning model. Non-dominated sorting genetic algorithm-II is employed with bi-objective of achieving a maximum area under the receiver operating characteristic curve simultaneously with a minimum number of features. To validate the effectiveness of the model, the experiments were performed on four HIV-1 protease datasets. The performance comparison with fifteen state-of-the-art techniques on average accuracy and area under curve has been evaluated to justify the improvement of the proposed model. We then analyze Friedman and post hoc tests to demonstrate the significant improvement. The result obtained following the extensive experiment enumerates the bi-objective multi-kernel model performance enhancement on within and cross-learning over the other state-of-the-art techniques.


HIV-1 protease Multi-kernel Multiobjective evolutionary algorithm Transfer learning 


Compliance with ethical standards

Conflict of interest

The authors declare that there is no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Acampora G, Herrera F, Tortora G, Vitiello A (2018) A multi-objective evolutionary approach to training set selection for support vector machine. Knowl Based Syst 147:94–108CrossRefGoogle Scholar
  2. Acharya UR, Dua P (2014) Machine learning in healthcare informatics, vol 56. Springer, BerlinzbMATHGoogle Scholar
  3. Al-Stouhi S, Reddy CK (2011) Adaptive boosting for transfer learning using dynamic updates. In: Lecture notes on computer science (including subseries lecture notes artificial intelligence lecture notes bioinformatics), vol. 6911 LNAI, no. PART 1, pp 60–75Google Scholar
  4. Amamuddy OS, Bishop NT, Bishop ÖT (2017) Improving fold resistance prediction of HIV-1 against protease and reverse transcriptase inhibitors using artificial neural networks. BMC Bioinform 18(1):1–7CrossRefGoogle Scholar
  5. Belharbi S et al (2017) Spotting L3 slice in CT scans using deep convolutional network and transfer learning. Comput Biol Med 87:95–103CrossRefGoogle Scholar
  6. Benavoli A, Corani G, Mangili F (2016) Should we really use post hoc tests based on mean-ranks? J Mach Learn Res 17:1–10MathSciNetzbMATHGoogle Scholar
  7. Bertolazzi P, Felici G, Festa P, Fiscon G, Weitschek E (2016) Integer programming models for feature selection: new extensions and a randomized solution algorithm. Eur J Oper Res 250(2):389–399CrossRefMathSciNetzbMATHGoogle Scholar
  8. Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California.
  9. Chou KC (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273(1):236–247CrossRefMathSciNetzbMATHGoogle Scholar
  10. Chou K-C, Shen H-B (2007) Recent progress in protein subcellular location prediction. Anal Biochem 370(1):1–16CrossRefMathSciNetGoogle Scholar
  11. Dai W, Yang Q, Xue G-R, Yu Y (2007) Boosting for transfer learning. In: Proceedings of 24th international conference on machine learning—ICML’07, pp 193–200Google Scholar
  12. Daumé III H (2007) Frustratingly easy domain adaptation. Association for Computational Linguistics (ACL)s, no. June, pp 256–263Google Scholar
  13. Deb K, Agrawal S (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. Int Conf Parallel Probl Solving From Nat 1917:849–858Google Scholar
  14. Duan L, Tsang IW, Xu D (2012) Domain transfer multiple kernel learning. IEEE Trans Pattern Anal Mach Intell 34(3):465–479CrossRefGoogle Scholar
  15. Fathi A, Sadeghi R (2018) A genetic programming method for feature mapping to improve prediction of HIV-1 protease cleavage site. Appl Soft Comput J 72:56–64CrossRefGoogle Scholar
  16. Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1, no 10. Springer series in statistics, New YorkGoogle Scholar
  17. Gallo RC, Montagnier L (2003) The discovery of HIV as the cause of AIDS. N Engl J Med 24(349):2283–2285CrossRefGoogle Scholar
  18. Gök M (2018) A novel machine learning model to predict autism spectrum disorders risk gene. Neural Comput Appl 5:1–7Google Scholar
  19. Gök M, Özcerit AT (2013) A new feature encoding scheme for HIV-1 protease cleavage site prediction. Neural Comput Appl 22(7–8):1757–1761CrossRefGoogle Scholar
  20. Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99CrossRefGoogle Scholar
  21. Gönen M, Alpaydın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268MathSciNetzbMATHGoogle Scholar
  22. Gong B, Shi Y, Sha F, Grauman K (2012) Geodesic flow kernel for unsupervised domain adaptation. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 2066–2073Google Scholar
  23. Hochberg Y (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75(4):800–802CrossRefMathSciNetzbMATHGoogle Scholar
  24. Huang W, Tung C, Huang H, Hwang S, Ho S (2007) ProLoc: prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features. BioSystems 90(2):57–581Google Scholar
  25. Iqbal M, Xue B, Al-Sahaf H, Zhang M (2017) Cross-domain reuse of extracted knowledge in genetic programming for image classification. IEEE Trans Evol Comput 21(99):4Google Scholar
  26. Jaeger S, Chen SS-S (2010) Information fusion for biological prediction. J Data Sci 8(2):269–288Google Scholar
  27. Jiang M, Huang W, Huang Z, Yen GG (2017) Integration of global and local metrics for domain adaptation learning via dimensionality reduction. IEEE Trans Cybern 47(1):1–14CrossRefGoogle Scholar
  28. Jin Y, Sendhoff B (2008) Pareto-based multiobjective machine learning: an overview and case studies. IEEE Trans Syst Man Cybern Part C Appl Rev 38(3):397–415CrossRefGoogle Scholar
  29. Kamishima T, Hamasaki M, Akaho S (2009) TrBagg: a simple transfer learning method and its application to personalization in collaborative tagging. In: Proceedings of IEEE international conference on data mining, ICDM, pp 219–228Google Scholar
  30. Kawashima S, Kanehisa M (2000) AAindex: amino acid index database”. Nucleic Acids Res 28(1):374CrossRefGoogle Scholar
  31. Kidera A, Konishi Y, Oka M, Ooi T, Scheraga HA (1985) Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J Protein Chem 4(1):23–55CrossRefGoogle Scholar
  32. Kim YW, Oh IS (2008) Classifier ensemble selection using hybrid genetic algorithms. Pattern Recognit Lett 29(6):796–802CrossRefGoogle Scholar
  33. Koçer B, Arslan A (2010) Genetic transfer learning. Expert Syst Appl 37(10):6997–7002CrossRefGoogle Scholar
  34. Kontijevskis A, Wikberg JES, Komorowski J (2007) Computational proteomics analysis of HIV-1 protease interactome. Proteins Struct Funct Bioinform 68(1):305–312CrossRefGoogle Scholar
  35. Kunkle D (2005) A summary and comparison of MOEA algorithms. In: Internal report, College of Computer and Information Science, Northeastern UniversityGoogle Scholar
  36. Leonhart PF, Spieler E, Ligabue-Braun R, Dorn M (2019) A biased random key genetic algorithm for the protein–ligand docking problem. Soft Comput 23(12):4155–4176CrossRefGoogle Scholar
  37. Li H, Omange RW, Plummer FA, Luo M (2017) A novel HIV vaccine targeting the protease cleavage sites. AIDS Res Ther 14(1):10–14CrossRefGoogle Scholar
  38. Liu H, Shi X, Guo D, Zhao Z (2015) Feature selection combined with neural network structure optimization for HIV-1 protease cleavage site prediction. In: BioMed research international, p 11Google Scholar
  39. Long M, Wang J, Ding G, Sun J, Yu PS (2013) Transfer feature learning with joint distribution adaptation. In: Proceedings of IEEE international conference on computer vision, pp 2200–2207Google Scholar
  40. Long M, Wang J, Ding G, Pan SJ, Yu PS (2014) Adaptation regularization: a general framework for transfer learning. IEEE Trans Knowl Data Eng 26(5):1076–1089CrossRefGoogle Scholar
  41. Long M, Wang J, Sun J, Yu PS (2015) Domain invariant transfer kernel learning. IEEE Trans Knowl Data Eng 27(6):1519–1532CrossRefGoogle Scholar
  42. Lu J, Behbood V, Hao P, Zuo H, Xue S, Zhang G (2015) Transfer learning using computational intelligence: a survey. Knowl Based Syst 80(5):14–23CrossRefGoogle Scholar
  43. Lumini A, Nanni L (2006) Machine learning for HIV-1 protease cleavage site prediction. Pattern Recognit Lett 27(13):1537–1544CrossRefGoogle Scholar
  44. Lysiak R, Kurzynski M, Woloszynski T (2014) Optimal selection of ensemble classifiers using measures of competence and diversity of base classifiers. Neurocomputing 126:29–35CrossRefGoogle Scholar
  45. Maetschke S, Towsey M, Boden M (2005) BLOMAP: an encoding of amino acids which improves signal peptide cleavage site prediction. In: Proceedings of the 3rd Asia-Pacific bioinformatics conference, pp 141–150Google Scholar
  46. Melacci S, Belkin M (2011) Laplacian support vector machines trained in the primal. J Mach Learn Res 12:1149–1184MathSciNetzbMATHGoogle Scholar
  47. Mundra P, Kumar M, Kumar KK, Jayaraman VK, Kulkarni BD (2007) Using pseudo amino acid composition to predict protein subnuclear localization: approached with PSSM. Pattern Recognit Lett 28(13):1610–1615CrossRefGoogle Scholar
  48. Nanni L (2006) Comparison among feature extraction methods for HIV-1 protease cleavage site prediction. Pattern Recognit 39(4):711–713CrossRefzbMATHGoogle Scholar
  49. Nanni L, Lumini A (2006) MppS: an ensemble of support vector machine based on multiple physicochemical properties of amino acids. Neurocomputing 69(13–15):1688–1690CrossRefGoogle Scholar
  50. Nanni L, Lumini A (2008) A genetic approach for building different alphabets for peptide and protein classification. BMC Bioinform 9(1):45CrossRefGoogle Scholar
  51. Nanni L, Lumini A (2009) Using ensemble of classifiers for predicting HIV protease cleavage sites in proteins. Amino Acids 36(3):409–416CrossRefGoogle Scholar
  52. Nanni L, Lumini A, Gupta D, Garg A (2012) Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou’s Pseudo amino acid composition and on evolutionary information. IEEE/ACM Trans Comput Biol Bioinform 9(2):467–475CrossRefGoogle Scholar
  53. Neto AAF, Canuto AMP, Xavier-Junior JC (2018) Hybrid metaheuristics to the automatic selection of features and members of classifier ensembles. Information 9(11):1–25Google Scholar
  54. Niu B, Yuan XC, Roeper P, Su Q, Peng CR, Yin JY, Lu WC (2013) HIV-1 protease cleavage site prediction based on two-stage feature selection method. Protein Pept Lett 20(3):290–298Google Scholar
  55. Oğul H (2009) Variable context Markov chains for HIV protease cleavage site prediction. BioSystems 96(3):246–250CrossRefGoogle Scholar
  56. Owen T (2017) Twenty one years of HIV/AIDS medicines in the newspaper: patents, protest, and philanthropy. Media Cult Soc 40(1):75–93CrossRefGoogle Scholar
  57. Pan S-J (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210CrossRefGoogle Scholar
  58. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRefGoogle Scholar
  59. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRefGoogle Scholar
  60. Prasad V, Rao TS, Babu MSP (2016) Thyroid disease diagnosis via hybrid architecture composing rough data sets theory and machine learning algorithms. Soft Comput 20(3):1179–1189CrossRefGoogle Scholar
  61. Qian N, Sejnowski TJ (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202(4):865–884CrossRefGoogle Scholar
  62. Ren Y, Zhang L, Suganthan PN (2016) Ensemble classification and regression-recent developments, applications and future directions. IEEE Comput Intell Mag 11(1):41–53CrossRefGoogle Scholar
  63. Rögnvaldsson T, You L (2004) Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics 20(11):1702–1709CrossRefGoogle Scholar
  64. Rognvaldsson T, You L, Garwicz D (2015) State of the art prediction of HIV-1 protease cleavage sites. Bioinformatics 31(8):1204–1210CrossRefGoogle Scholar
  65. Rögnvaldsson T, Etchells TA, You L, Garwicz D, Jarman I, Lisboa PJG (2009) How to find simple and accurate rules for viral protease cleavage specificities. BMC Bioinform 10:149CrossRefGoogle Scholar
  66. Rosales-Perez A, Garcia S, Gonzalez JA, Coello Coello CA, Herrera F (2017) An evolutionary multi-objective model and instance selection for support vector machines with pareto-based ensembles. IEEE Trans Evol Comput 21(6):1CrossRefGoogle Scholar
  67. Salman I, Ucan ON, Bayat O, Shaker K (2018) Impact of metaheuristic iteration on artificial neural network structure in medical data. Processes 6(5):57CrossRefGoogle Scholar
  68. Schilling O, Overall CM (2008) Proteome-derived, database-searchable peptide libraries for identifying protease cleavage sites. Nat Biotechnol 26(6):685–694CrossRefGoogle Scholar
  69. Schweikert G, Rätsch G, Widmer C, Schölkopf B (2009) An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In Advances in Neural Information Processing Systems, pp 1433–1440Google Scholar
  70. Shen H-B, Chou K-C (2008) HIVcleave: a web-server for predicting human immunodeficiency virus protease cleavage sites in proteins. Anal Biochem 375(2):388–390CrossRefGoogle Scholar
  71. Singh O, Su EC (2016) Prediction of HIV-1 protease cleavage site using a combination of sequence, and physicochemical features. BMC Bioinform 17(17):478CrossRefGoogle Scholar
  72. Singh D, Singh P, Sisodia DS (2018) Evolutionary based optimal ensemble classifiers for HIV-1 protease cleavage sites prediction. Expert Syst Appl 109:86–99CrossRefGoogle Scholar
  73. Singh D, Singh P, Sisodia DS (2019) Evolutionary based ensemble framework for realizing transfer learning in HIV-1 Protease cleavage sites prediction. Appl Intell 49(4):1260–1282CrossRefGoogle Scholar
  74. Song HJ, Park SB (2017) Identifying intention posts in discussion forums using multi-instance learning and multiple sources transfer learning. Soft Comput 22(24):1–12Google Scholar
  75. Song J et al (2012) “PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites. PLoS ONE 7(11):e50300CrossRefGoogle Scholar
  76. Tang T, Chen S, Zhao M, Huang W, Luo J (2019) Very large-scale data classification based on K-means clustering and multi-kernel SVM. Soft Comput 23(11):3793–3801CrossRefGoogle Scholar
  77. Verspurten J, Gevaert K, Declercq W, Vandenabeele P (2009) SitePredicting the cleavage of proteinase substrates. Trends Biochem Sci 34(7):319–323CrossRefGoogle Scholar
  78. Wang J, Shen X, Pan W (2005) On transductive support vector machines. Predict. Discov., no. 1998Google Scholar
  79. Wang Y et al (2017) Knowledge-transfer learning for prediction of matrix metalloprotease substrate-cleavage sites. Sci Rep 7(1):5755CrossRefGoogle Scholar
  80. Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J. Big Data 3(1):1–40CrossRefGoogle Scholar
  81. World Health Organization (2016).
  82. Xia X, Lo D, Pan SJ, Nagappan N, Wang X (2016) HYDRA: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng 42(10):977–998CrossRefGoogle Scholar
  83. Yang J, Yan R, Hauptmann AG (2007) Cross-domain video concept detection using adaptive SVMS. In: ACM international conference on multimedia, p 188Google Scholar
  84. Yliniemi L, Tumer K (2016) Multi-objective multiagent credit assignment in reinforcement learning and NSGA-II. Soft Comput 20(10):3869–3887CrossRefGoogle Scholar
  85. You L, Garwicz D, Rögnvaldsson T (2005) Comprehensive bioinformatic analysis of the specificity of human immunodeficiency virus type 1 protease. J Virol 79(19):12477–12486CrossRefGoogle Scholar
  86. Yu X, Zheng X, Liu T, Dou Y, Wang J (2012) Predicting subcellular location of apoptosis proteins with pseudo amino acid composition: approach from amino acid substitution matrix and auto covariance transformation. Amino Acids 42(5):1619–1625CrossRefGoogle Scholar
  87. Yu X, Wu M, Jian Y, Bennin KE, Fu M, Ma C (2018) Cross-company defect prediction via semi-supervised clustering-based data filtering and MSTrA-based transfer learning. Soft Comput 22(10):3461–3472CrossRefGoogle Scholar
  88. Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3(4):257–271CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Deepak Singh
    • 1
    Email author
  • Dilip Singh Sisodia
    • 1
  • Pradeep Singh
    • 1
  1. 1.Department of Computer Science and EngineeringNational Institute of Technology, RaipurRaipurIndia

Personalised recommendations