Ensemble Technique for Prediction of T-cell Mycobacterium tuberculosis Epitopes

  • Divya KhannaEmail author
  • Prashant Singh Rana
Original Research Article


Development of an effective machine-learning model for T-cell Mycobacterium tuberculosis (M. tuberculosis) epitopes is beneficial for saving biologist’s time and effort for identifying epitope in a targeted antigen. Existing NetMHC 2.2, NetMHC 2.3, NetMHC 3.0 and NetMHC 4.0 estimate binding capacity of peptide. This is still a challenge for those servers to predict whether a given peptide is M. tuberculosis epitope or non-epitope. One of the servers, CTLpred, works in this category but it is limited to peptide length of 9-mers. Therefore, in this work direct method of predicting M. tuberculosis epitope or non-epitope has been proposed which also overcomes the limitations of above servers. The proposed method is able to work with variable length epitopes having size even greater than 9-mers. Identification of T-cell or B-cell epitopes in the targeted antigen is the main goal in designing epitope-based vaccine, immune-diagnostic tests and antibody production. Therefore, it is important to introduce a reliable system which may help in the diagnosis of M. tuberculosis. In the present study, computational intelligence methods are used to classify T-cell M. tuberculosis epitopes. The caret feature selection approach is used to find out the set of relevant features. The ensemble model is designed by combining three models and is used to predict M. tuberculosis epitopes of variable length (7–40-mers). The proposed ensemble model achieves 82.0% accuracy, 0.89 specificity, 0.77 sensitivity with repeated k-fold cross-validation having average accuracy of 80.61%. The proposed ensemble model has been validated and compared with NetMHC 2.3, NetMHC 4.0 servers and CTLpred T-cell prediction server.


T-cell epitopes Mycobacterium tuberculosis Machine-learning models Ensemble model Feature selection 


  1. 1.
    Organization World Health (2016) Global tuberculosis report 2016. WHO.
  2. 2.
    Shah P, Mistry J, Reche PA, Gatherer D, Flower DR (2018) In silico design of mycobacterium tuberculosis epitope ensemble vaccines. Mol Immunol 97:56–62PubMedGoogle Scholar
  3. 3.
    Ferraz J, Melo F, Albuquerque MdFPM, Montenegro S, Abath F (2006) Immune factors and immunoregulation in tuberculosis. Braz J Med Biol Res 39(11):1387–1397PubMedGoogle Scholar
  4. 4.
    Flynn JL (2004) Immunology of tuberculosis and implications in vaccine development. Tuberculosis 84(1):93–101PubMedGoogle Scholar
  5. 5.
    Zhao Y, Pinilla C, Valmori D, Martin R, Simon R (2003) Application of support vector machines for T-cell epitopes prediction. Bioinformatics 19(15):1978–1984Google Scholar
  6. 6.
    Brusic V, Bajic VB, Petrovsky N (2004) Computational methods for prediction of T-cell epitopes a framework for modelling, testing, and applications. Methods 34(4):436–443PubMedGoogle Scholar
  7. 7.
    Bhasin M, Raghava G (2004) Prediction of CTL epitopes using QM. SVM and ANN techniques. Vaccine 22(23–24):3195–3204PubMedPubMedCentralGoogle Scholar
  8. 8.
    Nielsen M, Lundegaard C, Worning P, Lauemøller SL, Lamberth K, Buus S, Brunak S, Lund O (2003) Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci 12(5):1007–1017PubMedPubMedCentralGoogle Scholar
  9. 9.
    Dönnes P, Elofsson A (2002) Prediction of MHC class I binding peptides, using SVMHC. BMC Bioinform 3(1):25Google Scholar
  10. 10.
    Pellequer JL, Westhof E, Van Regenmortel MH (1993) Correlation between the location of antigenic sites and the prediction of turns in proteins. Immunol Lett 36(1):83–99PubMedGoogle Scholar
  11. 11.
    Alix AJ (1999) Predictive estimation of protein linear epitopes by using the program PEOPLE. Vaccine 18(3):311–314PubMedGoogle Scholar
  12. 12.
    Odorico M, Pellequer JL (2003) BEPITOPE: predicting the location of continuous epitopes and patterns in proteins. J Mol Recogn 16(1):20–22Google Scholar
  13. 13.
    Saha S, Raghava G (2004) BcePred: prediction of continuous B-cell epitopes in antigenic sequences using physico-chemical properties. In: Nicosia G, Cutello V, Bentley PJ, Timmis J (eds) Artificial immune systems. International conference on artificial immune systems, vol 3239. Springer, Berlin, Heidelberg, pp 197–204Google Scholar
  14. 14.
    Saha S, Raghava G (2006) Prediction of continuous B-cell epitopes in an antigen using recurrent. Neural Netw 65(1):40–48Google Scholar
  15. 15.
    Chen J, Liu H, Yang J, Chou KC (2007) Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids 33(3):423–428PubMedGoogle Scholar
  16. 16.
    EL-Manzalawy Y, Dobbs D, Honavar V (2008) Predicting linear B-cell epitopes using string kernels. J Mol Recogn 21(4):243–255Google Scholar
  17. 17.
    Yao B, Zhang L, Liang S, Zhang C (2012) SVMTriP: a method to predict antigenic epitopes using support vector machine to integrate tri-peptide similarity and propensity. PloS One 7(9):e45152PubMedPubMedCentralGoogle Scholar
  18. 18.
    Huang JH, Wen M, Tang LJ, Xie HL, Fu L, Liang YZ, Lu HM (2014) Using random forest to classify linear B-cell epitopes based on amino acid properties and molecular features. Biochimie 103:1–6PubMedGoogle Scholar
  19. 19.
    Yao L, HUANG ZC, Meng G, PAN XM (2015) An improved method for predicting linear B-cell epitope using deep maxout networks. Biomed Environ Sci 28(6):460–463Google Scholar
  20. 20.
    Shen W, Cao Y, Cha L, Zhang X, Ying X, Zhang W, Ge K, Li W, Zhong L (2015) Predicting linear B-cell epitopes using amino acid anchoring pair composition. BioData mining 8(1):1Google Scholar
  21. 21.
    Saha S, Raghava G (2006) AlgPred: prediction of allergenic proteins and mapping of IgE epitopes. Nucleic Acids Res 34(suppl 2):W202–W209PubMedPubMedCentralGoogle Scholar
  22. 22.
    Mohabatkar H, Mohammad Beigi M, Abdolahi K, Mohsenzadeh S (2013) Prediction of allergenic proteins by means of the concept of chou’s pseudo amino acid composition and a machine learning approach. Med Chem 9(1):133–137PubMedGoogle Scholar
  23. 23.
    Gupta S, Ansari HR, Gautam A, Raghava GP (2013) Identification of B-cell epitopes in an antigen for inducing specific class of antibodies. Biol Direct 8(1):1Google Scholar
  24. 24.
    Khanna D, Rana PS (2017) Multilevel ensemble model for prediction of IgA and IgG antibodies. Immunol Lett 184:51–60PubMedGoogle Scholar
  25. 25.
    Fleri W, Paul S, Dhanda SK, Mahajan S, Xu X, Peters B, Sette A (2017) The immune epitope database and analysis resource in epitope discovery and synthetic vaccine design. Front Immunol 8:278PubMedPubMedCentralGoogle Scholar
  26. 26.
    Dhanda SK, Vir P, Raghava GP (2013) Designing of interferon-gamma inducing MHC class-II binders. Biol Direct 8(1):30PubMedPubMedCentralGoogle Scholar
  27. 27.
    Vizcaíno C, Restrepo-Montoya D, Rodríguez D, Niño LF, Ocampo M, Vanegas M, Reguero MT, Martínez NL, Patarroyo ME, Patarroyo MA (2010) Computational prediction and experimental assessment of secreted/surface proteins from mycobacterium tuberculosis H37Rv. PLoS Comput Biol 6(6):e1000824PubMedPubMedCentralGoogle Scholar
  28. 28.
    Nielsen M, Lund O (2009) NN-align. an artificial neural network-based alignment algorithm for MHC class II peptide binding prediction. BMC Bioinform 10(1):296Google Scholar
  29. 29.
    Jensen KK, Andreatta M, Marcatili P, Buus S, Greenbaum JA, Yan Z, Sette A, Peters B, Nielsen M (2018) Improved methods for predicting peptide binding affinity to MHC class II molecules. Immunology 154(3):394–406PubMedPubMedCentralGoogle Scholar
  30. 30.
    Buus S, Lauemøller S, Worning P, Kesmir C, Frimurer T, Corbet S, Fomsgaard A, Hilden J, Holm A, Brunak S (2003) Sensitive quantitative predictions of peptide-mhc binding by a query by committeeartificial neural network approach. Tissue antigens 62(5):378–384PubMedGoogle Scholar
  31. 31.
    Andreatta M, Nielsen M (2015) Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics 32(4):511–517PubMedPubMedCentralGoogle Scholar
  32. 32.
    Andreatta M, Schafer-Nielsen C, Lund O, Buus S, Nielsen M (2011) Nnalign: a web-based prediction method allowing non-expert end-user discovery of sequence motifs in quantitative peptide data. PLoS One 6(11):e26781PubMedPubMedCentralGoogle Scholar
  33. 33.
    Nielsen M, Andreatta M (2016) NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets. Genome Med 8(1):33PubMedPubMedCentralGoogle Scholar
  34. 34.
    Osorio D, Rondon-Villarreal P, Torres R (2015) Peptides: a package for data mining of antimicrobial peptides. R J 7(1):4–14Google Scholar
  35. 35.
    Boman H (2003) Antibacterial peptides: basic facts and emerging concepts. J Intern Med 254(3):197–215PubMedGoogle Scholar
  36. 36.
    Hofmann H, Hare E, GGobi Foundation (2016) Evaluation of diversity in nucleotide libraries, version 0.2.2.
  37. 37.
    RColorBrewer S, Deng H, Deng MH (2018) Package ‘RRF’, version ​1.9.
  38. 38.
    Therneau, T., Atkinson, B., Ripley, B., Ripley, M.B.: Package rpart. Accessed 20 Apr 2016 (2018)
  39. 39.
    Williams CK, Engelhardt A, Cooper T, Mayer Z, Ziem A, Scrucca L, Tang Y, Candan C, Kuhn MM (2018) Package ‘caret’, version 6.0-80.
  40. 40.
    Hothorn T, Buehlmann P, Kneib T, Schmid M, Hofner B, Sobotka F, Scheipl F, Hofner MB (2018) Package ‘mboost’, version 2.9-1.
  41. 41.
    Gosso A, Gosso MA (2012) Package ‘elmnn’, version 1.0.
  42. 42.
    Hastie T, Hastie MT (2018) Package ‘gam’, version 1.16.
  43. 43.
    Ripley B, Venables W, Ripley MB (2016) Package ‘nnet’, version 7.3-12.
  44. 44.
    Karatzoglou A, Smola A, Hornik K, Karatzoglou MA (2018) Package ‘kernlab’, version ​0.9-27.
  45. 45.
    Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach Learn 36(1–2):105–139Google Scholar
  46. 46.
    Geluk A, Van Meijgaarden KE, Franken KL, Drijfhout JW, DSouza S, Necker A, Huygen K, Ottenhoff TH (2000) Identification of major epitopes of Mycobacterium tuberculosis AG85B that are recognized by HLA-A* 0201-restricted CD8+ T cells in HLA-transgenic mice and humans. J Immunol 165(11):6463–6471PubMedGoogle Scholar
  47. 47.
    McMurry J, Sbai H, Gennaro M, Carter E, Martin W, De Groot A (2005) Analyzing Mycobacterium tuberculosis proteomes for candidate vaccine epitopes. Tuberculosis 85(1):95–105PubMedGoogle Scholar
  48. 48.
    Lata S, Bhasin M, Raghava GP (2009) MHCBN 4.0: a database of MHC/TAP binding peptides and T-cell epitopes. BMC Res Notes 2(1):61PubMedPubMedCentralGoogle Scholar
  49. 49.
    Nielsen M, Lundegaard C, Worning P, Hvid CS, Lamberth K, Buus S, Brunak S, Lund O (2004) Improved prediction of MHC class I and class II epitopes using a novel gibbs sampling approach. Bioinformatics 20(9):1388–1397PubMedGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Computer Science and Engineering DepartmentThapar Institute of Engineering & TechnologyPatialaIndia

Personalised recommendations