Drug Development for Hepatitis C Virus Infection: Machine Learning Applications

  • Sajitha Lulu SudhakaranEmail author
  • Deepa Madathil
  • Mohanapriya Arumugam
  • Vino Sundararajan


Hepatitis C virus (HCV) infection is one of the leading causes of mortality and morbidity, and is widely reported for its association with the development of liver cirrhosis, hepatocellular cancer, and liver failure. Most of the reported cases of hepatitis C end up with a chronic form of the infection, existing as a large threat for public health and can be prevented by evading or eradicating the virus through effective drug development. Conventional medicines that are both safe and easily affordable, have not yet been developed for the treatment of chronic HCV infection. Apart from only identifying novel drugs, it is equally important to explore their effectiveness by ascertaining drug target accuracy, which is a crucial part of any drug development program. Moreover, it is highly critical to understand the activity and molecular basis of drug resistance of various drugs, as they may retain activity against a broad spectrum of drug resistant viral variants. Drug discovery and design are highly complex, time consuming, and expensive endeavors. Therefore, it is crucial to incorporate new technologies for this process. Modern drug design strategies include ligand-based (LBDD) and structure-based drug design (SBDD) methods to develop new drug candidates. Machine Learning (ML) approaches are extensively applied in drug design processes for HCV and most common applications include classifying drug targets into druggable and non-druggable, prioritizing drug targets, discovering novel inhibitors, predicting diseases by using risk factors as classifiers, in silico ADMET prediction, etc. However, a few studies using Machine Learning approaches have been reported for prediction of biological activity from multivariate models, prediction of binding site secondary structural modes of docking, and virtual screening.

The most common ML techniques applied in HCV drug discovery, comprise techniques such as random forest, SVM, Decision tree, Genetic algorithms, K-Nearest Neighbor’s, Naive Bayesian classifiers, Particle swarm optimization, as well as multilinear regression models. These tools are widely used in drug discovery studies as they are readily accessible, both as open source and commercial distributions, statistically consistent, computationally efficient, and relatively straight-forward to implement and interpret. Moreover, data-mining software enables users to implement these algorithms through graphical user interfaces and can also be written and executed using packages such as R, Matlab, and Octave. Datamining and Machine Learning approaches hence seem as promising aid for Drug Development studies on HCV infection.


HCV Drug resistance Machine learning methods SVM Decision tree Genetic algorithms K-nearest neighbors Naive Bayesian classifiers Particle swarm optimization And multilinear regression models 


  1. 1.
    Schuhmacher A, Gassmann O, Hinder M. Changing R&D models in research-based pharmaceutical companies. J Transl Med. 2016;14:105.CrossRefGoogle Scholar
  2. 2.
    Imming P, Sinning C, Meyer A. Drugs, their targets and the nature and number of drug targets. Nat Rev Drug Discov. 2006;5:821–34.CrossRefGoogle Scholar
  3. 3.
    Gashaw I, Ellinghaus P, Sommer A, Asadullah K. What makes a good drug target? Drug Discov Today. 2011;16:1037–43.CrossRefGoogle Scholar
  4. 4.
    Costa PR, Acencio ML, Lemke N. A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data. BMC Genomics. 2010;11:S9.CrossRefGoogle Scholar
  5. 5.
    Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23:1241–50.CrossRefGoogle Scholar
  6. 6.
    Zhu M, Gao L, Li X, Liu Z, Xu C, Yan Y, et al. The analysis of the drug–targets based on the topological properties in the human protein–protein interaction network. J Drug Target. 2009;17:524–32.CrossRefGoogle Scholar
  7. 7.
    Jeon J, Nim S, Teyra J, Datti A, Wrana JL, Sidhu SS, et al. A systematic approach to identify novel cancer drug targets using machine learning, inhibitor design and high-throughput screening. Genome Med. 2014;6:57.CrossRefGoogle Scholar
  8. 8.
    Bolton EE, Wang Y, Thiessen PA, Bryant SH. PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem. 2008;4:217–41.CrossRefGoogle Scholar
  9. 9.
    Barros RC, Basgalupp MP, de Carvalho ACPLF, Freitas AA. Automatic design of decision-tree algorithms with evolutionary algorithms. Evol Comput. 2013;21:659–84.CrossRefGoogle Scholar
  10. 10.
    Hashem S, Esmat G, Elakel W, Habashy S, Abdel Raouf S, Darweesh S, et al. Accurate prediction of advanced liver fibrosis using the decision tree learning algorithm in chronic hepatitis C Egyptian patients. Gastroenterol Res Pract. 2016;2016:1–7.CrossRefGoogle Scholar
  11. 11.
    Wei Y, Li J, Qing J, Huang M, Wu M, Gao F, et al. Discovery of novel hepatitis C virus NS5B polymerase inhibitors by combining random forest, multiple e-pharmacophore modeling and docking. PLoS One. 2016;11:e0148181.CrossRefGoogle Scholar
  12. 12.
    Barton HA, Pastoor TP, Baetcke K, Chambers JE, Diliberto J, Doerrer NG, et al. The acquisition and application of absorption, distribution, metabolism, and excretion (ADME) data in agricultural chemical safety assessments. Crit Rev Toxicol. 2006;36:9–35.CrossRefGoogle Scholar
  13. 13.
    Vrbanac J, Slauter R. ADME in Drug Discovery, A Comprehensive Guide to Toxicology in Nonclinical Drug Development (2nd Ed)2017;39–67CrossRefGoogle Scholar
  14. 14.
    Maltarollo VG, Gertrudes JC, Oliveira PR, Honorio KM. Applying machine learning techniques for ADME-Tox prediction: a review. Expert Opin Drug Metab Toxicol. 2015;11:259–71.CrossRefGoogle Scholar
  15. 15.
    Alexopoulos EC. Introduction to multivariate regression analysis. Hippokratia. 2010;14:23–8.PubMedPubMedCentralGoogle Scholar
  16. 16.
    Cramer RD. Partial least squares (PLS): its strengths and limitations. Perspect Drug Discovery Des. 1993;1:269–78.CrossRefGoogle Scholar
  17. 17.
    Yon JM. Protein folding: a perspective for biology, medicine and biotechnology, Braz J Med Biol Res, April 2001;34(4):419–435.CrossRefGoogle Scholar
  18. 18.
    Greenfield NJ. Using circular dichroism spectra to estimate protein secondary structure. Nat Protoc. 2006;1:2876–90.CrossRefGoogle Scholar
  19. 19.
    Muggleton S, King RD, Sternberg MJE. Protein secondary structure prediction using logic. Protein Eng. 1992;7:647–57.CrossRefGoogle Scholar
  20. 20.
    Lavecchia A. Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today. 2015;20:318–31.CrossRefGoogle Scholar
  21. 21.
    Amit Y, Geman D. Shape quantization and recognition with randomized. Trees. 1997;9:1545–88.Google Scholar
  22. 22.
    Ho TK. The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell. 1998;20:832–44.CrossRefGoogle Scholar
  23. 23.
    Dietterich TG. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn. 2000;40:139–57.CrossRefGoogle Scholar
  24. 24.
    Buja A, Stuetzle W. Observations on bagging. Stat Sin. 2006;16:323.Google Scholar
  25. 25.
    Ziegler A, König IR. Mining data with random forests: current options for real-world applications. Wiley Interdiscip Rev Data Min Knowl Discov. 2014;4:55–63.CrossRefGoogle Scholar
  26. 26.
    Zhang Y, Lok ASF, Higgins PDR, Konerman MA, Waljee AK, Zhu J. Improvement of predictive models of risk of disease progression in chronic hepatitis C by incorporating longitudinal data. Hepatology. 2015;61:1832–41.CrossRefGoogle Scholar
  27. 27.
    Ping Qiu, Xiao-Yan Cai, Wei Ding, Qing Zhang, Ellie D Norris, and Jonathan R Greene, HCV genotyping using statistical classification approach, J Biomed Sci. 2009; 16(1): 62.CrossRefGoogle Scholar
  28. 28.
    Srivastava DK, Lekha B. Data classification using support vector machine. J Theor Appl Inf Technol. 2005;12:1–7.Google Scholar
  29. 29.
    Understanding Support Vector Machine algorithm from examples (along with code). Available at
  30. 30.
    Chapter 2: SVM (Support Vector Machine) — Theory – Machine learning 101 – Medium. Available at
  31. 31.
    Kareem IA, Duaimi MG. Improved accuracy for decision tree algorithm based on unsupervised discretization. Int J Comput Sci Mob Comput. 2014;36:176–83.Google Scholar
  32. 32.
    Kawamura Y, Takasaki S, Mizokami M. Using decision tree learning to predict the responsiveness of hepatitis C patients to drug treatment. FEBS Open Bio. 2012;2:98–102.CrossRefGoogle Scholar
  33. 33.
    Shapiro J. Genetic algorithms in machine learning. Berlin, Heidelberg: Springer; 2001. p. 146–68.Google Scholar
  34. 34.
    Rafiei H, Khanzadeh M, Mozaffari S, Bostanifar MH, Avval ZM, Aalizadeh R, et al. QSAR study of HCV NS5B polymerase inhibitors using the genetic algorithm-multiple linear regression (GA-MLR). EXCLI J. 2016;15:38–53.PubMedPubMedCentralGoogle Scholar
  35. 35.
    Fix E, Hodges JL. Discriminatory analysis. Nonparametric discrimination: consistency properties. Int Stat Rev/Rev Int Stat. 1989;57:238.CrossRefGoogle Scholar
  36. 36.
    Mitchell TM. Instance-based Learning, Machine Learning. McGraw-Hill publishers, ISBN: 0070428077 (March 1, 1997).Google Scholar
  37. 37.
    Chomboon K, Chujai P, Teerarassammee P, Kerdprasop K, Kerdprasop N. An empirical study of distance metrics for k-nearest neighbor algorithm. In: The proceedings of the 2nd international conference on industrial application engineering 2015; 2015, p. 280–285.Google Scholar
  38. 38.
    Shi H-Y, Lee K-T, Lee H-H, Ho W-H, Sun D-P, Wang J-J, et al. Comparison of artificial neural network and logistic regression models for predicting in-hospital mortality after primary liver cancer surgery. PLoS One. 2012;7:e35781.CrossRefGoogle Scholar
  39. 39.
    Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, et al. New methods for ligand-based virtual screening: use of data fusion and machine learning to enhance the effectiveness of similarity searching. J Chem Inf Model. 2006;46:462–70.CrossRefGoogle Scholar
  40. 40.
    Vijayarani S, Dhayanand S. Liver disease prediction using SVM and Naïve Bayes algorithms. Int J Sci Eng Technol Res. 2015;4:816–20.Google Scholar
  41. 41.
    Kennedy J, Eberhart R. Particle Swarm Optimization, Computational Intelligence PC Tools, 1996 by Academic Press Professional (APP).Google Scholar
  42. 42.
    Salleh FHM, Zainudin S, Arif SM. Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems, Advances in Bioinformatics, 2017, 1–15.CrossRefGoogle Scholar
  43. 43.
    Qin Z, Wang M, Yan A. QSAR studies of the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by multiple linear regression (MLR) and support vector machine (SVM). Bioorg Med Chem Lett. 2017;27:2931–8.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Sajitha Lulu Sudhakaran
    • 1
    Email author
  • Deepa Madathil
    • 2
  • Mohanapriya Arumugam
    • 1
  • Vino Sundararajan
    • 3
  1. 1.Department of BiotechnologySchool of BioSciences and Technology, Vellore Institute of TechnologyVelloreIndia
  2. 2.Department of Sensor and Biomedical TechnologySchool of Electronics Engineering, Vellore Institute of TechnologyVelloreIndia
  3. 3.Department of of BiosciencesSchool of BioSciences and Technology, Vellore Institute of TechnologyVelloreIndia

Personalised recommendations