Skip to main content

Drug Development for Hepatitis C Virus Infection: Machine Learning Applications

  • Chapter
  • First Online:

Abstract

Hepatitis C virus (HCV) infection is one of the leading causes of mortality and morbidity, and is widely reported for its association with the development of liver cirrhosis, hepatocellular cancer, and liver failure. Most of the reported cases of hepatitis C end up with a chronic form of the infection, existing as a large threat for public health and can be prevented by evading or eradicating the virus through effective drug development. Conventional medicines that are both safe and easily affordable, have not yet been developed for the treatment of chronic HCV infection. Apart from only identifying novel drugs, it is equally important to explore their effectiveness by ascertaining drug target accuracy, which is a crucial part of any drug development program. Moreover, it is highly critical to understand the activity and molecular basis of drug resistance of various drugs, as they may retain activity against a broad spectrum of drug resistant viral variants. Drug discovery and design are highly complex, time consuming, and expensive endeavors. Therefore, it is crucial to incorporate new technologies for this process. Modern drug design strategies include ligand-based (LBDD) and structure-based drug design (SBDD) methods to develop new drug candidates. Machine Learning (ML) approaches are extensively applied in drug design processes for HCV and most common applications include classifying drug targets into druggable and non-druggable, prioritizing drug targets, discovering novel inhibitors, predicting diseases by using risk factors as classifiers, in silico ADMET prediction, etc. However, a few studies using Machine Learning approaches have been reported for prediction of biological activity from multivariate models, prediction of binding site secondary structural modes of docking, and virtual screening.

The most common ML techniques applied in HCV drug discovery, comprise techniques such as random forest, SVM, Decision tree, Genetic algorithms, K-Nearest Neighbor’s, Naive Bayesian classifiers, Particle swarm optimization, as well as multilinear regression models. These tools are widely used in drug discovery studies as they are readily accessible, both as open source and commercial distributions, statistically consistent, computationally efficient, and relatively straight-forward to implement and interpret. Moreover, data-mining software enables users to implement these algorithms through graphical user interfaces and can also be written and executed using packages such as R, Matlab, and Octave. Datamining and Machine Learning approaches hence seem as promising aid for Drug Development studies on HCV infection.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Schuhmacher A, Gassmann O, Hinder M. Changing R&D models in research-based pharmaceutical companies. J Transl Med. 2016;14:105.

    Article  Google Scholar 

  2. Imming P, Sinning C, Meyer A. Drugs, their targets and the nature and number of drug targets. Nat Rev Drug Discov. 2006;5:821–34.

    Article  CAS  Google Scholar 

  3. Gashaw I, Ellinghaus P, Sommer A, Asadullah K. What makes a good drug target? Drug Discov Today. 2011;16:1037–43.

    Article  CAS  Google Scholar 

  4. Costa PR, Acencio ML, Lemke N. A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data. BMC Genomics. 2010;11:S9.

    Article  Google Scholar 

  5. Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23:1241–50.

    Article  Google Scholar 

  6. Zhu M, Gao L, Li X, Liu Z, Xu C, Yan Y, et al. The analysis of the drug–targets based on the topological properties in the human protein–protein interaction network. J Drug Target. 2009;17:524–32.

    Article  CAS  Google Scholar 

  7. Jeon J, Nim S, Teyra J, Datti A, Wrana JL, Sidhu SS, et al. A systematic approach to identify novel cancer drug targets using machine learning, inhibitor design and high-throughput screening. Genome Med. 2014;6:57.

    Article  Google Scholar 

  8. Bolton EE, Wang Y, Thiessen PA, Bryant SH. PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem. 2008;4:217–41.

    Article  CAS  Google Scholar 

  9. Barros RC, Basgalupp MP, de Carvalho ACPLF, Freitas AA. Automatic design of decision-tree algorithms with evolutionary algorithms. Evol Comput. 2013;21:659–84.

    Article  Google Scholar 

  10. Hashem S, Esmat G, Elakel W, Habashy S, Abdel Raouf S, Darweesh S, et al. Accurate prediction of advanced liver fibrosis using the decision tree learning algorithm in chronic hepatitis C Egyptian patients. Gastroenterol Res Pract. 2016;2016:1–7.

    Article  Google Scholar 

  11. Wei Y, Li J, Qing J, Huang M, Wu M, Gao F, et al. Discovery of novel hepatitis C virus NS5B polymerase inhibitors by combining random forest, multiple e-pharmacophore modeling and docking. PLoS One. 2016;11:e0148181.

    Article  Google Scholar 

  12. Barton HA, Pastoor TP, Baetcke K, Chambers JE, Diliberto J, Doerrer NG, et al. The acquisition and application of absorption, distribution, metabolism, and excretion (ADME) data in agricultural chemical safety assessments. Crit Rev Toxicol. 2006;36:9–35.

    Article  CAS  Google Scholar 

  13. Vrbanac J, Slauter R. ADME in Drug Discovery, A Comprehensive Guide to Toxicology in Nonclinical Drug Development (2nd Ed)2017;39–67

    Chapter  Google Scholar 

  14. Maltarollo VG, Gertrudes JC, Oliveira PR, Honorio KM. Applying machine learning techniques for ADME-Tox prediction: a review. Expert Opin Drug Metab Toxicol. 2015;11:259–71.

    Article  CAS  Google Scholar 

  15. Alexopoulos EC. Introduction to multivariate regression analysis. Hippokratia. 2010;14:23–8.

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Cramer RD. Partial least squares (PLS): its strengths and limitations. Perspect Drug Discovery Des. 1993;1:269–78.

    Article  CAS  Google Scholar 

  17. Yon JM. Protein folding: a perspective for biology, medicine and biotechnology, Braz J Med Biol Res, April 2001;34(4):419–435.

    Article  CAS  Google Scholar 

  18. Greenfield NJ. Using circular dichroism spectra to estimate protein secondary structure. Nat Protoc. 2006;1:2876–90.

    Article  CAS  Google Scholar 

  19. Muggleton S, King RD, Sternberg MJE. Protein secondary structure prediction using logic. Protein Eng. 1992;7:647–57.

    Article  Google Scholar 

  20. Lavecchia A. Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today. 2015;20:318–31.

    Article  Google Scholar 

  21. Amit Y, Geman D. Shape quantization and recognition with randomized. Trees. 1997;9:1545–88.

    Google Scholar 

  22. Ho TK. The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell. 1998;20:832–44.

    Article  Google Scholar 

  23. Dietterich TG. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn. 2000;40:139–57.

    Article  Google Scholar 

  24. Buja A, Stuetzle W. Observations on bagging. Stat Sin. 2006;16:323.

    Google Scholar 

  25. Ziegler A, König IR. Mining data with random forests: current options for real-world applications. Wiley Interdiscip Rev Data Min Knowl Discov. 2014;4:55–63.

    Article  Google Scholar 

  26. Zhang Y, Lok ASF, Higgins PDR, Konerman MA, Waljee AK, Zhu J. Improvement of predictive models of risk of disease progression in chronic hepatitis C by incorporating longitudinal data. Hepatology. 2015;61:1832–41.

    Article  Google Scholar 

  27. Ping Qiu, Xiao-Yan Cai, Wei Ding, Qing Zhang, Ellie D Norris, and Jonathan R Greene, HCV genotyping using statistical classification approach, J Biomed Sci. 2009; 16(1): 62.

    Article  Google Scholar 

  28. Srivastava DK, Lekha B. Data classification using support vector machine. J Theor Appl Inf Technol. 2005;12:1–7.

    Google Scholar 

  29. Understanding Support Vector Machine algorithm from examples (along with code). Available at https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/.

  30. Chapter 2: SVM (Support Vector Machine) — Theory – Machine learning 101 – Medium. Available at https://medium.com/machine-learning-101/chapter-2-svm-support-vector-machine-theory-f0812effc72.

  31. Kareem IA, Duaimi MG. Improved accuracy for decision tree algorithm based on unsupervised discretization. Int J Comput Sci Mob Comput. 2014;36:176–83.

    Google Scholar 

  32. Kawamura Y, Takasaki S, Mizokami M. Using decision tree learning to predict the responsiveness of hepatitis C patients to drug treatment. FEBS Open Bio. 2012;2:98–102.

    Article  CAS  Google Scholar 

  33. Shapiro J. Genetic algorithms in machine learning. Berlin, Heidelberg: Springer; 2001. p. 146–68.

    Google Scholar 

  34. Rafiei H, Khanzadeh M, Mozaffari S, Bostanifar MH, Avval ZM, Aalizadeh R, et al. QSAR study of HCV NS5B polymerase inhibitors using the genetic algorithm-multiple linear regression (GA-MLR). EXCLI J. 2016;15:38–53.

    PubMed  PubMed Central  Google Scholar 

  35. Fix E, Hodges JL. Discriminatory analysis. Nonparametric discrimination: consistency properties. Int Stat Rev/Rev Int Stat. 1989;57:238.

    Article  Google Scholar 

  36. Mitchell TM. Instance-based Learning, Machine Learning. McGraw-Hill publishers, ISBN: 0070428077 (March 1, 1997).

    Google Scholar 

  37. Chomboon K, Chujai P, Teerarassammee P, Kerdprasop K, Kerdprasop N. An empirical study of distance metrics for k-nearest neighbor algorithm. In: The proceedings of the 2nd international conference on industrial application engineering 2015; 2015, p. 280–285.

    Google Scholar 

  38. Shi H-Y, Lee K-T, Lee H-H, Ho W-H, Sun D-P, Wang J-J, et al. Comparison of artificial neural network and logistic regression models for predicting in-hospital mortality after primary liver cancer surgery. PLoS One. 2012;7:e35781.

    Article  CAS  Google Scholar 

  39. Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, et al. New methods for ligand-based virtual screening: use of data fusion and machine learning to enhance the effectiveness of similarity searching. J Chem Inf Model. 2006;46:462–70.

    Article  CAS  Google Scholar 

  40. Vijayarani S, Dhayanand S. Liver disease prediction using SVM and Naïve Bayes algorithms. Int J Sci Eng Technol Res. 2015;4:816–20.

    Google Scholar 

  41. Kennedy J, Eberhart R. Particle Swarm Optimization, Computational Intelligence PC Tools, 1996 by Academic Press Professional (APP).

    Google Scholar 

  42. Salleh FHM, Zainudin S, Arif SM. Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems, Advances in Bioinformatics, 2017, 1–15.

    Article  Google Scholar 

  43. Qin Z, Wang M, Yan A. QSAR studies of the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by multiple linear regression (MLR) and support vector machine (SVM). Bioorg Med Chem Lett. 2017;27:2931–8.

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sajitha Lulu Sudhakaran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Sudhakaran, S.L., Madathil, D., Arumugam, M., Sundararajan, V. (2019). Drug Development for Hepatitis C Virus Infection: Machine Learning Applications. In: Shapshak, P., et al. Global Virology III: Virology in the 21st Century. Springer, Cham. https://doi.org/10.1007/978-3-030-29022-1_6

Download citation

Publish with us

Policies and ethics