Abstract
Hepatitis C virus (HCV) infection is one of the leading causes of mortality and morbidity, and is widely reported for its association with the development of liver cirrhosis, hepatocellular cancer, and liver failure. Most of the reported cases of hepatitis C end up with a chronic form of the infection, existing as a large threat for public health and can be prevented by evading or eradicating the virus through effective drug development. Conventional medicines that are both safe and easily affordable, have not yet been developed for the treatment of chronic HCV infection. Apart from only identifying novel drugs, it is equally important to explore their effectiveness by ascertaining drug target accuracy, which is a crucial part of any drug development program. Moreover, it is highly critical to understand the activity and molecular basis of drug resistance of various drugs, as they may retain activity against a broad spectrum of drug resistant viral variants. Drug discovery and design are highly complex, time consuming, and expensive endeavors. Therefore, it is crucial to incorporate new technologies for this process. Modern drug design strategies include ligand-based (LBDD) and structure-based drug design (SBDD) methods to develop new drug candidates. Machine Learning (ML) approaches are extensively applied in drug design processes for HCV and most common applications include classifying drug targets into druggable and non-druggable, prioritizing drug targets, discovering novel inhibitors, predicting diseases by using risk factors as classifiers, in silico ADMET prediction, etc. However, a few studies using Machine Learning approaches have been reported for prediction of biological activity from multivariate models, prediction of binding site secondary structural modes of docking, and virtual screening.
The most common ML techniques applied in HCV drug discovery, comprise techniques such as random forest, SVM, Decision tree, Genetic algorithms, K-Nearest Neighbor’s, Naive Bayesian classifiers, Particle swarm optimization, as well as multilinear regression models. These tools are widely used in drug discovery studies as they are readily accessible, both as open source and commercial distributions, statistically consistent, computationally efficient, and relatively straight-forward to implement and interpret. Moreover, data-mining software enables users to implement these algorithms through graphical user interfaces and can also be written and executed using packages such as R, Matlab, and Octave. Datamining and Machine Learning approaches hence seem as promising aid for Drug Development studies on HCV infection.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Schuhmacher A, Gassmann O, Hinder M. Changing R&D models in research-based pharmaceutical companies. J Transl Med. 2016;14:105.
Imming P, Sinning C, Meyer A. Drugs, their targets and the nature and number of drug targets. Nat Rev Drug Discov. 2006;5:821–34.
Gashaw I, Ellinghaus P, Sommer A, Asadullah K. What makes a good drug target? Drug Discov Today. 2011;16:1037–43.
Costa PR, Acencio ML, Lemke N. A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data. BMC Genomics. 2010;11:S9.
Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23:1241–50.
Zhu M, Gao L, Li X, Liu Z, Xu C, Yan Y, et al. The analysis of the drug–targets based on the topological properties in the human protein–protein interaction network. J Drug Target. 2009;17:524–32.
Jeon J, Nim S, Teyra J, Datti A, Wrana JL, Sidhu SS, et al. A systematic approach to identify novel cancer drug targets using machine learning, inhibitor design and high-throughput screening. Genome Med. 2014;6:57.
Bolton EE, Wang Y, Thiessen PA, Bryant SH. PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem. 2008;4:217–41.
Barros RC, Basgalupp MP, de Carvalho ACPLF, Freitas AA. Automatic design of decision-tree algorithms with evolutionary algorithms. Evol Comput. 2013;21:659–84.
Hashem S, Esmat G, Elakel W, Habashy S, Abdel Raouf S, Darweesh S, et al. Accurate prediction of advanced liver fibrosis using the decision tree learning algorithm in chronic hepatitis C Egyptian patients. Gastroenterol Res Pract. 2016;2016:1–7.
Wei Y, Li J, Qing J, Huang M, Wu M, Gao F, et al. Discovery of novel hepatitis C virus NS5B polymerase inhibitors by combining random forest, multiple e-pharmacophore modeling and docking. PLoS One. 2016;11:e0148181.
Barton HA, Pastoor TP, Baetcke K, Chambers JE, Diliberto J, Doerrer NG, et al. The acquisition and application of absorption, distribution, metabolism, and excretion (ADME) data in agricultural chemical safety assessments. Crit Rev Toxicol. 2006;36:9–35.
Vrbanac J, Slauter R. ADME in Drug Discovery, A Comprehensive Guide to Toxicology in Nonclinical Drug Development (2nd Ed)2017;39–67
Maltarollo VG, Gertrudes JC, Oliveira PR, Honorio KM. Applying machine learning techniques for ADME-Tox prediction: a review. Expert Opin Drug Metab Toxicol. 2015;11:259–71.
Alexopoulos EC. Introduction to multivariate regression analysis. Hippokratia. 2010;14:23–8.
Cramer RD. Partial least squares (PLS): its strengths and limitations. Perspect Drug Discovery Des. 1993;1:269–78.
Yon JM. Protein folding: a perspective for biology, medicine and biotechnology, Braz J Med Biol Res, April 2001;34(4):419–435.
Greenfield NJ. Using circular dichroism spectra to estimate protein secondary structure. Nat Protoc. 2006;1:2876–90.
Muggleton S, King RD, Sternberg MJE. Protein secondary structure prediction using logic. Protein Eng. 1992;7:647–57.
Lavecchia A. Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today. 2015;20:318–31.
Amit Y, Geman D. Shape quantization and recognition with randomized. Trees. 1997;9:1545–88.
Ho TK. The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell. 1998;20:832–44.
Dietterich TG. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn. 2000;40:139–57.
Buja A, Stuetzle W. Observations on bagging. Stat Sin. 2006;16:323.
Ziegler A, König IR. Mining data with random forests: current options for real-world applications. Wiley Interdiscip Rev Data Min Knowl Discov. 2014;4:55–63.
Zhang Y, Lok ASF, Higgins PDR, Konerman MA, Waljee AK, Zhu J. Improvement of predictive models of risk of disease progression in chronic hepatitis C by incorporating longitudinal data. Hepatology. 2015;61:1832–41.
Ping Qiu, Xiao-Yan Cai, Wei Ding, Qing Zhang, Ellie D Norris, and Jonathan R Greene, HCV genotyping using statistical classification approach, J Biomed Sci. 2009; 16(1): 62.
Srivastava DK, Lekha B. Data classification using support vector machine. J Theor Appl Inf Technol. 2005;12:1–7.
Understanding Support Vector Machine algorithm from examples (along with code). Available at https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/.
Chapter 2: SVM (Support Vector Machine) — Theory – Machine learning 101 – Medium. Available at https://medium.com/machine-learning-101/chapter-2-svm-support-vector-machine-theory-f0812effc72.
Kareem IA, Duaimi MG. Improved accuracy for decision tree algorithm based on unsupervised discretization. Int J Comput Sci Mob Comput. 2014;36:176–83.
Kawamura Y, Takasaki S, Mizokami M. Using decision tree learning to predict the responsiveness of hepatitis C patients to drug treatment. FEBS Open Bio. 2012;2:98–102.
Shapiro J. Genetic algorithms in machine learning. Berlin, Heidelberg: Springer; 2001. p. 146–68.
Rafiei H, Khanzadeh M, Mozaffari S, Bostanifar MH, Avval ZM, Aalizadeh R, et al. QSAR study of HCV NS5B polymerase inhibitors using the genetic algorithm-multiple linear regression (GA-MLR). EXCLI J. 2016;15:38–53.
Fix E, Hodges JL. Discriminatory analysis. Nonparametric discrimination: consistency properties. Int Stat Rev/Rev Int Stat. 1989;57:238.
Mitchell TM. Instance-based Learning, Machine Learning. McGraw-Hill publishers, ISBN: 0070428077 (March 1, 1997).
Chomboon K, Chujai P, Teerarassammee P, Kerdprasop K, Kerdprasop N. An empirical study of distance metrics for k-nearest neighbor algorithm. In: The proceedings of the 2nd international conference on industrial application engineering 2015; 2015, p. 280–285.
Shi H-Y, Lee K-T, Lee H-H, Ho W-H, Sun D-P, Wang J-J, et al. Comparison of artificial neural network and logistic regression models for predicting in-hospital mortality after primary liver cancer surgery. PLoS One. 2012;7:e35781.
Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, et al. New methods for ligand-based virtual screening: use of data fusion and machine learning to enhance the effectiveness of similarity searching. J Chem Inf Model. 2006;46:462–70.
Vijayarani S, Dhayanand S. Liver disease prediction using SVM and Naïve Bayes algorithms. Int J Sci Eng Technol Res. 2015;4:816–20.
Kennedy J, Eberhart R. Particle Swarm Optimization, Computational Intelligence PC Tools, 1996 by Academic Press Professional (APP).
Salleh FHM, Zainudin S, Arif SM. Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems, Advances in Bioinformatics, 2017, 1–15.
Qin Z, Wang M, Yan A. QSAR studies of the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by multiple linear regression (MLR) and support vector machine (SVM). Bioorg Med Chem Lett. 2017;27:2931–8.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Sudhakaran, S.L., Madathil, D., Arumugam, M., Sundararajan, V. (2019). Drug Development for Hepatitis C Virus Infection: Machine Learning Applications. In: Shapshak, P., et al. Global Virology III: Virology in the 21st Century. Springer, Cham. https://doi.org/10.1007/978-3-030-29022-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-29022-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29021-4
Online ISBN: 978-3-030-29022-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)