Drug Development for Hepatitis C Virus Infection: Machine Learning Applications

Sudhakaran, Sajitha Lulu; Madathil, Deepa; Arumugam, Mohanapriya; Sundararajan, Vino

doi:10.1007/978-3-030-29022-1_6

Drug Development for Hepatitis C Virus Infection: Machine Learning Applications

Sajitha Lulu Sudhakaran⁸,
Deepa Madathil⁹,
Mohanapriya Arumugam⁸ &
…
Vino Sundararajan¹⁰

Chapter
First Online: 23 November 2019

1003 Accesses
2 Citations

Abstract

Hepatitis C virus (HCV) infection is one of the leading causes of mortality and morbidity, and is widely reported for its association with the development of liver cirrhosis, hepatocellular cancer, and liver failure. Most of the reported cases of hepatitis C end up with a chronic form of the infection, existing as a large threat for public health and can be prevented by evading or eradicating the virus through effective drug development. Conventional medicines that are both safe and easily affordable, have not yet been developed for the treatment of chronic HCV infection. Apart from only identifying novel drugs, it is equally important to explore their effectiveness by ascertaining drug target accuracy, which is a crucial part of any drug development program. Moreover, it is highly critical to understand the activity and molecular basis of drug resistance of various drugs, as they may retain activity against a broad spectrum of drug resistant viral variants. Drug discovery and design are highly complex, time consuming, and expensive endeavors. Therefore, it is crucial to incorporate new technologies for this process. Modern drug design strategies include ligand-based (LBDD) and structure-based drug design (SBDD) methods to develop new drug candidates. Machine Learning (ML) approaches are extensively applied in drug design processes for HCV and most common applications include classifying drug targets into druggable and non-druggable, prioritizing drug targets, discovering novel inhibitors, predicting diseases by using risk factors as classifiers, in silico ADMET prediction, etc. However, a few studies using Machine Learning approaches have been reported for prediction of biological activity from multivariate models, prediction of binding site secondary structural modes of docking, and virtual screening.

The most common ML techniques applied in HCV drug discovery, comprise techniques such as random forest, SVM, Decision tree, Genetic algorithms, K-Nearest Neighbor’s, Naive Bayesian classifiers, Particle swarm optimization, as well as multilinear regression models. These tools are widely used in drug discovery studies as they are readily accessible, both as open source and commercial distributions, statistically consistent, computationally efficient, and relatively straight-forward to implement and interpret. Moreover, data-mining software enables users to implement these algorithms through graphical user interfaces and can also be written and executed using packages such as R, Matlab, and Octave. Datamining and Machine Learning approaches hence seem as promising aid for Drug Development studies on HCV infection.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Schuhmacher A, Gassmann O, Hinder M. Changing R&D models in research-based pharmaceutical companies. J Transl Med. 2016;14:105.
Article Google Scholar
Imming P, Sinning C, Meyer A. Drugs, their targets and the nature and number of drug targets. Nat Rev Drug Discov. 2006;5:821–34.
Article CAS Google Scholar
Gashaw I, Ellinghaus P, Sommer A, Asadullah K. What makes a good drug target? Drug Discov Today. 2011;16:1037–43.
Article CAS Google Scholar
Costa PR, Acencio ML, Lemke N. A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data. BMC Genomics. 2010;11:S9.
Article Google Scholar
Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23:1241–50.
Article Google Scholar
Zhu M, Gao L, Li X, Liu Z, Xu C, Yan Y, et al. The analysis of the drug–targets based on the topological properties in the human protein–protein interaction network. J Drug Target. 2009;17:524–32.
Article CAS Google Scholar
Jeon J, Nim S, Teyra J, Datti A, Wrana JL, Sidhu SS, et al. A systematic approach to identify novel cancer drug targets using machine learning, inhibitor design and high-throughput screening. Genome Med. 2014;6:57.
Article Google Scholar
Bolton EE, Wang Y, Thiessen PA, Bryant SH. PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem. 2008;4:217–41.
Article CAS Google Scholar
Barros RC, Basgalupp MP, de Carvalho ACPLF, Freitas AA. Automatic design of decision-tree algorithms with evolutionary algorithms. Evol Comput. 2013;21:659–84.
Article Google Scholar
Hashem S, Esmat G, Elakel W, Habashy S, Abdel Raouf S, Darweesh S, et al. Accurate prediction of advanced liver fibrosis using the decision tree learning algorithm in chronic hepatitis C Egyptian patients. Gastroenterol Res Pract. 2016;2016:1–7.
Article Google Scholar
Wei Y, Li J, Qing J, Huang M, Wu M, Gao F, et al. Discovery of novel hepatitis C virus NS5B polymerase inhibitors by combining random forest, multiple e-pharmacophore modeling and docking. PLoS One. 2016;11:e0148181.
Article Google Scholar
Barton HA, Pastoor TP, Baetcke K, Chambers JE, Diliberto J, Doerrer NG, et al. The acquisition and application of absorption, distribution, metabolism, and excretion (ADME) data in agricultural chemical safety assessments. Crit Rev Toxicol. 2006;36:9–35.
Article CAS Google Scholar
Vrbanac J, Slauter R. ADME in Drug Discovery, A Comprehensive Guide to Toxicology in Nonclinical Drug Development (2nd Ed)2017;39–67
Chapter Google Scholar
Maltarollo VG, Gertrudes JC, Oliveira PR, Honorio KM. Applying machine learning techniques for ADME-Tox prediction: a review. Expert Opin Drug Metab Toxicol. 2015;11:259–71.
Article CAS Google Scholar
Alexopoulos EC. Introduction to multivariate regression analysis. Hippokratia. 2010;14:23–8.
CAS PubMed PubMed Central Google Scholar
Cramer RD. Partial least squares (PLS): its strengths and limitations. Perspect Drug Discovery Des. 1993;1:269–78.
Article CAS Google Scholar
Yon JM. Protein folding: a perspective for biology, medicine and biotechnology, Braz J Med Biol Res, April 2001;34(4):419–435.
Article CAS Google Scholar
Greenfield NJ. Using circular dichroism spectra to estimate protein secondary structure. Nat Protoc. 2006;1:2876–90.
Article CAS Google Scholar
Muggleton S, King RD, Sternberg MJE. Protein secondary structure prediction using logic. Protein Eng. 1992;7:647–57.
Article Google Scholar
Lavecchia A. Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today. 2015;20:318–31.
Article Google Scholar
Amit Y, Geman D. Shape quantization and recognition with randomized. Trees. 1997;9:1545–88.
Google Scholar
Ho TK. The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell. 1998;20:832–44.
Article Google Scholar
Dietterich TG. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn. 2000;40:139–57.
Article Google Scholar
Buja A, Stuetzle W. Observations on bagging. Stat Sin. 2006;16:323.
Google Scholar
Ziegler A, König IR. Mining data with random forests: current options for real-world applications. Wiley Interdiscip Rev Data Min Knowl Discov. 2014;4:55–63.
Article Google Scholar
Zhang Y, Lok ASF, Higgins PDR, Konerman MA, Waljee AK, Zhu J. Improvement of predictive models of risk of disease progression in chronic hepatitis C by incorporating longitudinal data. Hepatology. 2015;61:1832–41.
Article Google Scholar
Ping Qiu, Xiao-Yan Cai, Wei Ding, Qing Zhang, Ellie D Norris, and Jonathan R Greene, HCV genotyping using statistical classification approach, J Biomed Sci. 2009; 16(1): 62.
Article Google Scholar
Srivastava DK, Lekha B. Data classification using support vector machine. J Theor Appl Inf Technol. 2005;12:1–7.
Google Scholar
Understanding Support Vector Machine algorithm from examples (along with code). Available at https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/.
Chapter 2: SVM (Support Vector Machine) — Theory – Machine learning 101 – Medium. Available at https://medium.com/machine-learning-101/chapter-2-svm-support-vector-machine-theory-f0812effc72.
Kareem IA, Duaimi MG. Improved accuracy for decision tree algorithm based on unsupervised discretization. Int J Comput Sci Mob Comput. 2014;36:176–83.
Google Scholar
Kawamura Y, Takasaki S, Mizokami M. Using decision tree learning to predict the responsiveness of hepatitis C patients to drug treatment. FEBS Open Bio. 2012;2:98–102.
Article CAS Google Scholar
Shapiro J. Genetic algorithms in machine learning. Berlin, Heidelberg: Springer; 2001. p. 146–68.
Google Scholar
Rafiei H, Khanzadeh M, Mozaffari S, Bostanifar MH, Avval ZM, Aalizadeh R, et al. QSAR study of HCV NS5B polymerase inhibitors using the genetic algorithm-multiple linear regression (GA-MLR). EXCLI J. 2016;15:38–53.
PubMed PubMed Central Google Scholar
Fix E, Hodges JL. Discriminatory analysis. Nonparametric discrimination: consistency properties. Int Stat Rev/Rev Int Stat. 1989;57:238.
Article Google Scholar
Mitchell TM. Instance-based Learning, Machine Learning. McGraw-Hill publishers, ISBN: 0070428077 (March 1, 1997).
Google Scholar
Chomboon K, Chujai P, Teerarassammee P, Kerdprasop K, Kerdprasop N. An empirical study of distance metrics for k-nearest neighbor algorithm. In: The proceedings of the 2nd international conference on industrial application engineering 2015; 2015, p. 280–285.
Google Scholar
Shi H-Y, Lee K-T, Lee H-H, Ho W-H, Sun D-P, Wang J-J, et al. Comparison of artificial neural network and logistic regression models for predicting in-hospital mortality after primary liver cancer surgery. PLoS One. 2012;7:e35781.
Article CAS Google Scholar
Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, et al. New methods for ligand-based virtual screening: use of data fusion and machine learning to enhance the effectiveness of similarity searching. J Chem Inf Model. 2006;46:462–70.
Article CAS Google Scholar
Vijayarani S, Dhayanand S. Liver disease prediction using SVM and Naïve Bayes algorithms. Int J Sci Eng Technol Res. 2015;4:816–20.
Google Scholar
Kennedy J, Eberhart R. Particle Swarm Optimization, Computational Intelligence PC Tools, 1996 by Academic Press Professional (APP).
Google Scholar
Salleh FHM, Zainudin S, Arif SM. Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems, Advances in Bioinformatics, 2017, 1–15.
Article Google Scholar
Qin Z, Wang M, Yan A. QSAR studies of the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by multiple linear regression (MLR) and support vector machine (SVM). Bioorg Med Chem Lett. 2017;27:2931–8.
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biotechnology, School of BioSciences and Technology, Vellore Institute of Technology, Vellore, Tamilnadu, India
Sajitha Lulu Sudhakaran & Mohanapriya Arumugam
Department of Sensor and Biomedical Technology, School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamilnadu, India
Deepa Madathil
Department of of Biosciences, School of BioSciences and Technology, Vellore Institute of Technology, Vellore, Tamilnadu, India
Vino Sundararajan

Authors

Sajitha Lulu Sudhakaran
View author publications
You can also search for this author in PubMed Google Scholar
Deepa Madathil
View author publications
You can also search for this author in PubMed Google Scholar
Mohanapriya Arumugam
View author publications
You can also search for this author in PubMed Google Scholar
Vino Sundararajan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sajitha Lulu Sudhakaran .

Editor information

Editors and Affiliations

Department of Internal Medicine, University of South Florida, Tampa, FL, USA
Paul Shapshak
Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India
Seetharaman Balaji
Biomedical Informatics 17, Irulan Sandy Annex, Pondicherry, Pondicherry, India
Pandjassarame Kangueane
Oral Biology and Medicine, CHS 63-090, UCLA School of Dentistry Oral Biology and Medicine, CHS 63-090, Los Angeles, CA, USA
Francesco Chiappelli
Department of Internal Medicine, University of South Florida, Tampa, FL, USA
Charurut Somboonwit
Department of Internal Medicine, University of South Florida, Tampa, FL, USA
Lynette J. Menezes
Department of Internal Medicine, University of South Florida, Tampa, FL, USA
John T. Sinnott

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sudhakaran, S.L., Madathil, D., Arumugam, M., Sundararajan, V. (2019). Drug Development for Hepatitis C Virus Infection: Machine Learning Applications. In: Shapshak, P., et al. Global Virology III: Virology in the 21st Century. Springer, Cham. https://doi.org/10.1007/978-3-030-29022-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-29022-1_6
Published: 23 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29021-4
Online ISBN: 978-3-030-29022-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics