Abstract
Machine learning (ML) is a field of computer science that allows interrogation to allow modified navigation (learning) of the data and through statistical derivation prediction of unseen data or events. ML has been a high-profile topic for many years and is ubiquitous in many aspects of daily life – from e-mail spam and malware filtering to search results refining online customer service and fraud detection. More recently, ML has been pervasive in solving complex nonlinear phenomena in pharmaceutical and medical sciences. It has been used in modeling chemical data sets for two decades. It has only recently become a useful approach to improve healthcare diagnoses and to provide personalized medical treatments. The rapid growth in data collection and integration, as well as the accessibility of increasing computing power, especially in cloud services, explains this unforeseen capacity to transform data into information, information into knowledge, and knowledge into wisdom (see Fig. 7.1). In this section, we briefly introduce the concepts and types of ML and its application for drug discovery, drug product development, and clinical application. The literature in these fields and the importance and challenges of interpreting ML results are also discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abuhammad, A., & Taha, M. O. (2016). QSAR studies in the discovery of novel type-II diabetic therapies. Expert Opinion on Drug Discovery, 11(2), 197–214. https://doi.org/10.1517/17460441.2016.1118046
Alves, V., Braga, R., Muratov, E., & Andrade, C. (2018). Development of web and mobile applications for chemical toxicity prediction. Journal of the Brazilian Chemical Society, 29(5), 982–988. https://doi.org/10.21577/0103-5053.20180013
Alves, V. M., Capuzzi, S. J., Braga, R. C., Borba, J. V. B., Silva, A. C., Luechtefeld, T., … Tropsha, A. (2018). A perspective and a new integrated computational strategy for skin sensitization assessment. ACS Sustainable Chemistry & Engineering, 6(3), 2845–2859. https://doi.org/10.1021/acssuschemeng.7b04220
Alves, V. M., Golbraikh, A., Capuzzi, S. J., Liu, K., Lam, W. I., Korn, D. R., … Tropsha, A. (2018). Multi-Descriptor Read Across (MuDRA): A simple and transparent approach for developing accurate quantitative structure–activity relationship models. Journal of Chemical Information and Modeling, 58(6), 1214–1223. https://doi.org/10.1021/acs.jcim.8b00124
Alves, V. M., Hwang, D., Muratov, E., Sokolsky-Papkov, M., Varlamova, E., Vinod, N., … Kabanov, A. (2019). Cheminformatics-driven discovery of polymeric micelle formulations for poorly soluble drugs. Science Advances, 5(6), eaav9784. https://doi.org/10.1126/sciadv.aav9784
Ashburn, T. T., & Thor, K. B. (2004). Drug repositioning: Identifying and developing new uses for existing drugs. Nature Reviews Drug Discovery, 3(8), 673–683. https://doi.org/10.1038/nrd1468
Bi, Y., Might, M., Vankayalapati, H., & Kuberan, B. (2017). Repurposing of Proton Pump Inhibitors as first identified small molecule inhibitors of endo-β-N-acetylglucosaminidase (ENGase) for the treatment of NGLY1 deficiency, a rare genetic disease. Bioorganic & Medicinal Chemistry Letters, 27(13), 2962–2966. https://doi.org/10.1016/j.bmcl.2017.05.010
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., … Zieba, K. (2016). End to end learning for self-driving cars. ArXiv, 1604.07316. Retrieved from http://arxiv.org/abs/1604.07316
Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010 (pp. 177–186). https://doi.org/10.1007/978-3-7908-2604-3_16
Braga, R. C., Alves, V. M., Muratov, E. N., Strickland, J., Kleinstreuer, N., Tropsha, A., & Andrade, C. H. (2017). Pred-skin: A fast and reliable web application to assess skin sensitization effect of chemicals. Journal of Chemical Information and Modeling, 57(5), 1013–1017. https://doi.org/10.1021/acs.jcim.7b00194
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Capuzzi, S. J., Sun, W., Muratov, E. N., Martínez-Romero, C., He, S., Zhu, W., … Tropsha, A. (2018). Computer-aided discovery and characterization of novel Ebola virus inhibitors. Journal of Medicinal Chemistry, 61(8), 3582–3594. https://doi.org/10.1021/acs.jmedchem.8b00035
Casati, S., Aschberger, K., Barroso, J., Casey, W., Delgado, I., Kim, T. S., … Zuang, V. (2018). Standardisation of defined approaches for skin sensitisation testing to support regulatory use and international adoption: Position of the International Cooperation on Alternative Test Methods. Archives of Toxicology, 92(2), 611–617. https://doi.org/10.1007/s00204-017-2097-4
Castelvecchi, D. (2016). Can we open the black box of AI? Nature, 538(7623), 20–23. https://doi.org/10.1038/538020a
Chakraborty, S., Tomsett, R., Raghavendra, R., Harborne, D., Alzantot, M., Cerutti, F., … Gurram, P. (2017). Interpretability of deep learning models: A survey of results. In 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) (pp. 1–6). https://doi.org/10.1109/UIC-ATC.2017.8397411
Che, Z., Purushotham, S., Khemani, R., & Liu, Y. (2016). Interpretable deep models for ICU outcome prediction. In AMIA ... annual symposium proceedings. AMIA symposium, 2016 (pp. 371–380). Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/28269832
Chen, J.-K., Shen, C.-R., & Liu, C.-L. (2010). N-acetylglucosamine: Production and applications. Marine Drugs, 8(9), 2493–2516. https://doi.org/10.3390/md8092493
Cherkasov, A., Muratov, E. N., Fourches, D., Varnek, A., Baskin, I. I., Cronin, M., … Tropsha, A. (2014). QSAR modeling: Where have you been? Where are you going to? Journal of Medicinal Chemistry, 57(12), 4977–5010. https://doi.org/10.1021/jm4004285
Ciallella, H. L., & Zhu, H. (2019). Advancing computational toxicology in the big data era by artificial intelligence: Data-driven and mechanism-driven modeling for chemical toxicity. Chemical Research in Toxicology, 32(4), 536–547. https://doi.org/10.1021/acs.chemrestox.8b00393
Courtiol, P., Maussion, C., Moarii, M., Pronier, E., Pilcer, S., Sefta, M., … Clozel, T. (2019). Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nature Medicine, 25(10), 1519–1525. https://doi.org/10.1038/s41591-019-0583-3
Dearden, J. C. (2016). The history and development of quantitative structure-activity relationships (QSARs). International Journal of Quantitative Structure-Property Relationships, 1(1), 1–44. https://doi.org/10.4018/IJQSPR.2016010101
Dearden, J. C., Cronin, M. T. D., & Kaiser, K. L. E. (2009). How not to develop a quantitative structure-activity or structure-property relationship (QSAR/QSPR). SAR and QSAR in Environmental Research, 20(3–4), 241–266. https://doi.org/10.1080/10629360902949567
Dearden, J. C., Hewitt, M., Roberts, D. W., Enoch, S. J., Rowe, P. H., Przybylak, K. R., … Katritzky, A. R. (2015). Mechanism-based QSAR modeling of skin sensitization. Chemical Research in Toxicology, 28(10), 1975–1986. https://doi.org/10.1021/acs.chemrestox.5b00197
Decencière, E., Cazuguel, G., Zhang, X., Thibault, G., Klein, J. C., Meyer, F., … Chabouis, A. (2013). TeleOphta: Machine learning and image processing methods for teleophthalmology. IRBM, 34(2), 196–203. https://doi.org/10.1016/j.irbm.2013.01.010
Dhiman, K., & Agarwal, S. M. (2016). NPred: QSAR classification model for identifying plant based naturally occurring anti-cancerous inhibitors. RSC Advances, 6(55), 49395–49400. https://doi.org/10.1039/c6ra02772e
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. ArXiv, 1702.08608. Retrieved from http://arxiv.org/abs/1702.08608
Dreyfus, H. (1979). What computers can’t do: The limits of artificial intelligence. London, UK: MIT Press.
Durdagi, S., Erol, I., Dogan, B., & Berkay Sen, T. (2019). Integration of text mining and binary QSAR models for novel anti-hypertensive antagonist scaffolds. Biophysical Journal, 116(3), 478a. https://doi.org/10.1016/j.bpj.2018.11.2583
Ekins, S., Puhl, A. C., Zorn, K. M., Lane, T. R., Russo, D. P., Klein, J. J., … Clark, A. M. (2019). Exploiting machine learning for end-to-end drug discovery and development. Nature Materials, 18(5), 435–441. https://doi.org/10.1038/s41563-019-0338-z
Fernandez, M., Ban, F., Woo, G., Isaev, O., Perez, C., Fokin, V., … Cherkasov, A. (2019). Quantitative structure–price relationship (QS$R) Modeling and the development of economically feasible drug discovery projects. Journal of Chemical Information and Modeling, 59(4), 1306–1313. https://doi.org/10.1021/acs.jcim.8b00747
Fourches, D., Muratov, E., & Tropsha, A. (2010). Trust, but verify: On the importance of chemical structure curation in cheminformatics and QSAR modeling research. Journal of Chemical Information and Modeling, 50(7), 1189–1204. https://doi.org/10.1021/ci100176x
Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M., Hersey, A., … Overington, J. P. (2012). ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Research, 40(Database issue), D1100–D1107. https://doi.org/10.1093/nar/gkr777
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Retrieved from http://www.deeplearningbook.org/
Goto, T., Jo, T., Matsui, H., Fushimi, K., Hayashi, H., & Yasunaga, H. (2019). Machine learning-based prediction models for 30-day readmission after hospitalization for chronic obstructive pulmonary disease. COPD: Journal of Chronic Obstructive Pulmonary Disease, 1–6. https://doi.org/10.1080/15412555.2019.1688278
Graham, S., Depp, C., Lee, E. E., Nebeker, C., Tu, X., Kim, H.-C., & Jeste, D. V. (2019). Artificial intelligence for mental health and mental illnesses: An overview. Current Psychiatry Reports, 21(11), 116. https://doi.org/10.1007/s11920-019-1094-0
Hisaki, T., Aiba, M., Yamaguchi, M., & Sasa, H. (2015). Development of QSAR models using artificial neural network analysis for risk assessment of repeated-dose, reproductive , and developmental toxicities of cosmetic ingredients. The Journal of Toxicological Sciences, 40(2), 163–180. https://doi.org/10.2131/jts.40.163
Horvitz, E. J., Apacible, J., Sarin, R., & Liao, L. (2012). Prediction, expectation, and surprise: Methods, designs, and study of a deployed traffic forecasting service. ArXiv, 1207.1352. Retrieved from http://arxiv.org/abs/1207.1352
Huval, B., Wang, T., Tandon, S., Kiske, J., Song, W., Pazhayampallil, J., … Ng, A. Y. (2015). An empirical evaluation of deep learning on highway driving. ArXiv, 1504.01716. Retrieved from http://arxiv.org/abs/1504.01716
Kepuska, V., & Bohouta, G. (2018). Next-generation of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home). In 2018 IEEE 8th annual computing and communication workshop and conference, CCWC 2018, 2018-January (pp. 99–103). https://doi.org/10.1109/CCWC.2018.8301638
Kerr, K. F., Bansal, A., & Pepe, M. S. (2012). Further insight into the incremental value of new markers: The interpretation of performance measures and the importance of clinical context. American Journal of Epidemiology, 176, 482–487. https://doi.org/10.1093/aje/kws210
Klein, R. J. (2005). Complement factor H polymorphism in age-related macular degeneration. Science (New York, N.Y.), 308(5720), 385–389. https://doi.org/10.1126/science.1109557
Kleinstreuer, N. C., Karmaus, A. L., Mansouri, K., Allen, D. G., Fitzpatrick, J. M., & Patlewicz, G. (2018). Predictive models for acute oral systemic toxicity: A workshop to bridge the gap from research to regulation. Computational Toxicology, 8(4), 21–24. https://doi.org/10.1016/j.comtox.2018.08.002
Koh, P. W., & Liang, P. (2017). Understanding black-box predictions via influence functions. In ICML’17 proceedings of the 34th international conference on machine learning (pp. 1885–1894). Retrieved from https://dl.acm.org/citation.cfm?id=3305576
Lavecchia, A. (2015). Machine-learning approaches in drug discovery: Methods and applications. Drug Discovery Today, 20(3), 318–331. https://doi.org/10.1016/j.drudis.2014.10.012
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
Lima, M. N. N., Melo-Filho, C. C., Cassiano, G. C., Neves, B. J., Alves, V. M., Braga, R. C., … Andrade, C. H. (2018). QSAR-driven design and discovery of novel compounds with antiplasmodial and transmission blocking activities. Frontiers in Pharmacology, 9, 146. https://doi.org/10.3389/fphar.2018.00146
Lipton, Z. C. (2016). The mythos of model interpretability. ArXiv, 1606.03490. Retrieved from http://arxiv.org/abs/1606.03490
Liu, J., Mansouri, K., Judson, R. S., Martin, M. T., Hong, H., Chen, M., … Shah, I. (2015). Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure. Chemical Research in Toxicology, 28, 738–751. https://doi.org/10.1021/tx500501h
Low, Y., Uehara, T., Minowa, Y., Yamada, H., Ohno, Y., Urushidani, T., … Tropsha, A. (2011). Predicting drug-induced hepatotoxicity using QSAR and toxicogenomics approaches. Chemical Research in Toxicology, 24(8), 1251–1262. https://doi.org/10.1021/tx200148a
Low, Y. S., Alves, V. M., Fourches, D., Sedykh, A., Andrade, C. H., Muratov, E. N., … Tropsha, A. (2018). Chemistry-Wide Association Studies (CWAS): A novel framework for identifying and interpreting structure-activity relationships. Journal of Chemical Information and Modeling, 58(11), 2203–2213. https://doi.org/10.1021/acs.jcim.8b00450
Luo, C., Wu, D., & Wu, D. (2017). A deep learning approach for credit scoring using credit default swaps. Engineering Applications of Artificial Intelligence, 65, 465–470. https://doi.org/10.1016/j.engappai.2016.12.002
McCarthy, J., Minsky, M., Rochester, N., & Shannon, C. (1955). A proposal for the Dartmouth summer research project on artificial intelligence. Retrieved December 4, 2019, from http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html
Melo Calixto, N., Braz dos Santos, D., Clecildo Barreto Bezerra, J., & de Almeida SilvaID, L. (2018). In silico repositioning of approved drugs against Schistosoma mansoni energy metabolism targets. PLoS One. https://doi.org/10.1371/journal.pone.0203340
Melo-Filho, C. C., Dantas, R. F., Braga, R. C., Neves, B. J., Senger, M. R., Valente, W. C. G., … Andrade, C. H. (2016). QSAR-driven discovery of novel chemical scaffolds active against Schistosoma mansoni. Journal of Chemical Information and Modeling, 56(7), 1357–1372. https://doi.org/10.1021/acs.jcim.6b00055
Miotto, R., Wang, F., Wang, S., Jiang, X., & Dudley, J. T. (2017). Deep learning for healthcare: Review, opportunities and challenges. Briefings in Bioinformatics, 19(6), 1236–1246. https://doi.org/10.1093/bib/bbx044
Mitchell, T. M. (1997). Machine learning. New York, NY: McGraw-Hill.
Neves, B. J., Braga, R. C., Alves, V. M., Lima, M. N. N., Cassiano, G. C., Muratov, E. N., Costa, F.T.M., Andrade, C. H. (2019). Deep Learning-driven research for drug discovery: Tackling Malaria. PLOS Computational Biology, 16(2):e1007025, https://doi.org/10.1371/journal.pcbi.1007025
Neves, B. J., Dantas, R. F., Senger, M. R., Melo-Filho, C. C., Valente, W. C. G., de Almeida, A. C. M., … Andrade, C. H. (2016). Discovery of new anti-schistosomal hits by integration of QSAR-based virtual screening and high content screening. Journal of Medicinal Chemistry, 59(15), 7075–7088. https://doi.org/10.1021/acs.jmedchem.5b02038
Nosengo, N. (2016). Can you teach old drugs new tricks? Nature, 534(7607), 314–316. https://doi.org/10.1038/534314a
Pantaleao, S. Q., Fujii, D. G. V., Maltarollo, V. G., da C. Silva, D., Trossini, G. H. G., Weber, K. C., … Honorio, K. M. (2017). The role of QSAR and virtual screening studies in type 2 diabetes drug discovery. Medicinal Chemistry, 13(8), 706–720. https://doi.org/10.2174/1573406413666170522152102
Perols, J. (2011). Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Auditing: A Journal of Practice & Theory, 30(2), 19–50. https://doi.org/10.2308/ajpt-50009
Ping, P., Watson, K., Han, J., & Bui, A. (2017). Individualized knowledge graph: A viable informatics path to precision medicine. Circulation Research, 120(7), 1078–1080. https://doi.org/10.1161/CIRCRESAHA.116.310024
Polishchuk, P., Kuz’min, V., Artemenko, A., & Muratov, E. (2013). Universal approach for structural interpretation of QSAR/QSPR models. Molecular Informatics, 32, 843–853.
Renard, P., Alcolea, A., & Ginsbourger, D. (2013). Stochastic versus deterministic approaches. In J. Wainwright & M. Mulligan (Eds.), Environmental modelling: Finding simplicity in complexity (2nd ed.). Chichester, UK/Hoboken, NJ: Wiley.
Ruths, D., & Pfeffer, J. (2014). Social media for large studies of behavior. Science, 346(6213), 1063–1064. https://doi.org/10.1126/science.346.6213.1063
Speck-Planche, A. (2019). Multicellular target QSAR model for simultaneous prediction and design of anti-pancreatic cancer agents. ACS Omega, 4(2), 3122–3132. https://doi.org/10.1021/acsomega.8b03693
Sushko, I., Novotarskyi, S., Körner, R., Pandey, A. K., Cherkasov, A., Li, J., … Tetko, I. V. (2010). Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set. Journal of Chemical Information and Modeling, 50(12), 2094–2111. https://doi.org/10.1021/ci100253r
Tildesley, D., & Care, P. (2014). Press release: Next RSC president predicts that in 15 years no chemist will do bench experiments without computer-modelling them first. Retrieved from http://www.rsc.org/AboutUs/News/PressReleases/2013/Dominic-Tildesley-Royal-Society-of-Chemistry-President-Elect.asp
Todeschini, R., & Consonni, V. (2009). Molecular descriptors for chemoinformatics (R. Mannhold, H. Kubinyi, & G. Folkers, Eds.). https://doi.org/10.1002/9783527628766
Tropsha, A. (2010). Best practices for QSAR model development, validation, and exploitation. Molecular Informatics, 29(6–7), 476–488. https://doi.org/10.1002/minf.201000061
Vamathevan, J., Clark, D., Czodrowski, P., Dunham, I., Ferran, E., Lee, G., … Zhao, S. (2019). Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery, 18(6), 463–477. https://doi.org/10.1038/s41573-019-0024-5
Wang, Y., Xiao, J., Suzek, T. O., Zhang, J., Wang, J., Zhou, Z., … Bryant, S. H. (2012). PubChem’s BioAssay database. Nucleic Acids Research, 40(Database issue), D400–D412. https://doi.org/10.1093/nar/gkr1132
Xu, C., Cheng, F., Chen, L., Du, Z., Li, W., Liu, G., … Tang, Y. (2012). In silico prediction of chemical Ames mutagenicity. Journal of Chemical Information and Modeling, 52(11), 2840–2847. https://doi.org/10.1021/ci300400a
Zhang, L., Fourches, D., Sedykh, A., Zhu, H., Golbraikh, A., Ekins, S., … Tropsha, A. (2013). Discovery of novel antimalarial compounds enabled by QSAR-based virtual screening. Journal of Chemical Information and Modeling, 53(2), 475–492. https://doi.org/10.1021/ci300421n
Zhang, S., Wei, L., Bastow, K., Zheng, W., Brossi, A., Lee, K. H., & Tropsha, A. (2007). Antitumor agents 252. Application of validated QSAR models to database mining: Discovery of novel tylophorine derivatives as potential anticancer agents. Journal of Computer-Aided Molecular Design, 21(1–3), 97–112. https://doi.org/10.1007/s10822-007-9102-6
Zhao, K., & So, H.-C. (2019). Using drug expression profiles and machine learning approach for drug repurposing. Methods in Molecular Biology (Clifton, N.J.), 1903, 219–237. https://doi.org/10.1007/978-1-4939-8955-3_13
Zhu, X., & Kruhlak, N. L. (2014). Construction and analysis of a human hepatotoxicity database suitable for QSAR modeling using post-market safety data. Toxicology, 321(1), 62–72. https://doi.org/10.1016/j.tox.2014.03.009
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 American Association of Pharmaceutical Scientists
About this chapter
Cite this chapter
Hickey, A.J., Smyth, H.D.C. (2020). Computational Modeling of Nonlinear Phenomena Using Machine Learning. In: Pharmaco-complexity. AAPS Introductions in the Pharmaceutical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-42783-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-42783-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-42782-5
Online ISBN: 978-3-030-42783-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)