Opportunities for Artificial Intelligence in Advancing Precision Medicine
Purpose of Review
We critically evaluate the future potential of machine learning (ML), deep learning (DL), and artificial intelligence (AI) in precision medicine. The goal of this work is to show progress in ML in digital health, to exemplify future needs and trends, and to identify any essential prerequisites of AI and ML for precision health.
High-throughput technologies are delivering growing volumes of biomedical data, such as large-scale genome-wide sequencing assays; libraries of medical images; or drug perturbation screens of healthy, developing, and diseased tissue. Multi-omics data in biomedicine is deep and complex, offering an opportunity for data-driven insights and automated disease classification. Learning from these data will open our understanding and definition of healthy baselines and disease signatures. State-of-the-art applications of deep neural networks include digital image recognition, single-cell clustering, and virtual drug screens, demonstrating breadths and power of ML in biomedicine.
Significantly, AI and systems biology have embraced big data challenges and may enable novel biotechnology-derived therapies to facilitate the implementation of precision medicine approaches.
KeywordsMachine learning Deep learning Digital pathology Digital health Multi-omics Single-cell transcriptomics Spatial transcriptomics Systems biology Precision medicine AI ML DL DNN
In the past decade, advances in genetic disease and precision oncology have resulted in an increased demand for predictive assays that enable the selection and stratification of patients for treatment . The enormous divergence of signaling and transcriptional networks mediating the cross talk between healthy, diseased, stromal, and immune cells complicates the development of functionally relevant biomarkers based on a single gene or protein.
Unexpectedly, the conclusion of the human genome did not translate into a burst of new drugs. The pharmaceutical industry rather announced a declining output in terms of the number of new drugs approved despite increasing commercial efforts of drug research and development [2, 3]. In contrast, machine learning (ML) as well as network and systems biology are innovating with impactful discoveries and are now starting to be seamlessly integrated into the biomedical discovery pipeline .
A major ambition of medical artificial intelligence (AI) lies in translating patient data to successful therapies. Machine learning models face particular challenges in biomedicine such as the size of the library to train the model, data input conversion problems, transfer, overfitting, ignorance of confounders, and many more [5, 6, 7]. They may require new infrastructures, while making possibly just recently established workflows obsolete. On the other hand, deep neural network (DNN) approaches may offer distinct benefits. Such opportunities for deep learning (DL) in biomedicine include scalability, handling of extreme data heterogeneity, and the ability to transfer learning , or if wanted even the possibility not to depend on data supervision at all .
Enabling Synergies Between Artificial Intelligence and Digital Pathology
Advances in pattern recognition and image processing have enabled synergies between AI technology and modern pathology [10, 11•]. In particular, DL architectures such as deep convolutional neural networks have achieved unprecedented performance in image classification and gaming tasks [13, 14, 15, 16]. The expression “digital pathology” was coined when referring to advanced slide-scanning techniques in combination with AI-based approaches for the detection, segmentation, scoring, and diagnosis of digitized whole-slide images .
In pathology, quantifying and standardizing clinical outcome remains a challenge. Accurate grading, staging, classifying, and quantifying response to treatment by computer-assisted technologies are important recent initiatives [12, 18]. Neural network algorithms perform well in a setting where either large amounts of input data or high-quality training sets are provided. Using a digital archive of more than 100,000 clinical images of skin disease such prerequisites were fulfilled and a deep convolutional neural network was successfully trained to classify skin lesions comparable with current quality standards in pathology . Given such an intuitive image-based analysis, a mechanistic understanding of the convoluted layers is not necessary and the approach could be transferred to patient-based mobile phone platforms to enhance early detection and cancer prevention [20, 21, 22]. In the future, specific DNN modules will replace selected steps of the traditional pathology workflow. By looking at different computational image-recognition tasks, already today, particularly strong performance of DL is already observed in segmentation tasks nuclei, epithelia or tubules, immune infiltration by lymphocyte classification, cell cycle characterization and mitosis quantification, and grading of tumors. Over time, the transition toward the digital pathology lab will lead to more accurate drug response prediction and prognosis of this underlying disease .
Digital Healthcare and Clinical Health Records
ML can learn from almost any data type, even unstructured medical text, such as patient records, medical notes, prescriptions, audio interview transcripts, or pathology and radiology reports. Future day-to-day applications will embrace ML methods to organize a growing volume of scientific literature, facilitating access and extraction of meaningful knowledge content from it . In the clinic, ML can harness the potential of electronic health records to accurately predict medical events . By implementing a ranking function in the content network, one can overcome heterogeneity of clinical or healthcare provider–specific electronic health records, inherent to the current medical practice around the world .
A defined goal of precision medicine is to predict the best treatment strategy for the patient. Drug responses in combination with genomic, epigenomic, transcriptomic, proteomic, metabolomic profiling data provide accurate network prediction to the perturbation. Using multi-omics data, including somatic copy number alterations, somatic exome mutations, methylomes, and transcriptomes of 1000 cell lines, ML can be utilized in a modeling exercise to predict genomic features for process and drug response prediction . Top-performing methods exploit ML, integrate multiple profiling data sets, and enhance scoring by regression models to predict drug sensitivities [28, 29, 30]. Given convolution and non-linear relationship between transcriptomic, epigenomic, and metabolic functions, future ML applications can be challenged to resolve intricate multi-omics patterns . Precision oncology has been showcased by implementing patient-derived cancer cell lines . Such bench-to-bedside models can provide real-time drug response predictions and often create massive knowledge banks accessible to ML workup. In the future, the ability to screen patient-derived avatars will inform about resistance mechanisms and facilitate evidence-based medicine, even of complex traits .
Machine Detection of Resistance Signatures
Somatic alterations in cancer frequently escape the recognition by the endogenous immune system, creating resistance . Even though excellent efficacy and some complete remissions have been seen in a limited number of melanoma patients, some of whom may be regarded as cured of cancer, many malignancies show resistance or lack of response of long duration with these agents. Predicting tumor responses to immune checkpoint blockade remains a major challenge and an active field of research fueled by systems biology and AI approaches .
Deciphering Epigenomic Networks
Epigenomics of oncogenic networks has an ability to accurately predict regulome function, epigenomic-transcriptomic cooperation, and disease progression . Then again, epigenetic modifications on chromatin, DNA, and RNA are complex and often context-specific, making their mechanistic understanding challenging. Elastic net is a shrinkage method hybrid of ridge and lasso regularization (preventing over-fitting) able to handle ultra-high dimensional regression and suitable for epigenomic data . Using such methods, metabolic and epigenomic data have been used to establish biomarkers and to predict clocks in aging [37, 38]. Enhanced by ML methods, epigenetic marks including promoter methylation are utilized as a continuous readout of transcriptional accessibility and molecular processes that guide development, tissue maintenance, disease states, and eventually aging. Given progress in multiplex barcoding, new data challenges in the field of epigenomics are quickly at hand. Frontiers include processing and machine integration of sequencing and chromatin accessibility information derived from the transcriptome and epigenome of the same cell [39•].
Visualizing and Exploring Cellular Heterogeneity at Single-Cell Resolution
In single-cell biology, ML and DL are frequently utilized to investigate the diversity and complexity of cell populations. In cancer, single-cell methods provide a view of heterogeneity that recognizes the impact of diverse cell states and types surrounding the tumor microenvironment. Further, cancer is a dynamic and highly heterogeneous disease composed of a mix of clones characterized by distinct genotypes pushing bulk sequencing methods to their limits. Profiling of copy numbers, transcripts, or chromatin accessibility together with cluster analysis can uncover differences, even in seemingly homogenous tissues and resolve subclonal complexity. Dimensionality reduction and clustering are typical ML techniques employed to visualize single-cell transcriptomics (scRNA-Seq) data. In particular, the clustering algorithm Louvain community detection is robust for high-dimensional data like scRNA-Seq matrices. The human cell atlas , whose primary goal is to establish, discover, and catalogue different cell populations ab initio, creates unsupervised maps, serving as a resource for subsequent disease-directed studies. In addition, it is possible to predict cycle, disease progression, and perturbation responses using deep network approaches [41•, 42•, 43, 44, 45].
Spatial transcriptomics (spRNA-Seq) combines the benefits of traditional histopathology with single cell gene expression profiling. The ability to connect the spatial organization of molecules in cells and tissues with their gene expression state enables mapping of specific disease pathology [46, 47]. ML has the ability to decode molecular proximities from sequencing information and construct images of gene transcripts at sub-cellular resolution .
Artificial Intelligence in Chemical Informatics and Drug Discovery
Chemical informatics has an ability to predict novel drug targets, quantify ADME and toxicology, match drugs with targets and biological activities, model physicochemical properties, accelerate data mining, predict biological targets for compounds on a large scale, design new chemicals and syntheses , and analyze large virtual chemical spaces . Such a new paradigm enables medicinal chemists to process billions of molecules in virtual screens [51, 52]. By tightly integrating database knowledge, AI, and lab automation, it is possible to accelerate the drug discovery pipeline and select structures that can be prepared on automated systems and made available for biological testing, allowing for timely hypothesis testing and validation.
Computational analyses of drug-perturbation assays have the ability to predict the activities of the compounds on seemingly unrelated biological processes . ML can provide insight into drug mechanism, create correlative bridges between disjoint nodes, establish biomarkers, repurpose existing drugs, optimize drug candidates, design clinical trials, and even recruit for clinical trials. Image-based drug fingerprints were demonstrated to enable biological activity prediction for drug discovery, even when a chemical library in combination with high-content image screening was repurposed. Potential applications of predictions delivered by implemented computational models were far beyond the intended target of the original compound screen [54•].
Biomedical science of genomic signatures, image processing, and drug discovery rapidly adopted big data opportunities and new learning-based technologies. From traditional approaches relying on leads from nature to brute-force screening using robotics, following the introduction of several other disruptive technologies, artificial intelligence is yet another pivotal moment toward a rationalized, data-driven process in healthcare and pharmaceutical industry. Machine intelligence and deep networks are changing our approach to medical bioinformatics at an unprecedented speed. As a result, the decision-making processes in precision medicine will shift from an algorithm-centric to a data-centric insight.
F.V.F. is grateful for the support by grants CA154887 from the National Institutes of Health, National Cancer Institute, GM115293, NIH Bridges to the Doctorate, NSF GRFP Graduate Research Fellowship Program, CRN-17-427258, by the University of California, Office of the President, Cancer Research Coordinating Committee, and the Science Alliance on Precision Medicine and Cancer Prevention by the German Federal Foreign Office, implemented by the Goethe-Institute, Washington, DC, USA, and supported by the Federation of German Industries (BDI), Berlin, Germany. This work is inspired by the curiosity and creativity of Franziska Violet and Leland Volker.
Compliance with Ethical Standards
Dr. Filipp declares no conflict of interest.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Papers of particular interest, published recently, have been highlighted as: • Of importance
- 2.Smietana K, Quigley D, Van de Vyver B, Møller M. The fragmentation of biopharmaceutical innovation. Nature Reviews Drug Discovery. 2019.Google Scholar
- 11.• Zhang Z, Chen P, McGough M, Xing F, Wang C, Bui M, et al. Pathologist-level interpretable whole-slide cancer diagnosis with deep learning. Nat Mach Intell. 2019;1(5):236–45. Use of neural networks to analyze microscope slides and presentation of detailed diagnostic results that can easily be reviewed by a pathologist. Google Scholar
- 17.Bera K, Schalper KA, Rimm DL, Velcheti V, Madabhushi A. Artificial intelligence in digital pathology-new tools for diagnosis and precision oncology. Nat Rev Clin Oncol. 2019.Google Scholar
- 24.Brown P, RELISH Consortium, Zhou, Y. Large expert-curated database for benchmarking document similarity detection in biomedical literature search. Database. 2019.Google Scholar
- 39.• Chen S, Lake BB, Zhang K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat Biotechnol. 2019. Single-cell multiplex barcoding technology recording single-nucleus chromatin accessibility and mRNA expression sequencing, SNARE-seq, linking transcriptome and epigenomic chromatin accessibility of the same cell.Google Scholar
- 41.• Tian T, Wan J, Song Q, Wei Z. Clustering single-cell RNA-seq data with a model-based deep learning approach. Nat Mach Intell. 2019;1(4):191–8. Deep clustering method for embedding and multidimensionality reduction, which simultaneously learns feature representation and clusters via explicit modelling of scRNA-seq data generation. Google Scholar
- 54.• Simm J, Klambauer G, Arany A, Steijaert M, Wegner JK, Gustin E, et al. Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery. Cell Chem Biol. 2018;25(5):611–8 e3. Scalable method predicting compound activity from high-content cellular image library. PubMedPubMedCentralGoogle Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.