Keywords

1 Introduction

World Health Organization (WHO) Risk Group (RG-) 4 pathogens are a relatively small group of high-consequence viral pathogens that can cause serious or life-threatening disease in humans or other animals and for which effective medical countermeasures (MCMs) are usually not available [1]. Handling replicative forms of these pathogens typically requires maximum (biosafety level 4 [BSL-4]) containment facilities (Table 1), of which there are only around three dozen globally [2]. Notorious examples of RG-4 pathogens include viruses that are associated with acute disease outbreaks, such as Ebola virus (EBOV) , which recently caused a human disease outbreak in Western Africa encompassing more than 28,000 cases and more than 11,000 deaths [3] or Lassa virus, which in 2018 infected ≈1,500 people in Nigeria [4, 5], and viruses that cause temporally isolated small case clusters, such as Kyasanur Forest disease virus (9,594 human infections from 1957 to 2017) [6]. Several of these viruses are considered potential source material for the development of biological weapons [7, 8] and are therefore considered research priorities within national public health and biodefense programs [9, 10]. Accelerated and increasingly focused efforts to develop MCMs for the prevention and/or treatment of RG-4 pathogens are undertaken to alleviate potential community suffering and the associated socioeconomic impact.

Table 1 Examples of Risk Group 4 pathogens requiring maximum (biosafety level 4) containment in the US [32]

Research on RG-4 pathogens generally involves cell culture methods (in vitro) with live viruses at BSL-4 or surrogate systems (e.g., minigenomes, virion-like particles, recombinant expression of individual viral proteins, virion pseudotyping) at BSL-2/3 and/or animal models (in vivo) to investigate viral pathogenesis and host responses to infection. Cell culture has been used as a simple tool to research specific aspects of viral infection, such as screening candidate therapeutics for antiviral activity [11,12,13,14,15,16,17,18,19,20,21] or quantifying host immune responses, such as virus-neutralizing antibody titers [22, 23]. However, as current common cell culture methods cannot model the complexities of a body system, animal BSL-4 (ABSL-4) models, such as rodents or nonhuman primates, are generally used to model disease [24,25,26,27,28].

Next to increased security measures to prevent unauthorized entry or agent misuse or theft [29,30,31], (A)BSL-4 laboratories have multiple layers of redundant safety precautions, including positive-pressure suits, Class III biological safety cabinets, and validated methods to inactivate pathogens to protect the laboratory worker from accidental, and potentially lethal, infections [32,33,34,35]. The enhanced regulatory and biosafety environment can encumber research physically and limit the talent pool of researchers that are permitted to work at the (A)BSL-4 facilities. Therefore, technological advancements in RG-4 pathogens research become especially important to maximize data output and lessen the time required for research.

Additionally, research performed at field laboratories in outbreak and virus-endemic zones with permission from internal review boards in the affected countries has provided important insights into disease course associated with human RG-4 pathogen infections. Monitoring of patients has refined pathogenic key events, including serum chemical and hematological value aberrations during disease, thereby providing guidance to clinicians and researchers of disease progression and disease model development, respectively [36,37,38,39,40,41]. Other research has focused on MCM testing for some RG-4 pathogens and occasionally shown considerable promise in clinical trials or ring vaccinations [42, 43]. However, as RG-4 pathogen disease outbreaks often occur in underdeveloped countries and/or geographically remote areas, research can be hampered by limited access to resources, transportation, or a skilled and local technician pool. For these reasons, advancements in research tools and the simplification of test methodology could help to alleviate the challenges associated with performing research.

2 Medical Imaging

2.1 Infectious Disease Imaging and Artificial Intelligence

Advanced imaging modalities, such as computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), single photon emission computed tomography (SPECT), and ultrasound (US), are being developed at one active BSL-4 facility to study immune and other host system responses to infection with RG-4 pathogens [44,45,46,47]. Imaging has the advantage of being non-invasive and can detect signs of infectious disease at earlier timepoints than through clinical signs alone. Findings from qualitative radiology reports can be quantified according to standardized methods, including longitudinal image registration and organ/lesion segmentation, which measure morphological and physiological changes due to disease. However, these quantitative methods often require time consuming manual tracing of regions of interest, and tracings are subjective. Therefore, development of automated methods is needed to decrease time requirements, reduce variability, and increase accuracy, and such methods are now on the horizon.

For instance, artificial intelligence (AI) has considerable promise in the advancement of the medical imaging field. AI algorithms that learn from data can be unsupervised or supervised (e.g., “deep learning”), with the latter algorithm trained prior to use on large pools of data. However, imaging data alone are not sufficient to train supervised deep learning algorithms of neural networks as imaging data must be labeled (e.g., pneumonia vs. normal x-ray, lesions vs. normal tissue) for proper algorithm identification. This labeling process is often quite time-consuming and prone to human error.

To avoid these errors, another field of AI, natural language processing, utilizes text-based reports generated by radiologists to associate findings with images for training of deep learning algorithms [48]. Alternatively, if the quantity of data are insufficient to train AI algorithms, data augmentation methods, such as rotation, horizontal flips, or random crops can be performed [49]. In addition, a neural network architecture called generative adversarial networks (GANs) can be used to generate synthetic images [50]. Because acquisition of adequate training data for all pathologic imaging phenotypes can be challenging, a one-class classification neural network has been developed that trains only on normal images and can detect abnormal images [51].

A type of deep learning neural network called a convolutional neural network integrates image feature extraction within artificial neural networks containing many hidden layers to both classify and segment images. These deep learning neural networks can be designed in various ways and, currently, the VGG16, GoogleNet, Inception4, Inception_Resnet are popular architectures for image classification, whereas 2D/3D U-net and V-net are popular architectures for image segmentation [52, 53].

2.2 Infectious Disease Imaging and Artificial Intelligence in a BSL-4 Environment

In one active BSL-4 facility, medical imaging can be performed on RG-4 pathogen-infected animals within containment [44,45,46,47]. Whereas AI is not yet used to analyze all available imaging modalities, an unsupervised AI algorithm called “fuzzy c-means clustering” is already utilized in MRI to segment brain tissues into grey matter, white matter, and cerebrospinal fluid. In combination with digital brain atlas registration, the brain is robustly segmented into multiple sub-regions that are co-located over longitudinal scans (Fig. 1).

Fig. 1
figure 1

Brain Segmentation. (a) Synthetic T1-weighted axial MRI scan of a nonhuman primate brain. (b) “Fuzzy c-means clustering” analysis with voxels classified as cerebrospinal fluid (black), grey matter (grey), and white matter (white). (c) Contours representing grey matter (red) and white matter (blue) overlaid on the original MRI scan. (d) Contours of grey and white matter and digital atlas-based contour of caudate (green) overlaid on the quantitative T1 map

MR images are composed of a three-dimensional grid of voxels that are assigned signal values throughout the image. Since voxel values in MRI scans have arbitrary units (unlike CT scans, which are characterized by pixel values that are directly proportional to the density of the imaged tissue), parametric maps, which assign a quantity to each voxel, are created from multiple MRI sequences. Parametric maps to provide images, for example, in the form of T1 and T2 relaxometry maps [54]. These physical quantities, which represent the relaxation of proton spins in tissue, change with tissue composition. As an example, if brain edema results from a viral infection, the T1 values will increase due to the addition of fluid in the tissue. The regions delineated with “fuzzy c-means clustering” and “atlas-based registration” can be directly applied to these parametric maps, and changes in disease status, such as accumulation of fluid or blood in the brain, can be predicted from abnormal findings. MRI of the brain of experimental animals will likely be exceptionally useful to refine the sequence of pathogenetic events in diseases caused by encephalitic RG-4 pathogens, such as Hendra virus, Nipah virus, or tick-borne encephalitis virus. For example, brain MRI scans of patients infected with Nipah virus have shown acute encephalitis with multiple lesions visible using T2-weighted and fluid attenuated inversion recovery (FLAIR) images [55, 56]. These same MR imaging sequences performed in human studies may be performed and refined in animal models using clinical MR scanners within the BSL-4 environment.

Many RG-4 pathogens diffusely affect multiple organs within the human body, including lung abnormalities caused by Nipah virus infection [57], liver damage, and disseminated intravascular coagulopathy during EBOV infection [58]. Therefore, segmenting liver, kidney, spleen, lungs, and lymph nodes in collected images is needed to detect structural or physiologic changes. Deep learning convolutional neural networks can automatically segment multiple abdominal organs in CT and MRI modalities using the processes described above. Given this robust segmentation, radiometric features, such as texture and histogram analysis of voxels within the region of interest, can be used to classify stages of the disease process and correlate these features with clinical parameters, such as liver enzyme concentrations or virus titers. For instance, a method to quantify lung abnormalities in various disease models is now available [59]. Initially, the lungs are automatically segmented from a chest CT scan. Segmenting the lung field without any pathologic condition is done with standard image analysis methods, such as region growing, which starts with the initialization of a seed point within the region of interest and includes areas in the vicinity of the seed point based on whether the signal intensity is within a given threshold. This iterative process ends when values of neighboring voxels are not within the threshold. However, the region growing algorithm will fail in the initial lung segmentation when hyper-dense voxels in the lung are outside the threshold, requiring manual correction. Recently, an AI method was established to fully automate the lung field segmentation process when the lungs contain hyper-dense pathological areas [60]. This deep learning algorithm was trained with thousands of normal and abnormal human CT lung images to segment images of lung fields. Post-processing with morphological operators such as erosions (removal of areas) and dilations (inclusion of areas) is then needed to finalize the segmentation process. By modifying the post-processing parameters, accurate lung field segmentation was achieved in a nonhuman primate (Fig. 2), whereas the lung field was originally overestimated.

Fig. 2
figure 2

Lung field segmentation. (a) Axial slice of nonhuman primate chest CT scan with no apparent lung abnormalities. (b) Contour of the lung field segmented by a “deep learning” algorithm trained on human CT scans (lung field is over-estimated). (c) Contour of a nonhuman primate lung field more accurately segmented after modification with post-processing methods. (d) Accurately segmented nonhuman primate lung field from a CT scan with apparent lung abnormalities

Current methods of analyses also involve correlations between imaging biomarkers and clinical measures, such as cytokine profiles, viral DNA/RNA concentrations, and blood composition testing. In the future, AI methods could be applied to integrate the collected imaging and clinical data to generate predictive models of disease outcome. Machine-learned features in images may be used to predict abnormal status prior to clinical manifestations of these abnormalities even though they may not be visible to the human eye. Integrating imaging and non-imaging measures may predict survival and efficacy of novel vaccines or therapies.

2.3 Molecular Imaging Probe Development

Molecular imaging is used to gain an understanding of cellular and molecular status compared with anatomic imaging, such as standard CT and MRI, which provide structural information on a larger scale. Molecular probes, such as fluorodeoxyglucose (18F-FDG; a marker of cellular glycolytic activity), can indicate increased cell metabolic activity in organs during infection with RG-4 pathogens [61]. Unfortunately, probes such as 18F-FDG are not entirely specific, and development of agent/disease-specific probes is urgently needed. Examples of more specific probes that have been investigated to study host responses in infectious disease imaging include PET fluorine radioisotopes such as fluoro-thymidine (18F-FLT) [62], fluorine-18 radio-labeled serum albumin (18F-albumin) [63], and 18F-N,N-diethyl-2-[4-(2-fluoroethoxy)phenyl]-5,7-dimethylpyrazolo[1,5-a]pyrimidine-3-acetamide (18F-DPA)-714 [64]. 18F-FLT can be used to investigate cellular proliferation during cancers such as lymphoma or during infectious diseases. 18F-albumin can be used to detect vessel leakage. 18F-DPA-714 is a selective ligand for the translocator protein (TSPO) to investigate over-expression of activated macrophages and serves as a biomarker for neuroinflammation. This marker could prove useful in the study of disease caused by RG-4 pathogens such as Ebola virus as macrophage activation and increased vessel leakage are key pathogenic events of Ebola virus disease. In contrast to host-specific probes such as those mentioned above, probes that can attach to virions or reporter-encoding open reading frames that can be inserted into a RG-4 pathogen genome could directly localize a virus to specific areas of the body. Development of such reporter viruses has just begun. For instance, a gene encoding the solute carrier family 5 member 5 (SLC5A5, aka sodium/iodide symporter) was inserted into the Middle Eastern respiratory syndrome coronavirus genome (a RG-3 pathogen) and resulted in viable virus [65]. In the future, SLC5A5 and other imaging reporters could be incorporated into a variety of viral vectors to obtain in vivo visualization of location and aid in the evaluation of vaccine and therapeutic development. However, a major hurdle to overcome is virus attenuation after reporter gene insertion.

3 Pathology: Tissue and Pathogen Imaging

System-wide responses required to overcome exposure to RG-4 pathogens involve complex interactions between resident tissue cells and infiltrating immune cells, yet the identification of specific cells types in tissue sections is hindered by the limitations of traditional immunofluorescence. Spectral overlap of fluorophores typically restricts immunofluorescence studies to a maximum of around four antibody channels, thereby precluding simultaneous identification of multiple highly specialized cell types and invading pathogens in a single tissue section. Though the development of multiple multiplexed imaging modalities [66,67,68,69] has been vital in overcoming these limitations, we will focus on only a few new advancements in pathological imaging.

3.1 Fluorescence-Based Multiplexed Tissue Imaging Tools

A new technique called CO-detection by indexing (CODEX) bypasses the limits of immunofluorescent antibody channels by using antibodies labeled with indexed DNA tags. With this technology, a cocktail of upwards of 50 DNA-indexed antibodies can stain a tissue section prior to iterative fluorescent visualization cycles to assemble a single 50+ parameter image [70]. CODEX is a highly effective multiplexing technique because a single antibody binding step eliminates much of the signal degradation that would otherwise be associated with stripping and re-staining of antibodies. The commercially available CODEX instrument automatically exchanges buffers needed to accomplish iterative imaging cycles. This instrument has a relatively small footprint and may be practical for use inside BSL-4 containment or after optimization of reagents to use with inactivated samples in RG-4 pathogen studies.

3.2 Metal Tag-Based Multiplexed Tissue Imaging Tools

Another technique called multiplexed ion beam imaging (MIBI) utilizes secondary ion mass spectrometry to generate high-dimensional images through mass spectrometry analysis of lanthanide-labeled antibodies on a pixel-by-pixel level [71]. This commercially available technology has thus far been leveraged for deep spatial understanding of archival breast cancer tissues [72]. A key feature of metal-tagged tissue imaging is the highly stable nature of the isotopes. Labeled samples can be archived theoretically indefinitely, for instance allowing reacquisition of target sample regions after analysis or reimaging with higher resolution instruments years later. In the MIBI workflow, inactivated tissues (e.g., formalin-fixed paraffin-embedded [FFPE]) are processed following conventional immunohistochemistry (IHC) protocols with the exception of the antibody cocktail. Routine tissue staining consists of 40 or more lanthanide-tagged antibodies, compared to the conventional one or two antibodies in IHC. A parallel method, termed Imaging Mass Cytometry (IMC), utilizing laser ablation coupled to a cytometry by time of flight (CyTOF) mass cytometer is also commercially available [73]. The antibodies and reagents for sample preparation are mostly cross-compatible.

3.3 Pathogen Detection in Tissue Sections

Current methods for the detection of pathogens in tissues can be divided into (1) antibody-based detection and (2) nucleic acid (NA)-based detection. Antibody-based methods (IHC) are severely limited by the availability of specific antibodies clones and by the conservation of the targeted epitope. Although NA-based methods, such as in situ hybridization (ISH), are ideal for identification of sequence-specific targets, these methods also have disadvantages, such as necessary signal amplification of targets, challenging experimental protocols, and complex probe design to achieve specificity and sensitivity. These disadvantages have largely been circumvented by the development of next generation ISH methods and probe design software [74,75,76,77,78,79,80,81]. RNAscope, an example of next-generation ISH, has been successfully implemented for the detection of RG-4 viruses (e.g., MARV, EBOV [82, 83]). RNAscope has also been used to follow single-integration events of simian immunodeficiency virus in tissues [76], demonstrating the sensitivity of the technology. Alternative enzymatic-based, virus-specific ISH has also been adapted for the surveillance of hepatitis delta virus [76, 84]. Currently, RNAscope as a method has demonstrated an ability to work on the IMC for the detection of highly abundant copies of RNA in FFPE tissues [81]. However, even with advances in these technologies, IHC- and ISH-based methods continue to suffer from the limits to the number of markers that can be examined simultaneously even if they are combined.

Future work to couple multiplexed imaging techniques (e.g., IMC, MIBI, CODEX) with sensitive ISH technologies will be instrumental for the mechanistic dissection of virus-infected cells in the context of their tissue microenvironment and of viral reservoirs in the broader environment. Such coupling will increase understanding of the dynamics of viral infection and replication in RG-4 pathogen studies.

3.4 Pathological Imaging in Transparent Animal Models

Whereas the current and futuristic technologies described earlier are useful for targeted imaging of tissue sections, sectioning of tissues makes it difficult to unbiasedly investigate pathogen distribution and the effects of infection on an organism as a whole. Sectioning is required, however, as mammalian tissues are naturally opaque, impeding any imaging deeper into the tissue than a few hundred micrometers [85]. To combat this limitation, researchers have begun developing chemical methods to render tissues transparent (tissue clearing), including entire adult mouse bodies following skin removal [86,87,88,89,90], and image these transparent tissues optically. For example, tissue clearing was used in combination with an antibody signal-boosting technique to produce high-resolution neuronal projection maps of adult mouse brains [91]. Additionally, these techniques were used in conjunction to detect cancer metastases in a transparent mouse and assess therapeutic antibody targeting of these cancer cells at the single-cell level [92].

Though yet to be realized for RG-4 pathogens, research using the combination of tissue clearing and optical imaging could be used to investigate RG-4 pathogen infection, host response, and treatment efficacy. Optical reporters, such as fluorescent proteins inserted into RG-4 pathogen genomes [19, 93,94,95] or optically labeled antibody systems that bind to pathogens [96,97,98,99,100] or specific immune cells (e.g., antibodies for flow cytometry), can be used to investigate pathogen or immune cell distribution throughout the host at varying time-points during disease. Additionally, treatment efficacy could be assessed against RG-4 pathogens similarly to the cancer treatment assessment described previously [92].

4 Cell Marker Analysis in Solution

4.1 Multiplexed Analysis of Cells in Solution

Flow cytometry analysis of single cells in solution has been the cornerstone of advances in our understanding of immune responses to infection over the past few decades. However, spectral overlap of the fluorescent marker-tagged antibodies used in flow cytometry limits simultaneous examination of a large number of cellular features. Replacing these fluorescent tags with metal ion antibody tags enabled the development of CyTOF, which overcomes traditional multiplexing limitations in blood or other dissociated cell profiling [101]. CyTOF has been applied to monitor immune responses to disease and vaccination [102]. Expanding the multiplexing capabilities of single-cell measurements offers exponential returns for profiling the complexity of immune cells [103]. Whereas flow cytometry is rarely performed with more than 12 parameters, limiting analysis to a subset of cell types or signaling readouts, CyTOF comfortably identifies 40 or more parameters on single cells. A single antibody panel can identify all major immune cell subsets in a blood sample, in addition to quantifying activation status, cytokine production, or signaling states of each of those cell types. In the case of RG-4 pathogen studies, sample volumes are frequently in limited supply as they are either sourced from rare human disease outbreaks or from the relatively few experimentally infected animals in the limited number of BSL-4 facilities. A fringe benefit of highly multiplexed technologies, including CyTOF, is that these small samples can now produce a greater number of measurements for evaluation of more hypotheses simultaneously. For instance, the gap between existing experimental and clinical data for RG-4 pathogens might be narrowed by carefully planned CyTOF studies in which rare patient samples are examined simultaneously for multiple cellular features observed in experimental models.

The dividends of CyTOF in shedding light on infectious disease are underscored by the findings of a number of recent studies [104,105,106,107], but use of CyTOF has yet to be realized for RG-4 pathogen research. One of the challenges for use of CyTOF in RG-4 studies is the currently unsuitable design of components of the CyTOF instrument for operation inside maximum containment environments (compressed gas and high-volume exhaust requirements, glassware, and superheated components). To address this challenge, workflows are currently under development that will enable CyTOF analysis of virus-inactivated samples derived from RG-4 pathogen studies. A set of CyTOF reagents for direct comparison of immune system responses in humans, laboratory mice, and non-human primate animal models frequently used in RG-4 pathogens research is already available [108, 109]. Similar reagents should be considered for development of guinea pig, hamster, ferret, and other models. This expanding toolset will likely contribute towards a framework for future in-depth RG-4 pathogen studies across different animal species, thereby also informing the choice of which animal model to use for divergent scientific questions.

4.2 Computational Tools for Analysis of High-Throughput and High-Dimensional Data

CyTOF, single-cell sequencing, and high-dimensional imaging all yield large datasets that are time-consuming to analyze exhaustively. The same general analysis principle applies to most of these datasets: partition of cells by phenotype and subsequent analysis of their functions, behaviors and/or relationships. A large number of tools have been developed or adapted from other fields to perform these tasks in automated or semi-automated fashions [110]. In RG-4 pathogen research, a key benefit of computational tools for multiparameter single-cell data is the possibility to identify biomarkers consisting of unanticipated combinations of parameters that would be missed by manual approaches (e.g., gating of cytometry data).

In many cases, tools used for CyTOF can be directly applied to segmented single-cell data from multi-parameter imaging. However, the addition of spatial position to multiplexed single-cell data created from imaging results in an enormously higher depth of information. Tools to address how the structure of cellular neighborhoods within tissues impact health and disease are now being developed [70, 72, 111].

5 Virus and Patient Sequencing

Widespread adoption of next-generation sequencing (NGS) has revolutionized virtually every facet of molecular biology and human health, including the study of RG-4 pathogens. Determination of the first human genome sequence took a decade and was performed predominantly by Sanger technology [112, 113]. Since then, NGS instrumentation has blossomed, and NGS data output grows exponentially every year. Generating 20 billion reads in a single sequencing run is now possible. Each currently available NGS platform has advantages and disadvantages [114,115,116], but large sequencing centers now regularly generate a single human genome sequence every few minutes using many of these platforms. In contrast to the Sanger method, NGS can sequence millions of DNA strands at the single-molecule level, which also allows researchers to obtain accurate sequence information from smaller genomes such as RG-4 pathogens. In the future, extensively cataloguing human and pathogen genomic variation will provide better insight into host-pathogen interactions and thereby enable personalized medicine approaches even in exotic disease outbreaks .

Researchers studying human biology have devised upstream workflows that take advantage of the ability of NGS to sequence millions of reads and then mine these data for answers to new scientific questions. Those seeking to study RG-4 pathogens can leverage most of these NGS-based assays. Though replication-competent RG-4 pathogens must be handled in BSL-4 facilities, NGS protocols can be performed at a lower BSL (e.g., BSL-2) if samples were appropriately inactivated and their nucleic acids (NAs) extracted. From a plethora of NGS applications, we describe three broad categories of NGS-based assays: (See Section 5.1) detecting unknown NAs; (See Section 5.2) marking unknown features of NAs (structure, modified bases) and mapping their locations; and (See Section 5.3) quantifying biomolecules in a sample (See Section 5.1).

5.1 Detection of Unknown Pathogens

Metagenomic NGS (mNGS) is a powerful tool for identifying pathogens in clinical or environmental samples [117, 118]. Diseases caused by many RG-4 pathogens are generally challenging to diagnose, as patients with these diseases often present with non-specific (“influenza-like”) clinical signs and even highly replicative viruses typically comprise a minority (<1%) of the NAs in a sample. Rather than testing for every pathogen individually, scientists can use mNGS to sequence millions of molecules from all NAs in a sample, revealing any low frequency NAs (from pathogens or non-pathogenic organisms). Supplemental methods for manipulating NAs can further enhance sensitivity and cost-effectiveness of mNGS, including multiplex polymerase chain reaction (PCR) [119,120,121], hybrid capture [122, 123], and clustered regularly interspaced short palindromic repeats (CRISPR)-based methods [124, 125], in ways that are not currently possible for other biomolecules like proteins or lipids.

Recent uses of mNGS for RG-4 pathogens includes analyzing the sequences from the 2013–2016 Ebola virus disease (EVD) epidemic in Western Africa [126,127,128,129,130,131,132], the two most recent EVD outbreaks in the Democratic Republic of the Congo [133, 134], and recent Lassa fever outbreaks in Nigeria [4, 5]. In each case, multiple research groups collaborated with African partners to collect clinical samples, inactivate them with guanidinium-based reagents, extract viral RNA, reverse transcribe RNA to cDNA, and sequence by mNGS. In these cases, as specific causal pathogens were suspected, it was possible to perform multiplex PCR to enrich (concentrate) EBOV or LASV content and then to sequence on the universal serial bus (USB)-sized Oxford Nanopore minION device in the field [120]. In contrast, other groups have used non-targeted amplification methods [126], which can reveal intriguing co-infection dynamics between pathogens of interest and other pathogens that would normally be removed by the enrichment process (e.g., EBOV and Plasmodium [135] or EBOV and GB virus C [136]).

Decisions on appropriate public health responses can also be greatly informed by the collection and cataloguing of hundreds or thousands of viral genome sequences during an outbreak. Molecular epidemiology (i.e., use of viral sequencing data to identify disease transmission chains) can provide key insights on outbreak information such as animal-to-human or human-to-human transmission [4, 5, 126, 137], instances of suspected “super spreading” [138], and transmission from persistently infected disease survivors [139,140,141]. Sequencing data also inform models of factors that influence outbreak scale and severity [132] and/or viral evolutionary rate [142], and facilitate identification of high frequency mutations that have functional impact on viral infectivity [143, 144].

5.2 Mapping Features of Nucleic Acid Sequences

In addition to identifying unknown sequences, NGS can also map the locations of unknown functional features of NAs, such as epigenomic and epitranscriptomic characteristics including interactions between nucleic acids and proteins or complex secondary and tertiary structure formations [145, 146]. Though these features have been more thoroughly studied in human NAs, viral NAs also fold into complex structures for replication [147] and/or harbor modifications that can dampen immunity [148].

Though physical methods such as crystallography and nuclear magnetic resonance remain powerful tools for elucidating NA form and function, chemical and enzymatic methods aid in identifying features in a high-throughput manner when coupled with NGS [149]. Enzymes that cleave specific NA features or chemicals that modify specific bases can terminate reverse transcription or produce errors at structured regions [150] or biomarkers [151, 152] that are sensitively and accurately quantified by NGS. Though evaluation of NA features of RG-4 pathogens have been less frequent, these studies can be very informative. One study used NGS to map the RNA structure of an EBOV minigenome and identified that the trailer non-coding region bound to heat-shock protein A8 promoted minigenome replication [153]; better understanding of host-virus interactions essential for replication could identify new targets for MCMs. Though live EBOV was not used in this study, another group used a similar NGS approach for RG-2/3 viruses, Sindbis virus and Venezuelan equine encephalitis virus. Although it was expected that the genomes of these two alphaviruses fold into different RNA structures, it is noteworthy that these two structures directly led to differing viral infectivity [154]. In addition to these technologies, “third generation sequencing” technologies (see Section 5.5) have the potential to facilitate sequencing of DNA and RNA base modifications directly [155,156,157], which can distinguish unmodified from modified bases by measuring changes in electrical currents.

5.3 Biomolecule Quantification

Aside from identifying and characterizing virus genomes directly, NGS can also be used to read DNA and DNA barcodes for a range of functional assays ranging from large screens to single-molecule and single-cell sequencing. Human genome scientists have developed genome-wide protein knock-out or knock-down (e.g., CRISPR, RNA interference, small hairpin RNA, small molecule) screens and massively parallel reporter assays [158,159,160] to screen thousands of DNAs simultaneously. For example, researchers generated a vesicular stomatitis Indiana virus expressing LASV glycoprotein [161, 162]. They then created cells with randomly knocked-out genes using a retroviral gene-trap vector and exposed these cells to the recombinant virus to identify host entry factors for LASV. By NGS of retroviral insertion sites of uninfected cells, they found that genes critical for glycosylating ɑ-dystroglycan (the major LASV cell-surface receptor) were also required for LASV entry during infection.

One promising technology that is starting to be used for virology is single-cell RNA sequencing (scRNA-seq) [163,164,165,166,167]. During scRNA-seq, individual cells are isolated, and unique DNA barcodes are applied to each cell’s RNA followed by NGS to associate each RNA with its cognate cell. Home-brewed methods like Drop-seq [168] and commercial options like 10X Genomics are becoming increasingly popular because scientists can profile thousands of cells in mixed populations, such as peripheral blood mononuclear cells (PBMCs) or even dissociated tissues. Researchers can also use scRNA-seq to measure heterogeneity in viral replication. For instance, scientists sequenced 3,000–4,000 single cells infected at low multiplicity of infection with influenza A virus (FLUAV). Most cells contained <1% FLUAV mRNA, but a number of cells had ≈50% FLUAV reads, indicating extreme heterogeneity of infection, partly attributed to variability of the FLUAV replicative machinery [164]. However, as Drop-seq and 10X Genomics are droplet-based and require specialized equipment, platforms of alternative methods that are microchip- [169] or microwell-based [170], such as Seq-Well [171, 172], are advantageous. These platforms are portable, have minimal equipment requirements, and can be easily decontaminated and discarded. A new technique, Slide-seq, allows spatially resolved single-cell RNA sequencing after transferring RNA from tissue sections onto a new surface covered in DNA-barcoded beads. Using Slide-seq, cell types and their activation states can be directly determined using standard histological work-up [173]. Seq-Well and related technologies could therefore be used in BSL-4 laboratories or in the field, thereby facilitating functional and single-cell studies for RG-4 pathogens.

5.4 Databases and Bioinformatics for Sequencing

The new wave of NGS technologies has spurred new public databases and bioinformatic tools that comprehensively and quickly analyze millions of short reads or thousands of long reads. Because RG-4 pathogens are relatively rare, NGS data generation and sharing are critical. In the US, the NIH supports the National Center for Biotechnology Information’s sequence read archive for raw sequencing data, GenBank for consensus sequences from humans and viruses, and a range of other databases for processed data. The NIH also supports the Virus Pathogen Resource [174], which collates sequence data and experiments from host factor assays. Smaller data portals like virological.org and nextstrain.org [175] also exist and can rapidly disseminate pre-publication data. Advances in algorithms and computing power, described in other chapters of this book, will certainly facilitate searching [176], classifying [177], and processing these massive data sets. These data will require advanced modeling methods in conjunction with basic molecular biology and public health efforts.

5.5 “Third Generation Sequencing” Methods

Some of the newest NGS technologies, often dubbed third-generation sequencing, have been driven by nanotechnology and possess unique properties that further expand the molecular biology toolkit [114,115,116]. In contrast to Illumina, Roche 454, and Ion Torrent short-read methodologies that involve cleaving genomic material and sequencing-by-synthesis in cycles (≤600 bases per read), Pacific Biosciences and Oxford Nanopore methodologies both rely on nanoscale pores to processively read entire DNA strands, producing reads up to hundreds of kilobases long. Though the error rate is often much higher than that of short-read methodologies, long reads are particularly useful for de novo genome assembly, i.e., the assembly of an unknown genome sequence [178, 179] and reconstruction of large haplotypes with multiple mutations or variants [179]. Nanoscale pores also possess other unique benefits. Using Pacific Biosciences technology, a researcher can continuously sequence the same DNA strand in a circle, creating a circular consensus sequence, and thus reducing the error rate [180, 181]. Using Oxford Nanopore technology, a researcher can sequence RNA directly [155,156,157] and also identify DNA and RNA base modifications [182]. Additionally, the direction of the pores can also be reversed [183,184,185], and the same DNA strand is read twice for an improved consensus sequence. Perhaps the biggest advantages of Oxford Nanopore devices are their small sizes, typically resembling a USB drive, and the ease of their use [5, 120]. Continued developments in nanotechnology will further reduce instrumentation size and sample requirements and improve the error rate and selectivity of NGS.

At present, a Star Trek-style tricorder device for universal diagnosis remains the unobtainable, yet holy grail for assigning etiological agents to fevers of unknown origin is within sight. In many cases, access to technology, rather than the technology itself, is the limiting factor, and point-of-care devices are increasingly sought. For example, paper-based lateral flow assays are standard for pregnancy and antibody/antigen testing [186]. Numerous isothermal methods are under development as alternatives to PCR and are continuously improved because DNA/RNA tests offer a complementary approach for detecting pathogens [187,188,189]. NGS equipment, in particular Oxford Nanopore, is already portable and has been utilized during numerous outbreaks around the world [120, 121, 190, 191] and even in space aboard the International Space Station [192].

6 Disease Modeling

Accurate model systems are of the utmost importance for studying normal physiology and pathobiology of human diseases. For the past several decades, animal models have been the standard systems used to emulate human disease processes, with conventional two-dimensional (2D) in vitro systems complementing animal models by reducing system complexity and increasing throughput. If no suitable animal model is yet identified, research is often limited to in vitro systems. However, no one model system is truly capable of reproducing the complex biological processes observed in humans. Translating findings from animal models to human subjects can be quite challenging as large biological differences, altered disease severity, and altered susceptibility to pathogens exists between humans and other animals [193,194,195]. In addition, conventional 2D in vitro systems often only recreate cell-cell interactions and fail to maintain the complexity of tissue-tissue and organ-organ communication, which is of critical importance to disease processes in vivo .

The following sections highlight the use of organs-on-chips and organoids as models of complex disease states, as screening platforms for new biomarkers, and as advantageous systems for the study of infectious diseases and therapeutic interventions. Whereas the examples outlined below primarily represent work with RG-2 pathogens, organs-on-chips and organoids could relatively easily be utilized for the study of RG-4 agents. Importantly, with these systems, the study of highly pathogenic agents using human tissues in a complex, dynamic setting could closely resemble relevant in vivo systems.

6.1 Organs-on-Chips

To bridge the gap between animal models and basic in vitro systems, advances in microengineering and microfluidics were channeled to create organ-on-chip technology. Organs-on-chips are microfluidic cell-culture devices, fabricated using soft lithography from inert, gas permeable, polymers [196]. These biomimetic systems recreate tissue-tissue interfaces and biophysical properties of organs, including mechanical torsion (e.g., cyclic “breathing” motion associated with expansion and contraction of alveolar and capillary interfaces) and shear force from blood flow [196,197,198].

Originally used to recreate the lung alveolus, organs-on-chips have been adapted to recreate the human small airway, liver, intestine, kidney, bone, blood vessels, bone marrow, neuronal tissue, cardiac muscle, and cornea [199]. Given their flexibility in design, well-defined architecture, and wide range of sources for cellular materials, organs-on-chips represent an excellent and adaptable model system to study a wide array of diseases, including chronic obstructive pulmonary disease (COPD), asthma, liver disease, cardiovascular disease, and malignancies [200,201,202,203,204,205].

Another growing application of this technology is modeling the pathogenesis of infectious diseases. In particular, research in the area of respiratory infections has been greatly propelled using lung-on-a-chip and small airway-on-a-chip technologies [200, 201, 206,207,208,209,210]. For instance, a small airway-on-a-chip was used to model respiratory infection through use of a toll-like receptor 3 agonist, poly-inosinic-poly-cytidylic acid (poly-IC), thereby mimicking cellular events during viral infection of lung epithelial cells [201]. This model replicated complex disease states, such as viral exacerbation of disease in patients suffering from COPD and asthma, and helped identify new potential biomarkers for COPD exacerbation, such as macrophage colony-stimulating factor [201]. Meanwhile, additional lung model systems have been used to study fungal and bacterial infections in the lung. For instance, a multi-compartment human bronchiole was created to investigate the production of inflammatory cytokines during colonization with an eurotiomycete (Aspergillus fumigatus) and a gammaproteobacterium (Pseudomonas aeruginosa) [211]. Interestingly, this work showed that colonization of the artificial bronchiole with less virulent A. fumigatus strain results in increased production of inflammatory cytokines and recruitment of leukocytes, a finding that would be less likely made if the experiments were performed on cell monolayers in vitro [211]. Moreover, inflammatory cytokine production differed when the bronchioles were exposed to volatile compounds produced from co-cultures of P. aeruginosa and A. fumigatus compared to monocultures of either microbe [211].

As described above, lung-on-chip and related models have been widely used to study respiratory infections [200, 201, 206,207,208,209,210,211]. However, organ-on-chip technology is not restricted to the lung and has been applied to study infection of other organ systems including the liver, the central nervous system, and the intestine. For instance, primary human hepatocytes were used in combination with organ-on-chip technology to facility the study of hepatitis B virus infection in vitro [212]. The hepatitis B virus life cycle and host immune responses (e.g., cytokine responses) to infection were successfully modeled. In addition, cutting-edge micro-extrusion three-dimensional (3D) printing techniques were adapted to develop a “3D nervous system-on-a-chip” for the study of viral infections of the central nervous system [213]. Using this system, it was found that Schwann cells were refractory to pseudorabies virus infection, but that these cells still nevertheless participated in axon-to-cell spread of the virus. Additionally, infection with a human enterovirus, coxsackievirus B1, was successfully modeled using a human gut-on-chips. The virus infected and replicated within intestinal epithelium and stimulated inflammatory cytokine release in a polarized fashion [214].

Organ-on-chip systems are highly applicable for bridging gaps left between animal and 2D in vitro models, particularly with respect to the drug discovery process. During the drug discovery process, the highest leading cause of candidate drug attrition during clinical trials are failures in drug efficacy and safety [215, 216]. Organ-on-chip platforms provide more biologically complex environments that are better suited than conventional 2D systems for testing drug activities prior to clinical trials [199, 217]. Further, by mimicking several of the complex characteristics of whole organ systems, use of organs-on-chips can help reduce the use of animal models in the drug discovery process, which would serve to reduce the cost of the drug discovery process.

6.2 Organoids

Organs-on-chips are clearly advantageous for modeling human infectious disease. However, these chips are not the only advanced micro-engineered system suitable for this task. Organoids are 3D organ structures consisting of organ-specific cells grown from (induced, embryonic, or adult) stem cells via self-organizing mechanisms [218, 219]. One of the advantages of organoids is the capacity to mimic some of the complexities and functions of natural organs [219]. Given the structural and functional similarities to natural organs, organoids have been used extensively to study infectious disease with human samples [199, 217, 220]. Human respiratory syncytial virus, Helicobacter pylori, hepatitis C virus, and Zika virus infection have all been modeled using organoids derived from the lung, gastric, liver, and nervous systems [220]. Recently, human airway organoids were used as a screening platform to study the infectivity of emerging FLUAVs [221]. FLUAV strains known to be highly infectious for humans were associated with higher replication rates in the organoids compared to strains known to poorly infect humans [221].

Similar to organ-on-chip technology, organoid utility surpasses simple disease modeling as they have been extensively used in the drug discovery process [217, 220]. A cerebral organoid was used to model Zika virus infection and to identify potential therapeutic compounds to abrogate damage associated with infection [222]. Three compounds that were previously identified as having protective effects during flavivirus infection (oxytetracycline, ivermectin, and azithromycin) limited Zika virus infection of the organoids and reduced tissue damage, suggesting these compounds may be good candidates for limiting Zika virus infection and associated damage in vivo [222].

Overall, while helpful in reproducing several salient features of tissue and organ pathophysiology, there is still room for improvement in both organoids and organs-on-chips. For instance, organoids often exhibit high variability in size and shape, do not experience naturally occurring mechanical forces (e.g., breathing airflow in the lung airways or rhythmic expansion-retraction of alveoli during inhalation-exhalation), and lack microvascular blood-like flow for circulation of immune cells and continuous nutrient and oxygen supply. In addition, accurately accessing the luminal content of organoids for biochemical analysis is challenging if not impossible. Integrating emerging genetic engineering tools such as CRISPR-associated protein 9 (Cas9) or transcription activator-like effector nuclease (TALENs) with organoids or organs-on-chips would be of high value. With this integration, researchers can introduce new sensitivity to pathogens (when absent) or to dissect underlying mechanisms of host protection or organ injury during host-pathogen interactions (e.g., through CRISPR-Cas9 mediated deletion of genes proposed to play roles in protection and susceptibility of infection with pathogens).

6.3 Improved Design of Experimental and In Silico Studies

Technological advancements in disease modeling, including organs-on-chips and organoids, in part aim to increase experimental productivity and reduce the number of animals required to meet research goals. Computationally-aided improvements through design of experiments (DOE) is another approach with the potential to improve the efficiency and yield from studies Although also significantly reducing the burden of research. Improved efficiency is especially relevant given the expense and general difficulty associated with performing research in BSL-4 facilities. Although not a new field, DOE is generally applied to chemistry and pharmaceutical development more than biology, and its core concepts have recently been enhanced by machine learning concepts [223]. For example, fractional factorial DOE could be used for multivariate drug screens to reduce the number of actual conditions required to be measured, without reducing the experimental yield. The same technique could possibly be used to reduce the number of animals required in in vivo studies. Inference-based and machine learning-based methods for drug repurposing are improving (reviewed in [224]). Given the low incidence of most RG-4 pathogen infections, drug repurposing is an important source of potential therapeutics.

7 Conclusions: Incorporating Futuristic Technologies into Risk Group 4 Research

Advanced research tools, such as those described here, are constantly under development and provide new and exciting ways to investigate RG-4 pathogens. The opportunity to further define aspects of disease pathogenesis and host response to infection can help to tease out new therapeutic and vaccine targets to combat these diseases. However, although some of these technologies already constitute a marked advancement in RG-4 pathogens research, their use in a BSL-4 environment comes with a few noteworthy complications.

BSL-4 laboratories require yearly maintenance, at a minimum, to replace air filters and to perform required servicing for laboratory infrastructure components. If these futuristic technologies are housed within the BSL-4 laboratory spaces, required machinery will have to be able to withstand repeated, yearly decontamination processes (e.g., paraformaldehyde gassing with subsequent neutralization, Microchem Plus surface decontamination) as the laboratory space is prepared for servicing. Additionally, as use of a BSL-4 facility requires extensive training and registration, outside technicians do not generally have the ability or permission to enter laboratory spaces and service instruments. Instruments, therefore, should not be too complex so that a BSL-4 scientist could troubleshoot them effectively. Alternatively, these instruments can also be housed in a lower biosafety level if test samples can be safely inactivated and removed from the BSL-4 laboratory. Researchers have used this method as part of multiple techniques, including many diagnostic tests that begin with an inactivation step and sample removal from containment before testing [225,226,227]. As technologies are not generally developed with inactivation methods in mind, a viral inactivation method for safe handling of samples in a lower BSL laboratory that maintains the integrity of sample components will need to be identified. After identification, the effects of these inactivation methods on test integrity (e.g., sample dilutions, signal loss) will require further study.

Additionally, RG-4 research is not only confined to the controlled environment of a BSL-4 laboratory space. Deployment of these technologies to outbreak regions provides the opportunity to further characterize human disease progression and correlates of disease outcome, without the caveats associated with disease models. Use of CyTOF and single-cell sequencing on patient samples collected during disease, for example, could provide valuable data on immune responses that lead to disease survival. Additionally, as autopsies of deceased humans infected with RG-4 pathogens are rare, utilizing CODEX and MIBI could glean valuable information about human disease, even with a single autopsy. However, as outbreaks generally occur in developing countries, challenges to overcome may include lack of infrastructure (including electricity), transportation, and/or staffing. Newly developed technologies will need to be robust enough to counter changes in humidity, temperature, and many other potential stressors occurring in an outbreak setting. Given these complications, the futuristic technologies described in this chapter provide the opportunity to advance the understanding of highly pathogenic and consequential viruses and the diseases they cause.