Detection of Viral RNA Splicing in Diagnostic Virology
- 684 Downloads
This chapter is the first one to introduce the detection of viral RNA splicing as a new tool for clinical diagnosis of virus infections. These include various infections caused by influenza viruses, human immunodeficiency viruses (HIV), human T-cell leukemia viruses (HTLV), Torque teno viruses (TTV), parvoviruses, adenoviruses, hepatitis B virus, polyomaviruses, herpesviruses, and papillomaviruses. Detection of viral RNA splicing for active viral gene expression in a clinical sample is a nucleic acid-based detection. The interpretation of the detected viral RNA splicing results is straightforward without concern for carry-over DNA contamination, because the spliced RNA is smaller than its corresponding DNA template.
Although many methods can be used, a simple method to detect viral RNA splicing is reverse transcription-polymerase chain reaction (RT-PCR). In principle, the detection of spliced RNA transcripts by RT-PCR depends on amplicon selection and primer design. The most common approach is the amplification over the intron regions by a set of primers in flanking exons. A larger product than the predicted size of smaller, spliced RNA is in general an unspliced RNA or contaminating viral genomic DNA. A spliced mRNA always gives a smaller RT-PCR product than its unspliced RNA due to removal of intron sequences by RNA splicing. The contaminating viral DNA can be determined by a minus RT amplification (PCR). Alternatively, specific amplification of a spliced RNA can be obtained by using an exon-exon junction primer because the sequence at exon-exon junction is not present in the unspliced RNA nor in viral genomic DNA.
KeywordsRNA splicing Virus infections Viral diseases Clinical virology Diagnostic virology Clinical diagnosis RT-PCR Viruses RNA viruses DNA viruses
Diagnostic virology is the process in which the viral etiologic cause of infection is identified from a patient’s clinical sample. In the past, diagnostic virology relied on three classical techniques to make this diagnosis of viral infection: (a) virus isolation by direct virus cultivation, (b) viral antigen detection, and (c) indirect detection of virus-specific antibodies. While the remaining important tools are utilized by diagnostic virology laboratories today, these techniques are time-consuming and require specific reagents/methods such as cultivation media, cell or tissue cultures, antibodies, or purified antigens. In the past several decades, the number of new molecular-based methods increased rapidly and became widely used in diagnostic virology laboratories. The core of these techniques constitutes of techniques based on nucleic acid detection by specific amplification, hybridization, and/or sequencing (reviewed in ). The majority of these nucleic acid-based diagnostic methods are simple, speedy, sensitive, and specific and thus meet the gold “four-S standard” for their application in any diagnostic laboratory. The methods are simple and speedy because only a specific primer pair and a PCR machine are required by the laboratory, and identification of a viral pathogen takes only a few hours. They are sensitive and specific and require only a small amount of patient’s clinical specimen to detect a specific nucleotide sequence region. In general, these techniques can be used to detect almost all types of viral pathogens and can even identify multiple viral pathogens or their variants at the same time. In this chapter, we will focus on detection of viral RNA splicing as a new tool for diagnostic virology.
Principle of RNA Splicing
Definition of RNA Splicing
RNA splicing was discovered 40 years ago by Susan Berget  and Louise Chow ; these investigators mapped adenovirus transcription and identified intervening sequences (introns) in type 2 adenovirus primary transcripts. Subsequently, RNA splicing was recognized as an essential nuclear event for mammalian gene expression and for virus replication of almost all DNA viruses and some RNA viruses. Most mammalian genes consist of multiple segments called exons which are separated by noncoding or intervening sequences named introns. Genes which are composed of exons and introns are “split” genes. After transcription, a nascent or primary transcript (pre-mRNA) contains both exons and introns. The introns are removed from the pre-mRNA by a molecular process called “RNA splicing” resulting in production of spliced mature mRNA. RNA splicing takes place both in coding as well as in noncoding primary transcripts. RNA splicing has been considered a posttranscriptional event; however, recent studies have demonstrated that RNA splicing often occurs co-transcriptionally [4, 5]. Only those transcripts which are fully processed are eventually exportable from the nucleus to the cytoplasm for protein synthesis.
Molecular Mechanism of RNA Splicing
RNA splicing is catalyzed by cellular splicing machinery, which consists of the following components: (1) small nuclear ribonucleoproteins (snRNPs, U1, U2, U4, U5, and U6) and (2) splicing factors. During the initial step, snRNP, U1, and U2 recognize the intron sequences at the 5’ splice site and the branch point via complementary base-pairing involving the U2 accessory proteins U2AF65 and U2AF35, which associate with the polypyrimidine track and 3’splice site, respectively (Fig. 1a). Intron recognition is a signal for the formation of a large protein complex called the “spliceosome” where intron removal takes place [8, 9] by two transesterification reactions (Fig. 1b). First, the primary transcript is cleaved at the intron 5’ end (5’ splice site) to leave the upstream exon free; this step is followed by branching of the cleaved intron 5’ end to the branch point (usually A) to create a looped structure named “lariat intermediate”. In the second step, the hydroxyl group of the free exon attacks the intron 3’ splice site leading to the 3’ splice site cleavage and lariat formation. Simultaneously, a covalent bond is created between two exons to create a mature mRNA. In general, the lariats are quickly released from the spliceosome and degraded in the nucleus.
The efficiency of RNA splicing is regulated at multiple levels, and both RNA cis-elements and cellular splicing factors play major roles in the regulation of RNA splicing. As described above, the level of conservation of the sequences at the splice sites and the branch point will affect the strength of the binding of core splicing factors and thus determine the splicing efficiency. RNA splicing is also modulated by a large family of cellular splicing factors containing serine-arginine-rich proteins (SR proteins) and heterogeneous nuclear ribonucleoproteins (hnRNPs). Most of the splicing factors that are differentially expressed in a specified tissues and/or in a development stage are the RNA-binding proteins, which bind to specific RNA cis-element (splicing enhancers or silencers) located within introns and exons . It is now well documented that splicing factors binding to the cis-elements either increase or decrease RNA splicing efficiency depending on the type of splicing factors, the positions of the binding sites, and the overall spliceosome composition [11, 12]. Current studies demonstrate that in addition to splicing factors, other processes such as RNA polymerase rate and chromatin structure also affect RNA splicing [13, 14].
Alternative RNA Splicing
Molecular Methods for Detection of RNA Splicing
The mRNA generated by RNA splicing is different from its pre-mRNA. First, mRNA is smaller in size than its pre-mRNA due to RNA splicing which removes the introns found in the pre-mRNA. In contrast, the pre-mRNA is not only larger than the spliced mRNA but also has the same size as its DNA template. Second, mRNA contains exon-exon junctions with the sequence not present in DNA or its primary transcript, allowing designing primers or probes to specifically identify a particular mRNA isoform due to alternative RNA splicing. Although an alternatively spliced mRNA may translate a truncated protein, which could be detectable with a specific antibody, the molecular techniques based on detection of nucleic acids are more commonly used to detect RNA splicing.
RNase Protection Assay
The RNase protection assay (RPA) requires 32P-labeled single-stranded antisense RNA probes complementary to the transcripts of interest. The prepared probe(s) is consequently hybridized with sample RNA to form an RNA-RNA hybrid. Unhybridized single-stranded RNA is then removed by RNases A and T1, which digest single-stranded RNA only. The protected RNA fragments are separated in the gel by electrophoresis, and their sizes are determined by molecular markers. To distinguish a spliced RNA product, the probe should contain at least one partial intron region that will be digested from the probe due to the lack of the intron sequence in the spliced mRNA. As a result, the probe protected by the corresponding exon regions of the detecting mRNA is shorter and will run faster in the gel (Fig. 3b). In general, RPA is more sensitive than Northern blot in detection of RNA splicing.
Both Northern blot and RPA are commonly used in research laboratories. Their main advantage is high specificity. However, both methods are very laborious and low-throughput requiring isolation of large amount (usually a few micrograms) of the total RNA from samples and preparation of specific probes often labeled with radioisotopes, which limits their use in clinical diagnostics.
The RT-PCR (reverse transcription-polymerase chain reaction) is one of the most commonly used methods for detection and quantification of RNA molecules. During RT-PCR, RNA transcripts are converted into complementary DNA (cDNA) by reverse transcription using random hexamers or oligo-dT- (a short sequence of deoxy-thymidine nucleotides) or transcript-specific primers. The resulted cDNA is then used as a template in subsequent PCR with a pair of transcript-specific primers.
Introduction of real-time quantitative RT-PCR (RT-qPCR) with a broad (10 ) dynamic range has significantly improved the sensitivity of RT-PCR. Because of its high sensitivity, real-time RT-qPCR is able to detect and amplify RNA directly from a single cell without RNA extraction. In addition, it automates the quantification and does not require electrophoretic separation of RT-PCR products.
Each detection chemistry has its own advantages and disadvantages, which need to be considered during experimental design. SYBR green represents a simple, easy-to-use method and is the most economical real-time RT-PCR method. The disadvantage of SYBR green is its binding non-specifically to any DNA including primer dimers and non-specific PCR products; therefore, it is not useful for multiplex amplification of several products in the same reaction. In contrast, TaqManTM, Molecular Beacons, and ScorpionsTM probes specifically detect only a PCR product complementary to probe sequence enabling to distinguish specific from non-specific products. The disadvantage of these detection chemistries is that each PCR product requires synthesis of its own specific probe, which increases the cost per reaction. On the other hand, labeling individual probes with fluorophores of different emission spectrums allows multiplexing with simultaneous detection of several products and thus reduces the cost and labor.
The usage of real-time RT-PCR for splicing detection requires special considerations. Since real-time RT-PCR techniques omit electrophoretic separation, the spliced product cannot be distinguished based on size. Therefore, it is important that only the desired product is amplified. In this case, the usage of SYBR green chemistry is the most challenging due to the lack of specificity. Probe-based methods provide higher specificity due to probe hybridization to selected sequences that are not present in non-specific products. To detect only the desired spliced product, the probe and primer should be complementary to a specific exon-exon junction or to an alternatively spliced region. Many commercial manufacturers of synthetic oligos provide free online tools for the design of the most optimal and specific primer pairs based on the provided template sequence. Several other factors must be considered when using RT-PCR-based techniques in diagnostics; these include (1) RNA sample quality and preparation, (2) Taq polymerase inactivating contaminants in clinical samples, and (3) amplification bias. Other considerations are false positivity and PCR cross-contamination.
DNA microarrays (also known as DNA chips) are composed of large number of probes (often several thousand probes) spotted on very small area in 2D format on a solid surface (glass or plastic). The probes represent DNA oligos of various length and chemistry. Each probe has specific DNA sequence allowing detection of corresponding DNA with a complementary sequence. Currently, there are two major technologies involved with DNA arrays and the manufacturing of a microarray: (a) direct synthesis of probes on the array and (b) printing arrays from a library of pre-synthesized probes. Each DNA microarray allows rapid profiling of large number of DNA molecules at the same time. Today DNA microarrays are widely used to study gene expression profiling and RNA posttranscriptional modifications including RNA splicing .
There are two different approaches to probe design for the study of RNA splicing using DNA microarrays: (1) tiling and (2) exon arrays  (Fig. 6b). In tiling arrays, the set of overlapping probes cover the full length of the nascent primary transcript including exons and introns. The analysis of fluorescence for each probe allows the identification of exons and introns based on the difference in signal intensity (Fig. 6bi). The advantage of tiling arrays is their ability to identify known splice events as well as new splice events. Therefore, the tiling arrays are often used as discovery tools. The disadvantages are the requirement of a large amount of probes, which results in time-consuming data analysis. The exon arrays are more commonly used but require the knowledge of splicing events. Several types of probes hybridizing to flanking exons, intron, and exon-exon junctions are designed to detect each splicing event (Fig. 6bii). The fluorescence intensity is detected for each probe, and a mathematical model is applied to determine the occurrence of splicing event. The advantage of exon arrays is the smaller number of probes required, which means simpler data analysis. However, the exon arrays detect only known or predicted splicing variants. Due to their large capacities, the exon arrays can be designed to detect splicing in multiple viral pathogens simultaneously.
In Situ Hybridization
Tissue sections historically represent an important tool for the diagnosis of pathological changes during viral infection as well as the detection of viral pathogens at the cellular level. There are two major types of tissue sections: frozen and formalin-fixed, paraffin-embedded (FFPE). Both are routinely used for the detection of viral antigens by various types of staining, but their use in the detection of nucleic acids including spliced transcripts is still relatively rare. The improved sensitivity of current nucleic acid isolation and amplification techniques allows the recovery of nucleic acid from tissue sections for further analysis by PCR and RT-PCR with selective isolation of only the cells of interest by the use of laser capture microdissection to add an additional level of specificity [23, 24]. However, the detection of nucleic acids by in situ hybridization (ISH) directly on tissue sections can provide additional information about gene expression linked with spatial distribution of specific RNA transcripts in a morphological context often at the cellular or even subcellular level. In the past, the nucleic acid molecules including RNA transcripts by ISH were detected by DNA probes labeled with radioisotope (35S, 33P, 3H) , which were later replaced by nonradioactive DNA probes labeled with biotin or digoxigenin and detected by chromogenic methods using enzyme-labeled antibodies (CISH) . Labeling probes specific for different transcripts with different fluorophores (FISH) allows detection of multiple targets at the same tissue section. However, the sensitivity was always a limiting factor of ISH techniques. This was caused mainly by the use of DNA probes, which suffer from low affinity to complementary RNA targets and are sensitivity to degradation of RNA-DNA hybrids by RNase H. Development of tyramide signal amplification (TSA) has dramatically improved the sensitivity of DNA probes . Further improvement was seen with the introduction of locked nucleic acid (LNA) and peptide nucleic acid (PNA) probes with high affinity to RNA molecules and resistance to RNase H degradation [28, 29, 30]. Detection of RNA splicing by ISH requires a probe to specifically bind only to spliced mRNA without binding to unspliced pre-mRNA or to the genomic DNA in the sample. Historically, this was achieved by designing a probe over exon-exon junction containing sequences present only in spliced transcripts (Fig. 6ci). Another approach in the detection of spliced transcripts by IHS is using co-hybridization of two probes labeled with donor and acceptor fluorophore and the generation of signals by FRET. In principle, each splicing event is monitored by a set of two probes complementary to exonic sequences flanking an intron region. One probe carries a fluorophore acceptor, while the second probe is labeled by a fluorophore donor. When probes bind to genomic DNA or unspliced nascent transcript, their binding sites are separated by intron regions resulting in the distance between the donor and acceptor being too big for the two fluorophores to engage in FRET. However, intron removal by splicing brings the probe binding sites to a proximity close enough for FRET to occur resulting in generation of measurable fluorescence  (Fig. 6cii).This results in high specificity and low background. Using a set of probes with different fluorophores allows detection of multiple spliced transcripts or various spliced isoforms of the transcript. In summary, ISH hybridization methods provide a useful tool for the investigation of the distribution not only of protein-encoding transcripts but also of the rapidly growing number of virus-encoded noncoding RNAs, which their role in viral pathogenesis often remains elusive . In situ hybridization methods could be especially suitable in retrospective analysis of archived samples in collections.
Next-generation sequencing (NGS) represents a new generation of analytical tools for genome and transcriptome analysis . This method is based on generation of a large amount of short sequences in parallel sequencing reactions. Advantages of NGS are the requirement of less amount of the initial sample, deep coverage, and nucleotide resolution. NGS also does not require any previous knowledge of the detecting sequence. Currently the main platforms are the Illumina HiSeq 2500 or HiSeq 3000/HiSeq4000 for high-throughput sequencing and generating the sequencing reads of various lengths.
Sequencing of RNA samples converted to cDNA is called RNA-seq. RNA-seq provides a comprehensive picture of whole genome transcriptome and has been successfully used for analysis on gene expression and posttranscriptional processing including RNA splicing. However, NGS may be costly and time-consuming as well as requiring sophisticated data analysis; this currently makes NGS less suitable for clinical diagnostics. However, RNA-seq does not require any prior knowledge of detecting sequence composition and therefore allows detecting unknown or unpredicted RNA sequences. This may be especially beneficial in discovery of new pathogens including viruses . In addition, RNA-seq instantly analyzes a transcriptome including spliced transcripts in any type of cell or tissue.
RNA Splicing in Clinical Virology
RNA splicing does not occur in prokaryotes and is a hallmark of the eukaryotic gene expression. In eukaryotes the number of genes which undergo splicing varies highly from organism to organism, with only about 5% of all genes being spliced in yeasts to 95% in human [35, 36]. Viruses as intracellular parasites replicate inside of host cells and hitchhike many cellular processes for their replication including RNA splicing. By using constitutive and/or alternative RNA splicing, most of DNA viruses and some of RNA viruses increase the complexity of their proteome without the requirement of additional genetic materials.
Detection of spliced viral mRNAs in clinical samples would provide several benefits. While detection of viral genomes in clinical samples indicates virus infection, the result does not provide information about the stage and dynamic of the virus infection. In many cases the progress of viral replication can be assumed from changes in viral load, but this approach requires multiple sampling during infection and varies between individuals. One major advantage for detection of spliced viral transcripts is that viral RNA splicing reflects viral gene expression and thus indicates active viral infection, providing important information about the status of infection without requiring multiple sampling. The production of viral transcripts and their RNA splicing products are often the first sign of virus replication detectable before the increase of viral load or occurrence of viral-specific antigens or antibodies. Therefore, the detection of active viral infection by RNA splicing may be particularly important for early diagnosis of viral infection and may be critical for successful treatment. Because of direct association of spliced viral transcripts with the level of active viral replication and by monitoring viral RNA, one might be able to provide essential information early enough for initiation of antiviral therapy. A rapid shutoff of viral transcription and RNA splicing could be also the first sign of the blockage of viral replication visible even before the change in viral load by genome copy numbers. In the case of ubiquitous and common viruses, such as members of herpesvirus or parvovirus family, which establish latent infection in the host, detection of RNA splicing of a viral early gene would assist to distinguish viral latent infection from active lytic infection. Such a diagnosis is critical for recipients of the transplant organs where reactivation of latent viruses often leads to transplant rejection.
In addition, interpretation of the detection of RNA splicing results is straightforward without concern for carry-over DNA contamination, because spliced RNA is smaller than its corresponding DNA template. As described above, there are many techniques currently available for RNA splicing assay. These techniques are not only easy to set up with a low cost compared to virus isolations and immunological methods, but can be quickly applied to detect new emerging viruses for which the cultivation of the virus is difficult or impossible and/or no immunological method is available. This is particularly true for the combination with RNA-seq, which can rapidly provide sequence information about a viral transcriptome and RNA splicing of the viral messages.
RNA Splicing in RNA Viruses
Influenza virus infection affects millions of people every year. Influenza viruses, including influenza virus A, B, and C, are the members of the Orthomyxoviridae family. Influenza viruses are enveloped RNA viruses with a segmented, single-stranded RNA (ssRNA) genome of negative polarity. The number of segments may vary between virus species, with influenza viruses A and B genome having eight segments and influenza C seven segments. In contrast to the majority of RNA viruses, the influenza viruses replicate in the nucleus of host cells because of their dependence on cellular expression machinery . During replication, the viral RNAs are produced by viral RNA-dependent RNA polymerase. However, viral RNA genomes use short sequences with a cap structure generated by host RNA polymerase II for priming to initiate viral transcription. During infection viral polymerase produces two types of RNAs: one for protein synthesis and the other serving as a template for viral genome replication (see review ).
The influenza A segment 7, which encodes M1 and M2 proteins, produces 3 RNA species by alternative RNA splicing events. The unspliced RNA, which is collinear with the genome, encodes M1 nucleoprotein composed of 252 amino acid residues. The two alternatively spliced RNAs, M2 and mRNA3 , share the same 3’ splice site at nt 740 position, but use different 5’splice sites for alternative RNA splicing (Fig. 7b). M2 RNA uses a 5’ splice site at nt 51 position, whereas mRNA3 employs another 5’ splice site at nt 11 position from the beginning of viral-specific sequences. The M2 protein has ion channel activity and shares 8 amino acid residues with the M1 N-terminus; it also overlaps with 14 amino acid of M1 C-terminus. The mRNA3 contains a short open reading frame in its exon 2 with the potential to encode a short peptide of 9 amino acid residues. However, the expression of this peptide has never been experimentally confirmed. The role of this transcript during virus replication remains unknown.
While M2 and mRNA3 transcripts are detectable in cells infected with all influenza A viruses, some strains, such as A/WSN/33, produce an additional spliced transcript named mRNA4 . The spliced mRNA4 transcript is generated by usage of additional 5’ splice site at position nt 146 and shares the same 3’splice site with M2 and mRNA3 at position nt 740 (Fig. 7b). The mRNA4 has the potential to encode a peptide with 54 amino acid residues, and its first 37 amino acid residues are identical with M1 protein. Sequence analysis of more than 6000 influenza strains revealed that about 20 influenza A strains have this conserved mRNA4 splice site . Sequence information of all influenza viruses can be found at https://www.fludb.org/.
The primary RNA transcript of segment 7 in influenza virus B does not undergo alternative splicing to produce M2 protein as happens in the case of influenza A. RNA splicing in the M transcript takes place with segment 6 of influenza C . The two transcripts generated from segment 6 in the infected cells are the full-length primary and a single-spliced transcript created by removal of an intron located at the 3’end of the primary transcript. The primary transcript contains a 374 aa-long ORF (P42), but the spliced message contains a shorter ORF encoding 242-aa residues (CM1) due to generation of a stop codon after RNA splicing from a 5’ splice site at nt 751 to a 3’ splice site at nt 982 (Fig. 7b). The P42 protein is consequently processed by internal cleavage, resulting in production of a predominant CM2 protein containing the C-terminal 115-aa residues of P42 protein [47, 48].
In summary, there are two instances of viral RNA splicing in influenza infection: one conserved between all three species (NS1) and the second being highly variable in each species (M). Thus, the combination of NS1 and M RNA splicing assays would reveal not only active influenza virus infection but also be able to specify the infection with a specific influenza virus species.
HIV-1 and HIV-2
Human immunodeficiency virus (HIV) is a member of Lentivirus genus in retrovirus family and causes the acquired immunodeficiency syndrome (AIDS). HIV infects cells of the immune system and consequently causes the failure of immunity that is associated with occurrence of opportunistic infections that may result in death. HIV infection is considered to be a pandemic with about 0.6% of the world population being infected. Two types of HIV viruses have been characterized. Although closely related, HIV-1 differs from HIV-2 in infectivity and geographical distribution, with HIV-2 being much less pathogenic and predominantly occurring in several West African countries.
HIV is an enveloped virus and carries two copies of a single-stranded RNA genome of 9 kb having positive polarity (ssRNA+). After initial infection, the viral genomic RNA is converted by virus-encoded reverse transcriptase into DNA, which then can integrate into the host genome where this integrated viral genome subsequently resides as a provirus. Later, the integrated provirus serves as a template for continued transcription of viral transcripts.
Three groups of HIV transcripts can be observed by size in Northern blot analysis. The first group represents an unspliced 9-kb transcript, which serves a template for expression of gag and gag/pol as well as serving as genomic RNA for newly formed virions. The second group represents single-spliced RNA transcripts of ~4 kb, which encode env, vif, vpr, and vpu proteins. The third group of transcripts of ~2 kb consists of multiple-spliced RNA transcripts that encode accessory proteins tat, rev, nef, and vpr. During virus infection, HIV generates a wide variety of RNA transcripts by using at least five alternative 5' splice sites as well as eight to nine alternative 3’ splice sites [50, 51] (Fig. 8b). In addition, several antisense transcripts from several 3’ long terminal repeats (3’ LTR) have been detected in HIV-1-infected cells .
Recent studies have demonstrated that HIV alternatively RNA splicing is largely regulated by viral RNA cis-elements as well as cellular splicing factors and is orchestrated for completion of the HIV life cycle during virus infection. Multiple-spliced transcripts of the 2-kb family are expressed in the early stage of virus infection and express tat, rev, and nef. This group of spliced RNAs is produced by using the 3’ splice site A3–A5 located in the central part of the viral genome with an A3 site for expression of tat, A4a-c for rev, and A5 splice site for nef proteins. During late stage of HIV infection, nuclear import and accumulation of tat together with rev protein allow the rev protein to bind to a rev-responsive element (RRE) in partially spliced 4-kb and unspliced 9-kb RNA transcripts located in the tat/rev intron between the D4 and A7 splice sites; this rev protein mediates the export of later transcripts into the cytoplasm for translation  (Fig. 8c). Sites A1A and D1A are involved in pre-mRNA stability . Strains from the IIIB family of HIV viruses use additional A6 and D5 splice sites to generate a small exon in the env region; transcripts containing this exon express the tripartite tat-env-rev fusion protein, tev [55, 56].
Regulation of HIV RNA splicing depends on the selection of 3’ splice sites, which are, in general, weak in contrast to stronger and highly active 5’ splice sites. In addition, numerous positive and negative splicing regulatory cis-elements identified in the HIV RNA genome bind various cellular splicing factors and thus affect the selection of individual 3’ splice site (see review ). Comparison of nucleotide sequences between the various clades of HIV-1 has demonstrated a high level of conservation of splice sites among the different clades of HIV-1 strains (except D4a, b, c).
HTVL-1 and HTVL-2
Human T-cell leukemia virus type 1 (HTLV-1) and type 2 (HTLV-2) were the first two retroviruses discovered in humans . HTLV-1 is etiologically linked to adult T-cell leukemia/lymphoma (ATLL), an aggressive malignancy of CD4+ T lymphocytes, as well as to a neurological disorder named HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) [58, 59, 60]. HTLV-1 is endemic in Japan, Africa, Caribbean basin, and South America. HTLV-2 is linked to HAM/TSP but not to ATLL. HTLV-2 infection occurs predominantly in parts of Africa and Americas .
In contrast to HIV-1, the regulation of HTLV RNA splicing and the roles of cellular splicing factors in HTLV RNA splicing are poorly understood.
Human circovirus Torque teno virus (TTV) was originally discovered in the serum of a patient with posttransfusion non-A to G hepatitis . Later studies showed that TTV is present in body fluids of healthy individuals and is not associated with any pathological disorder. However, the prevalence of TTV in the general population appears highly variable and ranges from about 2% to 90% of the incidence of TTV infection. This large variation among reported studies is most likely attributable to primer selection and PCR performance . TTV isolates have considerable diversity (about 30%) that can be clustered in several genotypic groups without any particular geographical distribution, indicating that TTV likely represents a ubiquitous virus. Virus replication, route of TTV infection, and association with pathological manifestations remain unclear.
The role of TTV RNA splicing and its regulation in infection by this virus as well as TTV viral replication remained largely unknown. Another human circovirus TTV-like mini virus (TLMV) also has been identified in human sera . TLMV shares the same genetic organization with TTV and other circoviruses, but its genome is only about 2.9 kb in size.
Hepatitis B virus (HBV) is a hepatotropic virus, by which chronic hepatic infection can result in the development of liver cirrhosis and hepatocellular carcinoma . Despite an effective HBV vaccine that is available in many countries, HBV infection remains epidemic in many parts of the world, particularly in Asia and sub-Saharan Africa. WHO estimates that HBV infects about 2 billion of people worldwide, with chronic infection affecting 350 million and causing death in approximately 1 million persons each year. While the vaccine can prevent HBV infection, there is no cure for already infected individuals due to the persistence of the active transcriptional template of HBV covalently closed circular DNA (cccDNA).
All HBV transcripts from cccDNA are produced by cellular RNA polymerase II. A spliced 2.2-kb RNA transcript was first identified in transfected hepatoma cells  and contains a single 1223-nt-long intron starting from the end of the core antigen ORF to the middle of the S antigen ORF. Subsequently, other single- and multiple-spliced forms of pgRNA have been discovered, with sizes of 2.1–2.6 kb, in both cell cultures and liver tissues of HBV patients [78, 79, 80]. So far, 13 spliced variants of pgRNA and 2 spliced isoforms of pre-S2/S RNA have been identified from HBV gene expression during infection; these spliced viral RNAs have been produced by the use of six 5’ splice sites and seven 3’ splice sites (Fig. 11c). A viral cis-element PRE (posttranscriptional regulatory element) as well as cellular splicing factors such as PTB (polypyrimidine track-binding protein) and SR proteins also may play roles in the regulation of HBV RNA splicing (see review ).
Approximately 30–50% of HBV RNA during HBV infection of human hepatoma cell lines Huh7 and HepG2 are spliced RNAs; Huh7 and HepG2 are two popular cell lines for in vitro HBV replication studies . Huh7 cells seem to produce more spliced RNA than do Hep2G cells. The major spliced product is derived from 30% of pgRNA using nt 2447 5’ss and nt 489 3’ss in genotypes A, C, D, and E. Serum of infected patients or hepatocarcinoma tumor samples frequently contain HBV DNA originating from spliced variants [83, 84]. The level of spliced HBV RNAs in patients varies widely from no splicing to extensive splicing and is related to viral genotype . The role of HBV RNA splicing in the HBV life cycle or in HBV pathogenesis remains to be elucidated. HBV spliced RNAs express two new proteins [86, 87]. A spliced mRNA derived from pgRNA with removal of a 454-nt intron from nt 2447 to 2901 encodes a structural polymerase-surface fusion protein p43 with potential function in the entry . Another single-spliced pgRNA with removal of an intron from nt 2447 to 489 translates a 93-aa fusion protein in size of 10.4 kDa, in which the first 46-aa residues are identical to the N-terminus of the viral polymerase protein followed by the 47-aa residues generated by the frameshift from the second exon. This protein has been referred as hepatitis B splice-generated protein or HBSP  and seems to be associated with chronical HBV infection, HBV viral cytopathogenic effect, and HBV immune evasion (see review ).
Parvoviruses are a group of small non-enveloped viruses containing a single-stranded DNA genome (ssDNA) of ~6 kb. The palindromic inverted terminal repeats at the ends of the virus genome function as an origin for replication. Parvoviruses replicate via a double-stranded DNA intermediate that serves as a template for viral transcription . Replication of some parvoviruses relies on “helper” viruses such as adenovirus, herpesviruses, vaccinia virus, and human papillomaviruses [90, 91, 92].
Parvoviruses are ubiquitous viruses and infect a wide range of animals. As of today, there are at least four members of the Parvoviridae family that are infectious to humans: adeno-associated viruses (AAV), parvovirus B19 (B19V), human bocaviruses (HuBoV), and human Parv4 . Despite structural and genetic similarity, different parvoviruses use different replication and transcription strategies during viral infection and have different host tropisms for initiating a productive infection in the presence of a helper virus.
Adeno-associated viruses (AAV), currently classified as Dependoviruses, were the first human parvoviruses identified in the group. AAVs infect a wide range of species with AAVs-1, AAVs-2, AAVs-3, AAVs-8, and AAVs-9 being found in human . Currently no disease or pathological condition is associated with AAV infection in humans. The correlation between AAV infection and fetal loss and male infertility has been proposed due to a high prevalence of AAV DNA in placental tissues and in genital tissues of men with abnormal semen [95, 96]. This has not yet been shown to be causal. Because AAV lacks pathogenicity, induces a low immune response, and infects both dividing and nondividing cells with the capability of viral DNA integration into the host genome, AAV has gained attention as a vector for gene therapy (see review ).
Human B19 virus, a member of Erythrovirus genus, was first identified in the serum of blood donor . Three genotypes of B19 viruses have been identified from different geographic regions . After acute infection, the virus persists in host for the rest of the life. The infection by B19 virus is generally asymptomatic, but several pathological conditions have been associated with B19 infection; these include erythema infectiosum (the “fifth disease) , polyarthropathy syndrome , transient aplastic crisis (TAC) , and persistent anemia/pure red cell aplasia (PRCA). B19 infection during pregnancy may associate with spontaneous miscarriage and development of nonimmune hydrops fetalis .
Like other parvoviruses, B19 virus genome encodes two large open reading frames. NS1 ORF on the left-side genome translates a 77-kDa nonstructural protein, while a VP ORF on the right-side genome produces two capsid proteins (84-kDa VP1 and 58-kDa VP2). At least nine virus-specific transcripts have been detected following B19 infection ; all of which are transcribed solely from a single promoter P6 located upstream of NS1 gene, but are alternatively spliced and terminated at two alternative polyadenylation sites (Fig. 12c) located either in the middle or on the far right side of the genome. By using the poly(A) site in the middle of virus genome, the P6 transcript has an intron in the NS1 ORF, and splicing of this intron from NS1 transcript may create a novel ORF encoding a small accessory 7.5-kDa protein. However, if the poly(A) site on the right-side genome is used for RNA polyadenylation, the P6 transcript becomes a bicistronic (NS1 and VP) transcript with two introns. By splicing to remove the intron 1 from the bicistronic RNA, the single-spliced P6 transcript is capable of encoding both 7.5-kDa and VP1 proteins. Double RNA splicing to remove both intron 1 and intron 2 from the P6 bicistronic transcript disrupts both ORFs for NS1 and VP1, but creates either a VP2 or a novel ORF for another accessory 11-kDa protein, depending on which alternative 3’ splice site is selected (Fig. 12c). Thus, all detected B19 transcripts are derived from a P6 pre-mRNA containing one or two introns with two possible alternative 3’ splice sites depending on the selection of one of two possible alternative poly(A) sites. All detected B19 transcripts are alternatively spliced RNA transcripts, except the unspliced full-length NS1RNA. The cis-elements in the central exon and intron 2 are regulatory elements and control the alternative P6 RNA splicing, with the double-spliced P6 RNAs being the predominant species in the infected cells .
The two most common infection sites by adenoviruses in humans occur in the upper and lower respiratory tract and result in bronchitis and/or pneumonia. Adenovirus infection can also involve a wide range of other sites resulting in conjunctivitis, ear infection, gastroenteritis, myocarditis, hemorrhagic cystitis, meningitis, and encephalitis. There are 56 adenovirus types belonging to seven species (human adenovirus A–G). Types belonging to B and C are responsible for most respiratory infections, B and D for conjunctivitis, and F and G for gastroenteritis . Adenoviruses were also found in other vertebrates.
Even though the human adenoviruses are not etiologically linked to any human cancer, some adenoviruses (types 2, 5, 12, 18, and 31) can, under special circumstances, transform rodent cells in vitro and induce tumors in small animals. Transformation activities are linked to two oncogenes: (1) E1A, which bind tumor suppressor pRB, and (2) E1B, which binds tumor suppressor p53 .
Almost all adenoviral early and late transcripts undergo RNA splicing in order to produce their corresponding viral products . Here, viral E1A and L1 transcripts are used as examples for the alternative RNA splicing seen in adenoviral infections.
The adenovirus E1A primary transcript contains three 5’ splice donor sites as well as two 3’ acceptor sites and is composed of three exons and two introns. The first intron from nt 636 to nt 853 is a suboptimal, minor intron. The second intron is a major intron that uses two alternative donor sites, respectively at nt 973 and nt 1111 as well as one acceptor site at nt 1226 for RNA splicing. Alternative splicing of E1A RNA through the utilization of various combinations of splice donor and acceptor sites leads to the formation of five different species (13S, 12S, 11S, 10S, and 9S) of E1A mRNAs based on their sedimentation coefficient (Fig. 13b) and expression of individual unique proteins .
The transcription of late genes starts predominantly from a major late promoter. The primary late transcript is then polyadenylated at one of five polyadenylation sites, forming five groups of late transcripts (L1–L5). Each late mRNA contains a 201-nt “leader” sequence derived from three noncoding exons that function as a translational enhancer . There are two variants of leader sequence with or without an i-leader exon. Beside the leader sequence region, L1 transcripts also are alternatively spliced by the utilization of a common 5’ splice site in combination with two alternative 3’ splice sites. Selection of a proximal 3’ splice site results in the formation of 52, 55K RNA, while selection of a distal 3’ splice site produces IIIa mRNA (Fig. 13c).
The characteristic features of adenovirus splicing depend on the stage of virus infection. For example, E1A 13S and 12S mRNA are two major spliced products that occur during early virus infection. In contrast, 9S RNA is largely accumulated in the late stage of infection . Similar phenomenon has been observed in the expression of late mRNAs. Inclusion of the i-leader exon is generally a signature of early transcripts, but most of the late transcripts contain a classical tripartite leader. While 52,55K L1 RNA is produced during both early and late infection, the IIIa splice site is used only in the late stage of viral infection . Both cellular splicing machinery and viral products have been found to regulate alternative splicing of adenoviral transcripts during the course of viral infection (see reviews [ 110, 115]).
Polyomaviruses are small non-enveloped viruses that contain a circular double-stranded DNA (dsDNA) genome of ~5000-bps. Polyomaviruses infect a wide range of mammalian and avian species, but each virus exhibits a limited host range with narrow tissue tropism. The Polyomaviridae family contains only one genus Polyomavirus (PyV), which has nine members of the human polyomaviruses: (1) BKPyV , (2) JCPyV , (3) KI PyV , (4) WU PyV , (5) Merkel cell PyV (MCPyV) , (6) HPyV6, (7) HPyV7 , (8) trichodysplasia spinulosa-associated PyV (TSV) , and (9) HPyV9 . Simian vacuolating virus 40 (SV40), a prototype virus of the family, was introduced into the human population as a contaminant in early trials of poliovirus vaccine . Serological data indicate that polyomavirus infection is widespread in the general human population with initial infection occurring in childhood . After the initial infection, polyomaviruses persist in the host for the rest of the life. While initial infection is usually asymptomatic, several human polyomaviruses are associated with various pathological conditions in immunocompromised patients including nephropathy and cystitis associated with BK PyV and progressive multifocal leukoencephalopathy associated with JCPyV , as well as trichodysplasia spinulosa presumably associated with TSV infection. Polyomaviruses expresses an oncoprotein T antigen, and this T antigen may lead to the development of human cancer by an abortive infection as recently confirmed in a rare but aggressive Merkel cell carcinoma [127, 128, 129].
In polyomavirus-infected cells, multiple isoforms of the T antigen are detectable because of alternative RNA splicing. The primary transcript of the T antigen contains two introns, but the first intron 1 has two alternative 5’ splice sites. During RNA splicing, the intron 2 retention is important for production of both large T and small t antigens. However, selection of proximal 5’ splice site in the intron 1 for RNA splicing leads to production of large T antigen, whereas selection of a distal 5’ splice site in the intron 1 results in small t production. Because the sequence region between the proximal 5’ splice site and the distal 5’ splice site has a stop codon, retention of this region in small t RNA splicing makes the small t RNA larger than the large T RNA, but introduction of a premature stop codon in the small t RNA results in production of a smaller protein (Fig. 14b). In addition, a rare tiny t antigen of ~17 kDa has been attributed to double RNA splicing in SV40-infected cells . In this case, the transcript encoding the 17-kDa antigen shows splicing of both introns, but splicing of the intron 1 by selection of the proximal 5’ ss. Similar to SV40, the multiple-spliced RNA species of early transcripts also were detected in other polyomaviruses such as the truncated T antigen (trunc-T Ag) in BKPyV , T’135, T’136, and T’165 in JCPyV  and T3 and T4 early transcripts in MCPyV  (Fig. 14b). Alternative splicing of polymavirus early transcripts allows the expression of multiple T antigens with distinguished function during the viral life cycle. In addition to the cells with actively replicating virus, the early viral transcripts also are expressed in cells with nonproductive infection or in polyomavirus-transformed cells. These cells often do not express late gene product due to integration of the viral DNA into host genome, resulting in dysregulated viral gene expression as well as cell transformation.
HPVs are the etiological agent of cervical cancer and presumably of other anogenital cancers. HPV is present in >95% of all cervical cancer and is required for initiation of cervical carcinogenesis and maintenance of the cervical cancer cells. Cervical cancer is a leading cause of death for women in the developing world, with about 493,000 new cases and nearly 273,000 deaths each year. More than 200 genotypes of HPVs have been identified to date and are grouped into two major groups based on their pathogenesis and association with cervical cancer . The reference genome sequences are available at https://pave.niaid.nih.gov/#home. The high-risk or oncogenic HPV types are present in cervical cancers, some anogenital cancers, as well as head-and-neck cancers. The low-risk or non-oncogenic HPVs are not associated with cancers . In general, women acquire HPV infection by sexual contact. A number of epidemiology studies have demonstrated that women with repeat exposure to oncogenic HPVs as well as women with persistent cervical infection by oncogenic HPVs are at high risk for developing cervical cancer [136, 137]. Infection with oncogenic HPV-16 and HPV-18, the two most common oncogenic HPV types, leads to the development of almost 70% of all cervical and other types of anogenital cancers. Viral E6 and E7 of the oncogenic HPVs are two viral oncoproteins that inactivate, respectively, cellular p53 and pRB, which are two tumor suppressor proteins essential for cell cycle control [138, 139]. In cervical cancer tissues and cervical cancer-derived cell lines, E6 and E7 oncogenes are highly expressed; the majority of the E6/E7 bicistronic RNA are alternatively spliced as diagramed for HPV-16 and HPV-18. A major spliced RNA isoform of viral E6/E7 bicistronic RNA is E6*I derived from splicing of nt 226 5' splice site to nt 409 3' splice site for HPV-16 and of nt 233 5' splice site to nt 416 3' splice site for HPV-18 (Fig. 15b). It has been demonstrated that this RNA splicing is necessary for viral E7 translation  and can be easily detected by RNase protection assay (RPA) or by RT-PCR methods [140, 141].
The presence of the high-grade premalignant lesions (CIN, cervical intraepithelial neoplasia) caused by oncogenic HPV infection is a sign of increased risk for developing cervical cancer. These lesions can be detected by routine cervical examination and treated by surgery to prevent progression to cervical cancer. The Papanicolaou test (also called the Pap smear) is a screening test used in gynecology to detect premalignant and malignant cells in cervical swabs. A woman who has a Pap smear with abnormal cells may also be referred for HPV DNA testing by two FDA-approved assays: the Hybrid Capture 2 DNA test, which detects 13 high-risk HPVs (HPV-16, HPV-18, HPV-31, HPV-33, HPV-35, HPV-39, HPV-45, HPV-51, HPV-52, HPV-56, HPV-58, HPV-59, and HPV-68) and is available from Qiagen , or the Cobas 4800 System HPV test, which detects 14 high-risk HPVs (HPV-16, HPV-18, HPV-31, HPV-33, HPV-35, HPV-39, HPV-45, HPV-51, HPV-52, HPV-56, HPV-58, HPV-59, HPV-66, and HPV-68) and is available from Roche . A few of the HPV E6/HPV E7 RNA tests also have been introduced. The APTIMA HPV Assay from Hologic was designed to detect HPV E6/HPV E7 mRNA from 14 high-risk types (HPV-16, HPV-18, HPV-31, HPV-33, HPV-35, HPV-39, HPV-45, HPV-51, HPV-52, HPV-56, HPV-58, HPV-59, HPV-66, and HPV-68)  with a sensitivity and specificity similar to or better than the Hybrid Capture 2 DNA test [145, 146]. The PreTect HPV-Proofer from PreTect was designed to detect E6 and E7 RNA from HPV types 16, 18, 31, 33, and 45 [147, 148] and is more specific than the HC2 for identifying women with CIN 2+ but has a lower sensitivity . By using the primers detailed in Fig. 15c for RT-PCR assays, the spliced E6/E7 RNAs of HPV-16 and HPV-18 can be easily detected due to an amplicon size smaller than E6/E7 DNA, without worry of any carry-over viral DNA contamination commonly encountered with HPV DNA tests.
Herpesviruses are large DNA viruses with a complex life cycle. Their relatively large linear double-stranded DNA (dsDNA) genome (~100–200 kb) is encapsulated in a capsid with icosahedral architecture. The capsid is covered with a heterogeneous layer of viral proteins and RNAs called tegument. Outside of this tegument is a lipid bilayer membrane (envelope) containing several virus-encoded glycoproteins. A hallmark of herpesvirus infection is the establishment of a lifelong “latent” infection in their host following initial infection. Latent virus is often reactivated by various stimuli and causes recurrent infections, which is a typical feature of all herpesviruses.
Currently there are more than 100 known herpesviruses infecting a wide range of animal species. All human herpesviruses belong to the Herpesviridae family which is further grouped into four subfamilies: Alphaherpesvirinae, Betaherpesvirinae, Gammaherpesvirinae, and unassigned viruses. Currently, eight herpesvirus species have been isolated from humans; these have been assigned to three subfamilies of Herpesviridae. These include (1) herpes simplex virus type 1 [HSV-1, also referred as human herpesvirus 1 (HHV1)], (2) herpes simplex virus type 2 (HSV-2 or HHV2), (3) varicella-zoster virus (VZV or HHV3), (4) Epstein-Barr virus (EBV or HHV4), (5) human cytomegalovirus (CMV or HCMV or HHV5), (6) human herpesvirus 6 (HHV6), (7) human herpesvirus 7 (HHV7), and (8) Kaposi sarcoma-associated herpesvirus (KSHV or HHV8).
After viral entry, the viral genome is translocated to the nucleus of the infected cell where the expression of viral genes and viral genome replication occurs. All herpesviruses have two types of viral life cycle, latent and lytic, with each having a distinctive transcriptional profile. Latent infection is characterized by the expression of a few viral genes (latent transcripts) that maintain the viral genome in latently infected cells. Lytic infection is associated with viral genome replication and production of infectious virions; this generally leads to destruction of the infected cell. In contrast to latent infection, almost all viral lytic genes in the lytic infection are expressed in a timely regulated fashion and are divided, based on their dependence on viral protein expression and viral genome replication, into three kinetic classes: immediate early, early, and late. In some circumstances, the virus in latently infected cells may be reactivated and proceeds to lytic infection. The mechanisms controlling the establishment of latency and reactivation of herpesviruses are not yet fully understood. The human herpes viral genome encodes up to 100 different genes including a variable number of noncoding genes expressing noncoding RNAs or viral miRNAs [150, 151, 152].
Most human herpesviruses are highly prevalent in the general population. Initial infection generally occurs in childhood or early adolescence through body contact and is followed by the establishment of latent infection. Some herpesviruses are sexually transmitted. Blood transfusion, tissue transplantation, and/or congenital transmission are additional mechanisms for acquiring the virus. The primary infection often occurs in epithelia, i.e., the point of entry, followed by establishment of latent infection, which generally occurs in a specialized cell type (neurons or lymphocytes) and serves as a virus reservoir. Recurrence of infection is caused by virus reactivation from its latent state with the virus escaping from the host immunological surveillance. Overall symptoms of herpesvirus infections in healthy individuals are generally mild, but may be life threatening in immunocompromised patients. While it is clear that infections by some herpesviruses such as EBV and KSHV are etiologically linked to the development of several types of cancer, the role of other human herpesviruses in cell transformation remains unknown . Several antiviral compounds are used to treat acute herpesvirus infections. The only vaccine against herpesviruses currently approved for use in clinic is varicella/chickenpox vaccine against VZV.
The infections by herpesviruses are most commonly diagnosed by the presence of specific antibodies and antigens or by detection of viral DNA by PCR. However, without quantification at multiple time points, these techniques are unable to distinguish virus carriers from patients with active virus replication. Detection of viral transcripts associated with virus lytic phase by RT-PCR provides indication of active virus replication, but often leads to a false-positive result due to viral DNA contamination. Such DNA contamination problems could be avoided by selection of an amplicon over the intron in spliced viral transcripts; a specific product of the spliced RNA could be distinguished from its corresponding DNA based on its size. The number of spliced viral transcripts varies from one herpesvirus to another, ranging from only a handful of split genes in HSV-1 to about 30% in KSHV . Both latent and lytic genes may have an intron and sometimes these are alternatively spliced.
Herpes Simplex Virus Type 1
During latency, HSV-1 expresses LAT (latency-associated transcript) RNA using a repeat region of the viral genome called LAT-DNA [157, 158]. Two forms of LAT RNAs are detectable in latently infected neurons. A major 2.0-kb RNA is produced by splicing of a capped and polyadenylated 8.3-kb primary transcript and represents a unique stable intron while the spliced exonic RNA is unstable and quickly degraded . A minor 1.5-kb RNA is generated by further splicing of the 2.0-kb RNA by removal of an internal intron of 559 or 556-bp, depending on the virus strain  (Fig. 6a). Both LAT RNAs are uncapped without a poly(A) tail and accumulate in the nucleus of infected cells. HSV-1 LAT RNA is a noncoding regulatory RNA for establishment and maintenance of viral latency by inhibiting the expression of viral lytic genes and thus interfering with the cellular apoptosis pathway . Recent studies have demonstrated that LAT transcript functions as a precursor for the generation of virus-encoded miRNAs . The expression of LAT-DNA also has been observed during lytic infection. Lytic LAT transcripts differ from latent LAT RNA by the presence of a poly(A) tail .
ICP0 (IE110) is encoded by a gene located in the viral genome repeat region and partially overlaps with LAT transcripts. Antisense expression of LAT transcripts inhibits the expression of ICP0 during latency. ICP0 is an immediate-early gene expressed in the early stage of lytic infection. ICP0 functions as a non-specific transactivator and a cofactor of another viral transactivator ICP4 . ICP0 initiates lytic replication in both newly infected cells as well as after reactivation in cells with latent infection. ICP0 is transcribed in reverse orientation from viral genome, and its pre-mRNA contains three exons separated by two introns  (Fig. 16a). After splicing, the mature mRNA encodes ICP0 protein with 775-aa residues. An alternatively spliced ICP0 transcript retaining intron 2 is detectable in the infected cells  and encodes a truncated ICP0R in size of 262-aa residues due to the presence of a stop codon in the intron 2. Thus, both ICP0 and ICP0R have the same aa sequences in the N-terminal part. ICP0R functions as a repressor of viral expression .
HSV-2 represents another important human pathogen belonging to the alphaherpesvirus subfamily. Genital infection with HSV-2 causes genital herpes, which is considered to be a sexually transmitted disease. HSV-2 is also neurotropic and establishes latent infection in sacral ganglia. HSV-1 and HSV-2 are two closely related viruses with similar genomes and gene structures, including their LAT and ICP0 regions .
In general, the infection of HSV-1 and HSV-2 is controlled by the host immune system. Thus, initial or recurrent infections are usually associated with only mild symptoms. Infection in immunocompromised patients can cause several severe diseases including encephalitis . Genital infections or reactivation of HSV-2 during pregnancy can lead to congenital infection . Detection of viral DNA may not provide sufficient information about virus replication status due to the permanent presence of viral DNA in the infected cells. Detection of viral lytic products, such as spliced ICP0 RNA, may be a better predictor of virus reactivation and may be seen even before the occurrence of clinical symptoms allowing early diagnosis and enabling early treatment. Disappearance of the detectable lytic products could be a sign of treatment efficiency since the viral transcripts disappear earlier than viral DNA.
Human cytomegalovirus (HCMV) together with HHV-6 and HHV-7 belongs to the Betaherpesvirinae. A high prevalence of CMV infection has been noted in 50–80% of the human population. In a majority of healthy individuals, the primary CMV infection occurs asymptomatically but, in some cases, can be associated with sore throat, prolong fever, or a syndrome similar to infectious mononucleosis. After initial infection, the virus usually remains latent in T cells for the rest of the host life without apparent symptoms. In contrast, CMV infections in immunocompromised individuals, such as newborns, transplant recipients, persons with AIDS, or cancer patients, can lead to severe disease and even death. The symptoms include hepatitis, retinitis, colitis, pneumonia, encephalitis, and others.
CMV has a large genome of about 220 kb capable of encoding approximately 200 genes (reviewed in ). While the majority of CMV transcripts do not have introns, the presence of several split genes has been identified in all kinetic classes of the viral genes . A major immediate-early region (MIE) located within a unique long (UL) region of the CMV genome contains several genes that are highly expressed during the early stage of viral lytic infection. These include UL123 (IE1), UL122 (IE2), and UL119-115. MIE transcripts contain multiple introns and undergo complex alternative RNA splicing. MIE transcripts IE1 and IE2 are expressed from the same promoter but are alternatively polyadenylated. These transcripts have five major exons and can be alternatively spliced to express additional isoforms of IE1 and IE2 proteins (Fig. 16b). Splicing also was detected in transcripts from other CMV genes such as TRL4, UL89, US3, R160461, and R27080 . Gene UL21.5 (previously named as R27080) is one of the known CMV split late genes (SLG). The UL21.5 transcript that encodes viral glycoprotein is expressed from the UL region posited from nt 27080 to nt 27574 of CMV genome  and has a short intron of 83 nts. Removal of this intron leads to production of a mature mRNA in size of ~0.4 kb. Both spliced and unspliced UL21.5 RNAs are easily detectable by RT-PCR from infected cells .
Allogenic bone marrow transplant recipients are at high risk for developing CMV diseases. Historically, viremia has been used as an indicator of CMV disease as well as to guide preemptive treatment. Multiple approaches have been developed to detect CMV viremia in circulating lymphocytes by direct virus isolation with cultivation, by detection of viral antigens in polymorphonuclear cells, or by quantitative viral DNA [175, 176]. However, the detection of viremia is not sufficient for disease prediction since many viremic patients never develop symptoms. However, active CMV replication in peripheral blood lymphocytes can be verified by analyzing viral mRNAs . Amplification of spliced viral transcripts has some advantages in comparison to intronless transcripts and is not affected by DNA contamination. Detection of spliced immediate-early transcripts had been reported to have a good correlation with the detection of viral DNA or viral antigen [178, 179, 180]. Detection of late gene UL21.5 has a better prediction value and has a significant correlation with disease progression [174, 181, 182].
Epstein-Barr virus (EBV), a well-characterized member of the Gammaherpesvirinae subfamily, is an important human pathogen. EBV infection is highly prevalent, with more than 95% of the human population becoming seropositive in early life. While primary infection during childhood is usually unremarkable, the virus acquisition in adolescence and adulthood is often associated with the development of the infectious mononucleosis syndrome. In healthy individuals, the EBV infection is well controlled by the immune system. However, EBV remains in long-living memory B cells where it establishes a latent infection. EBV is an oncogenic virus capable of transforming the infected B cells . EBV infections have been associated with the development of a number of human malignancies, including nasopharyngeal carcinoma, Burkitt’s lymphoma, Hodgkin’s lymphoma, gastric carcinoma, and others (see review ). Active EBV replication due to immunosuppression may cause posttransplant lymphoproliferative disease . During latent infection, EBV expresses six nuclear antigens (EBNA-1, EBNA-2, EBNA-3A, EBNA-3B, EBNA-3C, and EBNA-LP), three latent membrane proteins (LMP-1, LMP-2A, and LMP-2B), and several noncoding transcripts (EBER-1 and EBER-2 and BARTs). Many EBV latent products are defined as oncogenes and are responsible for EBV-mediated cell transformation . Several types of EBV latency have been defined by variable expression of latent genes in a number of malignancies .
EBNA-1 is a multifunctional viral protein critical for establishing and maintaining EBV latency and for regulation of viral promoter activities . In infected cells, EBNA-1 is expressed from a spliced mRNA derived from a primary transcript of ~100 kb. This transcript originates from one of two alternative promoters, Cp or Wp, which are named by their localization in different BamHI fragments of the viral genome (Fig. 17b). At the early stage of latent infection, Wp is initially used, but EBNA-1 and EBNA-2 expressed from Wp transactivate the Cp promoter and cause a switch of transcription from Wp to Cp . Usage of the Cp promoter is associated with EBV “latency type III.” In Burkitt’s lymphoma and Burkitt’s lymphoma-derived cell lines, EBNA-1 expression is initiated from the distal Qp promoter rather than from Cp and Wp and is associated with “latency type I” . EBNA-1 is also expressed in the lytic phase from an additional Fp promoter located closely upstream to the Qp promoter .
The establishment of active EBV replication after virus reactivation from latency is dependent on the expression of two immediate-early genes, BZLF1 and BRLF1 , and encodes viral transactivators ZEBRA (BZLF1) and RTA (BRLF1). Although BRLF1 and BZLF1 are transcribed separately from a different promoter with the Rp for BRLF1 and the Zp for BZLF1, both gene transcripts utilize the same polyadenylation site for RNA polyadenylation  (Fig. 17c). Thus, the Zp promoter-derived transcript is a monocistronic ZEBRA RNA containing two constitutive introns; splicing of these two introns results in production of a 0.9-kb mRNA that encodes ZEBRA protein. Transcription from the Rp promoter leads to production of a 3.8-kb bicistronic transcript, ZEBRA/RTA, which contains two additional introns and the ZEBRA RNA. Splicing of the intron 1 in the 5’ noncoding region of ZEBRA/RTA transcript leads to production of a 2.9-kb RNA as a major RNA isoform. However, both isoforms of ZEBRA/RTA RNA have the potential to encode ZEBRA and RTA proteins. A third minor isoform of ZEBRA/RTA transcript is derived from the splicing of an additional internal intron spanning from BRLF1 ORF to BZLF1 ORF; this splicing produces a RAZ transcript of ~0.9 kb encoding a RTA-ZEBRA fusion protein, RAZ. RAZ may function as an inhibitor to ZEBRA during EBV infection .
Transcripts for EBNA-1 are believed to be expressed in all forms of EBV latent infection, except latently infected nondividing B cells having “latency type 0.” This makes detection of EBNA-1 expression a good marker for the presence of EBV in tumors. The expression of ZEBRA during lytic infection could be used to monitor productive EBV infection as well as EBV reactivation.
Kaposi Sarcoma-Associated Herpesvirus
Kaposi sarcoma-associated herpesvirus (KSHV) is the latest human herpesvirus to be discovered . After primary infection, KSHV establishes latent infection in endothelial cells as well as B cells . In healthy individuals, both primary and latent KSHV infections are generally asymptomatic. Suppression of the immune system in KSHV-positive individuals, such as in AIDS patients or tissue transplant recipients, is associated with the development of several cancers, including all forms of Kaposi sarcoma (a solid tumor of endothelial origin) or B-cell lymphomas [primary effusion lymphoma (PEL) and multicentric Castleman’s disease (MCD)] (see review ). The presence of the viral genome as well as expression of viral-encoded products in all cancer cells strongly suggests the active role of KSHV in cell transformation.
During latency, the KSHV genome expresses a latency-associated nuclear antigen-1 (LANA-1)  from ORF73. The gene ORF73 posits along with ORF72 and K13 in a larger latent locus of the virus genome. The latter two genes encode viral homologues of cellular proteins vCyclin (ORF72) and vFLICE (K13). ORF73/72/K13 are transcribed from a single promoter (P127880) as a tricistronic RNA containing an intron with two alternative 3’ splice sites. Alternative RNA splicing and alternative RNA polyadenylation of the tricistronic pre-mRNA result in production of three mature mRNAs (5.4, 3.3, and 1.7 kb)  (Fig. 18b). The 5.4-kb transcript most likely responsible for LANA-1 expression is produced by usage of the proximal 3’ splice site, whereas usage of the distal 3’ splice site leads to expression of 1.7-kb transcripts for vCyclin and vFLICE. Both transcripts are polyadenylated at the same distal polyadenylation site. The minor 3.3-kb transcript uses the proximal splice site for RNA splicing but is polyadenylated at a proximal noncanonical polyadenylation signal (Fig. 18b).
KSHV lytic replication is controlled by a major viral transactivator, ORF50 (also referred as Rta) [203, 204]. Like LANA-1, ORF50 posits along with K8 and K8.1 in a larger gene locus (ORF50/K8/K8.1 cluster) (Fig. 18c) and is expressed as an immediately early transcript during lytic virus replication. K8 encoding a viral k-bZIP protein is an early gene and K8.1 encoding a glycoprotein is a late gene. Although each of the three genes bears its own promoter, all of their RNA transcripts use a single polyadenylation site located downstream of K8.1 gene and undergo alternative RNA splicing (see review ). Thus, the 3’ portion of ORF50 transcript is homologous to K8 and K8.1 and has the same intron and exon structures as seen in the K8 and K8.1 transcripts. The ORF50 transcript is tricistronic, K8 is bicistronic, and K8.1 is monocistronic in nature. The bicistronic K8 transcript is composed of four exons separated by three introns (Fig. 18c). A functional K8α protein is expressed from a fully spliced mRNA, but retention of the intron 2 in K8β mRNA results in the expression of a minor form K8β protein . An unspliced K8 RNA, K8γ, is also detectable, but rare in lytically infected cells.
In summary, LANA-1 expression is a hallmark of KSHV latent infection. Transcripts originated from the ORF73/72/K13 gene cluster are expressed in latently infected, KSHV-transformed cells and are detectable by RT-PCR. Active virus replication is associated with the expression of viral lytic genes. Amplification of the spliced K8 region that detects the expression of both ORF50 tricistronic and K8 bicistronic transcripts could be used to monitor viral lytic replication .
The major aim of this chapter is to provide readers with knowledge of viral RNA splicing during viral infection as well as how the detection of these spliced viral RNA transcripts can be used as a new approach in diagnostic virology. In the first part, basic information about the mechanisms of RNA splicing and the methodological approaches for specific detection of these spliced RNA molecules is provided. The core of these techniques represents an amplification and detection of nucleic acids. The advantage of nucleic acid-based techniques is the application of the same platform for detection of various viral pathogens, often at the same time, by multiplexing. The rapid setup of these methods is especially important for a rapid response to emerging viruses as has been successfully proven in the case of severe acute respiratory syndrome (SARS), avian influenza, Zika virus, and Ebola virus outbreaks where nucleic acid amplification was rapidly deployed to detect and to confirm these infections. The low material requirement and their simplicity make these detection methods suitable for applications in low resources setting such as laboratories where the first contact is seen as well as field laboratories. Because the genomic sequences for many viruses can be detected by amplification of the nucleic acid molecules as a routine procedure in many diagnostic laboratories, the detection of spliced viral transcripts could be performed simultaneously using already existing methods.
The second part of this chapter summarizes the current knowledge of viral RNA splicing events for the majority of known human viruses. Some unique viral agents, such as human circoviruses and adeno-associated viruses, where a direct link between infection and pathological manifestation remains to be determined, have been included. In addition, examples of each virus, where the detection of spliced viral RNA could bring additional benefit to current techniques to improve the disease prognosis or better monitoring of efficiency of therapeutic intervention, have been provided. Systematic study of RNA splicing events during viral infection is likely to lead to better viral diagnostics and better management of viral therapy and will eventually lead to a better understanding of the pathogenesis of these human viral pathogens.
- 1.Wiedbrauk DL. Nucleic acid amplification and detection methods. In: Sprcter S, Hodinka RL, Young SA, Wiedbrauk DL, editors. Clinical virology manual. 4th ed. Washington, DC: ASM Press; 2009. p. 156–84.Google Scholar
- 26.Knoll JH, Lichter P, Bakdounes K, Eltoum IE. In situ hybridization and detection using nonisotopic probes. Curr Protoc Mol Biol 2007; Chapter 14:Unit 14.7.Google Scholar
- 142.Khan MJ, Castle PE, Lorincz AT, et al. The elevated 10-year risk of cervical precancer and cancer in women with human papillomavirus (HPV) type 16 or 18 and the possible utility of type-specific HPV testing in clinical practice. J Natl Cancer Inst. 2005;97(14):1072–9.PubMedCrossRefPubMedCentralGoogle Scholar
- 160.Spivack JG, Woods GM, Fraser NW. Identification of a novel latency-specific splice donor signal within the herpes simplex virus type 1 2.0-kilobase latency-associated transcript (LAT): translation inhibition of LAT open reading frames by the intron within the 2.0-kilobase LAT. J Virol. 1991;65(12):6800–10.PubMedPubMedCentralGoogle Scholar
- 182.Andre E, Imbert-Marcille BM, Cantarovich D, Besse B, Ferre-Aubineau V, Billaudel S. Use of reverse transcription polymerase chain reaction with colorimetric plate hybridization to detect a cytomegalovirus late spliced mRNA in polymorphonuclear leukocytes from renal transplant patients. Diagn Microbiol Infect Dis. 1999;34(4):287–91.PubMedCrossRefGoogle Scholar
- 186.Young LS, Arrand JR, Murray PG. EBV gene expression and regulation. In: Arvin A, Campadelli-Fiume G, Mocarski E, Moore PS, Roizman B, Whitley R, Yamanishi K, editors. Human herpesviruses: biology, therapy, and immunoprophylaxis. Cambridge, MA: Cambridge University Press; 2007. Chapter 27.Google Scholar
- 206.Majerciak V, Pripuzova N, McCoy JP, Gao SJ, Zheng ZM. Targeted disruption of Kaposi's sarcoma-associated herpesvirus ORF57 in the viral genome is detrimental for the expression of ORF59, K8alpha, and K8.1 and the production of infectious virus. J Virol. 2007;81(3):1062–71.PubMedCrossRefGoogle Scholar
- 215.Majerciak V, Yamanegi K, Allemand E, Kruhlak M, Krainer AR, Zheng ZM. Kaposi’s sarcoma-associated herpesvirus ORF57 functions as a viral splicing factor and promotes expression of intron-containing viral lytic genes in spliceosome-mediated RNA splicing. J Virol. 2008;82(6):2792–801.PubMedPubMedCentralCrossRefGoogle Scholar