Functional similarity analysis of human virus-encoded miRNAs
miRNAs are a class of small RNAs that regulate gene expression via RNA silencing machinery. Some viruses also encode miRNAs, contributing to the complex virus-host interactions. A better understanding of viral miRNA functions would be useful in designing new preventive strategies for treating diseases induced by viruses. To meet the challenge for how viruses module host gene expression by their encoded miRNAs, we measured the functional similarities among human viral miRNAs by using a method we reported previously. Higher order functions regulated by viral miRNAs were also identified by KEGG pathway analysis on their targets. Our study demonstrated the biological processes involved in virus-host interactions via viral miRNAs. Phylogenetic analysis suggested that viral miRNAs have distinct evolution rates compared with their corresponding genome.
KeywordsGene Ontology Semantic Similarity Functional Similarity Viral miRNAs Bart miRNAs
miRNAs, about 22 nucleotides in length, constitute a large family of non-coding RNAs that regulate gene expression posttranscriptionally, leading their target mRNAs to direct destructive cleavage or translational repression by base pairing with the 3' untranslated regions (3' UTRs). miRNA-mediated regulation plays crucial roles in a wide spectrum of biological processes, including proliferation , apoptosis , development , immune system regulation , and oncogenesis .
Recent discoveries on viral miRNAs, mostly in herpesvirus family , threw lights on a new level of cross-talk between virus and host in viral infections and pathogenesis . Viral miRNAs have been reported to participate in immune evasion by directly down-regulating host immune defence genes, and even to cooperate with viral proteins to target the same process . The combination of protein-mediated and miRNA-mediated regulations forms an intricate strategy for viruses to resist host defence system and thus increase the opportunities of their survival.
The research on viral miRNAs is still far from exhausted, with many unknown miRNA functions yet to be discovered. miRNA identification using computational tools is the most widely used method. In contrast to most eukaryotic miRNAs, virus-encoded miRNAs do not have homologs in other viral genomes or in the genome of the human host , and thus are difficult to be identified using existing miRNA gene prediction tools. Cloning and sequencing small RNA libraries to identify and characterize miRNAs is the basic method for miRNA discovery, since computationally predicted miRNAs should also be confirmed by experimental methods. Reverse ligation-mediated RT-PCR  is a widely used method in the identification of mature miRNAs and has been used to detect maturely processed MuHV-4 miRNAs . Experimental validation is still a barrier in miRNA identification, especially in host cells infected by viruses. Currently, only a small fraction of viral miRNAs has been identified, and the functions of most of these viral miRNAs remain unknown. To bridge the gap in understanding the targets regulated by these virus-encoded miRNAs, we used computational method to predict host targets of viral miRNAs and measured their functional similarities to reveal the interspecies cross-talk between virus and host by viral miRNAs.
Materials and methods
Host target gene prediction of viral miRNAs
In order to determine how viruses reshape the physiological states of human cells by their encoded miRNAs, we first predicted host genes targeted by viral miRNAs. We collected viral miRNAs encoded by BK polyomavirus (BKV), Epstein-Barr virus (EBV), human cytomegalovirus (HCMV), human immunodeficiency virus 1 (HIV1), human herpesvirus 1 (HSV1), human herpesvirus 2 (HSV2), and Kaposi's sarcoma-associated herpesvirus (KSHV). Viral miRNA sequences were retrieved from miRBase  release 16 (Sep 2010). We extracted 3' UTR sequences in a single FASTA format file from human genome (version 18) that was downloaded from UCSC .
Host target genes of virus-encoded miRNAs were predicted by the algorithm of Probability of Interaction by Target Accessibility (PITA) that computes the difference between the free energy gained from the formation of the miRNA-mRNA duplex and the energetic cost of unpairing the mRNA to make it accessible to the miRNA . We chose PITA for viral miRNA target prediction because it had been demonstrated to reach high accuracy, and more importantly, it takes advantage of the target accessibility but not conservation information to reduce false positive. Conservation information, which was used by most of other methods, is not suitable for predicting target genes of the less evolutionarily conserved viral miRNAs . We used a flank of 3 upstream and 15 downstream nucleotides when performing prediction, since miRNA-mRNA interaction requires unpairing of bases flanking the targets. To reduce false positive, the prediction results were narrowed down by the criteria of 7-8 bases seed length, with no G:U wobble or loops, no mismatch, and ∆∆G < -20 kcal/mol.
Functional similarity measurement of viral miRNAs
We have previously proposed a method for systematic study of functional similarities among miRNAs by using their target genes Gene Ontology (GO) semantic similarities . As described in our previous study, the functional similarity of human miRNAs, obtained by our method, showed positive correlation with expression similarity, and the clustering results derived from the functional similarity were coherent with biological knowledge in many aspects including disease association, genome conservation, and the cross-talk between hosts and viruses . The method is reliable to calculate functional similarities and sensible to cluster miRNAs, and thus can be used to predict novel miRNA functions.
Here, we applied our method to measure functional similarities among viral miRNAs. As suggested in our previous study , the measurement was fundamentally based on host target genes of viral miRNAs. Biological process ontology was used to annotate target genes, and Wang's method  was used to calculate semantic similarity. Semantic similarity calculation was implemented by our in-house developed R package GOSemSim .
Similarity scores were then analyzed by R package pvclust , which used multi-scale bootstrap re-sampling to evaluate the uncertainty of cluster analysis. The agglomerative method, average linkage, was used, and 10,000 bootstrap replications were run. All clusters were extracted with approximately unbiased (AU) p-value > 0.95, meaning that the hypothesis with "the cluster does not exist" is rejected with significance level of 0.05.
GO enrichment analysis of significant clusters
The common biological processes regulated by these significant miRNA clusters were evaluated by GOstats  with p < 0.001. GOstats using hypergeometric model to assess whether the number of selected genes associated with the GO term is larger than expected. This method had been used to predict the functions of miRNAs  and can be used to provide biological insights of viral strategies.
KEGG enrichment analysis of genes targeted by viral miRNAs
In order to uncover higher order functions of how viruses transform cellular states by their encoded miRNAs, we adopted KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analysis to identify pathways regulated by viral miRNAs to provide biological insights. KEGG pathway is a collection of manually drawn pathway maps representing molecular interactions and reaction networks, and has been widely used for biological interpretation of higher level systemic functions . KEGG enrichment analysis is calculated by R package SubpathwayMiner , which implements hypergeometric test to measure p-value for evaluating enrichment significance of pathways. SubpathwayMiner also provides the FDR-corrected q-values to reduce the false positive discovery rate .
Comparing viral miRNA regulated pathways
Significant KEGG pathways regulated by different viruses were compared and visualized using our in-house developed R package clusterProfiler http://bioconductor.org/packages/2.8/bioc/html/clusterProfiler.html. ClusterProfiler, which was implemented based on R and its plotting system ggplot2 , is released under the Artistic-2.0 license within Bioconductor project . ClusterProfiler was designed to provide statistical analysis of GO and KEGG and visualization tools for comparing functional profiles among gene clusters. More details on the use of clusterProfiler are available in the package vignette.
We built phylogenetic trees of human viruses based on the functions their miRNAs encoded. Phylogenetic trees were constructed by R package phangorn  using the popular neighbour-joining (NJ) method. For validating our phylogenetic analysis, we compared our results with phylogenetic trees obtained from whole genome sequence alignment. Complete genome sequences of viruses were obtained from NCBI nucleotide database. Multiple sequence alignment and phylogenetic tree construction were done by ClustalX (version 2.0.12)  using NJ algorithm. Robinson-Foulds (RF) metric , the most widely used method in comparing phylogenetic trees, was adopted to compute the topological distance between phylogenetic trees. RF rate was obtained by normalizing the RF distances by the number of total edges for representing the relationship among trees . RF rate measures the dissimilarity between two trees.
Results and Discussion
GO enrichment analysis was performed across these three significant clusters to discover their biological themes. As a result, Cluster 1 suggests the down-regulation of xylosyltransferase activity, involved in O-glycan processing. O-glycans had been described to play roles in cell polarity , which involves in the formation of immunological synapse , indicating that viruses prevent the formation of immunological synapse by inhibiting the xylosyltransferase activity. Cluster 2 represses a wide range of binding activities, including protein binding, DNA binding, receptor binding, and enzyme binding. Especially, the inhibition of MHC protein binding and CD40 receptor binding suggests that viruses use miRNAs to interfere the activation of antigen presenting cells. This may be the strategy for viruses to extend the life of the infected cells and to establish a favourable environment for their replication. Cluster 3 down-regulates transcription factor activity to favour viral latency. EBV BART miRNAs were expressed in latent infection . Hsv1-miR-H6-3p had been reported to promote latency by inhibiting the expression of HSV-1-encoded transcription factor, ICP4, that is required for the expression of most HSV-1 genes during productive infection [30, 31]. It has been reported that viruses encode proteins to interfere with transcription factors, and that miRNAs are more versatile to reshape the cellular status to escape host immune system and to hijack cellular machinery for their replication [32, 33].
The average similarity among 29 viral miRNAs is only 0.434 and most of the miRNAs cannot be clustered with AU p-value > 0.95, indicating that a majority of these viral miRNAs have distinct functions, with the versatilities and flexibilities of viral regulations.
As shown in Figure 3, different viruses have distinct strategies to reshape cellular status. It seems that viral miRNAs were designed to against many important pathways to favour their pathogenesis. KSHV-encoded miRNAs had been described to directly down regulate a major regulator of cell adhesion, THBS1 , that is involved in the recruitment of monocytes and T cells to the sites of infection . Down regulation of THBS1 by KSHV miRNAs may aid KSHV-infected cells in avoiding detection by the host immune system . HIV1-encoded miRNAs play critical roles in oncogenic transformation , and three miRNAs encoded by EBV are crucial for efficient B cell transformation . These biological findings are consistent with our analyses. In addition, many pathways in our analyses have not been reported yet, and thus can serve as putative functions played by viral miRNAs for further investigations.
We evaluated the similarity between these two trees. The topological distance between them was calculated by RF metric to be 8, and the corresponding RF rate is 0.727, and thus the similarity between the two trees is 0.273. Surprisingly, viral miRNAs have distinct evolution rates compared to their corresponding genome based on our functional analysis. We thus measured the evolutionary distance among viruses by their encoded miRNA sequences. RF distance between phylogenetic trees obtained from genome sequences and miRNA sequences is 6, and the corresponding RF rate is 0.545, and thus the similarity between the two trees is 0.455.
Viral miRNAs have different properties compared with viral proteins, such as small and non-immunogenic, and thus they may serve as ideal tools to interpolate cellular environment in the ways that benefit virus replication. This would mean an evolutionary reward for rapid adjustment to the host and environmental statuses. Viral miRNAs do not share a high level of homology even within the members of the same family [6, 39]. Phylogenetic analysis of all previously known virus miRNA genes showed that most of the known viral miRNAs have long distant relationships and could be classified into specific miRNA families . These findings are consistent with our phylogenetic analysis, suggesting that viral miRNAs may evolve more rapidly than their genome. Especially, the functions of viral miRNAs evolve even more rapidly than their sequences.
Obviously, miRNAs are ideal for the tight space constraints characteristic of viral genomes and the evolution of a miRNA down-regulating a new target gene can presumably be achieved more easily than the evolution of a new protein . It must be pointed out, however, our current method only provides a perception of viral miRNA perspective and may contain some biases, as it did not consider the fact that the activation of viral miRNAs depends on the viral life cycle in various latent or at lytic stages, and the specific infected cell types.
Intimately connected with various kinds of diseases, viruses pose a crucial health problem on host. Host cellular expression profiles altered by virus-encoded miRNAs form a new regulatory layer. Though studies into pathogenesis by viral miRNAs are still in its infancy, the interspecies regulation at the miRNA level fuels the spark of the investigation into the repertoire of virus-host interactions. Here, we applied our method to assess the functional similarity among viral miRNAs. Our analyses showed that viral miRNAs have diverse functions. We then summarized cellular pathways regulated by viral miRNAs by the GO and KEGG enrichment analyses. Phylogenetic trees were reconstructed to reveal the evolutionary distance at the perspective of viral miRNAs.
Experimental validation of computational results is still a challenge, a hindrance towards understanding the functions of viral miRNAs. We believe that the integration of bioinformatics with microarray and proteomic data would be a promising way to elucidate the whole picture of virus-host interaction mediated by viral miRNAs. In addition, the identification of roles played by viral miRNAs in pathogenesis would help in designing new preventive and therapeutic approaches. This has also been described as new therapeutics to correct the aberrant activity of miRNA-mRNA interaction by using anti-miRNA oligonucleotides (AMOs) . We hope that this work can provide a better understanding of basic biological processes involved in latency and oncogenic transformation mediated by viral miRNAs.
Acknowledgements and Funding
This work was partially supported by the 2007 Chang-Jiang Scholars Program, "211" Projects, National "973" Projects of China (2011CB910700), National Natural Science Foundation of China (20871057), Guangdong Natural Science Research Grant (32209003), and the Fundamental Research Funds for the Central Universities (21611303 to G Yu and 11610101 to QY He).
- 6.Pfeffer S, Sewer A, Lagos-Quintana M, Sheridan R, Sander C, Grasser FA, van Dyk LF, Ho CK, Shuman S, Chien M, Russo JJ, Ju J, Randall G, Lindenbach BD, Rice CM, Simon V, Ho DD, Zavolan M, Tuschl T: Identification of microRNAs of the herpesvirus family. Nat Meth. 2005, 2: 269-276. 10.1038/nmeth746.CrossRefGoogle Scholar
- 8.Stern-Ginossar N, Elefant N, Zimmermann A, Wolf DG, Saleh N, Biton M, Horwitz E, Prokocimer Z, Prichard M, Hahn G, Goldman-Wohl D, Greenfield C, Yagel S, Hengel H, Altuvia Y, Margalit H, Mandelboim O: Host Immune System Gene Targeting by a Viral miRNA. Science. 2007, 317: 376-381. 10.1126/science.1140956.PubMedCentralCrossRefPubMedGoogle Scholar
- 12.Pheasant M, Meyer L, Hsu F, Hinrichs AS, Harte RA, Giardine B, Fujita P, Diekhans M, Dreszer T, Clawson H, Kuhn RM, Barber GP, Haussler D, Kent WJ, Karolchik D, Zweig AS, Wang T, Smith KE, Rosenbloom KR, Rhead B, Raney BJ, Pohl A: The UCSC Genome Browser Database: update 2009. Nucleic Acids Res. 2009, 37: D755-761. 10.1093/nar/gkn875.PubMedCentralCrossRefPubMedGoogle Scholar
- 22.Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMedCentralCrossRefPubMedGoogle Scholar
- 28.Ludford-Menting MJ, Crimeen-Irwin B, Oliaro J, Pasam A, Williamson D, Pedersen N, Guillaumot P, Christansen D, Manie S, Gaus K, Russell SM: The Reorientation of T-Cell Polarity and Inhibition of Immunological Synapse Formation by CD46 Involves Its Recruitment to Lipid Rafts. Journal of Lipids. 2011, 2011: 1-10.CrossRefGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.