Evolutionary maintenance of filovirus-like genes in bat genomes
- 5.7k Downloads
Little is known of the biological significance and evolutionary maintenance of integrated non-retroviral RNA virus genes in eukaryotic host genomes. Here, we isolated novel filovirus-like genes from bat genomes and tested for evolutionary maintenance. We also estimated the age of filovirus VP35-like gene integrations and tested the phylogenetic hypotheses that there is a eutherian mammal clade and a marsupial/ebolavirus/Marburgvirus dichotomy for filoviruses.
We detected homologous copies of VP35-like and NP-like gene integrations in both Old World and New World species of Myotis (bats). We also detected previously unknown VP35-like genes in rodents that are positionally homologous. Comprehensive phylogenetic estimates for filovirus NP-like and VP35-like loci support two main clades with a marsupial and a rodent grouping within the ebolavirus/Lloviu virus/Marburgvirus clade. The concordance of VP35-like, NP-like and mitochondrial gene trees with the expected species tree supports the notion that the copies we examined are orthologs that predate the global spread and radiation of the genus Myotis. Parametric simulations were consistent with selective maintenance for the open reading frame (ORF) of VP35-like genes in Myotis. The ORF of the filovirus-like VP35 gene has been maintained in bat genomes for an estimated 13. 4 MY. ORFs were disrupted for the NP-like genes in Myotis. Likelihood ratio tests revealed that a model that accommodates positive selection is a significantly better fit to the data than a model that does not allow for positive selection for VP35-like sequences. Moreover, site-by-site analysis of selection using two methods indicated at least 25 sites in the VP35-like alignment are under positive selection in Myotis.
Our results indicate that filovirus-like elements have significance beyond genomic imprints of prior infection. That is, there appears to be, or have been, functionally maintained copies of such genes in mammals. "Living fossils" of filoviruses appear to be selectively maintained in a diverse mammalian genus (Myotis).
KeywordsGenome Assembly Tammar Wallaby Marburgvirus Positional Homology Myotis Lucifugus
While genomic transfers from retroviruses to eukaryotic hosts are well known and expected, transfers from non-retroviral RNA viruses to eukaryotes are unexpected . Non-retroviral RNA viruses lack the coding for reverse transcriptase and the integration machinery needed for successful transfer to DNA genomes. However, several recent studies have provided evidence for widespread viral transfer to fungi[2, 3, 4], animals [3, 5, 6, 7, 8, 9, 10], and plants . These transfers have been termed NIRVs (non-retroviral integrated RNA viruses) because the integrated elements differ from endogenous viruses in their requirement for co-option of integration machinery and in their normally subgenic architecture [3, 4, 10, 11]. A NIRV is a subclass of paleovirus or endogenous viral element (EVE,). Unlike other paleoviruses, there is no evidence that a NIRV has ever coded for an active endogenous viral genome or is capable of copying itself as with endogenous retroviruses. Several studies have implicated and identified the signatures of retrotransposon activity in association with NIRV formation [3, 4, 6, 7, 10].
Because NIRVs are a form of "fossil" viral element, their recognition permits, for the first time, study of the deeper evolution of non-retroviral RNA viruses and an understanding of the timescale of genomic interactions. The ages of NIRVs have turned out to be much older than those age estimates from molecular clocks based on nucleotide substitution rates. Horie et al. , for example, assigned a date of >40 My for the integration of bornaviruses into primate genomes based on phylogenetic clades of NIRVs from genome assemblies. The determination of orthology can be complicated by gene duplications and horizontal transfers of NIRVs among hosts . Homology is unambiguous when the dated NIRVs are monophyletic and share integration locations (i.e., synteny or positional homology). Taylor et al.  provided a minimum date of about 10 million years for filovirus-like NP gene NIRVs based on the shared integration location and monophyly in rodents that have dated fossil records. Still, the timescale of host interactions for nonretroviral RNA viruses (including those viruses with NIRVs) remains poorly studied.
The biological significance of NIRVs beyond that of a viral fossil record remains controversial. Have NIRVs been co-opted by eukaryotic hosts for a novel or an antiviral function? Thus far, evolutionary evidence for maintenance of NIRVs among host species has been ambiguous. Although the vast majority of known NIRVs have disrupted open reading frames (ORFs), extended ORFs and RNA transcripts of NIRVs have been identified in mosquitoes , yeast , primates [5, 7] and plants . But these cases of NIRV RNA expression involved only one or two species per group and, with the exception of the Arabidopsis IAA-leucine-resistant protein 2 (ILR2;[3, 12]), could be the result of transcriptional noise. Evidence of significant selective maintenance among species in the yeast-totivirus NIRV is complicated by the dearth of known orthologs . There is some evidence for evolutionary maintenance in the mammal-filovirus-like NP NIRV system [4, 10] and in the primate-Borna virus systems . But in each case, a substantial proportion of the species in the analyses of NIRV maintenance had disrupted ORFs. As such, the existing evolutionary analyses of NIRVs in mammals indicate more a slowed rate of erosion in codon structure than selection for maintenance of an ORF . Presently, it is difficult to determine when or if selective maintenance has occurred in known NIRVs because the comparisons normally involve sparse taxonomic sampling and distantly related host genome assemblies. An improved test of the selective maintenance of NIRVs would involve a comparison of regions that lack ORF disruptions and have demonstrated positional homology within a closely related group of animals such as a genus.
Filovirus-like NIRVs in mammals are candidates for a more detailed study of selective maintenance. Taylor et al.  isolated copies of the NP-like gene from specimens of the big brown bat (Eptesicus fuscus), the little brown bat (Myotis lucifugus) and the tammar wallaby (Macropus eugenii), but each of these NIRVs appeared to be pseudogenized. Belyi et al.  detected an extended open reading frame of a VP35-like gene from a BLAST query of the NCBI genome assembly of the little brown bat (Myotis lucifugus). As filoviral VP35 has been shown to interfere with host defences, the acquisition of a mammalian NIRV has been proposed as a putative co-option to interfere with viral infection . The presence of an open reading frame of a NIRV in a single genome assembly could be associated with function or merely indicate a recent integration where there has been insufficient time to accrue ORF disruptions. Pseudogenization for bats could also be a slow process compared to similar-sized mammals, requiring an estimated 2.02 My of neutral evolution just to reach a 50% probability of ORF disruption . Several mammals including mouse and rat have been identified as possessing the filovirus NP-like NIRVs , but only the tarsier, tammar wallaby, and little brown bat have been identified as possessing the VP35-like NIRVs . None of the BLAST-matched VP35 NIRVs have been verified by independent DNA sequencing. One limitation of the BLAST approach for identifying NIRVs is the tendency for underestimation of NIRVs when only known viral genes are used as queries. When clades of NIRVs are detected that are divergent from known viruses, their membership might be underrepresented. One way around this problem is to carry out secondary BLAST searches with the divergent NIRVs as queries . More NIRV sequences from VP35 would be particularly important in testing the hypothesis  that known filoviruses form two divergent clades (Marburgviruses, ebolaviruses with marsupial NIRVs) and (placental mammal NIRVs with an unidentified filovirus clade). It is presently unclear if the VP35-like NIRVs form the same phylogenetic associations with known filoviruses [5, 15] as found with the other filovirus-like NIRVs.
Here we test for the evolutionary maintenance of filovirus VP35-like and NP-like NIRVs in the bat genus Myotis. Myotis is a diverse genus of mammals (> 100 species) and is thought to have radiated from Asia to every continent save Antarctica over the past 13.4 My [16, 17]. There are several congruent studies of both nuclear and mitochondrial genes [16, 17, 18] that find three main geographic clades of Myotis (North America, South America and Old World). We test positional homology by PCR amplifying between the VP35 NIRV and a neighboring gene. We also compare the evolution of NP and VP35-like NIRVs in Myotis. The results provide evidence that filovirus-like integrations are more widespread in mammals than previously thought and that these transferred genes have been exposed to positive selection and selection for open reading frames in bats.
Results and Discussion
tBLASTn searches of the WGS database using the filovirus-like VP35 NIRV of Myotis lucifugus (i.e., the ORF of the genome project: AAPE02000262.1, 88641-89459) as a query sequence, yielded three previously unknown mammal species matches with expect values <10-5 (Additional file 1, Table S1). The Chinese hamster (Cricetulus griseus), mouse and rat had significant matches using this bat query, with hamster having the best match for these rodents. The strongest match in rats (mapped to chromosome 9) had a highly significant BLAST match to the mouse genome (CAAA01163972.1; 1e-34; Chromosome 1). We note that the known marsupial VP35-like NIRV did not appear as a match when the Myotis lucifugus sequence was used as a query, presumably a result of a BLAST analysis where subjects are from divergent clades. When the VP35 of Marburgvirus was used as a query, different contigs of the Chinese hamster (Cricetulus griseus) had the best matching sequences in the WGS. Likewise, when NP of Marburgvirus was used as a query, the Chinese hamster again had the best match. We also detected a new NP-like NIRV from the naked mole rat (that is, in addition to those detected by Taylor et al. ).
The determination of orthology and positional homology permits estimation of minimum ages of viral-host associations. For Myotis, the VP35-like gene is estimated to have had an ORF for at least 13.4 My (11-18 My with error bars). The timescale estimates are based on the dating of the divergence of Myotis with multigenic molecular analyses and at least two internal fossil calibrations [16, 17]. The insertion of the NP NIRV predates the common ancestor of Eptesicus and Myotis, which has been estimated at 25 My (19-30 My range) using multiple fossil calibrations and loci . The proposed minimum age of the Rattus/Mus common ancestor based on dating of the oldest fossil record of the genus Progonomys (the presumed genus of the Rattus/Mus common ancestor) is 12.3 My . Although knowledge of the timescale of mammalian radiations is in a state of flux, it is clear that NP-like and VP35-like genes are ancient and independently integrated in rodents and in bats.
The results indicate that the largest species radiation in mammals could be associated with the maintenance of a "living" fossil copy of the VP35-like gene co-opted from filoviruses. The results reveal a rare example of non-retroviral viral genes that have been successfully co-opted by mammals.
Nucleic Acid Extractions
Total nucleic acids were extracted from wing punches (Rabies Laboratory, New York State Health Dept.), or from preserved tissue using the Qiagen blood and tissues nucleic acids extraction kit (Qiagen) or the DNA Quickextract solution (Epicentre®). Thus, no live bats were harmed by the extraction of nucleic acids during this study. Sample information with voucher numbers from the Field Museum of Natural History and the American Museum of Natural History are provided in Additional file 8, Table S3.
PCR, RTPCR, and DNA Sequencing
50 μl PCR reactions were assembled according to the protocol for Takara Primestar HS DNA polymerase with 5 μL of extracted DNA template. Primers for sequencing and PCR were: LINE1 to VP35-like intergenic region (5'-GCCTCCTAAAATGAGTTTGTGAGTGTTCCCTGGTC-3'; 5'-GAGTGGATGTTGCAGGTCCTGACATTACAGGC-3' with an amplicon size of 2365 bp in Myotis lucifugus); VP35-like region (5- CTTCTGTCTACGTCTTCTAAGGTTAATC -3; 5- CCCGAGGCTTCCTTCAGGAGTTAG -3; with an amplicon size of 660 bp in Myotis lucifugus). A third primer combination was used to fill in gaps in sequences (5'- CTCGTCAGATCAGCATGTCCCTGGAGC -3' and 5'-GAGTGGATGTTGCAGGTCCTGACATTACAGGC-3'). We used the primers of Taylor et al.  for the NP-like region and the universal primers of Folmer et al.  for the mitochondrial gene. For the NP region, the new genome assembly appears to have introduced a deletion (AAPE02007767) for Myotis lucifugus. We used the latest assembly in the present paper. The PCR temperature profiles were: 10 cycles of 94°C for 30 s, 59°C for 30 s and 72°C for 2 min, with a touchdown to an annealing temperature of 48°C over 30 additional cycles and a final extension at 72°C for 5 min. A constant annealing temperature of 45°C was used for the mitochondrial primers. PCR products were purified and sequenced by the University of Washington High Throughput Genomics Facility. Geneious 4.8 was used to assemble and edit electrophoregrams. For RT-PCR, total nucleic acids were extracted from a frozen specimen of Myotis lucifugus. The internal organs were ground in liquid nitrogen and a subsample was exposed to extraction. RNA templates were exposed to DNAse. The Qiagen One step RT-PCR kit was used with VP35-like primers and primers for the actin gene (5'-ACAGGTCCTTACGGATGTCG-3'; 5'-TATACGCTTCTGGCCGTACC-3') specific to Myotis lucifugus. New sequences from this study have the following Genbank accession numbers: JN847695-JN847723.
We searched for sequence similarity to filoviruses using protein sequences based on the VP35 and the NP regions of Marburgvirus (NC_001608.3) as a query with tBLASTn in the WGS database of NCBI. Additional searches (tBLASTn) in each of the available NCBI databases used the sequence of Myotis lucifugus as a query. Nonviral subject sequences with expect values of E<10-5 and matches greater than 100 amino acid residues were retained for phylogenetic analyses. VP35 sequences from available filoviruses that differed at the AA level were also added to the alignment. Taxonomy of filoviruses followed Kuhn et al. . Filovirus-like Bat sequences from Taylor et al. were added to the NP-like analysis from the present study. Mitochondrial sequences available at NCBI for the genus Myotis were retained for the COI gene alignment.
For genome assembly sequences, the sequence boundaries and translations identified by tBLASTn were used to retrieve nucleotide sequences and assemble amino acid sequences. MAFFT  was used to align the protein sequences for the VP35 and NP analyses using the JTT100 model. Other alignments were unambiguous, requiring no or few indels.
Phylogenetic estimates were obtained with a maximum likelihood optimality criterion in PhyML 3.0 . Models were chosen according to the best available optimal model from Modeltest  or Prottest  (ML). Reliability was assessed by approximate likelihood ratio tests (aLRT: SH like tests) and/or posterior probabilities. For PhyML, SPR search algorithms were used with five random starting trees.
Tests of selection were carried out using both gene-wide Bayesian methods  and site-specific tests. Gene-wide significance was assessed by comparing the fit of a codon-based substitution model that permits sites with positive selection (M8;) to the fit of a null model that does not allow for positive selection (M8a; ). This comparison was carried out with likelihood ratio tests (LRTs) where DF = 1 . The test statistic for the LRT is calculated as twice the difference between the likelihood scores of the null model and the alternative model. A Chi- square table is used to obtain the significance. Unlike standard methods based on overall estimates of Ka /Ks, codon-based models can account for among-site variation in Ka /Ks by assigning sites to discrete rate categories. Significant selection at individual sites in the alignment was assessed by two methods: by confidence intervals around the Ka /Ks estimates in Selecton and by the REL method  implemented in HyPhy [39, 40]. Sites with a CI lower bound of Ka /Ks that exceeds 1 in Selecton were considered to be under positive selection. Sites with a Bayes factor of > 50 were considered as reliably under selection in HyPhy. As Selecton requires continuous ORFs, disrupted codons were replaced with gaps. For the Bayesian estimate of Ka /Ks, an ML tree was input after estimation with PhyML. Selection for ORF maintenance was estimated using a parametric simulation approach modified from Katzourakis and Gifford . A centre of tree (COT) sequence was estimated using DIVA [41, 42]. The COT sequence, which had an open reading frame, was used as a starting sequence for simulated neutral evolution with substitution and branch length parameters input from the observed data. Seq-gen was used to carry out 1000 evolutionary simulations from the COT sequence . The simulated alignments were translated and visualized in Geneious and the number of ORF disruptions per alignment was tallied. A histogram of the ORF disruptions per simulated alignment was created in PASW statistics 18. The probability that an alignment would have complete ORFs by chance was determined from the frequency of alignments with complete ORFs in the parametric simulation.
The orthology of filovirus-like VP35 genes in rat and mouse was assessed by genomic BLAST searches and visualized on the NCBI chromosome maps. We used the Cinteny server  and Roundup database  for whole chromosome comparisons of larger orthologous blocks.
We thank the Field Museum of Natural History, Texas Tech University Museum and the American Museum of Natural History for preserved samples of bats. We also thank Robert J. Rudd and Patrick Fitzgerald (Rabies Laboratory, New York State Health Dept.) for fresh specimens and wing punches of bats and Liliana M. Dávalos (SUNY Stonybrook) for DNA extractions. This paper was supported by a National Science Foundation grant (DEB 1050793) awarded to K.D.
- 15.Barrette RW, Xu L, Rowland JM, McIntosh MT: Current perspectives on the phylogeny of Filoviridae. Infect Genet Evol. 2011Google Scholar
- 26.Kuhn JH, Becker S, Ebihara H, Geisbert TW, Johnson KM, Kawaoka Y, Lipkin WI, Negredo AI, Netesov SV, Nichol ST, et al: Proposal for a revised taxonomy of the family Filoviridae: classification, names of taxa and viruses, and virus abbreviations. Arch Virol. 2010, 155 (12): 2083-2103.CrossRefPubMedPubMedCentralGoogle Scholar
- 33.Posada D: ModelTest Server: a web-based tool for the statistical selection of models of nucleotide substitution online. Nucleic Acids Res. 2006, W700-703. 34 Web ServerGoogle Scholar
- 35.Stern A, Doron-Faigenboim A, Erez E, Martz E, Bacharach E, Pupko T: Selecton 2007: advanced models for detecting positive and purifying selection using a Bayesian inference approach. Nucleic Acids Res. 2007, W506-511. 35 Web ServerGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.