Filoviruses are ancient and integrated into mammalian genomes
- 18k Downloads
Hemorrhagic diseases from Ebolavirus and Marburgvirus (Filoviridae) infections can be dangerous to humans because of high fatality rates and a lack of effective treatments or vaccine. Although there is evidence that wild mammals are infected by filoviruses, the biology of host-filovirus systems is notoriously poorly understood. Specifically, identifying potential reservoir species with the expected long-term coevolutionary history of filovirus infections has been intractable. Integrated elements of filoviruses could indicate a coevolutionary history with a mammalian reservoir, but integration of nonretroviral RNA viruses is thought to be nonexistent or rare for mammalian viruses (such as filoviruses) that lack reverse transcriptase and replication inside the nucleus. Here, we provide direct evidence of integrated filovirus-like elements in mammalian genomes by sequencing across host-virus gene boundaries and carrying out phylogenetic analyses. Further we test for an association between candidate reservoir status and the integration of filoviral elements and assess the previous age estimate for filoviruses of less than 10,000 years.
Phylogenetic and sequencing evidence from gene boundaries was consistent with integration of filoviruses in mammalian genomes. We detected integrated filovirus-like elements in the genomes of bats, rodents, shrews, tenrecs and marsupials. Moreover, some filovirus-like elements were transcribed and the detected mammalian elements were homologous to a fragment of the filovirus genome whose expression is known to interfere with the assembly of Ebolavirus. The phylogenetic evidence strongly indicated that the direction of transfer was from virus to mammal. Eutherians other than bats, rodents, and insectivores (i.e., the candidate reservoir taxa for filoviruses) were significantly underrepresented in the taxa with detected integrated filovirus-like elements. The existence of orthologous filovirus-like elements shared among mammalian genera whose divergence dates have been estimated suggests that filoviruses are at least tens of millions of years old.
Our findings indicate that filovirus infections have been recorded as paleoviral elements in the genomes of small mammals despite extranuclear replication and a requirement for cooption of reverse transcriptase. Our results show that the mammal-filovirus association is ancient and has resulted in candidates for functional gene products (RNA or protein).
KeywordsTammar Wallaby Blast Match Monodelphis Domestica Marburgvirus Rinderpest Virus
The ongoing threat of emerging hemorrhagic diseases has made the search for reservoir species with a history of coevolution with filoviruses a priority [1, 2]. Outbreaks of filovirus infections are known from Africa and the Phillipines [3, 4, 5] and, in some cases, the mortality of primates is so severe as to raise concerns of extinction . Bats are considered a candidate for a reservoir based on the detection of filovirus-specific RNA, antibodies, and viral particles [1, 6, 7, 8, 9, 10]. Still, the average seroprevalence in tested bats is much smaller than expected (usually < 5%) for large colonies of a main reservoir , and the ability of bats to maintain a persistent hypovirulent infection is unknown. Rodents and insectivores (shrews) have further been proposed as the leading candidates for filovirus reservoirs by modeling, the detection of filovirus RNA, and in one specimen, the potential detection of a DNA copy [2, 11]. Rodents (mice and guinea pigs) share one expected feature of coevolution -- asymptomatic infections from wild-type filoviruses . However, a reservoir role for rodents and shrews has been questioned because only one study has detected filovirus RNA fragments in these small mammals, and many more outbreaks than observed are expected from a rodent reservoir that is commensal with humans . Moreover, no live viruses, filovirus particles or antibodies to filoviruses have been found in rodents or shrews. Distinguishing principal reservoir species from "spillover" infections remains a challenge.
There are now several cases in eukaryotes where non-retroviral integrated RNA viruses (NIRVs) have been detected [17, 18]. Still, this type of transfer is believed to be extremely rare in mammals [17, 19] because the process requires the cooption of reverse transcriptase and perhaps replication within the nucleus. The sole mammalian example is bornavirus, which is unique among RNA viruses of animals in developing persistent infections within the nucleus. The study of NIRVs requires an evolutionary approach where the direction of transfer is tested. Evolutionary comparisons among NIRVs have been carried out for the Totiviridae in yeast , and the Bornaviridae in mammals . In the Totivirus system there strong support for the direction of transfer from virus to fungus, and a role for the expression of NIRVs in viral interference has been proposed . We proposed that NIRVs are more common than presently known and might be detected in other systems with persistent infections of non-retroviral RNA viruses. As part of a search for NIRVs in NCBI databases we found strong BLAST matches of NP sequences from filoviruses to translated genomic sequences from small mammals. We aimed to test if these sequence similarities might indicate NIRVs of filoviruses.
Results and Discussion
We tested for integrated DNA based copies of the filovirus-like sequences in the two mammals with the most copies, the tammar wallaby and the little brown bat. We designed PCR primers from mammalian genomic sequence flanking the longer BLAST matches and carried out PCR amplification of DNA extractions from different specimens than used for existing genome projects. Our sequence of the tammar wallaby had only a single transition difference from the genome project sequence. The sequence of the little brown bat from Minnesota (FMNH 172384) had a similarity of 96% with four indels compared to contig (AAPE01196249) from the existing genome. To test for the presence of a filovirus-like DNA sequence in an additional insectivorous bat, we extracted DNA from a specimen of big brown bat (Eptesicus fuscus). Using primers designed from the little brown bat, we again obtained PCR product and sequence. In this case, the identity between the sequences of the two genera of bats was 87% with 11 indels. In each case the similarity of the new sequences obtained from DNA to genomic sequence is consistent with an integrated filovirus-like DNA copy in these mammalian genomes.
The observation that most of the mammalian sequences have ORF disruptions and possess only truncated NP-like genes (Fig. 1) is also inconsistent with a transfer from mammals to virus. Only Monodelphis has more than one different filovirus-like gene (Additional file 3: Fig. S3) and these (the NP and L protein-like sequences) are on separate chromosomes. The apparent genic bias of NIRVs for the NP gene could have a biological explanation. Because of the transcription gradient in the Mononegavirales, the most common primary transcript is NP . We also note that experimental expression of an N-terminal portion of the Ebolavirus NP gene (from residue 1-450 in wildtype NP) that is positionally homologous to the region of NP spanned by mammalian NIRVs (from residue 18-405 in wildtype NP, NP_066243) is sufficient to inhibit the formation of Ebolavirus minigenomes in a dosage specific fashion . A background transcription bias could account for overrepresentation in NIRVs of NP, but such a bias fails to explain the N-terminal bias within the NIRVs of NP. The bias is consistent with the experimental filoviral interference mechanism involving the N-terminal of NP.
The eutherian orders with NIRVs of filoviruses closely match the proposed candidate reservoir groups of bats, rodents, and insectivores [1, 2] (Fig. 6). This pattern is not a sampling artifact that we can attribute to the available genome assemblies. Seven of the ten genomes (including the Big Brown bat) sampled from predicted reservoir orders had integrated filoviruses, while only 1 of 27 from non-candidate eutherian orders had detected integrated filovirus-like elements (Fisher's exact test, two-tailed p value = 0.00003). The sole eutherian species from a non-candidate group to have a potential NIRV was the pygmy hedgehog tenrec, which is the Afrotherian small insectivore analog on the island of Madagascar. The three assemblies of genomes from candidate orders that lacked apparent NIRVs were the ground squirrel (Spermophilus tridecemlineatus), the European hedgehog (Erinaceus europaeus) and the fruit bat (Pteropus vampyrus). At present it is unclear why some small mammal groups (bats, rodents, insectivores and marsupials) appear to have an association with filoviruses. Still, the study of filovirus-like NIRVs could have predictive value for identifying filovirus reservoirs, ancestral proteins, outbreak modeling, undetected lineages of filoviruses and virulence in mammalian species. For example, the close relationship of South American and expressed Australian marsupial filovirus-like NIRVs with rapidly evolving African filoviruses now makes it more likely that the New World harbors undetected filoviruses or has acted as a source region for extant filoviruses.
Our findings indicate that filovirus infections are recorded as paleoviral elements in the genomes of small mammals. These elements are candidates for functional gene products (RNA or protein). The integration is unexpected because filoviruses lack reverse transcriptase and the ability to replicate within the nucleus. Our results indicate that the association of mammals with filoviruses is likely tens of millions of years older than previously thought.
Nucleic Acid Extractions
DNA was extracted from freshly collected wallaby fur, toe clips of a Big Brown Bat, and DMSO preserved tissue from a little brown bat using the DNA Quickextract kit (Epicentre Technologies) modified to have a two hour incubation step at 65°C.
PCR, RTPCR, and DNA Sequencing
50 μl PCR reactions contained 5 μL of extracted DNA template, 25 μL of 2× GoTaq PCR reagent mix (Promega) each primer. Primers for sequencing and PCR were: 5'-GCCTTGTCGACGTTCATCCTGTG-3' and 5'-GAGCCATTGGTTGCTCGGAAGC3- for Myotis; 5'-GGAGACCTCGAGCAAATGGAGC-3' and 5'-GAGCCATTGGTTGCTCGGAAGC-3' for Eptesicus and 5'-TGAGTTTTGGGGTGAATTAGC-3' and 5'-GGGTGACATAGGGAAGCACA-3' for Macropus. The PCR temperature profiles were: 30 cycles of 94°C for 30 s, 50°C for 30 s and 72°C for 2 min, and final extension at 72°C for 5 min. PCR products were purified and sequenced by the University of Washington High Throughput Genomics Facility. Geneious 4.8 was used to assemble and edit electrophoregrams. New sequences from this study have been named as endogenous filovirus-like NP elements (EFLNP) and assigned the following Genbank accession numbers: HM545133-HM545135.
Initial searches for sequence similarity to filoviruses used protein sequences from genes of Marburgvirus (NC_001608.3) as a query with tBLASTn in the WGS database and the EST database and BLASTp in protein database of NCBI. A second tBLASTn in the same databases used the best scoring non-viral sequence of placental mammals as a query. A third search used the EST nucleotide sequences Trichosurus as a query for the nucleotide and WGS databases. Nonviral subject sequences with expect values of E < 10-5 and two different sequences from each of the five known species in the Filoviridae were retained for alignment. A search constrained to Mononegavirales NCBI Genomic Reference Sequences Marburgvirus (NC_001608.3) found two species of Morbillivirus had expect values below 10-5 (Rinderpest virus, and Measles virus) that were retained for alignment. L protein sequences searches used a similar strategy but many more Paramyxoviruses had a significant match to Marburgvirus. We retained 19 different Paramyxoviruses for alignment with filovirus and the mammal sequence using BLAST explorer .
For genome assembly sequences, the sequence boundaries and translations identified by tBLASTn were used to retrieve nucleotide sequences and assemble amino acid sequences. MAFFT  was used to align the protein sequences for all analyses using the default parameters. The NP alignment was trimmed to the range of the mammalian filovirus-like sequences and the L protein alignment which had a mosaic of conserved and length variable regions was trimmed by Gblocks  (with gaps allowed).
Phylogenetic estimates were obtained with a maximum likelihood optimality criterion (PhyML  and RAxML ) and Bayesian MCMC methods . Models were chosen according to the best available optimal model from Prottest  (ML) or using a mixed model prior for amino acids (Mr.Bayes). Reliability was assessed by non-parametric bootstrapping (ML), approximate likelihood ratio tests (aLRT: SH like tests), and posterior probabilities. Prottest determined that the LG+G+F model was the best fit with the AIC criterion for the L protein alignment and the JTT+G model was the best fit for the NP alignment. We therefore carried out maximum likelihood analysis using these models. However, as RAxML does not accommodate the LG model we used the next best fit model of RtREV+G+F for the RAxML of L protein . For bootstrapping, RAxML estimated the number of pseudoreplicates. For PhyML, both SPR and NNI search algorithms were used with five random starting trees. For Bayesian analysis, a million Markov chain Monte Carlo generations were initially carried out and convergence metrics were assessed. If the average standard deviation of split frequencies <0.01 and a plot of log-likelihood scores versus generation time as consistent with convergence, then we culled the burn-in set of half of the trees and calculated the posterior probabilities. We added 500,000 MCMC generations at a time until convergence metrics were satisfied.
Tests of neutral evolution were carried out using both approximate methods (Codon-based Z test with Kumar model  that accommodates transition-transversion ratio bias) and Bayesian methods  of estimating site-specific Ka/Ks. For input, codon alignments were estimated using PAL2NAL  from a subset of sequences from the amino acid sequence alignment. We used only one sequence per species in the alignment. As both MEGA and Selecton require continuous ORF's, disrupted codons were replaced with gaps. For the Bayesian estimate of Ka/Ks, an ML tree was input after estimating with PhyML and a GTR+G model. Site-specific Ka/Ks values were culled from the Macropus sequence sites, which reduced the influence of alignment end gaps on the estimates. A histogram of the Ka/Ks values was created in PASW statistics 18.
To evaluate orthology between rat and mouse NIRVs, we used genomic BLAST searches and visualized the matches and annotations on the NCBI chromosome maps. Whole chromosome comparisons of larger orthologous blocks were assessed using the Cinteny server  and Roundup database .
We thank Gerald Aquilina and Kurt Volle at the Buffalo Zoo for fur samples from Macropus eugenii, Katharina Dittmar (University at Buffalo) for tissue samples from Eptesicus fuscus, the Field Museum of Natural History for tissue from Myotis lucifugus, and the administration team of the Center for Computational Research (University at Buffalo) for set up, monitoring, and use of the U2 cluster.
- 7.Strong JE, Wong G, Jones SE, Grolla A, Theriault S, Kobinger GP, Feldmann H: Stimulation of Ebola virus production from persistent infection through activation of the Ras/MAPK pathway. Proc Natl Acad Sci USA. 2008, 105 (46): 17982-17987. 10.1073/pnas.0809698105.PubMedCentralCrossRefPubMedGoogle Scholar
- 8.Leroy EM, Epelboin A, Mondonge V, Pourrut X, Gonzalez JP, Muyembe-Tamfum JJ, Formenty P: Human Ebola outbreak resulting from direct exposure to fruit bats in Luebo, Democratic Republic of Congo, 2007. Vector Borne Zoonotic Dis. 2009, 9 (6): 723-728. 10.1089/vbz.2008.0167.CrossRefPubMedGoogle Scholar
- 9.Towner JS, Amman BR, Sealy TK, Carroll SA, Comer JA, Kemp A, Swanepoel R, Paddock CD, Balinandi S, Khristova ML, et al: Isolation of genetically diverse Marburg viruses from Egyptian fruit bats. PLoS Pathog. 2009, 5 (7): e1000536-10.1371/journal.ppat.1000536.PubMedCentralCrossRefPubMedGoogle Scholar
- 11.Morvan JM, Deubel V, Gounon P, Nakoune E, Barriere P, Murri S, Perpete O, Selekon B, Coudrier D, Gautier-Hion A, et al: Identification of Ebola virus sequences present as RNA or DNA in organs of terrestrial small mammals of the Central African Republic. Microbes Infect. 1999, 1 (14): 1193-1201. 10.1016/S1286-4579(99)00242-7.CrossRefPubMedGoogle Scholar
- 13.Holmes EC: The evolution and emergence of RNA viruses. 2009, New York: Oxford University PressGoogle Scholar
- 16.Shi W, Huang Y, Sutton-Smith M, Tissot B, Panico M, Morris HR, Dell A, Haslam SM, Boyington J, Graham BS, et al: A filovirus-unique region of Ebola virus nucleoprotein confers aberrant migration and mediates its incorporation into virions. J Virol. 2008, 82 (13): 6190-6199. 10.1128/JVI.02731-07.PubMedCentralCrossRefPubMedGoogle Scholar
- 25.Ebisuya M, Yamamoto T, Nakajima M, Nishida E: Ripples from neighbouring transcription. Nat Cell Biol. 2008Google Scholar
- 28.Stern A, Doron-Faigenboim A, Erez E, Martz E, Bacharach E, Pupko T: Selecton 2007: advanced models for detecting positive and purifying selection using a Bayesian inference approach. Nucleic Acids Res. 2007, W506-511. 10.1093/nar/gkm382. 35 Web ServerGoogle Scholar
- 41.Suyama M, Torrents D, Bork P: PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006, W609-612. 10.1093/nar/gkl315. 34 Web ServerGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.