Findings

Introduction

Equine infectious anemia virus (EIAV) is an important nonprimate enveloped virus, of the retrovirus family, lentivirus genus, along with the human immunodeficiency virus (HIV), simian immunodeficiency virus (SIV) [1]. Among the lentiviruses, EIAV is the least complex lentivirus including only 6 genes. In addition to the gag, pol and env genes coding for the structural and enzymatic proteins coded by gag, pol and env, EIAV also contains three accessory genes: tat, rev and S2[2]. The host range of EIAV is reported to include all members of the Equidae, while susceptible to infection, donkeys do not develop clinical EIA and lower amounts of plasma associated virus [3].

It is well known that the redundancy of the genetic code allows for multiple codons to encode for a single amino acid, resulting in codon usage biases in genes [4]. The non-random usage of synonymous codons is crucial for the efficient protein translation and correct folding. Indeed, mutation pressure and natural selection are thought to be two major forces that drive the codon usage bias away from an equal usage among genes in different organisms [5]. Understanding the extent and causes of biases in codon usage is important for the comprehension of the pathogen evolution and the relationship between pathogens and the immune response [6].

Recent efforts to understand codon usage biases in viruses have primarily focused on the hepatitis A virus [7, 8], West Nile virus [9], foot-and-mouth disease virus [10], influenza virus [11], and HIV [1214]. To date, although the remarkable adenine (A)-richness of the EIAV genome was already discovered several decades ago [15], few codon usage analyses have been performed on EIAV genome. To gain insight into the characteristics of the viral genome, the synonymous codon usage pattern and the correlation between the codon usage pattern of EIAV and its hosts were investigated in our study.

Methods

Sequence data

The complete genome sequences of 29 EIAV strains were obtained from the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/Genbank/). The detailed information about the viruses is listed in Additional file 1: Table S1.

Codon usage analysis

Each nucleotide content and each nucleotide content at the third site of the codon in the EIAV coding sequence were calculated using MEGA4 software. The dinucleotides of the EIAV genome were analyzed by DAMBE software. The relative synonymous codon usage (RSCU) values for EIAV were calculated as previously described [16]. The effective number of codons (ENC), was used to quantify deviations from the expected random codon usage of EIAV ORFs [17]. The ENC values range from 20 to 61, and a low ENC value indicates a strong codon usage bias.

The codon adaptation index (CAI) was used to estimate the adaptation of EIAV to host codons. When the CAI value is much closer to 1, the gene expression level is much higher. The CAI was calculated to compare a given codon usage to a predefined reference set, using the CAIcal approach (available at: http://genomes.urv.es/CAIcal). The synonymous codon usage data for the viral hosts were obtained from the codon usage database (http://www.kazusa.or.jp/codon/)[18].

Principal component analysis

Principal component analyses (PCA) were performed to analyze the major trend in the codon usage model among the different EIAV strains. Each ORF is represented as a 59-dimensional vector and each dimension corresponds to the RSCU value of one sense codon, excluding the codons of AUG, UGG and terminal codons. The major trend within a dataset can be determined using measure of relative inertia and genes ordered according to their position along the axis of major inertia [19].

Results and discussion

Synonymous codon usage in EIAV

The overall base composition of different EIAV strains was nonrandom, and the U% and A% were higher than the C% and G% (Table 1). The EIAV genome which was rich in A may possess viral tactics for escaping from the antiviral activity of apolipoprotein B mRNA-editing enzyme-catalytic polypeptide 3 (APOBEC3) [20, 21]. To investigate whether these 29 EIAV strains display similar codon usage biases, the ENC values were calculated. The obtained ENC values varied from 38.10 to 49.71 with a mean of 43.61 ± 3.28. One possible explanation for the weak synonymous codon bias of EIAV was that the weak codon bias is essential to increasing the translational accuracy and efficiency. To further analyze the extent of codon usage bias in EIAV, the overall RSCU values for the 59 sense codons were calculated. Almost all extremely highly preferred codons among the strains were A-ended or U-ended codons (Table 2). In addition, AA was the most common dinucleotide in EIAV, while CG dinucleotides were significant suppressed in the genome of EIAV. The result revealed that CpG suppression may be a factor that shapes the synonymous codon usage of EIAV genome.

Table 1 The overall nucleotide contents and nucleotide contents at the synonymous third position of sense codons in EIAV genome
Table 2 Codon usage in EIAV genomes and its hosts

The effect of mutation pressure on the codon usage of EIAV

The ENC-plot analysis (ENC value plotted against the GC3s content) was performed to further investigate the patterns of synonymous codon usage. We found that all of the spots lie slightly below the expected curve, indicating that mutational pressure was the main factor for shaping the codon usage bias of EIAV (Figure 1A). To further identify the role of mutation pressure from the virus itself or by natural selection pressure in shaping the codon usage pattern of EIAV, a correlation analysis was used to analyze the relationships among the G + C content at the first and second codon positions (GC12%) and that at the synonymous third codon positions (GC3%). A highly significant correlation was observed (r =0.761, P < 0.001), indicating that the mutation pressure dominated over the natural selection pressure in shaping the coding sequence’s composition (Figure 1B).

Figure 1
figure 1

The main forces that affect the codon usage of EIAV. (A) The relationship between the effective number of codons (ENC) and the GC content of the third codon position (GC3). The continuous curve represents the expected curve between ENC value and GC3% in the absence of selection. All of spots lie below the expected curve. (ENC, GC3%) values of gag, pol and env were indicated by blue plot, green plot, and red plot respectively. (B) Correlation between G + C content at the first and second codon positions (GC12%) and that at synonymous third codon positions (GC3%). The line represents the correlation curve generated by the correlation analysis.

Genetic relationship based on synonymous codon usage in EIAV

The first principal component (f'1) which can account for 43.39% of the total variation has a substantial impact on the total variation in the codon usage pattern. In addition, a plot of the f'1 and f'2 of the gag, pol, and env ORFs in EIAV was drawn (Figure 2A). The plots for the different structural proteins were generally separated from each other. This phenomenon implied that the functions of the viral protein were likely related to the codon usage pattern.

Figure 2
figure 2

Genetic relationship based on synonymous codon usage in EIAV. (A) A plot of the values of the first axis and the second axis of gag, pol, and env in principle component analysis. EIAV field strain Miyazaki2011-A was indicated by black arrow. (B) A plot of the values of the first axis and the second axis of EIAV strains isolated from China (green), Japan (blue), and America (red) in principle component analysis. EIAV field strain Miyazaki2011-A was indicated by black arrow, EIAV strains V70 and V26 was indicated by blue arrow.

It has been reported that a strong pattern of geographic clustering is observed for EIAV, with a significant correlation between phylogroups of isolates and major geographic regions [22]. Based on the potential for the geographical factors in influencing EIAV evolution, a plot of f'1 and f'2 was performed according to the geographic distribution. The plots for EIAV isolated from China, Japan, and America were generally divided into three groups, implying that the EIAV isolated from the three countries evolved independently after diverging from a common ancestor (Figure 2B). In addition, we cannot ignore that the plots for EIAV strains V70 and V26 were clustered together with the strains isolated from America. The origin of these strains still remains controversial [23, 24]. Our data demonstrated that these EIAV strains have an American ancestry. Notably, the EIAV Miyazaki2011-A plot was far from the plots of the other strains. Recent reports showed that this EIAV strain was unlikely derived as a result of genomic recombination events and constituted a separate monophyletic group [24]. It is interesting to identify the potential origin of this novel EIAV isolate.

Comparative analysis of the codon usage between EIAV and host cells

The synonymous codon usage pattern of EIAV tended to differ from that of horse and donkey (Table 2 and Additional file 2: Figure S1). To further investigate whether the frequency of codon usage between EIAV and its hosts might have a close relationship with the viral proteins’ expression levels, the CAI were calculated using the horse and donkey codon usage as reference sets [25]. A mean CAI of 0.655 ± 0.020 was obtained for the EIAV ORFs in relation to horse codon usage reference set. A mean CAI of 0.593 ± 0.021 was obtained for the EIAV ORFs in relation to the donkey codon usage reference set. There was a trend for a lower CAI for EIAV in relation to donkey, with the consequent lower efficiency of protein synthesis in donkey. This phenomenon reflected that the interplay of codon usage between EIAV and its hosts may influence viral fitness, survival and evolution.

In conclusion, our comprehensive analysis of the codon usage patterns in EIAV has provided a basic understanding about some of the evolutionary information of EIAV. However, there were some limitations to this study. The sample size was relatively small and may not be fully representative of EIAV. More studies should be carried out to confirm the conjecture.