Introduction

Trionyx sinensis hemorrhagic syndrome virus (TSHSV) was first discovered and isolated by Liu in our laboratory in 2013. This virus has recently become an important factor limiting the sustainable development of T. sinensis farming [5, 13]. When compared with previously discovered T. sinensis viruses, TSHSV appears to be more virulent, causing high mortality both in the wild and in captivity [5]. TSHSV-infected turtles display typical characteristics, namely breathing difficulty, interstitial pneumonia, and hyperemic laryngeal mucosa in the throat [13, 14]. An artificial infection experiment has demonstrated that TSHSV-infected turtles are seriously hemorrhagic in multiple organs, including liver, kidney and intestine [13]. These clinicopathological features induced by T. sinensis are consistent with those typically caused by arteriviruses, including porcine reproductive and respiratory syndrome virus (PRRSV), equine arteritis virus (EAV), and simian hemorrhagic fever virus (SHFV) [1, 4, 7, 11, 18]. In addition, Liu et al. cloned a 435-bp-length amplicon that showed sequence similarity at the amino acid level to the arteriviruses mentioned above [13]. Considering the extremely high degree of infectivity and pathogenicity of TSHSV, it is essential to do more research on this newly discovered virus. In this study, we cloned and analyzed the complete genome of TSHSV, and the biological functions of the encoded hypothetical proteins were predicted.

Complete genome sequencing and classification of TSHSV

T. sinensis turtles that began to show signs of disease in early September 2013 were collected from outdoor ponds of turtle farms in Zhejiang Province [15]. Total RNA was extracted from lungs using TRIzol Reagent (Invitrogen) following the manufacturer’s protocol. Integrated RNA with an OD260-to-OD280 ratio between 1.8 and 2.2 was used for subsequent cDNA synthesis. One µg of total RNA was reverse transcribed into 5’ and 3’ cDNA using a SMARTer® RACE 5’/3’ Kit (cat. no. 634858, Clontech, USA). To obtain the complete genome sequence of TSHSV, TSHSV-specific primers for RACE, namely TSHSV-5’ (5’-GATTACGCCAAGCTTGCTCCCTCTCAACAACCAGCCAAAC-3’) and TSHSV-3’ (5’-GATTACGCCAAGCTTTTCAGCCACTTGAGCCTGGTCCTTT-3’), were designed based on the previously reported partial sequence of TSHSV (GenBank accession no. MH447986). Each 50.0-μL reaction mixture contained 2.5 μL of 5’- or 3’-RACE-Ready cDNA, 15.5 μL of PCR-grade water, 25.0 μL of 2× SeqAmp buffer, 1.0 μL of SeqAmp DNA polymerase, 5.0 μL of 10× universal primer mix (UPM), and 1.0 μL of TSHSV-specific primer. The amplification conditions were as follows: 94 °C for 5 min, followed by 25 cycles of denaturation (94°C for 30 s), annealing (68°C for 30 s) and extension (72°C for 8 min). The PCR product was purified according to the operating manual using a Cycle Pure Kit (Omega) and send to Sangon Biotech Co. (Shanghai, China) for sequencing. The complete genome sequence of TSHSV was deposited in the GenBank database under accession number MH447987. The genome of TSHSV was found to be a positive-sense, single-stranded RNA with a 3’-end poly(A) tail. The whole TSHSV genome was 17,875 bp in length with a 1323-bp-length 5’-untranslated region (UTR) and an 828-bp-length 3’-UTR. Gene model identification (http://topaz.gatech.edu/GeneMark/) predicted that the TSHSV genome contained eight open reading frames (ORFs) encoding eight hypothetical proteins (HPs) (Table 1). These HPs were calculated to have lengths of 79, 308, 1552, 1692, 169, 194, 329 and 714 amino acids (aa), respectively.

Table 1 Gene sequence characteristics and predicted functions of TSHSV proteins

Conjoint analysis of a multiple amino acid sequence alignment and NCBI BLAST revealed that TSHSV-HP2, TSHSV-HP3 and TSHSV-HP4 shared partial sequence identity with some arteriviruses, including Chinese broad-headed pond turtle arterivirus, Pebjah virus, PRRSV, Guangdong greater green snake arterivirus, Wuhan Japanese halfbeak arterivirus and EAV [10]. The deduced amino acid sequences were most similar to those of Chinese broad-headed pond turtle arterivirus, while only 26%, 27% and 37% identity was calculated for HP2, HP3 and HP4, respectively (Fig. 1A). Phylogenetic analysis of the complete genome sequence of TSHSV revealed that TSHSV and Chinese broad-headed pond turtle arterivirus clustered in the same branch of the phylogenetic tree (Fig. 1B). In agreement with a previous analysis [13], the distinct genome and deduced amino acid sequences demonstrated that the TSHSV is a new arterivirus.

Fig. 1
figure 1

Multiple sequence alignment and phylogenetic analysis. A. Multiple amino acid sequence alignment of TSHSV-HP2, HP3 and HP4 with specific proteins of other viruses. The aligned sequences were as follows: 1ab protein (Chinese broad-headed pond turtle arterivirus; AVM87331.1); putative 1b protein (Pebjah virus; AKI29956.1); ORF1a polyprotein (porcine reproductive and respiratory syndrome virus; ADX07034.1); polyprotein (Kibale red colobus virus 2; AHH53949.1); ORF1ab polyprotein (equine arteritis virus; AAR14192.1); 1ab protein (Guangdong greater green snake arterivirus; AVM87321.1); ORF1b polyprotein, partial (porcine reproductive and respiratory syndrome virus; BAP16275.1); putative 1b protein (southwest baboon virus 1; YP_009067063.1). Identical (red) and similar (box) residues are indicated. B. Phylogenetic relationships between TSHSV, arteriviruses and coronaviruses based on the complete genome sequence. The tree was generated using the neighbor-joining method

Functional annotation of predicted proteins

The predicted protein sequences were analyzed using SwissProt (http://uniprot.org), GO (http://www.geneontology.org/) and three-dimensional structure analysis (http://www.expasy.ch/swissmod/SWISS-MODEL.html) to do functional annotation. TSHSV-HP2, TSHSV-HP3 and TSHSV-HP4 were found to contain domains involved in specific biological processes (Table 1). No specific functional domains were identified in the last five HPs. TSHSV-HP2 shared 31% sequence identity with papain-like protease (PLP) 2 at the amino acid sequence level. PLPs of coronaviruses are involved in processing virus-encoded large replicase polyproteins and also function as deubiquitinating enzymes [19]. This protease has been shown to be synthesized during infection with many coronaviruses, including human coronavirus NL63 (HCoV-NL63) [6], murine hepatitis virus-A59 (MHV-A59) [3], and severe acute respiratory syndrome coronavirus (SARS-CoV) [2], and these viruses are capable of synthesizing different amount of PLPs. The genome of the representative arterivirus PRRSV also contains a region encoding a papain-like cysteine protease alpha (PCPalpha) domain, which gives further support for the potential classification of TSHSV as an arterivirus [17].

TSHSV-HP3 appears to belong to the same protein family as EAV peptidase S32 (IPR008760). Additional molecular features of proteins with serine-type endopeptidase activity (GO: 0004252) and 1a replicase protein activity (GO: 0003824) were also found in this protein. Serine-type endopeptidases are often involved in viral protein processing. Proteolytic enzymes that require a serine for their catalytic activity are ubiquitous and can be found in viruses, bacteria and eukaryotes [9]. They can be associated with a wide range of peptidase activities including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity [8]. For the arteriviruses, EAV, PRRSV-I, PRRSV-II, lactate dehydrogenase-elevating virus (LDV) and SHFV, a serine protease domain is present in non-structural protein 4 (nsp4), a 21-kDa cleavage product from the central region of the ORF1a-encoded polypeptide [15, 16]. It has been demonstrated that this is the main proteinase responsible for producing the majority of non-structural proteins (nsps) from the polyproteins (pp1a and pp1ab). Furthermore, it is responsible for the expression of replicase proteins, [12] suggesting that a similar function of TSHSV-HP3.

TSHSV-HP4 is predicted to have replicase activity, since a P-loop containing a nucleoside triphosphate hydrolase domain (IPR027417) and an EndoU-like endoribonuclease domain (IPR037227) were detected at positions 612-689 and 798-1042 respectively, of this protein (Table 1). These two functional activities have been confirmed for the replicase polyprotein 1ab of PRRSV [20].

In summary, as the full-length genome of a new arterivirus, TSHSV, was cloned and verified to be a positive-sense, single-stranded RNA with a poly(A) tail at its 3’ end. The encoded proteins TSHSV-HP2, TSHSV-HP3 and TSHSV-HP4 were predicted to be involved in replicase activity, whereas the functions of the last five HPs remain unknown. Further functional analysis and validation tests need to be conducted to determine the functions of these proteins.