Comparative studies of Toxoplasma gondii transcriptomes: insights into stage conversion based on gene expression profiling and alternative splicing
Toxoplasma gondii is one of the most important apicomplexan parasites and infects one-third of the human population worldwide. Transformation between the tachyzoite and bradyzoite stages in the intermediate host is central to chronic infection and life-long risk. There have been some transcriptome studies on T. gondii; however, we are still early in our understanding of the kinds and levels of gene expression that occur during the conversion between stages.
We used high-throughput RNA-sequencing data to assemble transcripts using genome-based and de novo strategies. The expression-level analysis of 6996 T. gondii genes showed that over half (3986) were significantly differentially expressed during stage conversion, whereas 2205 genes were upregulated, and 1778 genes were downregulated in tachyzoites compared with bradyzoites. Several important gene families were expressed at relatively high levels. Comprehensive functional annotation and gene ontology analysis revealed that stress response-related genes are important for survival of bradyzoites in immune-competent hosts. We compared Trinity-based de novo and genome-based strategies, and found that the de novo assembly strategy compensated for the defects of the genome-based strategy by filtering out several transcripts with low expression or those unannotated on the genome. We also found some inaccuracies in the ToxoDB gene models. In addition, our analysis revealed that alternative splicing can be differentially regulated in response to life-cycle change. In depth analysis revealed a 20-nt, AG-rich sequence, alternative splicing locus from alt_acceptor motif search in tachyzoite.
This study represents the first large-scale effort to sequence the transcriptome of bradyzoites from T. gondii tissue cysts. Our data provide a comparative view of the tachyzoite and bradyzoite transcriptomes to allow a more complete dissection of all the molecular regulation mechanisms during stage conversions. A better understanding of the processes regulating stage conversion may guide targeted interventions to disrupt the transmission of T. gondii.
KeywordsGene expression RNA-seq Transcriptome mRNAs Alternative splicing Toxoplasma gondii
Apical membrane antigen
Reads per kilobase of exon model per million mapped reads
Dense granule protein
Integrative Genomics Viewer
Toxoplasma gondii belongs to the phylum Apicomplexa, which includes many other deadly pathogens, such as Plasmodium, responsible for malaria, and Cryptosporidium, the cause of cryptosporidiosis . Because of its remarkable capacity for invasion, transmission, and persistence, it is estimated that one-third of the human population is chronically infected with T. gondii. The reported prevalence rates range from a few percent to nearly 80%, depending on the population [2, 3, 4]. Toxoplasma gondii is one of the most important opportunistic pathogens in immune-compromised individuals, including patients with acquired immune deficiency syndrome (AIDS)/human immunodeficiency virus (HIV) and those receiving cancer treatments or organ transplants [3, 5]. It can also cross the placental barrier and cause abortion or congenital birth defects . In utero infection may elevate the risk ofocular toxoplasmosis owing to spontaneous reactivation of the disease .
While the complex life-cycle of T. gondii includes sexual and asexual stages in felines and intermediate hosts, respectively, it has an unusual capability to clonally propagate in intermediate hosts. The asexual life-cycle of T. gondii occurs within all intermediate hosts and involves a conversion between two distinct life forms: an initial phase with rapid multiplication of tachyzoites that are responsible for the acute stage of the disease, and a slowly dividing or resting phase with bradyzoites that are contained in thick-walled tissue cysts to evade the host immune system [8, 9, 10]. The latent bradyzoite cysts usually form in brain, skeletal muscles, or visceral organs, and are responsible for chronic disease because of their ability to evade the immune system, to resist commonly used drug treatments, and to reactivate into virulent tachyzoites [11, 12, 13]. As a consequence, it is essential to evaluate the regulation of gene expression during the critical interconversion between these two stages.
Toxoplasma gondii maintained in cell culture are capable of stage interconversion and have been used to explore the differences in the transcripts between tachyzoites and bradyzoites (tissue cysts) . However, the percentage of cells forming cysts is low and gene expression is asynchronous. Previous studies of the parasite’s life-cycle stages have indicated a complex pattern of expression associated with each of its forms. Serial analysis of gene expression (SAGE) has demonstrated that 18% of the tags were marked by unique stage-specific mRNA during its life-cycle . Some of the genes specifically regulated during the tachyzoite-bradyzoite transition are expressed throughout mitosis, cytokinesis, and early G1 phases, in concordance with the cell cycle arrest observed during bradyzoite differentiation . The recent development of next-generation sequencing (NGS) technology has commenced a new era for Toxoplasma genomics and transcriptome studies. Several databases of T. gondii have been constructed on ToxoDB for use in research, representing the predominant strains in Europe and North and South America . Using NGS to sequence full-length cDNAs (FL-cDNAs) reversely transcribed from RNAs, the RNA-sequence (RNA-Seq) technology provides a new approach for studying transcriptomes [18, 19]. This RNA-Seq can be used not only to measure gene expression levels at higher resolution than microarrays, but can also reveal unknown transcripts and splicing isoforms and provide a quantitative measurement of alternatively spliced (AS) isoforms. Such new possibilities are expanding the transcriptome studies of T. gondii from conventinal gene expression studies to all aspects of transcriptomes, including AS, gene fusion, and various kinds of non-coding RNAs . However, to the best of our knowledge, there have been no reports on the comparative analysis of transcriptomes from different stages of PRU strain T. gondii, especially from the tissue cyst bradyzoites.
In this study, we tried to delineate the transcriptomes of bradyzoites in vivo and compare them with those of tachyzoites from the same strain of T. gondii. The genome-based assembly method was utilized for relative comparisons of gene expression in the different stages, while Trinity de novo assembly was used for genome annotation and AS exploration. Our findings demonstrated major changes in gene expression between tachyzoites and bradyzoites in T. gondii, and provided insight into the pathways regulating these processes. A better understanding of the processes regulating stage conversion may help guide targeted interventions to disrupt transmission of this deadly parasite.
The TypeII T. gondii Prugniaud (PRU) strain was kindly provided by the Department of Parasitology, Xinxiang Medical College, Henan, China. Tachyzoites of T. gondii were collected from invaded HFF cells in vitro and purified by a 3-μm membrane filter. Bradyzoites (tissue cysts) were harvested from infected mice brains and purified by Ficoll (Sigma Co. Ltd) density gradient centrifugation as previously described .
RNA preparation and sequencing
Tachyzoites were washed three times with 1× phosphate-buffered saline (Invitrogen Co. Ltd) and collected by centrifugation at 1500× rpm for 5 min after induction to wash off the medium. Bradyzoites were lysed from tissue cysts by using 0.25% pancreatic enzyme (Invitrogen Co. Ltd). Total RNA was extracted from the purified tachyzoites and bradyzoites of the T. gondii PRU strain by using Trizol reagent according to the manufacturer’s protocol (Invitrogen Co. Ltd). The RNA concentration for each sample was measured using Nanodrop (Thermo Scientific Co. Ltd) and the Agilent 2000 Bioanalyzer (Agilent Technologies Co. Ltd).
Illumina sequencing and filtering
The RNA samples were processed by high-throughput cDNA sequencing (RNA-Seq) using Illumina HiseqTM 2000 at the Beijing Genomics Institute (BGI) - Shenzhen, China. Each sequencing feature can yield 2 × 90 base pairs (bp) independent reads from either end of a fragment. Sequencing errors can create difficulties for the short-read assembly algorithm. Therefore, we conducted a stringent filtering process of raw sequencing reads before the transcriptome assembly. We removed the raw low-quality reads (more than 5% ambiguous sequences “N” and more than 10% bases with a quality score of Q < 20) with an adaptor. Then, the high-quality transcriptome sequence data was mapped to mouse (mm9) and human (hg18) genomes using Bowtie2  (v 2.1.0) and TopHat2  (v 2.0.9) with the default parameters set to filter host genes. The remaining clean reads were considered to be T. gondii reads and used for the downstream informatic analyses.
Transcriptome assembly and annotation
Two libraries of T. gondii PRU strains were assembled with two different strategies: (i) a de novo assembly pipeline, utilizing the Trinity assembly suite ; and (ii) a genome-guided assembly pipeline, performed according to the Cufflinks protocol .
In the de novo pipelines, transcriptome assembly was performed on the clean reads using the short read assembly program Trinity (release-20130225). Trinity, consisting of three distinct software modules (Inchworm, Chrysalis, and Butterfly), had a consistently better integrated performance than most other single k-mer assemblers . We needed more complete transcripts for further annotation and analysis in our research. We used the eukaryotic genome annotation tool, which is the program used to assemble spliced alignments (PASA)  (PASA_r20140417), to reconstruct the Trinity-assembled contigs on the T. gondii Me49 genome (ToxoDB, version 11.0), as performed in previous research . Because of potential errors in sequencing and Trinity assembly, PASA first aligned the contigs to the Me49 genome using the Genome Mapping and Alignment Program (GMAP)  (version 2014-04-20); the valid contigs were then reconstructed into complete transcripts.
For the genome-guided transcriptome assembly pipeline, Bowtie2 and Tophat2 first needed to align the reads to the Me49 genome. Then, Cufflinks assembled the aligned RNA-Seq reads into transcripts by taking the binary version of the sequence alignment data as input and measuring transcript abundances in fragments per kilobase of exon per million fragments mapped (FPKM). Finally, we used the Cuffmerge program to merge the two assembled transcripts into a final transcriptome assembly.
To complete the annotation, the transcriptome assembly results in general transfer format (GTF) from the de novo strategy were compared with the genome-guided strategy with the annotated Me49 genes in ToxoDB using the Cuffcompare program in the Cufflinks package.
Identification and characterization of differentially expressed genes
We performed Cuffdiff to identify differentially expressed genes, using the negative binomial (NB) distribution for differential expression analysis (fold change ≥ 2, P < 0.05), on the basis of the preceding results. The Bioconductor package Cummerbund was then used to produce differential expression statistics and plots. The differentially expressed genes were searched against the Me49 annotated protein database (e-value < 1e-10). For further characterization, the differentially expressed genes were also searched against the nonredundant protein (Nr) database using Blastx with an e-value cut-off of 10e-10. With Nr annotation, we used the Blast2GO program  to obtain gene ontology (GO) annotation of genes. The WEGO software  was used to perform GO functional classification to understand the distribution of gene functions.
Structure annotation and assessing alternative splicing
For the purpose of gene structure annotation and analysis, we used BEDTools  (v 2.17.0) to search for overlaps between the PASA transcripts and the Me49 gene coordinates. The overlapping transcripts were searched against the annotated Me49 protein database using Blastx with an e-value cut-off of 1e-10.
PASA also identified and classified all splicing variations supported by incompatible transcript alignments. The unoverlapping transcripts were also searched against the NCBI Nr database (e-value < 1e-10). These matching transcripts obtained from the two preceding Blastx processes, re-submitted together into PASA with the parameter “--ALT SPLICE,” to identify and classify potential AS between the type II T. gondii strains. The MEME suite  provides a motif discovery algorithm, which has been widely used for DNA and protein sequence motif discovery. Because of AS, sites usually have fixed patterns on both ends, so we used MEME to assess the predicted AS sequences.
The structure and AS model were further confirmed by viewing the PASA and ToxoDB gene annotation files in the Integrative Genomics Viewer (IGV) .
Results and discussion
Genome-based assembly and analysis
We sequenced the transcriptomes of T. gondii tachyzoites and bradyzoites in order to understand changes in gene regulation during stage conversion in intermediate hosts. We used tachyzoites purified via in vitro culture and bradyzoites (tissue cysts) harvested from the brains of orally infected mice for RNA analysis. Illumina sequencing generated a total of 8 G 90 bp short reads from the tachyzoites and bradyzoites of the T. gondii PRU strain. We initially used Tophat to filter out the mouse and human reads. Using Tophat, a total of 50 and 47 M clean reads for tachyzoites and bradyzoites, respectively, were aligned to the T. gondii Me49 genome assembly, another type II strain. The mapping statistics showed that the proportion of the sequence that aligned with the reference genome was 87.15% and 94.1% in tachyzoites and bradyzoites, respectively, which was comparable to published data. The unmapped sequences in each library were only partially (< 13%), accounting for by tRNA or rDNA genes, which was not represented in the genome assembly.
Gene expression changes during stage conversion
At the top of the tachyzoite list was a transcript corresponding to TGME49_212200 (magnesium (Mg2+) transporter NIPA), a Mg2+ transporter distributed widely in eukaryotes with four proteins known as NIPA 1–4 . It is known that organisms must maintain physiological levels of Mg2+, because this divalent cation is critical for the stabilization of membranes and ribosomes, for the neutralization of nucleic acids, and as a cofactor in a variety of enzymatic reactions. Bacteria have the means to assess Mg2+ levels in their surroundings as well as in their own cytoplasm by multiple Mg2+ transporters therein . The ability of bacterial pathogens to detect the levels of Mg2+ in host tissues has shown some correlations with Mg2+ sensing, Mg2+ transport, and bacterial virulence. The ion channels in the unicellular T. gondii are poorly documented; thus, this finding was unexpected and there should be further exploration into how the Mg2+ transporters are regulated and subsequently affect the organism’s function. Other abundant transcripts included those encoding the MIC proteins, apical membrane antigen AMA1, cyclophilin, protease inhibitor (PI) 2 in tachyzoites, bradyzoite antigen (BAG1), cold/heat-shock protein family (cold-shock DNA-binding domain-containing protein, HSP90, and HSP70), elongation factor 1-alpha (EF-1-ALPHA), and facilitative glucose transporter (TgGT1) in bradyzoites. The initial discharge of the MIC proteins leads to the gliding motion necessary for invasion and attachment to the host cell . According to the list, MIC10, MIC1, MIC11, MIC2-associated protein (M2AP), MIC2 and MIC6 might play a core role in this process. Research on the facilitative glucose transporter (TgGT1, TGME49_214320) in tachyzoites, which is the major hexose transporter on the parasite’s plasma membrane, has shown that it is not essential for the in vitro survival and in vivo virulence of tachyzoites . The significantly increased level of TgGT1 in bradyzoites suggests that this glucose transporter might be involved in resistance to host stress in bradyzoites and formation of tissue cyst walls, similar to another nucleotide-sugar transporter (NST1) that is required for cyst wall glycosylation .
There were additional transcripts encoding for hypothetical proteins that were expected to be abundant. For example, CUFF.5515 would code for a hypothetical protein (of 284 amino acids), annotated as TGME49_258470, and was also identified in previous studies in other strains of Toxoplasma. On the other hand, three transcripts on the list (CUFF.66, CUFF.62 and CUFF.2458), which were among the most abundant transcripts, did not contain any previously annotated genes. It is clear that we urgently need more studies to understand the characteristics of these hypothetical protein genes and new transcripts that are expressed at such a high level.
Annotation of GO terms
In total, 2573 transcripts in bradyzoites and 2352 in tachyzoites were assigned to immune response-related GO classes (Additional file 5: Table S5). Transcripts related to the stress response, such as heat, cold, hypoxia, oxidative stress, and wounding, were the common stresses in GO terms. The number of stress-related transcripts in bradyzoites was two times greater than in tachyzoites. These transcripts might provide some indication of the importance of these processes to the bradyzoite’s survival in an immune-competent host.
De novo full-length transcript assembly
An alignment comparison of the genome-based and de novo assembly strategies for the T. gondii transcriptome
AVG length (bp)
Reference genes detected
Gene model annotations and alternative splicing predictions
For a long time, AS was recognized as an integral part of transcriptome complexity and proteomic diversity. Alternative splicing plays a major role in the diversity and functions of proteins in cells, and aberrant splicing has been observed to be associated with many diseases . We found that the AS events were obviously repressed in the dormant status (bradyzoites) and largely occurred in the rapid progenitive stage (tachyzoites), indicating some potential correlations between AS events and stage conversions.
Our comparison of the tachyzoite and bradyzoite transcriptomes presented here considerably expands what is known about T. gondii. This study sheds light on the levels of differential gene expression, genome annotation, and AS. In addition, we found some inaccuracies in the ToxoDB gene models. A better understanding of the processes regulating stage conversion may guide targeted interventions to disrupt transmission of T. gondii.
We are grateful to Professor Haizhu Zhang (Xinxiang Medical University) for providing PRU parasite strain.
This work was supported by National Natural Science Foundation of China (No.31030066 and No.81601779), Guangdong Natural Science Foundation (No. 2014A030310210), Medical Science and Technology Research Project of Guangdong Province (No. A2016029).
Availability of data and materials
The RNA-seq data generated from the tachyzoite and bradyzoite of Toxoplasma gondii are available from NCBI under BioProject ID PRJNA475428.
X-G C and L-F C X-L H conceived of the study. X-G C, L-F C, X-L H and M L wrote the manuscript. Y-Y Y, F-X L, X-C L and K W prepared and processed the RNA and sequencing data. L-F C, X-L H, F-X L, J-P F and X-J L carried out the data analysis and prepared the figures. All authors have read and approved the final manuscript.
Approval for the animal experiment was obtained from the Ethical Committee of Southern Medical University (Resolution No. L2015092).
All animals were housed and handled in strict accordance with the guidelines of the institutional and national Committees of Animal Use and Protection.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 18.Hassan MA, Melo MB, Haas B, Jensen KD, Saeij JP. De novo reconstruction of the Toxoplasma gondii transcriptome improves on the current genome annotation and reveals alternatively spliced transcripts and putative long non-coding RNAs. BMC Genomics. 2012;13:696.Google Scholar
- 20.Masihi KN, Jira J. Density gradient centrifugation for separation of different stages of Toxoplasma gondii. Folia Parasitol (Praha). 1979;26:9–13.Google Scholar
- 25.Zhao QY, Wang Y, Kong YM, Luo D, Li X, Hao P. Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinformatics. 2011;12:S2.Google Scholar
- 34.Ehrenkaufer GM, Weedall GD, Williams D, Lorenzi HA, Caler E, Hall N, et al. The genome and transcriptome of the enteric parasite Entamoeba invadens, a model for encystation. Genome Biol. 2013;14:R77.Google Scholar
- 35.Ben Mamoun C, Gluzman IY, Hott C, MacMillan SK, Amarakone AS, Anderson DL, et al. Co-ordinated programme of gene expression during asexual intraerythrocytic development of the human malaria parasite Plasmodium falciparum revealed by microarray analysis. Mol Microbiol. 2001;39:26–36.CrossRefPubMedGoogle Scholar
- 37.Morf L, Spycher C, Rehrauer H, Fournier CA, Morrison HG, Hehl AB. The transcriptional response to encystation stimuli in Giardia lamblia is restricted to a small set of genes. Eukaryot Cell. 2010;9:1566–76.Google Scholar
- 45.Blume M, Rodriguez-Contreras D, Landfear S, Fleige T, Soldati-Favre D, Lucius R, et al. Host-derived glucose and its transporter in the obligate intracellular pathogen Toxoplasma gondii are dispensable by glutaminolysis. Proc Natl Acad Sci USA. 2009;106:12998–3003.Google Scholar
- 52.Yeakley JM, Morfin JP, Rosenfeld MG, Fu XD. A complex of nuclear proteins mediates SR protein binding to a purine-rich splicing enhancer. Proc Natl Acad Sci USA. 1996;93:7582–7.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.