High-resolution analysis of multi-copy variant surface glycoprotein gene expression sites in African trypanosomes
- 1.1k Downloads
African trypanosomes cause lethal diseases in humans and animals and escape host immune attack by switching the expression of Variant Surface Glycoprotein (VSG) genes. The expressed VSGs are located at the ends of telomeric, polycistronic transcription units known as VSG expression sites (VSG-ESs). Each cell has many VSG-ESs but only one is transcribed in bloodstream-form parasites and all of them are inactive upon transmission to the insect vector mid-gut; a subset of monocistronic metacyclic VSG-ESs are then activated in the insect salivary gland. Deep-sequence analyses have been informative but assigning sequences to individual VSG-ESs has been challenging because they each contain closely related expression-site associated genes, or ESAGs, thought to contribute to virulence.
We utilised ART, an in silico short read simulator to demonstrate the feasibility of accurately aligning reads to VSG-ESs. Then, using high-resolution transcriptomes from isogenic bloodstream and insect-stage Lister 427 Trypanosoma brucei, we uncover increased abundance in the insect mid-gut stage of mRNAs from metacyclic VSG-ESs and of mRNAs from the unusual ESAG, ESAG10. Further, we show that the silencing associated with allelic exclusion involves repression focussed at the ends of the VSG-ESs. We also use the approach to report relative fitness costs following ESAG RNAi from a genome-scale screen.
By assigning sequences to individual VSG-ESs we provide new insights into VSG-ES transcription control, allelic exclusion and impacts on fitness. Thus, deeper insights into the expression and function of regulated multi-gene families are more accessible than previously anticipated.
KeywordsAllelic exclusion Antigenic variation Gene expression RNA-seq Trypanosoma brucei VSG
Bloodstream expression site
Chromatin immunoprecipitation sequencing
Disruptor of telomeric silencing 1B
Differentiation trypanosome medium
Expression-site associated gene
Gene related to ESAG
Procyclic (insect) form
Quantitative reverse transcription polymerase chain reaction
RNA interference target sequencing
- RNA Pol
Reads per kilobase of transcript per million mapped reads
Reads per million mapped
Trypanosomiasis research Edinburgh University
VSG exclusion 1
Variant surface glycoprotein
VSG expression site
African trypanosomes are protozoan parasites that cause devastating diseases known as human African trypanosomiasis and a livestock disease known as nagana. These parasites are transmitted by the bite of an infected tsetse-fly, the distribution of which restricts the geographic spread of the disease. The parasite exists extracellularly, and is continually exposed to immune attack in the mammalian host . To persist in the host bloodstream, the parasite has evolved a sophisticated strategy of antigenic variation and immune evasion. The trypanosome surface is coated in a dense layer of 107 copies of a single variant surface glycoprotein (VSG) . Switching of this VSG coat is central to adaptive immune evasion, and operates at a rate of approximately 10−6 per parasite cell division in culture . In vivo, this leads to the recrudescent parasitaemia characteristic of T. brucei infection , where un-switched parasites are removed by antibody mediated killing.
VSG expression sites (VSG-ESs) are the key subtelomeric polycistronic units involved in antigenic variation in bloodstream African trypanosomes . Understanding the expression and function of these units is critical to understanding virulence. VSG-ES transcription, mediated by RNA polymerase I, initiates at multiple VSG-ES promoters but is attenuated in all but one to prevent multi-VSG expression in individual cells . The polycistronic VSG-ESs contain a number of Expression Site Associated Genes (ESAGs), several of which are of unknown function, but those that have been characterised are involved in nutrient acquisition; ESAG6 and ESAG7 , and innate immune evasion; ESAG4 and SRA [7, 8]. In trypanosomes, maturation of mRNA from nascent transcripts occurs via the linked processes of trans-splicing, the addition of a 39-nt capped leader sequence, and poly-adenylation [9, 10]. RNA Pol-II transcribes the spliced leader from a repetitive array as a primary 135 b transcript , that is processed and 5′ capped before association with the spliceosome , which mediates trans-splicing to nascent transcripts .
Antigenic variation is specifically required for immune evasion in the bloodstream and, consistent with this, VSG-ESs are subject to developmental regulation. Upon parasite differentiation in the tsetse mid-gut, VSG transcription stops and the VSG coat is shed in the fly mid-gut, where recent evidence shows it interferes with fly innate immunity . Procyclins, a family of repetitive proteins containing either EP or GPEET amino acid repeats, replace the VSG coat in the mid-gut . Following migration to the fly salivary gland, a distinct sub-set of VSGs are expressed on the surface of metacyclic cells from monocistronic VSG-ESs, and are required for re-infection of the mammalian host [15, 16].
Next-generation sequencing (NGS) and RNA-seq approaches in particular, have been used in African trypanosomes to examine a range of features of genome organisation and gene expression, including developmentally regulated transcript expression , alternative splicing , control by RNA-binding proteins  and translation control [20, 21]. The approach has also been used to analyse relative expression levels for transcripts mapping to the active VSG-ES, revealing that most ESAG transcripts are present at 1–0.01 % the level of the active VSG transcript [17, 18].
NGS analysis of VSG-ESs presents several unique challenges. In particular, VSG-ESs are closely related  and, although increased mapping stringency can improve the alignment , the accuracy of assigning sequence-reads to the correct and specific sites has not been assessed in detail. Genes related to ESAGs (GRESAGs) are also found at non-telomeric locations; copies of GRESAG4 are particularly prevalent and copies of GRESAG2 are present at procyclin loci [24, 25]. In addition, VSG-ESs are under-represented in reference genome-sequence assemblies. Fortunately, the full set of VSG-ESs have been isolated and sequenced , and the subset of VSG-ESs expressed in the metacyclic stage has also been identified [15, 26], in the widely studied Lister 427 strain. However, developmental control of VSG-ESs has not yet been analysed in any detail in this strain.
We generated transcriptome data from sub-cloned populations of Lister 427 cells expressing a defined VSG (VSG-2) and from differentiated insect mid-gut stage cultures directly derived from those sub-clones. We then developed computational approaches to determine how accurately short reads derived from NGS can be aligned to VSG-ESs. We find that the differences between VSG-ESs are sufficient to allow 100-b reads to be accurately aligned to specific loci. Subsequent high-stringency mapping revealed a number of unanticipated features regarding VSG-ESs and their developmental control. High-stringency mapping was also applied to published NGS datasets. This revealed specific perturbations to VSG-ES transcriptomes following knockdown or over-expression of the allelic exclusion regulator VEX1 , and relative fitness costs following knockdown of individual ESAGs .
Transcriptomes from isogenic bloodstream and insect-stage T. brucei
We next analysed reads from bloodstream and insect-stage cells that aligned to the active VSG-ES. A single base resolution BES1 plot (Fig. 1b) revealed a strikingly compact transcription-unit, incorporating little inter-transcript DNA sequence. Reads associated with a trans-spliced leader sequence, found associated with all trypanosomatid mRNAs , revealed trans-splicing at discrete points for each gene (Fig. 1b), as expected . We observed multiple trans-splicing events within the VSG gene, but the dominant splice-site was used >1000-fold more frequently than other sites. As also expected, we see bloodstream-specific over-representation (266 fold on average) of transcripts for every ESAG present in the active VSG-ES (Fig. 1b), consistent with transcription attenuation following differentiation . In the bloodstream-form, the VSG transcript itself is 141-fold more abundant than the mean value of the other VSG-ES-derived transcripts. We do see some isolated ESAGs that display higher expression relative to upstream ESAGs following differentiation to the insect stage but, rather than VSG-ES internal transcription initiation, this likely reflects incorrect assignment of reads from GRESAGs that are transcribed by RNA pol-II [24, 25]. Analysis of procyclin loci (also transcribed by RNA Pol-I) and the PGK locus (transcribed by RNA Pol-II) revealed similarly compact transcription units and the expected developmental controls (Fig. 1b). Thus, our RNA-seq datasets from isogenic bloodstream and insect-stage cultures are suitable for more detailed VSG-ES transcriptome analysis.
‘Short’ reads can be accurately assigned to VSG-ESs
We next considered the challenge of accurately assigning 100-b sequence reads from RNA-seq datasets to individual VSG-ESs. Analysis of ESAG7 genes from the Lister 427 strain highlighted the challenge in terms of distinguishing among individual ESAGs (Additional file 3). In this case, a high level of identity was observed throughout the coding-sequence. There are differences however, which can be exploited.
The analysis indicated that Bowtie2 aligns 75.5 % of in silico generated reads to the correct bloodstream VSG-ES with a MapQ > = 0 (98.7 % to metacyclic VSG-ESs) such that mis-aligned reads, as expected, can have a significant negative impact on transcriptome analysis. A MapQ value of > 1 removes 99.9 % of inappropriately aligned reads and retains 81 % of the signal while a MapQ > 12 eradicated all noise from the data and retained 65.7 % of the signal (Fig. 2b). We selected a MapQ cutoff > 1 as optimal for accurately assigning short sequence reads to individual VSG-ESs; the vast majority of VSG-ES associated genes retain short reads using this approach; an additional table file shows these values (Additional file 1, ‘BSF v PCF mapq > 1’ tab).
During this analysis, we also observed a distinct trend in the distribution of in silico-generated reads. Specifically, reads were more effectively retained closer to the telomere as we increased the uniqueness of alignment (Additional file 4A), indicating that sequences closer to telomeres are more divergent/unique. The alignment map for promoter-proximal ESAG7 genes from Additional file 3 is also compared to an alignment map for the telomere-proximal ESAG1 to further illustrate this point (Additional file 4B). Although a contested hypothesis , error-prone VSG gene-conversion was previously suggested as a mechanism contributing to antigenic variation [37, 38]. Our observation is consistent with this hypothesis when taken together with the inherent fragility of sub-telomeres and subsequent telomere-directed gene-conversion events .
Differential controls affecting specific ESAGs and VSGs
Reads mapping to ‘silent’ VSG-ESs are 2.4 × 104-fold lower on average relative to those mapping to the active VSG-ES and are further reduced when cells differentiate to the insect stage (Fig. 3a). Again, we see some isolated ESAGs that display higher expression relative to upstream genes, likely reflecting reads from RNA pol-II transcribed GRESAGs [24, 25]. VSG-ESs share a generic structure, with similar ESAGs in similar positions. When grouped and represented according to their position, VSG-ES associated genes closer to telomeres display greater down-regulation in the insect-stage (Fig. 3b). For instance, average VSG expression level decreases 21-fold upon differentiation to insect stage cells, while ESAG7 expression decreased only 2 fold. In contrast to other ESAGs, four of six ESAG10 genes were significantly (p < 0.05) upregulated (average 3.9 fold) in the insect-stage (Fig. 3a-b). This was unexpected since ESAG expression has been considered bloodstream stage-specific . Thus, ESAG10 may be an unconventional ESAG in terms of developmental expression-control.
Another unexpected observation was that, while VSGs in polycistronic VSG-ESs were down-regulated, three of the five VSGs located within the monocistronic metacyclic VSG-ESs were expressed at a significantly higher level (8.4 fold average, p < 10−22) in insect-stage cells (Fig. 3c-d, Additional file 5). VSG expression has been considered to be specific to the tsetse salivary gland stage and the bloodstream-stages, due to promoter control (the VSG-ES promoters active in bloodstream-form cells are distinct from the VSG promoters active in metacyclic cells) and stage-specific stabilisation of transcripts driven by a conserved element in the VSG mRNA 3′ untranslated sequence . We speculate that the unexpected increase in expression of monocistronic metacyclic VSGs reflects progression to a metacyclic (−like) stage by small numbers of cells present in differentiated cultures. Indeed, increased expression of a single RNA binding-protein, RBP6, can trigger this progression through the life cycle . Alternatively, sub-telomeric silencing may be less pronounced in insect-stage cells.
Our analysis of silent VSG-ESs allowed us to identify the trans-splicing sites for twelve VSG-ES linked VSGs and all five metacyclic VSGs from our RNA-seq data (Additional file 6A-B). We counted polypyrimidine tract lengths for VSG splice sites and compared these to the genes in the RNA Pol-I transcribed procyclin loci, counting the number of consecutive pyrimidines and allowing for a single purine interruption. We found that the VSG genes are associated with significantly shorter polypyrimidine tracts (11.5 b, n = 17) compared to genes in the procyclin loci (19.0 b, n = 13, p < 4 × 10−4) or the 20 most abundant RNA Pol-II transcripts in our dataset (20.4 b, n = 20, p < 5 × 10−4) (Additional file 6C). Notably, ESAG7 genes also possess shorter polypyrimidine tracts, suggesting that VSG-ES associated genes do not require extensive polypyrimidine tracts to form abundant mature messenger RNAs. Identification of splice-sites also allowed us to predict 5′-untranslated sequences and we note that there does not appear to be a consensus here; these sequences range in size from 15 to 91 b.
Regulation of VSG-ES transcripts by VEX1
We next analysed the RNA-seq data derived following VEX1 RNAi (Fig. 4b, Additional file 7). In this case, our analysis shows minimal impact at the active VSG-ES (Fig. 4b, red bars) but differential behaviour of the ‘silent’ VSGs and the ESAGs at either end of VSG-ESs relative to the centrally located ESAGs. Specifically, the promoter-adjacent ESAG6 and 7 were increased 10-fold; 75 % of genes tested increasing significantly (> 3 fold, p < 0.05), while the telomere-proximal VSG and ESAG1 increased 10 to 15-fold; 50 % of genes tested increasing significantly (> 3 fold, p < 0.05). This is in contrast to several centrally located ESAGs, which increased only 1.5 fold on average, with just 2 % of genes tested increasing significantly (> 3 fold, p < 0.05).
Loss-of-fitness associated with ESAG knockdown
Massive parallel NGS approaches have revolutionised the study of gene regulation and expression, producing data with an unrivalled depth , and have been applied with great success in African trypanosomes; see [17, 18, 33] for just a few examples. There are, however, a number of challenges associated with the analysis of subtelomeric sequences in a range of eukaryotes. These regions often incorporate highly repetitive and plastic components of the genome. This is particularly true of parasites, in which subtelomeric genes function in the processes of antigenic variation and immune evasion, such as malaria parasites  and African trypanosomes .
Analysis of these repetitive loci has proven challenging, a fact exemplified by our in silico analyses. Indeed, we note the use of orthogonal methods such as qRT-PCR and genetic tagging of VSG-ESs [46, 47], despite the availability of NGS approaches and datasets. We now report high-coverage transcriptomes from isogenic T. brucei cultures and from two major life cycle stages, namely the mammal-infective bloodstream form and the tsetse fly mid-gut stage, with a focus on the regulation of VSG-ES transcription. Simulated Illumina sequence data allowed us to gauge an appropriate filter that maximises the signal-to-noise ratio of sequence alignments at these loci. We find the expected extreme developmental regulation of VSG and EP/GPEET surface antigen genes and VSG-ES attenuation in insect stage cells but also, using high-stringency mapping, uncover additional and unexpected features.
Previous reports indicate that alignment of short reads to VSG-ESs can be problematic due to the similarity between these loci [22, 46, 48]. Our analyses show that an average of 24.5 % of VSG-ES derived reads are typically incorrectly aligned to VSG-ESs, and filtering reads with a MapQ value > 1 greatly reduces mis-mapping. Further analyses of VSG-ESs suggest that this is particularly useful for the closely related ESAG sequences. Improvements in sequencing technologies, such as quality (our sequence data has mean per-base quality scores > 34) and read-length now facilitate accurate high-stringency mapping. Specifically, we believe that 100–150 b reads that are commonly produced by current Illumina technologies incorporate sufficient SNPs to allow specific assignment to individual VSG-ESs.
In bloodstream cells, monoallelic expression ensures that a single subtelomere is productively transcribed . However, in mid-gut stage cells, VSG-ESs are silenced . In our populations, we find that transcripts for twelve silent VSG-ES linked and all five metacyclic VSGs are detectable, although at a level approximately 26,000 times lower than the active VSG; illustrating the impressive dynamic range of RNA-seq. In addition, our analysis reveals that ESAG10 mRNAs, encoding putative folate transporters , are more abundant in the mid-gut stage cultures than the bloodstream form. This surprising finding is suggestive of less effective silencing in the insect-stage of ESAG10-associated VSG-ES promoters relative to the almost identical VSG-ES promoters located downstream . Alternatively, ESAG10 transcripts may display increased stability in insect-stage cells. Notably, additional RNA pol-II transcribed genes on chromosome 8 (Tb927.8.3620, 3630 and 3650) encode folate transporters and whether the ESAG10-associated promoters or transcripts are ‘activated’ at any point in the life cycle remains unknown.
High-stringency mapping using transcriptomic datasets derived following knockdown of the allelic exclusion regulator, VEX1 , revealed derepression of promoter and telomere proximal VSG-ES genes. In another study, ectopic overexpression of a second VSG gene resulted in VSG-ES silencing spreading from the telomere towards the promoter in a disruptor of telomeric silencing 1B (DOT1B) dependent manner . Our analysis indicates that VEX1-mediated silencing is directed at the telomeric VSG and ESAG1 genes, and at the VSG-ES promoter-adjacent ESAG6 and ESAG7 genes. This is in contrast to VEX1 overexpression, which upregulates all silent ESAGs and VSGs. Thus, our current data indicate that VEX1-mediated silencing primarily affects the ends of silent VSG-ESs, suggesting that subtelomere conformation may be important in the control of these genes. Finally, high-stringency phenotyping data confirm (GR)ESAG2 as the ESAG associated with the greatest fitness cost when knocked-down in in vitro culture.
By distinguishing between closely related transcription units, we have been able to enhance our understanding of the behaviour of VSG-ESs in terms of VSG silencing, developmental regulation and contributions to fitness in culture. NGS approaches, coupled to high-stringency mapping, such as RNA-seq, ChIP-seq, RIT-seq and the growing list of ‘seq’ technologies will undoubtedly improve our understanding of the organisation and expression of these virulence gene loci and indeed closely related gene families in a range of other organisms.
Two subclones of wild type bloodstream-form Lister 427 strain T. brucei expressing VSG-2 (VSG-221, Mitat1.2) were differentiated to insect mid-gut stage cells as previously described . Briefly, cells were collected by centrifugation and resuspended in differentiation medium (DTM)  supplemented with 3 mM citrate and 3 mM cis-aconitate and maintained for 10 days at 27 °C, 0 % CO2 (ambient).
For RNA extraction, 5 × 107 cells were collected and RNA prepared using the Qaigen RNeasy kit, according to the manufacturer’s instructions. Poly-A+ RNA was enriched using oligo-dT beads, and reverse transcribed. Second strand synthesis was randomly primed. Sequencing was performed on the HiSeq platform (Illumina) at the University of Dundee generating 100-b paired-end reads. This yielded insect-stage RNA-seq data, using identical processing, that were only 10-days removed from our bloodstream-form RNA-seq data .
In order to align reads we generated a hybrid genome assembly consisting of the 11 megabase chromosomes from the T. brucei 927 reference genome , the non-redundant set of 14 bloodstream expression sites  and the 5 metacyclic expression sites from our Lister 427 strain [15, 26]. Read alignment was performed using Bowtie2  as previously described  using the parameters --very-sensitive --no-discordant. Approximately 25 million bloodstream-form and 50 million insect stage reads were aligned for each clone. Alignment files were manipulated using SAMtools , and visualized in the Artemis genome browser . Single base resolution plots were generated using the pysam API (https://github.com/pysam-developers/pysam) in an in-house script that filters reads based on alignment quality (MapQ) and corrects for library size (available on request). Trans-spliced reads were extracted using a previously published script , using the partial spliced leader sequence ‘TCTGTACTATATTG’ and it’s reverse complement to search. This is the shortest sequence that returns only spliced leader sequences following BLAST search of the TREU-927 genome sequence on TriTrypDB. Differential expression analysis was performed with edgeR  as previously described . When analysing VEX1 perturbation, we excluded genes with <10 reads averaged across replicates in both uninducing or inducing conditions.
ART is a software package that simulates next-generation sequencing runs using empirical error models utilized by the 1000 genomes project . Illumina sequencing runs were simulated for all 19 of the Lister 427 VSG-ES contigs in the hybrid genome using the parameters art_illumina -i contigX.fa -len 100 -ss MS -c 100000. This produced 105 single-end reads for each contig; as the longest contig is 59,781 bp, this provided coverage of every base in each VSG-ES. These in silico reads were then aligned back to the complete genome with Bowtie2 using the parameters --very-sensitive . Alignment files were manipulated using SAMtools . Read counts were generated using the Artemis genome browser .
Sequence and data analysis
Clustal alignment analysis and visualisation of ESAG7 sequences was performed using CLC workbench using settings: gap open cost = 0.0, gap extension cost = 0.0, end gap cost = free, alignment mode = very accurate, redo alignments = no, use fixedpoints = yes. A non-redundant gene list was from  and VSG-ES sequences  were retrieved from TriTrypDB. ‘Generic’ ESAG lists are derived from ; ESAGs from each VSG-ES were compiled based on relative position within each VSG-ES.
The work was supported by The Wellcome Trust (Investigator Award 100320/Z/12/Z to D.H. and Strategic Award 100476/Z/12/Z supporting Biological Chemistry & Drug Discovery).
Availability of data and materials
The RNA-seq sequence data reported in this paper have been deposited in the European Nucleotide Archive, www.ebi.ac.uk/ena (accession no. PRJEB8747). Genome sequences assembled from publicly available data (tritrypdb.org) for this paper, and any scripts used are provided on request without condition.
SH and DH designed the study and analysed the data, LG generated samples for RNA-seq. SH, LG and DH wrote the paper. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
- 18.Nilsson D, Gunasekera K, Mani J, Osteras M, Farinelli L, Baerlocher L, Roditi I, Ochsenreiter T. Spliced leader trapping reveals widespread alternative splicing patterns in the highly dynamic transcriptome of Trypanosoma brucei. PLoS Pathog. 2010;6(8):e1001037.CrossRefPubMedPubMedCentralGoogle Scholar
- 25.Paindavoine P, Rolin S, Van Assel S, Geuskens M, Jauniaux JC, Dinsart C, Huet G, Pays E. A gene from the variant surface glycoprotein expression site encodes one of several transmembrane adenylate cyclases located on the flagellum of Trypanosoma brucei. Mol Cell Biol. 1992;12(3):1218–25.CrossRefPubMedPubMedCentralGoogle Scholar
- 47.Pena AC, Pimentel MR, Manso H, Vaz-Drago R, Pinto-Neves D, Aresta-Branco F, Rijo-Ferreira F, Guegan F, Pedro Coelho L, Carmo-Fonseca M, et al. Trypanosoma brucei histone H1 inhibits RNA polymerase I transcription and is important for parasite fitness in vivo. Mol Microbiol. 2014;93(4):645–63.CrossRefPubMedPubMedCentralGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.