An integrative genomic analysis of transcriptional profiles identifies characteristic genes and patterns in HIV-infected long-term non-progressors and elite controllers
Despite that most HIV-infected individuals experience progressive CD4+ T cell loss and develop AIDS, a minority of HIV-infected individuals remain asymptomatic and maintain high level CD4+ T cell counts several years after seroconversion. Efforts have been made to understand the determinants of the nonprogressive status, exemplified by the clinical course of elite controllers (ECs) who maintain an undetectable viremia and viremic nonprogressors (VNPs) who have a normal CD4+ count in spite of circulating viral load. However, the intrinsic mechanism underlying nonprogression remained elusive. In this study, we performed an integrative analysis of transcriptional profiles to pinpoint the underlying mechanism for a naturally occurring viral control.
Three microarray datasets, reporting mRNA expression of the LTNPs or ECs in HIV-infected patients, were retrieved from Gene Expression Ominbus (GEO) or Arrayexpress databases. These datasets, profiled on the same type of microarray chip, were selected and merged by a bioinformatic approach to build a meta-analysis derived transcriptome (MADNT). In addition, we investigated the different transcriptional pathways and potential biomarkers in CD4+ and CD8+ cells in ECs and whole blood in VNPs compared to HIV progressors. The combined transcriptome and each subgroup was subject to gene set enrichment analysis and weighted co-expression network analysis to search potential transcription patterns related to the non-progressive status.
30 up-regulated genes and 83 down-regulated genes were identified in lymphocytes from integrative meta-analysis of expression data. The interferon response and innate immune activation was reduced in both CD4+ and CD8+ T cells from ECs. Several characteristic genes including CMPK1, CBX7, EIF3L, EIF4A and ZNF395 were indicated to be highly correlated with viremic control. Besides that, we indicated that the reduction of ribosome components and blockade of translation facilitated AIDS disease progression. Most interestingly, among VNPs who have a relatively high viral load, we detected a two gene-interaction networks which showed a strong correlation to immune control even with a rigorous statistical threshold (p value = 2−e4 and p value = 0.004, respectively) by WGCNA.
We have identified differentially expressed genes and transcriptional patterns in ECs and VNPs compared to normal chronic HIV-infected individuals. Our study provides new insights into the pathogenesis of HIV and AIDS and clues for the therapeutic strategies for anti-retroviral administration.
KeywordsElite controller Long-term nonprogressor Viremic nonprogressor Meta-analysis GSEA WGCNA
human immunodeficiency virus
acquired immunodeficiency syndrome
Gene Expression Ominbus
differentially expressed genes
combination anti-retroviral therapy
single nucleotide polymorphism
Robust Multi-Array Analysis
false discovery rate
Kyoto Encyclopedia of Genes and Genomes
Database for Annotation, Visualization and Integrated Discovery
meta-analysis derived nonprogressor transcriptome
gene set enrichment analysis
weighted gene coexpression network analysis
peripheral blood mononuclear cell
IFN regulatory factor 7
Dulbecco’s modified Eagle medium
Despite the great progress in combination anti-retroviral therapy (cART), AIDS is still a non-curative disease that causes a high death rate every year. It is interesting that there are a small proportion of HIV-infected individuals who maintain a stable CD4+ T cell count within the normal reference range over several years from the diagnosis of HIV infection in the absence of ART and clinical symptoms. These patients with spontaneous and sustained control of HIV disease progression were identified as long term nonprogressors (LTNPs) [1, 2]. Similarly, there are approximately 1% of HIV-infected patients who maintain undetectable viral load for a prolonged period (generally less than 50 copies/ml), who are identified as elite controllers. These two groups of patients classified by the immunologic parameters or viral parameters represent useful models of natural protection against disease progression and may have important implications for prophylactic and therapeutic strategies . Thus far it is still in debate whether viral factor, host or environmental factor contributes to the LTNP status [2, 4]. Association of augmented control with attenuated viruses due to deletions or mutations of regulatory proteins or other special viral polymorphism have been reported by several researches [5, 6, 7, 8]. For instance, a F72L mutation in HIV-1 Vpr in a LTNP was reported to confer a significant reduction in Vpr nuclear import and virion incorporation implying a link between efficient Vpr nuclear import and HIV disease progression . Moreover, viruses isolated from some LTNPs were found to have gross deletions in nef genes, suggesting the importance for nef gene in AIDS disease progression [10, 11]. However, studies of nef genes in elite controller yield contradicting results [12, 13]. A phylogenetic analysis revealed that nef sequences from patients with different rates of progression did not form distinct cluster between LTNP and progressors, suggesting the degree of variation in nef is unlikely to be indicative of disease progression [14, 15].
Cellular immune responses have also been reported to contribute to viral control [16, 17]. For instance, a tight association was observed between Gag specific CD8+ T cells and viral control [17, 18, 19, 20]. Some elite suppression was linked with higher level of cytolytic granules within HIV-specific CD8+ T cells . Additionally, CD8+ T cells isolated from ECs exhibited more polyfunctional capability in response to HIV specific antigens [22, 23, 24, 25]. On the other hand, CD4+ T cells from ECs retained an ability to proliferate and produce interleukin 2 (IL-2) in response to HIV compared with normal patients under ART . In addition to CD4+ and CD8+ T lymphocytes, a correlation of Interleukin 17 secreting T (Th17) cell level and HIV disease progression was observed in LTNPs compared to TPs, supporting a role of this cell subset in HIV pathogenesis . Besides, humoral immune response also plays a role in the context of natural viral control. Some researchers have reported that ECs maintained HIV-1 specific memory B cell response which contributed to neutralizing responses in contrast to treated patients .
Moreover, various approaches have been undertaken to uncover the host genetic factors or specific genes involved in virologic control of HIV infection in ECs [29, 30]. A study showed a role of p21 in ECs via indirectly blocking reverse transcription by inhibiting CDK2-dependent phosphorylation . The HLA class I allele HLA-B*57 [32, 33] and HLA-B*27 alleles [34, 35] were overrepresented among elite controllers and viremic controllers (VCs) compared with normal progressors, which underscored the important role of CD8+ T cells in naturally viral control. . Most notably, HLA class I molecules might affect HIV-1 immune control by interacting with their receptors on innate immune cells, such as the killer cell immunoglobulin-like receptors (KIR) on natural killer (NK) cells. Two studies reported that activating KIR3DS1 allele in combination with Bw4-80I and KIR3DL1*004 in the presence of HLA-Bw4 showed a pronounced protection against AIDS progression [36, 37]. Genome-wide association studies have been carried out to decipher the association between naturally occurring single nucleotide polymorphism (SNPs) and viremic control in ECs and LTNPs [38, 39]. An international HIV controllers study identified over 300 SNPs on chromosome 6 involved in viral control . However, only approx. 20% of the protective effects can be attributable to the SNPs discovered, indicating other unknown mechanisms were accountable for the observed control. Further studies are necessary to pinpoint more novel pathways and intrinsic host factors responsible for virological control.
In most studies investigating the host intrinsic factors controlling disease progression, only a few ECs or LTNPs were involved and the conclusions are often controversial. To overcome this limitation of individual studies, a large-scale transcriptional study was necessary to reduce random error and increase statistical power. In this study, we combined relevant microarray data to increase statistical power to uncover the biological differences between LTNP or ECs and chronic progressors. To decrease the heterogeneity and increase the consistency between different datasets, we only select microarray data from Human Genome U133A or Human Genome U133 plus 2 Array (Affemetrix Company). Furthermore, a new illumina data series GSE87620 was used as a validating set. To be most informative, we conducted our analysis in two steps. First, we combined the three datasets into meta-analysis derived transcriptome, which provided a comprehensive comparison of nonprogressors (NP) with progressors (PP). Second, we split these study subjects into three subgroups, namely CD4+ T cell samples from ECs, CD8+ T cell samples from ECs and whole blood samples from VNPs, thereby providing an elaborate comparison between aviremic controllers or viremic controllers and progressors. Differential expression analysis, gene set enrichment analysis and WGCNA approach were carried out in each step. Our study have revealed some characteristic biomarkers and transcriptional patterns and highlight several key genes in nonpathogenic individuals, Most strikingly, we identified key transcriptional modules in VNPs which have never been reported before. These findings may better the understanding of HIV-1 viremic and immune control and AIDS progression.
Selection of studies and datasets
General information of each dataset
Sample size (PP/NP)
GPL96; Affymetrix Human Genome U133A Array
CD4+ T cell
CD8+ T cell
PP = 10; NP = 10
*3 NPs are ECs and 2 with extremely low viral load
W. Nicholas Haining
GPL3921; Affymetrix HT Human Genome U133A Array
CD8+ T cell
PP = 18;NP = 24
*All NPs are ECs
Gregory K Tharp
GPL570; Affymetrix Human Genome U133 Plus 2.0 Array
PP = 7; NP = 5
*All NPs are VNPs
Microarray meta-analysis were carried out according to the guidelines described in . Each datasets were log2 transformated and normalized by Agilent GeneSpring software (Version 11.5, Agilent, USA). Then, gene matching was done for all probes. When multiple probes matched the same gene symbol, the probe presented the greatest inner-quartile range (IQR) was selected to represent the target gene symbol. After matching all the probes to a common gene symbol, “MetaDE” R package  was exploited to merge the common gene symbols across multiple studies by p value combination using Fisher methods. Differentially expressed genes were selected with adjusted p value < 0.05, based on false discovery rate (FDR) by the Benjamini–Hochberg procedure and moderated t test.
Enrichment analysis for KEGG pathway and Gene Ontology terms were carried out by David online tool (https://david.ncifcrf.gov). Gene set enrichment analysis (GSEA)  was carried out using GSEA version 3.0, downloaded from the Broad Institute (http://www.broadinstitute.org/gsea/downloads.jsp). Expression data sets and phenotype labels were created according to GSEA specifications. Gene set permutations were set to be done 1000 times for each analysis using the weighted enrichment statistic and signal to noise metric. Gene sets with FDR lower than 0.05 were considered significant.
Weighted gene coexpression network analysis (WGCNA) is a gene coexpression network-based approach [43, 44]. A gene co-expression network is defined as undirected, weighted gene network, in which the nodes represent expression profiles while edges represent pairwise correlation between gene expressions. Briefly, correlation coefficient Smn between characteristic gene m and gene n is calculated by their expression values between different samples using the formulation: Smn = |cor(mn). The correlation matrix was then transformed into an undirected network by raising the absolute value of each entry to a power β using 6 as correlation coefficient threshold. Genes were clustered into different modules using dynamic tree cutting method.
Protein–protein interactions (PPI) networks in each module were visualized by Cytoscape 3.6.0. The Network Analyzer examined the network for topological parameters, including degree, connectivity, betweenness and closeness.
HIV infection assay and western blotting
To analyze the antiviral activity of several up-regulated genes, HIV infection assay was performed as previously described [45, 46]. Briefly, 200 ng pNL4-3-R-E-Luc vector and 150 ng pCMV-VSV-G packaging vector were transfected into the 293T cells together with 200 ng empty pCMV6-Entry vector, control gene (SLFN11, GADD45G) or pCMV6-Entry-CMPK1, ZNF395, METTL9, GADD45A, PKKAR2B, OAT, CHPT1, SPOCK2 and CBX7 in 12-well plate using Lipofectamine 2000 (Thermo Scientific) according to the manufacturer’s instructions. The old culture medium was replaced with fresh culture medium at 6 h post-transfection. Subsequently, the supernatants were collected at 48 h post-transfection and cleared by filtration with the 0.45 μm filter. Cell lysis were harvested for SDS-PAGE and western blotting with α-myc, p24 (Gag) and β-actin antibody, respectively. The amount of (infectious) virus particles in the supernatants was determined via one-cycle infection assay. For infection assays, 200 μl of 1 × 105 SupT1 or Jurkat cells per well in 96 well plate were infected with supernatants collected from 293T cells. At 48 h post-infection, the SupT1 cells were lysed and luciferase activity was determined using a firefly Luciferase Assay System (Promega).
293T cells were maintained at 37 °C in high glucose Dulbecco’s modified Eagle medium (DMEM) supplemented with 10% fatal bovine serum (Gibco). SupT1 and Jurkat cells were maintained at RPMI 1640 Medium with 10% fetal bovine serum.
Description of datasets
We searched two public repositories for mRNA transcriptional profile related to viremic control using the search term” “elite controller” [OR] “long-term nonprogressors”. The inclusion criteria were set as (1) the microarray chips used in the study was from Human Genome U133A or Human Genome U133 plus 2 Array (Affemetrix Company); (2) sample source was CD4+, CD8+ T cells or whole blood. Three studies met the inclusion-criteria were selected for the meta-analysis study. Details about these datasets were outlined as in Table 1. The three series included 35 PPs, 34 ECs (except for two patients with extremely low viremia) and 5 VNPs. The following information was extracted from each data series including GEO number, author name, sample source, platform and the number of progressors and controllers.
A pilot study reported that serum from LTNPs and ECs have distinct neutralizing capabilities and antibody-dependent cellular cytotoxicity (ADCC) activity . To uncover the divergent transcription pattern between aviremic controllers and VNPs from different cell types, study subjects were grouped into three subsets aforementioned. Data of CD8+ T cells from two datasets were merged into a combined transcriptome by MetaDE package. 115 DEGs were identified in CD8+ T cells (Additional file 3: Table S3), which highly overlap with those identified in MADNT. Only 4 DEGs (OAS3, IFI44, IFI44L and EIF2AK2) were identified in CD4+ T cells of ECs (Fig. 2e), and no DEGs met the criteria (FDR < 0.05) in VNPs.
Taken together, statistical combination of the relevant datasets has created a meta-analysis derived nonprogressor transcriptome (MADNT) and different subgroups, which can be subject to downstream GSEA and WGCNA analysis.
Gene set enrichment analysis
In the aspect of stratification subgroup, 6 pathways were enriched with FDR less than 0.25 in CD4+ T cells of ECs which included “multi organism metabolic process”, “translational initiation”, “Nuclear transcribed mRNA catabolic process nonsense-mediated decay”, “protein targeting to membrane”; Similarly, 32 pathways were enriched with FDR less than 0.01 in CD8+ T cells of ECs, most of which were associated with pathways related to translation and protein-targeting to membrane (Additional file 4: Table S4).
Weighted gene co-expression network analysis
WGCNA provides insights to disease pathogenesis by studying co-expression genes between clinical samples based on gene expression correlation coefficients. The reliability of this algorithm depends on a large sample size, which we are able to apply the MADNT and two subgroups (CD8+ T cells (EC) and VNP) expression set into WGCNA analysis.
Increasing transcriptional profiles allowed us to investigate the significant genes and functional pathways relevant to viral control in ECs and VNPs in a large scale level. In this study, we adopted an integrative approach to uncover the transcriptional patterns in LTNPs and ECs and identified several key genes and functional pathways which are significantly connected to viral control and disease progression.
We found several common genes up-regulated in all ECs and VNPs, some of which have never been reported before, including ZBTB10, ZNF395, GPX4, CHPT1, METTL9, SPOCK2, EIF4A3, GADD45A, SLC75A, TGIF1, CMPK1, LMNA, PLXDC1 and BACH2. GADD45A is a stress-induced protein and participate in various cellular pathways. A recent study reported that overexpression of GADD45 proteins reduces HIV-1 production through suppressing transcription from the HIV-1 LTR promoter . We have experimentally examined several up-regulated genes for their anti-HIV activity in cell lines, most of which were demonstrated to suppress the replication of VSVG-pseudotyped HIV-luc viruses. The biological relevance of these up-regulated genes in the context of HIV pathogenesis needs further investigation.
On the other hand, an arsenal of ISGs were down-regulated in NPs compared to chronic progressors, which further debates the role of interferon in chronic HIV infection. Several researches by other groups observed similar phenomenon. Margalida Rotger showed several ISGs (IF44, MX1, EIF2AK2, IFI6, LY6E, TRIM22) were up-regulated in RPs who got a progressive immunosuppression soon after seroconversion while a lower expression of ISGs was observed in CD8+ T cells in viremic nonprogressor (VNP) profile . Sankaran et al. reported a significant increase in gene expression regulating immune activation and inflammatory response in intestinal mucosa in HIV-infected patients with high viral load compared to LTNPs . In accordance with this, CD38 and LAG3 which were found to be elevated in RPs have also been identified in our analysis. Consistent with this knowledge, a research reported a significant increase of ISG expression with increased viral load, including genes of intrinsic antiretroviral defense . Currently, the contribution of IFN-Is to the control of viral infection and to the immunopathogenesis of AIDS is still under debate [53, 54]. Supportively, Liang  implied that type I interferon contributed to aberrant immune activation, T cell depletion and dysfunction during chronic HIV-1 infection. They found that persistent HIV-1 infection in humanized-mice led to induction of IFNs and ISGs including MX2, IFITM3, Trim22, ISG15, OAS1, and IFN regulatory factor 7 (IRF7) both in peripheral blood mononuclear cells (PBMCs) and in the spleen. They also observed the enhanced expression of CD38 and HLA-DR, similar to our data. Persistent immune activation plays a central role in CD4+ T cell depletion and progression to AIDS and may be considered to be a predictor of disease progression in ART-naïve patients. Taken together, our analysis supports the concept that IFNs and ISGs are a double-edged sword during chronic infection and highlighted the caution of IFNs usage in HIV-1 chronically infected patients.
Given the limited power of classical function annotation, GSEA and WGCNA algorithm was performed on our combined transcriptional profile individually. Pathways including “translational initiation”, “3′UTR mediated translational regulation” and “nuclear transcribed-mRNA nonsense-mediated decay” were shown to be positively correlated with viremic control. In contrast, pathways related to interferon and immune response, complement activation and cell cycle seemed to be highly correlated with AIDS disease progression. Notably, ribosomal substitutes including RPS20, RPS28, RPS15A, RPS25, RPS6, RPS21, RPS3, RPL36, RPL9, RPL31, RPL23, RPL27, RPL30, RPL29, RPL35 and translational initiation factors including EIF2S3, EIF1, EIF2C2, EIF4G2, EIF4A3 were elevated in NPs compared with chronic progressors by GSEA analysis. This indicates that defects in ribosome components and blockade of translational initiation probably play a significant deleterious role in HIV progression. In agreement with our findings, a recent study reported several ribosomal formation genes including RPL27, PRS7, RPL24, RPS13 and RPL10L was down regulated in NK cells from HIV infected individuals . Our most interesting findings were the identification of gene networks which were highly associated with nonprogression status in VNPs. These patients who retained a functional immune control of circulating viruses represented a very perplexing question pertinent to the LTNP status. Our findings hopefully provide a clue to resolve the mystery.
Nevertheless, we can not rule out the possibility that the changed transcription level of some gene found herein may reflect a consequence rather than a cause of low viral replication in LTNPs. The role of these candidate genes reported herein in controlling HIV-1 infection awaits further investigation.
In summary, our integrative analysis of microarray data relevant to ECs and VNPs provides an overview of the biomarkers and transcriptional patterns in nonpathogen individuals compared with chronic progressors. We observed that HIV controllers had reduced immune activation and interferon response, which collaborates the concept that interferons might play a deleterious role in AIDS disease progression. Moreover, we have identified several key genes responsible for the non-progressive status, some of which were validated in cell lines. We also detected several pathways that may be related to the exacerbation of immunosuppression in AIDS patients, i.e., blockade of translation and dysfunction of T cell homeostasis. Altogether, our integrative genome-wide analysis has provided new knowledge for HIV-1 pathogenesis and immune intervention of the disease progression of AIDS.
DJW performed the majority of work described here. DJW and ML wrote and edited the manuscript. ZJY and XYL collected and examined the data. SC and XYL supervised the project. All authors read and approved the final manuscript.
We are grateful to Charles MA (University of Pittsburgh) for helpful advices on MetaDE Package. We thank National Infrastructure of Microbial Resources (NIMR-2014-3) for providing valuable reagents.
The authors declare that they have no competing interests.
Availability of data and materials
All data in the current study are based on public data available in the Gene Expression Omnibus (GEO) datasets (http://www.ncbi.nlm.nih.gov/geo/).
Consent for publication
All the authors have read and approved the paper and declare no potential conflicts of interest in the paper. If their paper is accepted, all the authors will observe the terms of the license to publish.
Ethics approval and consent to participate
This work was supported by The National Mega-Project for Infectious Disease (2018ZX10301408 SC), The National Key Research and Development program of China (2016YFD0500307 SC), The National Natural Science Foundation of China (81772205 SC), CAMS innovation fund for Medical Sciences (2018-I2M-3-004, 2016-I2M-2-002 XYL and 2017-I2M-1-012 YXZ) and Xiehe Scholar to SC.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 6.Wang B, Dyer WB, Zaunders JJ, Mikhail M, Sullivan JS, Williams L, et al. Comprehensive analyses of a unique HIV-1-infected nonprogressor reveal a complex association of immunobiological mechanisms in the context of replication-incompetent infection. Virology. 2002;304(2):246–64.CrossRefGoogle Scholar
- 8.Yamada T, Iwamoto A. Comparison of proviral accessory genes between long-term nonprogressors and progressors of human immunodeficiency virus type 1 infection. Adv Virol. 2000;145(5):1021–7.Google Scholar
- 26.Emu B, Sinclair E, Favre D, Moretto WJ, Hsue P, Hoh R, et al. Phenotypic, functional, and kinetic parameters associated with apparent T-cell control of human immunodeficiency virus replication in individuals with and without antiretroviral treatment. J Virol. 2005;79(22):14169–78.CrossRefGoogle Scholar
- 38.Guergnon J, Dalmasso C, Broet P, Meyer L, Westrop SJ, Imami N, et al. Single-nucleotide polymorphism-defined class I and class III major histocompatibility complex genetic subregions contribute to natural long-term nonprogression in HIV infection. J Infect Dis. 2012;205(5):718–24.CrossRefGoogle Scholar
- 43.Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stati Appl Genet Mol Biol. 2005;4:17.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.