Background

Renal cell carcinoma (RCC) is a heterogenous disease, which is composed of ccRCC and nccRCC [1]. Over the past few years, targeted therapies have significantly improved overall survival (OS) and relapse free survival (RFS) of patients with ccRCC [2]. However, ascribing to relatively low incidence (25–30%) and rare clinical trails of nccRCC, the optimal targeted therapies for nccRCC patients still remain uncertain [3]. ChRCC, taking up 4–5% of RCC, is the second common subtype of nccRCC. Although the tumor grade or stage of ChRCC is relatively low, there is no significant difference between patients with localized ChRCC and ccRCC in 5-year cancer-specific survival rates (P = 0.98) [4]. Due to the poor outcomes of ChRCC, it’s urgent to identify novel molecular biomarkers to evaluate the prognosis of ChRCC patients, which might help to assess the malignancy and provide therapeutic potential for this disease.

WGCNA is a method commonly used to explore the complex relationships between genes and phenotypes. This method is able to transform gene expression data into co-expression modules and provide insights into signaling networks that may be responsible for phenotypic traits of interests [5,6,7]. WGCNA is widely used in various biological processes, such as cancer, neuroscience and genetic data analysis, which is quite helpful for the identification of potential biomarkers or therapeutic targets [8,9,10,11]. Not only can it analyses mRNA level of tumor samples, but also work on microRNA or lncRNA datasets of neoplasms to find candidate biomarkers for prognosis and treatment [12, 13].

In this study, WGCNA method was firstly used to analyze clinic traits and gene expression data of ChRCC samples provided by TCGA database to identify key genes associated with tumor prognosis and progression. Our findings may be very beneficial to assess malignant potential of ChRCC and offer therapeutic methods to this neoplasm.

Methods

Data sources and data preprocessing

Gene expression data and patient clinic traits of ChRCC were downloaded from The Cancer Genome Atlas (TCGA) database (https://cancergenome.nih.gov/). Annotation information of microarray was used to match probes with corresponding genes. The average value was calculated out for those genes corresponding to more than one probes, while probes matched with more than one gene were eliminated.

Screening for differentially expressed genes

The “DEseq2” R package was used to screen DEGs between ChRCC samples and paired normal tissues in the expression data. The DEGs threshold was set at a |log2FoldChange| > 0.585 and adj.P.value < 0.05. After DEGs were screen out, flashClust tool package in R language was used to perform cluster analysis of ChRCC samples.

Co-expression network construction and module analysis

Firstly, expression data profile of DEGs was tested by to check if they were the good samples and good genes. Afterwards, power value was screened out by WGCNA [14] algorithm which is implemented in the R software package (http://www.r-project.org/). Scale independence and average connectivity degree of modules with different power value was tested by gradient method (the power value ranging from 1 to 20). The appropriate power value was determined when the degree of independence was above 0.85 and average connectivity degree is relatively higher [8]. Once the power value was determined, the scale-free gene co-expression networks were constructed by WGCNA algorithm. Besides, the corresponding genes information of each module was extracted out. The minimum number of genes in each module was set as 50 for the high reliability of the results.

Interaction analysis of co-expression modules of ChRCC

After co-expression modules were identified by WGCNA algorithm. Heatmap was painted to describe the strength of the interactions between different modules by heatmap function embedded in R language.

Construct module-trait relationships of ChRCC and key module identification

The correlation between module eigengenes (MEs) and phenotype (clinic traits) was used to evaluated module-trait associations. MEs were considered as the major component in the principal component analysis for each gene module. We calculated the correlation between MEs and clinical trait to identify the relevant module. Gene significance (GS), which was defined as the log10 transformation of the P value (GS = lgP) in the linear regression between gene expression and clinical information, was calculated to evaluate correlation strength. Modules with the highest correlation coefficients among all modules were usually considered as the key module and selected for further analysis.

Functional enrichment analysis of the key module

The information of genes in key modules was upload to Enrichr online database to perform GO and KEGG pathway analysis [15, 16]. Analysis results were extracted out under the condition of P < 0.05 after correction [17]. The top 10 GO terms were visualized if there were more than ten terms, so as to KEGG pathways.

Selection of candidate hub genes

Hub genes, with highly intramodular connectivity, have been shown to be functionally significant. In this study, candidate hub genes were defined by module connectivity, measured by module membership (MM) > 0.8 and clinical trait relationship, measured by significance of the Pearson’s correlation (GS. Pathologic stage > 0.2) [18]. Furthermore, we uploaded all genes in the hub module to the STRING database, choosing confidence > 0.4 to construct protein–protein interaction (PPI). After setting degree cut-off = 5, node score cut-off = 0.2, k-core = 2, and max. depth = 100, the most significant sub-module was selected by using plug-in MCODE [9]. Genes both in co-expression network and MCODE sub-module were regarded as candidate hub genes for further analysis.

Hub genes identification and validation

Another independent dataset GSE15641 containing six ChRCC samples and 23 normal tissues was extracted and performed on GEO2R analysis (https://www.ncbi.nlm.nih.gov/geo/). |log2FoldChange| > 1 and P.Value < 0.05 as criterion, DEGs were identified. Candidate hub genes were mapped to DEGs in GSE15641 to find the real hub genes. For further validation, real hub genes were upload to online TCGA database to perform survival analysis (http://gepia.cancer-pku.cn/). ChRCC patients were divided into two groups according to median expression of each hub gene (high vs. low) and Kaplan–Meier survival analysis was conducted.

Statistical analyses

Two-tailed Student’s t-test was used for significance of differences between groups. Statistical analyses were performed with GraphPad Prism 6.02. Statistical significance was set at probability values of P < 0.05.

Results

DEGs screening

The gene expression profile of 66 ChRCC samples were download from TCGA database. A total of 2215 DEGs (1741 up-regulated and 384 down-regulated) between ChRCC samples and normal tissues were screened out under the threshold of |log2FoldChange| > 0.585 and adj.P.Value < 0.05. These 2215 DEGs were then selected for subsequent analysis.

Construction of co-expression modules of ChRCC

All DEGs were included for constructing co-expression modules by WGCNA algorithm. The “flashClust tool” package was used to perform the cluster and trait analysis and results were shown in (Fig. 1). First of all, the appropriate power value was screened out (Additional file 1: Fig S1). When the power value was equal to 5, scale free networks were constructed with independence degree up to 0.85 and the relatively higher average connectivity (mean connectivity = 6.04). As a result, eight distinct gene co-expression modules were identified by the appropriate power value (5) in ChRCC after excluding the grey module. Constructed modules painted with different colors and cluster trees of DEGs were shown in (Fig. 2). Gene numbers and module names were shown in Table 1. Interactions between eight co-expression modules were subsequently analyzed. The heatmap demonstrates the Topological Overlap Matrix (TOM) among all genes in the analysis. Light color represents low overlap and darker red color represents higher overlap. As a result, each module showed independent validation to each other (Fig. 3).

Fig. 1
figure 1

Clustering dendrogram of 66 tumor samples and heatmaps of clinical traits. The clustering was based on the expression data of DEGs between ChRCC samples and non-tumor samples. The color intensity in heatmaps was proportional to older age, higher pathological stage. In neoplasm and survival status, white means patient neoplasm free or live, while red means patient with tumor or dead

Fig. 2
figure 2

Clustering dendrogram of DEGs, with dissimilarity based on topological overlap, together with assigned module colors

Table 1 The number of genes in eight constructed modules
Fig. 3
figure 3

Visualizing the gene network using a heatmap plot. The heatmap depicts the Topological Overlap Matrix (TOM) among all genes in the analysis. Light color represents low overlap and progressively darker red color represents higher overlap. The gene dendrogram and module assignment are also shown along the left side and the top

Identification of key modules corresponding to clinic traits

The correlations between module eigengene and clinic traits were shown in (Fig. 4). The brown module was selected as key module for taking up top three highest correlations with progression and survival of ChRCC (R2 = 0.48 and P = 5e−05 with pathologic stage, R2 = 0.41 and P = 6e−04 with neoplasm status, R2 = 0.52 and P = 8e−06 with living status). Afterwards, we plotted scatter plots of GS vs. MM (module membership) in the brown modules with clinic traits respectively (Additional file 2: Fig S2).

Fig. 4
figure 4

Module-trait heatmap. Each row corresponds to a module eigengene, column to a trait. Each cell contains the corresponding correlation and P-value

Functional enrichment analysis of genes in the key module

To obtain further insight into the function of genes in the hub module, all genes in the brown module were uploaded to Enrichr online database to conduct GO and KEGG pathway analysis. According to P-value of each term, top 10 biological process and KEGG pathways were extracted out (Additional file 3: Table S1 and Additional file 4: Table S2) and visualized. GO analysis results showed genes in brown module significantly enriched in mitotic cell cycle transition, mitotic spindle assembly, mitotic spindle organization, regulation of cell cycle process, etc. KEGG pathway analysis revealed cell cycle, oocyte meiosis, progesterone-mediated oocyte maturation pathways were significantly enriched (Fig. 5). All these results implied that dysfunctional mitotic cell cycle may contribute to tumorigeneses of ChRCC.

Fig. 5
figure 5

Functional analysis. a Gene ontology analysis of genes in brown module. b KEGG pathway enrichment analysis of genes in brown module. The x-axis shows the − log10 (P-value) of each term and the y-axis shows the GO and KEGG pathway terms

Selection of candidate hub genes

Under the condition of MM > 0.8 and GS. Pathologic stage > 0.2, 39 genes in brown module were taken out (Fig. 6a). PPI network were constructed under the cutoff of confidence > 0.4, 32 genes were filtered out after MCODE process (Fig. 6b). 29 candidate genes identified both in co-expression network and MCODE sub-module were show in Table 2.

Fig. 6
figure 6

Candidate hub gene identification. a Scatter plot of module eigengenes in brown module. The vertical line represents cutoff of module membership = 0.8, the horizontal line represents cutoff of gene significances in pathologic stage = 0.2. Genes on upper right were selected out. b PPI network of genes in MCODE sub-module

Table 2 Candidate hub genes identified by co-expression and MCODE methods

Identification and validation of hub genes

1794 DEGs (884 up-regulated, 910 down-regulate) were identified in GSE15641. All candidate genes were mapped to DEGs in GSE15641, four genes (SKA1, ERCC6L, POLQ, GTSE1) were found to be differentially expressed both in TCGA and GSE15641 datasets. While expressions of POLQ in two datasets were opposite (up-regulated in TCGA, down-regulated in GSE15641). Thus, SKA1, ERCC6L and GTSE1 were regarded as the real hub genes in ChRCC by WGCNA analysis. Expressions of real hub genes between ChRCC samples and normal tissues were shown in (Fig. 7). Survival analysis of these genes showed over-expressions of SKA1 and ERCC6L are significantly related to shorter overall survival time (Fig. 8).

Fig. 7
figure 7

Expression levels of real hub genes. a Expression levels of real hub genes between tumor samples and normal tissues in TCGA database. b Expression levels of real hub genes between tumor samples and normal tissues in GSE15641 dataset

Fig. 8
figure 8

Validation of real hub genes by survival analysis between high expression group (red line) and low expression group (blue line). a SKA1, b ERCC6L, c GTSE1

Discussion

ChRCC is an uncommon, but not rare, malignant disease representing ~ 5% of histologic spectrum of cancer arising from kidney [19]. However, patients with ChRCC have a relatively low risk of tumor progression and cancer-specific death [20], the clinical outcomes of target therapies between ChRCC and ccRCC were not significantly different in metastatic disease [21]. To the best of our knowledge, there are no effective treatments for patients with metastatic ChRCC or any preventions for tumor recurrence. Thus, it’s important to get a better understanding of molecular mechanism of ChRCC and identify potential biomarkers to evaluate the biological behavior of this malignancy.

WGCNA has many prominent advantages over other methods since the analysis explores associations between co-expression modules and clinic traits and the results had much higher reliability and biological significance [22]. In this study, a total of eight co-expression modules were constructed by the 2215 DEGs from the 66 human ChRCC samples by WGCNA method, which was used to detect the relationship between ChRCC transcriptome and clinic traits. We calculated the correlations between co-expression modules and pathologic stage, neoplasm status and living status. Brown module with the highest correlation with clinic traits were regarded as the key module to explore primary cause for disease progression.

For further analysis, genes in brown module with high module membership (> 0.8) and significance of correlations with pathologic stage (> 0.2) were filtered out. Meanwhile, MCODE analysis was performed on brown module to find a panel of genes with high connective degrees [9]. 29 candidate hub genes were identified by overlapping results of co-expression analysis and MCODE analysis. GO and KEGG analysis showed that dysregulation of cell cycle may be the underlying mechanism of tumorigeneses of ChRCC.

29 candidate hub genes were subsequently validated in GSE15641 dataset, four candidate genes (SKA1, ERCC6L, POLQ, GTSE1) differentially expressed both in TCGA and GSE15641 database were considered as real hub genes. After excluding POLQ for conflicting expression level in two databases, two of three hub genes (SKA1, ERCC6L) were found to be significantly correlated with shorter overall survival time under their over expressions.

SKA1 was a microtubule-binding subcomplex of the outer kinetochore which is essential for proper chromosome segregation. It played an important role on tumorigenesis in multiple malignancies [23, 24]. SKA1 over-expression led to cancerization in human prostate epithelial cells via the induction of centriole over-duplication [25]. Depletion of SKA1 inhibited cell proliferation in gastric cancer by blocking cell cycle in S phase [26]. What’s more, SKA1 was proved to be an oncogene in ccRCC which could be down-regulated by antitumor miR-10a-5p transfection [27]. ERCC6L gene, also named PLK1-interacting checkpoint helicase (PICH), was a member of the SNF2 protein family (SWI/SNF catalytic subunit SNF2). It was a mitotic target and substrate of polo-like kinases (PLKs) which regulates multiple processes in mammalian cell mitosis [28]. Downregulation of ERCC6L decreased cell viability in RCC cell lines by blocking mitogen-activated protein kinase (MAPK) signaling pathway and interactions with protein PLK1 [29]. Protein encoded by GTSE-1 modulated cell migrations in an EB1-dependent manner. Up-regulation of GTSE1 expression could be associated with increased invasive potential in breast cancer [30]. Depletion of GTSE-1 enhanced mitotic centromere-associated kinesin (MCAK) activity in mitotic cells, leading to chromosomal instability (CIN) which was presented in most solid tumors [31]. Silencing GTSE-1 expression inhibits proliferation and invasion of hepatocellular carcinoma cells [32]. Based on many studies about three hub genes, we could find their important roles in tumorigenesis and metastasis. Targeting these biomarkers and other candidate hub genes participating in cell cycles could provide therapeutic potentials for ChRCC.

Some limitations exiting in our studies should be mentioned. The most vital genes out of 29 candidate hub genes couldn’t been filtered out under restrictions of the bioinformatics methods. Multi-central and large sample studies on ChRCC expression profiles were needed for validation of our findings. However, this couldn’t be fulfilled due to limited online databases for ChRCC.

Conclusions

Our study used weighted gene co-expression analysis to construct a gene co-expression network, identify and validate the key module and hub genes associated with the progression and prognosis of ChRCC. Functional analysis of hub genes reveal mitotic cell cycle dysregulation could be the underlying mechanism of ChRCC. Three real hub genes including SKA1, ERCC6L and GTSE-1 were identified. SKA1 and ERCC6L were validated in association with poor prognosis of ChRCC. However, further molecular biological experiments are needed to confirm the function of these biomarkers in ChRCC.