Epithelial-mesenchymal transition markers screened in a cell-based model and validated in lung adenocarcinoma
Re-capture of the differences between tumor and normal tissues observed at the patient level in cell cultures and animal models is critical for applications of these cancer-related differences. The epithelial-mesenchymal transition (EMT) process is essential for tumor migratory and invasive capabilities. Although plenty of EMT markers are revealed, molecular features during the early stages of EMT are poorly understood.
A cell-based model to induce lung cell (A549) EMT using conditioned medium of in vitro cancer activated fibroblast (WI38) was established. High-throughput sequencing methods, including RNA-seq and miRNA-seq, and advanced bioinformatics methods were used to explore the transcriptome profile transitions accompanying the progression of EMT. We validated our findings with experimental techniques including transwell and immunofluorescence assay, as well as the TCGA data.
We have constructed an in vitro cell model to mimic the EMT in patients. We discovered that several new transcription factors were among the early genes (3 h) to respond to cancer micro-environmental cues which could play critical roles in triggering further EMT signals. The early EMT markers also included genes encoding membrane transporters and blood coagulation function. Three of the nine-examined early EMT hallmark genes, GALNT6, SPARC and HES7, were up-regulated specifically in the early stages of lung adenocarcinoma (LUAD) and confirmed by TCGA patient transcriptome data. In addition, we showed that miR-3613, a regulator of EGFR pathway genes, was constantly repressed during EMT progress and indicative of an epithelial miRNA marker.
The CAF-stimulated EMT cell model may recapture some of the molecular changes during EMT progression in clinical patients. The identified early EMT hallmark genes GALNT6, SPARC and HES7and miR-3613 provide new markers and therapeutic targets for LUAD for the further clinical diagnosis and drug screening.
KeywordsEMT Lung adenocarcinoma RNA-seq miRNA-seq WGCNA
Differentially expressed genes
Differentially expressed miRNA
Weighted gene coexpression network analysis
Tumor growth is not only determined by cancer cells proliferation, but also relies on tumor environment, which recently was considered as a target for new anti-metastatic therapies . A subpopulation of cancer adjacent fibroblast can be activated by a diverse set of growth factors secreted from cancer cells [2, 3, 4]. The activated fibroblasts, termed as cancer-associated fibroblasts (CAFs), are the most abundant stromal cells in tumor microenvironment that could secret a wide spectrum of chemokines and cytokines into the invasive margins of desmoplastic cancers to promote tumor growth and progression [5, 6, 7, 8, 9, 10, 11]. Epithelial-mesenchymal transition (EMT) is a reversible biological process indispensable for development . EMT is reactivated during cancer progression [12, 13, 14, 15, 16], which includes initiation, primary tumor growth, invasion, dissemination and metastasis to colonization, as well as acquisition of therapeutic resistance [17, 18, 19]. CAFs have been reported to stimulate cancer EMT by activating cellular signaling pathways that increase the invasive features of cancer cells [20, 21, 22].
Cell-based models are widely used for EMT studies [23, 24]. Cytokines such as transforming growth factor (TGF)-β are frequently applied for inducing EMT of various epithelial cell types [25, 26]. Besides, conditioned medium of cultured CAFs from cancer tissues of patients have been collected to induce EMT states of epithelial cells [27, 28, 29]. In recent years, the rapid accumulation of genome-wide data enabling direct comparisons between the disease and control samples, such as TCGA database, has created an unprecedented opportunity for identification of potential biomarkers and therapeutic targets for cancers [30, 31]. A combination of TCGA- and cell-based screening should expedite the translational medicine process.
To explore this possibility, we co-cultured A549 and WI-38 cells, and then the medium of WI-38 fibroblasts was collected for A549 EMT induction, mimicking the condition of CAF-induced EMT as previously reported . We found that the prototypical EMT markers, the induction of vimentin and repression of E-cadherin were both present in the induced A549 cells. We then gained a comprehensive view of the transcriptomic changes of lung cancer cells during EMT by applying RNA-seq and microRNA-seq (miRNA-seq). The two co-expression modules of genes were specifically upregulated and one miRNA was constantly downregulated at early EMT stages, providing potential biomarkers and therapeutic targets. By analyzing LUAD dataset from TCGA, we found three EMT markers (GALNT6, SPARC and HES7) among the nine in vitro identified genes with known function are also up-regulated in specifically in early stage lung adenocarcinoma patients. These results support the biological relevance of our cell-based screening model for future study of early EMT mechanism and biomarkers, and possibly for drug screening.
Human LUAG A549 cells (CRM-CCL-185) and human lung fibroblast WI38 cells (CCL-75) were obtained from American Type Culture Collection (Manassas, VA, USA) in 2013. These cell lines have been authenticated by short-tandem repeat analyses. They are free of mycoplasma contamination. A549 cells were cultured in RPMI-1640 medium (Gibco, Long Islands, NY), while WI38 cells were cultured in IMDM (Gibco) at 37 °C in a humidified atmosphere of 5% CO2. These cell culture media were also supplemented with 10% fetal bovine serum (Hyclone, Logan, UT, USA),penicillin (100 U/mL), and streptomycin (100 μg/mL).
Tumor cell Transwell invasion assay
Appropriate matrigel (Corning) was used to pre-coated the filters with 8-μm pore size between the upper and bottom chambers of the Transwell apparatus (Corning). After the matrigel solidified at 37 °C overnight, A549 cells were seeded into the upper chambers and then control medium and CAF conditional medium were added into the bottom Transwell chamber and cells were incubated at 37 °C for different time points. Cells on the upper chambers were fixed with 100% methanol for 20 min, stained in DAPI (Sigma) for 10 min and washed with PBS. Cells remaining on the surface of the filter were swabbed with a cotton swab. The number of cells invaded into the lower surface of the polycarbonate filter was counted at 100× magnification under a light microscope.
For immunofluorescence staining, cells were grown on a Glass Bottom Cell Culture Dish (Nest, Wuxi, China) until 50–60% confluence, fixed with 4% paraformaldehyde and permeabilized with 0.3% Triton X-100. After washing three times with cold PBS, cells were incubated with anti-E-cadherin (Invitrogen, Carlsbad, USA) and anti-Vimentin antibodies (Abcam, Cambridge, UK) at 4 °C for one hour, followed by Alexa Fluor 488-labeled and 594-labeled secondary antibody (Proteintech, Wuhan, China) for one hour, and counterstained with DAPI (Sigma, St Louis, USA). Images were subsequently captured using a confocal microscope (Leica TCS SP5, Mannheim, Germany).
MiR-3613 mimic experiment
Human LUAG A549 cells were obtained from American Type Culture Collection (Manassas, VA, USA) and cultured in RPMI-1640 medium (Gibco, Long Islands, NY). MiR-3613-3p mimic and corresponding negative control (random sequences) were purchased from GenePharma (Suzhou, China). Cells were transfected with the miR-3613 mimic, negative control (NC) using Lipofectamine 2000 transfection reagent (Invitrogen, Carlsbad, CA, USA). Opti-MEM I Reduced Serum Medium (Gibco, Grand Island, NY, USA) was used to dilute Lipofectamine 2000 and nucleic acids. The detailed sequence information is presented in Additional file 6: Table S1.
Transcriptome sequencing of 12 RNA samples from A549 cells collected at different time points was carried out. Libraries were prepared using RNA-seq Library Preparation Kit for Whole Transcriptome Discovery (Gnomegen), and Balancer NGS Library Preparation Kit for small/microRNA (GnomeGen) following manufacture’s instruction. The libraries were applied to illumina NextSeq 500 system for 151 nt pair-end sequencing by ABlife Inc. (Wuhan, China).
Clean reads were aligned to the human-hg19 genome using TopHat2 . Reads with only one genomic location were preserved for RPKM (reads per kilobase of exon model per million mapped reads) calculation . Differentially expressed genes (DEGs) were analyzed by edgeR . For each gene, the p-value was computed and the significance threshold to control FDR at a given value was calculated.
Weighted gene correlation network analysis (WGCNA)
To get the expression module and distinguish genes from a union set by expression feature, we use the weighted gene correlation network analysis (WGCNA) . RPKM files of DEGs by any pair were used as the input. The output is the gene modules according to their expression pattern. For each gene module, eigengene was chosen to represent the expression pattern.
Functional enrichment analysis
Gene Ontology (GO) and KEGG enrichment analysis was performed with KOBAS 2.0 . Hypergeometric test was performed with robust FDR correction to obtain an adjusted P-value between certain tested gene groups and genes annotated in the reference genome.
Total RNA was prepared from A549 cells with TRIzol Reagent (Life Technology) according to the manufacturer’s instructions. DNA was eliminated by DNase I treatment and RNA were purified by sequential phenol-chloroform extraction and isopropanol precipitation, and dissolved in sterile RNase-free water. Complementary DNA was synthesized from 4 μg total RNA with random hexamers and quantitative real-time PCR analysis was performed with SYBR green real-time PCR mix (Toyobo) in a real-time detection system (Bio-Rad). GAPDH and U6 genes were used as the internal control genes for mRNA and miRNA, respectively. The primers used in this study are listed in Additional file 6: Table S1.
MiRNA targets prediction
For each miRNA, we predict the target mRNAs using two software, TargetScan (version 7.1)  and miRanda (version 3.3)  with default parameters. The results from the two methods were combined to generate a complete list of miRNA target genes.
TCGA data analysis
Transcriptome profiling and clinical data of LUAD patients were downloaded from The Cancer Genome Atlas database (https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga). Expression differences between normal and cancer tissues were analyzed using edger . For each gene, the p-value was computed and the significance threshold to control FDR at a given value was calculated.
Activated WI-38 CM induced A549 cell epithelial-mesenchymal transition
The expression of other mesenchymal markers (MAML3, NOTCH3, SMAD2, TGFB1, TWIST1and ZEB1/ZEB2) showed a similar trend (Fig. 2b). Except for SMAD2, ZEB1 and ZEB2, we observed the up-regulation of mesenchymal genes as early as 3-h treatment. Except ZEB2, the other six EMT markers showed a significant up-regulation at 24 h, but not at 72 h (Fig. 2b), which indicated a desynchronization between EMT phenotypes and marker gene expression. ZEB2 exhibited upregulation at 72 h (Fig. 2b). The fact that the systematically induced expression of EMT-related genes begins before phenotypical changes compelled us to investigate further into the molecular features of cancer cells at early EMT stages. These results also supported that the in vitro CAF-CM system is effective in inducing A549 EMT and suitable for the analysis of early EMT markers.
The temporal patterns of differentially expressed genes in CAF-induced A549 EMT
Top terms by Gene Ontology (GO) enrichment analysis of the DEGs contain EMT-related pathways, such as cell-cell signaling, inflammatory response, signal transduction and cell adhesion (Fig. 3b). A couple of terms associated with neuronal functions (synaptic transmission and negative regulation of neuron apoptotic process) are also high on the list (Fig. 3b), which could be related to the morphological changes that A549 cells undergo during EMT.
The results confirmed the conclusion that CAF treatment induces A549 cell EMT, and further suggested that CAF-promoted EMT is characterized by transcriptional change of genes at early stages prior to the appearance of EMT phenotypes, which is consistent with the expression patterns of EMT markers shown in Fig. 2. All six EMT markers were up-regulated upon CAF-induction, although only one of them was DEG (Additional file 1: Figure S1A-F). Interestingly, the epithelial marker gene CHD1 encoding E-cadherin was only reduced at 3 h of CAF induction but increased afterward (Additional file 1: Figure S1G), indicating the presence of post-transcriptional controls for production of E-cadherin protein.
To further analyze the temporal pattern of gene regulation during CAF induced EMT, we analyzed co-regulated genes between adjacent time points from 3 to 24 h. The overlap of down-regulated genes was very low (Additional file 2: Figure S2A), indicating that during the early hours of EMT progression distinct groups of genes were down-regulated at different time points. Up-regulated genes showed a higher level of overlap, especially towards the later time points (Additional file 2: Figure S2B).
WGCNA analysis revealed four CAF-induced and two CAF-repressed expression modules
WGCNA was applied to identify module eigengenes (MEs) response to CAF-CM treatment. The1346 DEGs were clustered into seven modules (Fig. 3c). The eigengene bar plots showed that the four modules in the same branch (turquoise, red, yellow and blue) exhibit a time-dependent gene upregulation under the CAF-induced condition (Fig. 3d-g). First, the turquoise module contains the largest number of 932 genes, with a trend of gene upregulation in CAF treated cells from 6 to 72 h. The most drastic upregulation occurs between 24 and 48 h (Fig. 3d). This timing is consistent with the morphological changes and expression of EMT marker genes. GO analysis also showed that these genes are associated with classical EMT pathways, including inflammatory response, cell adhesion and cell-cell signaling (Additional file 7: Table S2). The red module contains genes that are deregulated only at 72 h (Fig. 3e). Judging from the cell morphology and EMT marker expression at this time point, these genes may be responsible for the maintenance of the mesenchymal state of the cells, rather than promoting EMT.
The yellow and blue modules, on the other hand, have shown the trend of gene upregulation at earlier time points. Genes in the yellow module are upregulated in CAF treated cells mainly between 3 and 24 h, and peaked at 12 h (Fig. 3f). The blue module showed a similar trend which peaked at 24 h and reverted at 48 h (Fig. 3g). Genes in these two modules are significantly deregulated at earlier time points and within a smaller window of time, thus we hypothesize that they may contain early EMT markers, which is further analyzed and described below.
The green and brown modules showed a CAF-repressed pattern. Brown module showed a time-dependent repression from 3 to 24 h, whereas green module showed the most pronounced repression at 3 h of CAF-treatment (Additional file 3: Figure S3A). The CAF-downregulated genes were not enriched in any EMT related pathways as expected. The green module was enriched in genes involved in positive regulation of apoptotic process and signal transduction, and the brown module was enriched in genes in DNA-dependent transcription (Additional file 3: Figure S3B).
Biological pathway analysis of blue and yellow modules reveals early EMT markers
We next explored in more detail the two modules that highlight early gene deregulation during EMT. Functional analysis of the 109 blue module genes resulted in the enrichment of membrane located and transmembrane transport related genes (Additional file 8: Table S3). We noticed that only 10 of them were annotated with a GO biological process term, and six of them enriched in transmembrane transport (Additional file 8: Table S3). A total of 26 genes were enriched in integral to membrane (Additional file 9: Table S4).
These data highlighted several classes of early EMT markers, the transmembrane transporters and blood coagulation in the blue module, and the transcription factors in the yellow module. Considering that expression of the latter was peaked 12 h early than the former, it is possible that these three classes of markers are separately regulated and underline separable and interconnected EMT promoting mechanisms. For example, the early EMT transcription factors could drive the induction of the transmembrane transporters and blood coagulation.
miR-3613 regulates EGFR pathway during early EMT
We then performed functional analysis on the predicted target genes of miR-3613 (Fig. 6c). In the biological process, the second highest term was the EGFR pathway, which has been shown to be activated to promote EMT . Given the target gene expression changes, we plotted out the potential regulatory network between miR-3613 and its EGFR pathway targets at different time points (Fig. 6d). Starting as early as 3 h of CAF treatment, miR-3613 is steadily down-regulated during the early progression of A549 cell EMT, which potentially regulates a number of known EMT regulator genes throughout the process.
To further explore the impacts of miR-3613 on EGFR pathway, we transfected the miR-3613 mimic into A549 cells to elevate the cellular level of miR-3613 (Additional file 4: Figure S4C). Seven miR-3613 target genes in EGFR signaling pathway were selected to check their expression after miR-3613 mimic transfection. Five out of the seven genes, including CDKNB1, ERBB4, FGFR4, GRB7, and PIGR, showed significantly down regulation after miR-3613 overexpression (Fig. 6e), implying that miR-3613 play important roles in EGFR signaling regulation. We propose that miR-3613 may serve as an early miRNA marker for EMT, although further studies are required to further pursue miR-3613 regulation.
The expression pattern of early EMT hallmark genesGALNT6, SPARC and HES7 in vitro is recaptured in LUAD TCGA samples
In this study, we described an in vitro cell-culture model that mimics EMT of A549 cells by induction of CAF-CM that was produced by human fetal lung fibroblast WI-38 cells activated by A549 cell culture medium. This model is easy to operate and can be robustly repeated, and has been repeated in another lung cancer cell (PC9) . High-throughput sequencing revealed a time-dependent upregulation of genes from two WGCNA modules; one peaked at 12 h and the other at 24 h of induction. In both cases, the induced expression dropped after the peak stages. These two modules of genes are highly enriched in transmembrane transport, blood coagulation, and transcription regulation. We then analyzed the expression of nine of the annotated genes in LUAD patient samples obtained by TCGA, and found three of them were specifically upregulated in cancer at early but not advanced LUAD stages.
Molecular profiling of clinical specimens has provided abundant information on diagnostic biomarkers and therapeutic targets of cancer in EMT research [42, 43, 44, 45]. However, due to the limitations of sample collection and time points, current EMT markers mainly focus on late stages when the EMT has already been accomplished. Besides, clinical findings were unable to be applied for further study in the same system. However, cell models were the optimal choices as they are easy to operate. Different from previous successful cases of EMT models [25, 27, 28], our cell-based EMT model utilizing CM of cultured CAFs from in vitro cultured cells was a reliable system allowing assessment of similarities and differences between the cell line and primary human lung cancer.
In the blue module, two transmembrane proteins together playing a central role in the regulatory network are involved in cancer metastasis. GALNT6 has been implicated in the metastasis of multiple cancers [46, 47]. It is upregulated in pancreatic cancer cells, and its silence reduces the level of EGFR2 and cell viability . Besides, SPARC is a secreted matricellular protein governing cell adhesion, proliferation and differentiation, and driving pathological responses in non-small cell lung cancer . SPARC may also serves as an unfavorable prognostic marker in pancreatic cancer, as its overexpression may improve cell invasion .
The yellow module revealed transcription regulators involved in EMT initiation. CITED1 is the most well-studied in melanoma progression that can be activated by the TGFβ-SMAD2 pathway and promote amoeboid migration of melanoma cells . Intriguingly, this metastatic behavior is distinct from EMT, thus making us wonder whether the same or a different pathway is involved in CITED1 regulated EMT. Another bHLH transcription factor HES7regulated by Notch signaling pathway  has been shown to express in cervical cancer , and its roles in cancer progression is unclear.
Many key EMT transcription factors are under miRNA regulation, such as the metastasis suppressive function of miR-200 family in targeting the TGFβ/ZEB pathway [53, 54]. Our data have revealed another potential EMT regulatory pathway, in which the miR-3613 regulation of EGFR pathway may contribute to the promotion of EMT by CAF conditioned medium. EGFR activation has been proven to promote cancer cell proliferation, EMT and drug resistance [40, 55, 56, 57], and multiple EGFR pathway genes have been proposed as anti-metastatic drug targets. MiR-3613 was recently identified to overexpress in ovarian cancers and down-regulate PTEN , a regulator of PI3K-Akt signaling downstream of EGFR. The regulation between miR-3613 and its potential targets may present novel therapeutic targets to overcome drug resistance caused by EGFR mutation (Additional file 5: Figure S5).
Generally, this study presents a new reliable in vitro model for CAF-induced EMT, which is supported not only by the cell morphology and EMT markers, but also by identification of classic EMT-related functional pathways. Moreover, this model allows a time-dependent monitoring of the EMT progress, which led to the identification of the early EMT hallmark genes. Strikingly, three early EMT hallmark genes GALNT6, SPARC and HES7 show the similar stage-specific expression pattern in LUAD TCGA samples, which can be further studied for diagnosis of early stages of lung cancer and for developing anticancer drugs (Additional file 5: Figure S5). Furthermore, the cell-model could facilitate the future studies for screening additional biomarkers and cell-based drugs. This study proves that a combination of the cell-based study and the available patient genome-wide data can greatly expedite the translational medicine process.
This study presented a reliable cell-based EMT model and several classes of novel early EMT markers identified by this model. Three of the early EMT markers were confirmed by TCGA LUAD transcriptome data. Results from the combination of cell-based screening and patient data validation introduce new prognostic markers and therapeutic targets for LUAD, as well as a cell-based model ready for studying their mechanisms of action and for drug screening.
JS designed and conducted experiments. WW and JZ performed analysis of RNA-seq and miRNA-seq data. YingyanW, YingziW and YQ performed cellular experiments. JS, WW, XW and YZ wrote the manuscript. YZ and QW designed the original research and made critical revisions. All authors have read and approved the final manuscript.
This work was supported by grants from the National Natural Science Foundation of China (#91129733, #81330060 and #81502702), the National High Technology Research and Development Program (863 Program Projects) of China (#2015AA020409), Science and Technology Plan Foundation of Liaoning Province (2014225003), and Special Grant for Translational Medicine, Dalian Medical University (#2015001) to QW, and grants from ABLife, Inc. (#ABL2014–12006) to YZ. The funder did not participate in the design of the study and collection, analysis, interpretation of data or in writing the manuscript.
Ethics approval and consent to participate
Not applicable. The cell lines used in our experiments do not require ethical approval.
Consent for publication
The authors declare that they have no competing interests.
- 20.Elkabets M, Gifford AM, Scheel C, Nilsson B, Reinhardt F, Bray MA, Carpenter AE, Jirstrom K, Magnusson K, Ebert BL, et al. Human tumors instigate granulin-expressing hematopoietic cells that promote malignancy by activating stromal fibroblasts in mice. J Clin Invest. 2011;121(2):784–99.PubMedPubMedCentralGoogle Scholar
- 23.Spinner NB, Shapiro IM, Cheng AW, Flytzanis NC, Balsamo M, Condeelis JS, Oktay MH, Burge CB, Gertler FB. An EMT–driven alternative splicing program occurs in human breast Cancer and modulates cellular phenotype. PLoS Genet. 2011;7(8):e1002218.Google Scholar
- 28.Nishioka M, Venkatesan N, Dessalle K, Mogas A, Kyoh S, Lin TY, Nair P, Baglole CJ, Eidelman DH, Ludwig MS, et al. Fibroblast-epithelial cell interactions drive epithelial-mesenchymal transition differently in cells from normal and COPD patients. Respir Res. 2015;16:72.PubMedPubMedCentralGoogle Scholar
- 30.Cancer Genome Atlas Research N. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455(7216):1061–8.Google Scholar
- 31.Cancer Genome Atlas Research N. Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012;489(7417):519–25.Google Scholar
- 32.!!! INVALID CITATION !!! [31-33].Google Scholar
- 33.Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):295–311.Google Scholar
- 38.Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120(1):15–20.Google Scholar
- 39.Enright AJ, Bino J, Ulrike G, Thomas T, Chris S, Marks DS. MicroRNA targets inDrosophila. Genome Biol. 2004;5(1):R1.Google Scholar
- 40.Sherbet GV. EGFR Signalling in EMT; 2013.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.