A computationally constructed ceRNA interaction network based on a comparison of the SHEE and SHEEC cell lines
- 1.8k Downloads
Long non-coding RNAs (lncRNAs) play critical and complicated roles in the regulation of various biological processes, including chromatin modification, transcription and post-transcriptional processing. Interestingly, some lncRNAs serve as miRNA “sponges” that inhibit interaction with miRNA targets in post-transcriptional regulation. We constructed a putative competing endogenous RNA (ceRNA) network by integrating lncRNA, miRNA and mRNA expression based on high-throughput RNA sequencing and microarray data to enable a comparison of the SHEE and SHEEC cell lines. Using Targetscan and miRanda bioinformatics algorithms and miRTarbase microRNA-target interactions database, we established that 51 miRNAs sharing 13,623 MREs with 2260 genes and 82 lncRNAs were involved in this ceRNA network. Through a biological function analysis, the ceRNA network appeared to be primarily involved in cell proliferation, apoptosis, the cell cycle, invasion and metastasis. Functional pathway analyses demonstrated that the ceRNA network potentially modulated multiple signaling pathways, such as the MAPK, Ras, HIF-1, Rap1, and PI3K/Akt signaling pathways. These results might provide new clues to better understand the regulation of the ceRNA network in cancer.
KeywordsSHEE SHEEC miRNA lncRNA ceRNA
Competing endogenous RNAs
Coding non-coding index
Coding potential calculator
Esophageal squamous cell carcinoma
Fragments per kilobase of exon per million fragments mapped
Long non-coding RNAs
Human esophageal cancer is the sixth leading cause of cancer-related death worldwide . The most frequent subtype, esophageal squamous cell carcinoma (ESCC), has an obvious geographic variation in its distribution, with eastern Asia being a high incidence area [1, 2]. Most ESCC patients are diagnosed at an advanced stage, past the best opportunity for surgical treatment, and their overall 5-year survival rate is less than 15 %. By contrast, patients diagnosed at an early stage can have 10-year overall post-operative survival rates up to 95 % .
The immortalized human esophageal epithelial cell line SHEE and the malignantly transformed esophageal carcinoma cell line SHEEC were established and cultured in consecutive passes by Zeng Yi of the National Institute for Viral Disease Control and Prevention and Shen Zhongying of the Medical College of Shantou University [4, 5]. From a biological perspective, SHEE is similar to a primary cell line and retains its proliferation ability and differentiation potential. It retains the phenotype of primary epithelial cells, such as growth in a monolayer and anchorage-dependent cell aggregation without colony formation in soft agar or tumor formation after transplantation. SHEEC was established through induction of the SHEE cell line using 12-otetradeanoy-lphorbol-13-acetate (TPA). These two cell lines are the most appropriate models for ESCC tumorigenesis.
To date, considerable research has indicated that differential expression of miRNAs has been observed in various types of human cancers, such as lung cancer, gastric cancer, breast cancer, colorectal cancer and ESCC [6, 7, 8, 9]. The expression of miRNAs has been used as a useful biomarker for tumor prognosis and clinical treatment. MicroRNAs can also regulate gene expression at the epigenetic regulation level. They are involved in most aspects of cellular processes, including development, differentiation, growth and apoptosis [10, 11].
Unlike small ncRNAs, increasing evidence indicates that lncRNAs play critical and complicated roles in the regulation of various biological processes, including chromatin modification, transcription and post-transcriptional processing [12, 13, 14, 15]. Interestingly, additional studies have revealed that some lncRNAs can serve as miRNA “sponges” that inhibit interaction with their miRNA targets in post-transcriptional regulation. This means they can be considered competing endogenous RNA (ceRNA). Through gene expression data analysis, Sumazin et al. found >7000 transcripts acting as ceRNAs in glioblastoma tissues . Cesana et al.  revealed the ceRNA function of linc-MD1, which liberates the differentiation factors MEF2C and MAML1 from repression by decoying miR-135 and -133 to control muscle differentiation. Another lncRNA, linc-RoR, acts as an endogenous sponge to mediate miR-145 regulation of self-renewal of human embryonic stem cells [18, 19]. Interestingly, CHRF, a lncRNA connected to cardiac hypertrophy-related factor, was found to act as an endogenous sponge of miR-489, directly binding to and downregulating miR-489 expression levels and, in turn, regulating its target Myd88 expression and hypertrophy .
The above studies identified a new regulatory mechanism in post-transcriptional regulation: ceRNA networks. The mechanism of the ceRNA network is that all types of RNA transcripts (lncRNA, pseudogenes and circular RNAs) could communicate with each other by competing for binding to shared miRNA-binding sites (MREs) .
In this study, we constructed a putative competing endogenous RNA (ceRNA) network by integrating lncRNA, miRNA and mRNA expression based on high-throughput RNA sequencing and microarray data to enable a comparison of the SHEE and SHEEC cell lines. Using Targetscan and miRanda bioinformatics algorithms and the miRTarbase microRNA–target interactions database, we established that 51 miRNAs sharing 13,623 MREs with 2260 genes and 82 lncRNAs were involved in this ceRNA network. Based on a biological function analysis, this ceRNA network may participate in the PI3K/Akt pathway and, consistent with previous reports, may play a modulating role in the regulation of stem-like cells in primary ESCC . These results might provide new clues to better understand the regulation of the ceRNA network in cancer.
Materials and methods
SHEE and SHEEC cells were cultured in Gibco MEM medium supplemented with 100 ml/l fetal bovine serum (containing 100 μg/ml penicillin and 100 μg/ml streptomycin) and incubated at 37 °C in a humidified atmosphere of 50 ml/l CO2. Cells were harvested after growth into a full monolayer and were kept at −70 °C until use.
Total RNA isolation
Total RNA from each sample was extracted using a TRK-1001 Total RNA Purification Kit (LC Sciences) according to the manufacturer’s protocol. Total RNA was quantified on a NanoDrop ND-2000 (Thermo Scientific), and the RNA integrity was assessed using an Agilent Bioanalyzer 2100 (Agilent Technologies).
μParaflo microRNA microarray assay
The microarray assay was performed by a service provider (LC Sciences). The assay started with a 2 to 5 μg total RNA sample that was 3′-extended with a poly(A) tail using poly(A) polymerase. An oligonucleotide tag was then ligated to the poly(A) tail for later fluorescent dye staining. Hybridization was performed overnight on a μParaflo microfluidic chip using a microcirculation pump (Atactic Technologies) [23, 24]. On the microfluidic chip, each detection probe consisted of a chemically modified nucleotide-coding segment complementary to the target microRNA (from miRBase 20.0, http://www.mirbase.org/) or other RNA (control or customer-defined sequences) and a spacer segment of polyethylene glycol to extend the coding segment away from the substrate.
The detection probes were made via in situ synthesis using PGR (photogenerated reagent) chemistry. The hybridization melting temperatures were balanced by chemical modifications of the detection probes. Hybridization used 100 l 6× SSPE buffer consisting of 0.90 M NaCl, 60 mM Na2HPO4, 6 mM EDTA (pH 6.8) and 25 % formamide at 34 °C. After RNA hybridization, tag-conjugating Cy3 dye was circulated through the microfluidic chip for dye staining. Fluorescence images were collected using a laser scanner (GenePix 4000B, Molecular Device) and digitized using Array-Pro image analysis software (Media Cybernetics).
Data were analyzed by first subtracting the background, and then normalizing the signals using a LOWESS filter (locally weighted regression) . The miRNA differential expression based on the normalized signal was analyzed via selective application of Student’s t test. The significance threshold was p < 0.05 and fold-change >2.
Construction of cDNA libraries and high-throughput sequencing
To construct the next-generation sequencing libraries, approximately 3 μg of total RNA was used to deplete ribosomal RNA according to the manufacturer’s protocol for the Human/Mouse/Rat Ribo-Zero rRNA Removal Kit (Epicentre/Illumina). Following purification, the poly(A)- or poly(A) + RNA fractions were fragmented into small pieces using divalent cations at elevated temperatures. The cleaved RNA fragments were reverse transcribed to construct a cDNA library using the dUTP method as described previously . The average insert size for the paired-end libraries was 300 bp (±50 bp). RNA libraries were then sequenced on the Illumina HiSeq 2500 platform using 125-bp paired-end reads at LC Biotech in Hangzhou, China.
The RNAseq data were aligned to hg19 using TopHat v2.0.9  with the default parameters. The mapped reads were assembled using Cufflinks v2.11 [28, 29]. All multiple assembled transcript files (GTF format) were then merged to produce a unique transcriptome set using the Cuffmerge utility provided in the Cufflinks package .
We filtered the assembled novel transcripts from the two pooled cell lines to obtain putative lncRNAs. First, identical and overlapping transcripts were merged to remove redundancy. Then, transcripts overlapping with known gene exons were removed. Only transcripts with a length >200 nt were retained. To identify and eliminate potential known lncRNA transcripts, we compared the merged transcriptome with lncRNA and protein-coding genes in the public authoritative database GENCODE .
To obtain a reliable dataset of putative lncRNAs, single exon models were filtered out. Next, we removed transcripts that were likely to be assembly artifacts or PCR run-on fragments according to their class code (annotated by Cuffmerge). Among the different classes defined by Cufflinks , only those annotated by “u”, “i”, “x”, “o” and “j” were retained. Extremely low gene expression is generally considered to be transcriptional noise . On average, 79 % of the initial reads with a quality score >30 could be aligned to the hg19 assembly of the human genome sequence. We used CPC (http://cpc.cbi.pku.edu.cn)  and the coding non-coding index (CNCI)  to assess the protein-coding potential of each novel transcript. Those putative transcripts with a CPC score < −1 and a CNCI score < −1 were retained as candidate lncRNAs for further analysis.
Transcript differential expression analysis
Expression levels of all of the transcripts, including putative lncRNAs and mRNAs, were quantified as fragments per kilobase of exon per million fragments mapped (FPKM) using the Cuffdiff program from the Cufflinks package. Differential gene expression was determined using Cuffdiff with a p-value of <0.05 and q-value of <0.05.
Construction of ceRNA network
The putative miRNA–lncRNA interactions were evaluated using the algorithms of Targetscan version 6.2 (http://www.targetscan.org/) and miRanda version 3.3a (http://www.microrna.org/microrna/home.do). The miRNA binding-site prediction in lncRNAs was based on their full-length sequence in consideration of their non-coding properties. High-confidence miRNA–lncRNA pairs had a Targetscan context+ score percentile >50 and miRanda max energy < −20. To reduce the false positives, at least two miRNA binding-sites were retained with each lncRNA. The mRNAs that were targeted by miRNAs with experimental support were from miRTarbase (http://mirtarbase.mbc.nctu.edu.tw/). The ceRNA relationships were integrated using an in-house Perl script. The information including all of the above interactions was imported into Cytoscape software version 3.3.0 (http://www.cytoscape.org) to construct a regulatory network.
Aberrantly expressed miRNAs
Aberrantly expressed lncRNAs and protein-coding genes
To systematically characterize the transcriptional changes between the SHEE and SHEEC cell lines, we performed high-throughput RNA sequencing according to previously published methods [36, 37, 38]. Briefly, we generated >17 million reads from strand-specific, paired-end, 125-bp RNAseq reads of total RNA depleted of rRNA from both cell populations. Using TopHat , an average of 95 % clean reads were mapped to the human GRCh38 genome (http://www.gencodegenes.org/releases/21.html). Using the lncRNA discovery pipeline described in the Methods section above, we detected 11,557 unique protein-coding genes and 15,932 unique lncRNAs that met the criteria of FPKM > 1 and length >200 nucleotides.
The expression levels of all transcripts were then quantified as fragments per kilobase of exon per million fragments mapped (FPKM) using the Cuffdiff program from the Cufflinks package . To examine the aberrantly expressed protein-coding genes and lncRNAs, we performed differential analysis using the Cuffdiff program. By setting stringent criteria (FPKM > 3, fold-change >2, and p < 0.05), 5593 protein-coding genes and 6294 lncRNAs were found to be aberrantly expressed within the group comparison (Additional files 2, 3 and 4). Interestingly, the majority of the aberrantly expressed novel lncRNAs are downregulated in our dataset. The potential mechanism of dysregulation remains to be further explored.
miRNA-binding site prediction
To establish the lncRNA–miRNA–mRNA (ceRNA) network, Targetscan 6.2 and miRanda 3.3a were used for lncRNA targeted search. Using a high confidence score (see the Materials and Methods section), the predicted MREs showed that 54 miRNAs may interact with 83 lncRNAs (Additional file 5). Based on those 54 miRNAs, we used the experimentally validated microRNA–target interactions database (miRTarbase) to search for the miRNAs’ mRNA targets. The results showed that 51 miRNAs can interact with 2260 targets (Additional file 6). Most of these targets are cancer-associated genes such as PTEN, STAT3, VEGFA, KRAS, TP53, CCND1, CDK6, E2F1, FGFR1 and EGFR, with roles in cell proliferation, apoptosis, cell cycle, invasion and metastasis.
lncRNA–miRNA–mRNA ceRNA network construction
The ceRNA network was visualized by importing the above interactions into the Cytoscape software to assemble the regulation network. The genes within the above networks were further processed by gene function (GO; http://geneontology.org/) and KEGG (http://www.genome.jp/kegg/) pathway analysis.
Gene function and pathway analysis
Accumulated studies have revealed that ceRNAs could serve as post-transcriptional regulators of protein-coding gene expression by decoying miRNAs from other target transcripts, such as lncRNAs, mRNA, pseudogenes and circular RNAs (circRNAs) [41, 42, 43]. Some studies have confirmed that ceRNAs play an important role in the regulation of gene expression in cancers such as head and neck squamous cell carcinoma, prostate cancer, papillary thyroid carcinoma, pituitary gonadotrope tumors, ovarian cancer, and chronic lymphocytic leukemia [44, 45, 46].
In this study, based on the high-throughput RNA sequencing and microarray data, we constructed a putative ceRNA network by integrating lncRNA, miRNA and mRNA expression. In our ceRNA network, lncRNA UCA1 was reported to be an independent prognostic factor associated with tumor differentiation and location .
Subsequent studies reported that UCA1 could upregulate 3 target genes of miR-204-5p (BCL2, RAB22A and CREB1) though competitive sponging of miR-204-5p, and then promote proliferation and chemoresistance in CRC cells. . Yang et al. reported that the overexpressed UCA1 was correlated with metastasis of epithelial ovarian cancer (EOC) and functioned as a ceRNA to suppress the expression of matrix metallopeptidase 14 (MMP14) via competition for miR-485-5p. . Chen et al. reported on nuclear enriched abundant transcript 1 (NEAT1) lncRNA as a novel prognostic indicator for patients with ESCC, finding that it contributes to the malignant character of ESCC . NEAT1 was also reported to be a well sponge platform for many kinds of miRNA, such as miR-548 in the regulation of breast cancer cell apoptosis , miR-204 in regulation of epithelial-to-mesenchymal transition (EMT) and the radioresistance of NPC cells , and has-mir-98-5p in regulation of EGCG-induced CTR1 and cDDP sensitivity enhancement in NSCLC . These specific features could potentially be used to classify lncRNAs and identify those that participate in ceRNA networks.
We found that biological function was significantly enriched with signaling pathways, small molecule metabolic processes, apoptotic processes, small GTPase-mediated signal transduction, negative regulation of the apoptotic process, mitotic cell cycle, protein binding, ATP binding, DNA binding, protein kinase binding, protein kinase activity and GTP binding, among other things (Fig. 3; Additional file 8). These signaling pathways were often altered, especially in the cancer cells, resulting in phenotypes of uncontrolled growth and increased capability to invade the surrounding tissue. These crucial molecules involved in signaling pathways represent attractive targets for cancer therapy [54, 55, 56, 57]. Agents targeting epidermal growth factor receptor (EGFR), PI3K, and mTOR have been developed for interfering with their signaling functions.
Previous studies demonstrated that the PTEN/PI3K/Akt pathway was essential to side population (SP) cells thanks to its involvement in the regulation of ABCG2 transporter function in primary ESCCs . The PTEN/PI3K/Akt pathway plays a modulating role in regulating stem-like cells in primary ESCCs and may provide essential clues for the development of novel therapeutic strategies and efficient drugs.
Consistent with these studies, we found the PI3K/Akt signaling pathway was the most significantly enriched pathway, based on KEGG pathway analysis (Fig. 5; Additional file 9). PTEN, a well-researched protein in cancer, encodes a plasma-membrane lipid phosphatase that functions as a negative regulator of the PI3K/Akt signaling pathway [58, 59, 60].
There is more evidence that both non-coding and protein-coding transcripts regulate PTEN levels via PTEN ceRNAs and then antagonize downstream PI3K/Akt signaling [61, 62, 63, 64, 16]. Our analysis suggested that lncRNAs like NEAT1 and TCONS_00287673 may upregulate PTEN though competitive sponging of miR-26a-5p and miR-182-5p. The regulatory mechanism of these ceRNAs needs to be further evaluated.
We would like to acknowledge anonymous reviewers and our academic editor for their constructive suggestions on the manuscript. We would like to sincerely thank Dr. Jianning Liu of LC Biotech for his help with the bioinformatics analysis.
This research was financially supported by a National Natural Science Foundation of China (NSFC) grant funded by the government of China (U1304809).
Availability of data and materials
The microarray and RNAseq data sets were submitted to the Gene Expression Omnibus (GEO) database under accession numbers GSE72138 and GSE72273, respectively.
JCS and SGG contributed to design the experiment. JCS and JQY performed the experiments and drafted the manuscript. XZY, RNY and TYD contributed to the analysis and interpretation of the data. XSW and GQK contributed to analysis of the data. JCS, JQY and SGG contributed to revise the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
- 4.Shen ZY, Cen S, CW, et al. Immortalization of human fetal esophageal epithelial cells induced by E6 and E7 genes of human papilloma virus 18. Chinese J Exp Clin Virol 1999;13:121–4Google Scholar
- 16.Sumazin P, Yang XR, Chiu HS, Chung WJ, Iyer A, Llobet-Navas D, Rajbhandari P, Bansal M, Guarnieri P, Silva J, Califano A. An extensive MicroRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways in glioblastoma. Cell. 2011. doi: 10.1016/j.cell.2011.09.041.PubMedPubMedCentralGoogle Scholar
- 28.Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010. doi: 10.1038/Nbt.1621.PubMedPubMedCentralGoogle Scholar
- 31.Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei BK, Tress M, Rodriguez JM, Ezkurdia I, van Baren J, Brent M, Haussler D, Kellis M, Valencia A, Reymond A, Gerstein M, Guigo R, Hubbard TJ. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012. doi: 10.1101/gr.135350.111.Google Scholar
- 53.Jiang P, Wu X, Wang X, Huang W, Feng Q. NEAT1 upregulates EGCG-induced CTR1 to enhance cisplatin sensitivity in lung cancer cells Oncotarget 2016. doi: 10.18632/oncotarget.9712.
- 63.Karreth FA, Tay Y, Perna D, Ala U, Tan SM, Rust AG, DeNicola G, Webster KA, Weiss D, Perez-Mancera PA, Krauthammer M, Halaban R, Provero P, Adams DJ, Tuveson DA, Pandolfi PP. In vivo identification of tumor- suppressive PTEN ceRNAs in an oncogenic BRAF-induced mouse model of melanoma. Cell. 2011. doi: 10.1016/j.cell.2011.09.032.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.