Integrative analysis reveals functional and regulatory roles of H3K79me2 in mediating alternative splicing
Accumulating evidence suggests alternative splicing (AS) is a co-transcriptional splicing process not only controlled by RNA-binding splicing factors, but also mediated by epigenetic regulators, such as chromatin structure, nucleosome density, and histone modification. Aberrant AS plays an important role in regulating various diseases, including cancers.
In this study, we integrated AS events derived from RNA-seq with H3K79me2 ChIP-seq data across 34 different normal and cancer cell types and found the higher enrichment of H3K79me2 in two AS types, skipping exon (SE) and alternative 3′ splice site (A3SS).
Interestingly, by applying self-organizing map (SOM) clustering, we unveiled two clusters mainly comprised of blood cancer cell types with a strong correlation between H3K79me2 and SE. Remarkably, the expression of transcripts associated with SE was not significantly different from that of those not associated with SE, indicating the involvement of H3K79me2 in splicing has little impact on full mRNA transcription. We further showed that the deletion of DOT1L1, the sole H3K79 methyltransferase, impeded leukemia cell proliferation as well as switched exon skipping to the inclusion isoform in two MLL-rearranged acute myeloid leukemia cell lines. Our data demonstrate H3K79me2 was involved in mediating SE processing, which might in turn influence transformation and disease progression in leukemias.
Collectively, our work for the first time reveals that H3K79me2 plays functional and regulatory roles through a co-transcriptional splicing mechanism.
KeywordsAlternative Splicing H3K79me2 DOT1L AML
Alternative 3′ splice site
Alternative 5′-end splice site
Acute myeloid leukemia
DOT1-like histone lysine methyltransferase
Mixture of isoforms
Mutually exclusive exon
Splicing factor motifs
Alternative splicing (AS) is a pre-mRNA process mainly controlled by post-transcriptional regulation involving 90% of human multi-exonic coding genes in a variety of tissues and cell types [1, 2, 3]. Many studies have highlighted the key role of AS in regulating cellular development and differentiation, and aberrant AS events lead to disease states such as muscular dystrophies and cancers [4, 5, 6]. Accumulating evidence further supports a new paradigm that AS is a co-transcriptional splicing process mutually coordinated by transcription and splicing [7, 8, 9]. Recent studies further illustrate that splicing is also regulated by epigenetic regulators, including chromatin structure, histone modifications, and CTCF [10, 11]. Dysregulation of some epigenetic components may alter the splicing process, resulting in various types of human diseases [12, 13, 14, 15]. For instance, a recent study reported that a mutation of the histone methyl transferase SEDT2 alters AS of several key WNT signaling regulatory genes, resulting in colorectal cancer .
Recent genome-wide studies revealed histone marks such as H3K36me3 and H3K79me2 as well as nucleosome positioning were highly enriched within intragenic regions, implicating their regulatory roles in the RNA polymerase II elongation process and exon definition [17, 18, 19, 20]. Further studies demonstrated the enrichment levels of histone modifications were correlated not only with transcriptional activity, but also with AS [21, 22, 23]. Despite these de novo genome-wide findings, knowledge on the causal and functional roles of histone modifications in AS is limited. In addition, little work has been done on aberrant AS processing in diseases caused by epigenetic defects.
H3K79, located in the globular domain of histone H3, is exposed on the nucleosome surface and then methylated by the sole enzyme DOT1-like histone lysine methyltransferase (DOT1L), a member of the lysine methyltransferase family . This histone methylation typically functions in transcriptional regulation [25, 26], telomeric silencing [27, 28], cell-cycle regulation , and DNA damage repair [30, 31, 32]. Recent studies revealed a new role for it in regulating AS [33, 34, 35]. For example, H3K79me2 is able to recruit chromodomain-containing protein MRG15 and splicing factor PTB1 to influence AS outcomes [36, 37]. In particular, new findings demonstrated its crucial role in transformation as well as disease progression in leukemias [38, 39, 40]. DOT1L is frequently involved in chromosomal translocations, with numerous genes creating fusion genes that interfere with its interaction with the elongation complexes, resulting in a loss of function. This is common in the mixed-lineage leukemia (MLL) gene, resulting in aggressive leukemia , including 5–10% of adult acute leukemias  and 60–80% of infant acute leukemias . These findings have established a foundation for disease-specific epigenetic therapies against acute leukemias.
In a previous study, we found a correlation between H3K79me2 enrichment level and an exon skipping event in GM12878 and K562 cells . However, the common and cell type-specific genomic patterns and correlations between H3K79me2 and various types of splicing events across diverse cell types have not been fully explored. In this study, we integrated AS events derived from RNA-seq with H3K79me2 ChIP-seq data across 34 different normal and cancer cell types, and examine the enrichment of H3K79me2 in five major types of AS events, skipping exon (SE), mutual exclusive exon (MXE), retained intron (RI), alternative 5′-end splice site (A5SS), and alternative 3′-end splice site (A3SS). We attempt to elucidate functional and regulatory roles of H3K79me2 in mediating AS, particularly in MLL-rearranged (MLL-r) acute myeloid leukemia (AML) cells.
Raw data processing
H3K79me2 ChIP-seq and RNA-seq data for a total of 34 various normal and cancer cell lines were collected from the Gene Expression Omnibus (GEO) repository and ENCODE Consortia (Additional file 1: Table S1). Raw sequence reads were aligned against the human genomic sequence (GRCh37) using bowtie2 for ChIP-seq data  and TopHat (version 2.0.14) for RNA-seq data . Only uniquely mapped reads were used for further downstream analysis.
Identification of AS events and H3K79me2 enrichment and peaks
Unique reads from RNA-seq data in bam format are used as input for MISO (The Mixture of Isoforms), which detected AS events based on Bayes factors, filtering criteria, Psi values (Ψ) and confidence intervals . Sashimi plots were generated to illustrate all five types of AS events for visualization. The enrichment of H3K79me2/kb is calculated as the number of reads from H3K79me2 ChIP-seq data in exon skipping gene regions (the exon part of an exon skipping gene plus 50 bp upstream and downstream around exons) per kilobase pair (the length of the exon skipping gene region) normalized by the total number of reads of each dataset. The H3K79me2 peaks were identified by Model-based Analysis of ChIP-Seq version 2 (MACS2) with a q value (minimum false discovery rate (FDR)) of 0.01 .
Self-organizing map clustering
We used self-organizing map (SOM) clustering for dimension reduction for feature extraction associated with exon skipping sites. SOM is a model of two-layer artificial neural networks that maps high dimensional input datasets to a set of nodes arranged in lattice. SOM has two steps: (i) determining a winner node and (ii) updating weighted vectors associated with the winner node and some of its neighboring nodes. According to the enrichment of H3K79me2 for each SE site, the SOM algorithm maps multi-dimensional input vectors to two-dimensional neurons, helping to understand the high-dimensional SE data; the most enriched cluster for each cell is assigned to their cell type. The SOM training was performed using the R package “kohonen”. SOM training parameters and node number optimization were defined on the basis of Xie et al. . Node grouping was based on a hierarchical clustering approach using the hclust function of the “Stats” package of R. The number of clusters was chosen based on homogeneity analyses.
Cell culture and reagents
Human cell lines MV-4-11, K562, and OCI-LY7 were cultured in Iscove’s modified Dulbecco's medium (Thermo Fisher Scientific) and GM12878, MM.1S, and MOLM-14 cell lines were cultured in RPMI-1640/10% fetal bovine serum (FBS; Invitrogen, Carlsbad, CA, USA) at 37 °C in 5% CO2. MOLM-14 and OCI-LY7 cells were purchased from the DSMZ (Deutsche Sammlung von Mikroorganismen und Zellkulturen, Braunschweig, Germany), and GM12878, K562, MM.1S, and MV-4-11 cells were purchased from ATCC (American Type Culture Collection).
Co-transfection and cell viability assay
siRNAs of DOT1L were purchased from Thermo Fisher Scientific Silencer® Select siRNAs. For transfection of siRNA oligos, cells were seeded in six-cell plates with Lipofectamine® RNAiMAX Transfection Reagent for 48 h.
The Cell Counting Kit-8 method was used to measure cell viability. Cells were seeded in 96-well plates at a density of 3 × 103 cells/ml. The viability of cells was assessed using the CCK8 reagent (Dojindo Laboratories, Japan) according to the manufacturer’s protocols. The absorbance at 450 nm was recorded on a microplate reader.
RT-PCR and ChIP-qPCR
Total RNAs from cells were extracted using Quick-RNA™ MiniPrep kit (Zymo Research). Then cDNA was prepared using a RevertAid H Minus First Strand cDNA Synthesis Kit (Thermo Fisher Scientific). The PCR primers for amplifying cDNA fragments between the upstream exon and downstream exon of five exon-skipping event sites are described in Additional file 1: Table S2. PCR was performed with NEBNext® High-Fidelity 2X PCR Master Mix (New England Biolabs, UK), and the cycling conditions were 98 °C for 1 min, then 30 cycles of 98 °C for 10 s, 58 °C for 20 s, 72 °C for 30 s. PCR products were visualized on 3% agarose gels.
ChIP-qPCR was performed as described in Zhu et al. . Briefly, crosslinking was performed with 1% formalin and the cells were lysed in SDS buffer. DNA was fragmented by sonication with a Covaris S220. Chromatin immunoprecipitation (ChIP) was performed using an antibody to the H3K79me2 modification (Abcam, ab3594). Quantification of ChIP-DNA analysis was performed with the LightCycler® 480 SYBR Green I Masteron and LightCycler® 480 System Sequence Detection System (Roche Applied Science) using GAPDH for normalization with primers listed in Additional file 1: Table S3.
Identification of the AS events across 34 normal and cancer cell types
Characterization of H3K79me2 enrichment around splice sites
SOM clustering for SE sites across 34 different cell types
Gene expression, Gene Ontology, and motif analyses of SE-associated genes
The trans-acting RNA-binding proteins, often called splicing factors (SFs), play central roles in promoting or suppressing the use of a particular splice site. Thus, we searched for SF or RBP motifs in the sequences spanning skipping sites, 50 bp extending into the exon and intron, using the RBPmaptool, a tool designed to map SF binding sites in human genomic regions using the COS(WR) algorithm . We compared the frequency of predicted SF motifs (SFMs) in the four defined genomic regions immediately adjacent to skipping sites versus non-skipping locations. As shown in Fig. 4d, for the common genes in clusters A and F, the top 30 highly enriched SFMs showed a strong tendency towards being within 50 bp of a skipping exon start site, which are highly involved in the exon junction process. In contrast, for the unique genes identified in clusters A or F, the top enriched SFMs were towards the end of the skipped exon. Interestingly, we found that two enriched motifs, SRSF2 and U2AF2, were previously reported to be highly involved in AML progression through aberrant splicing regulation  and another motif, PTBP1, was shown to play an important role in breast and colorectal cancers [57, 58]. Taken together, our in silico analyses unveil a potential mechanistic or functional link between H3K79me2-mediated skipping exon processing, splicing factors, and disease progression.
Functional characterization of DOT1L-mediated SE in MLL-r AML cells
The current paradigm of pre-mRNA splicing centers on a post-transcriptional process mediated by the spliceosome machinery . However, accumulating evidence suggests that AS is a co-transcriptional splicing process not only controlled by RNA-binding splicing factors, but also mediated by epigenetic regulators, such as chromatin structure, nucleosome density, and histone modification . Many recent genome-wide studies, including ours, have revealed the regulatory roles of H3K36me3, H3K79me2, and nucleosome positioning in the RNA polymerase II elongation process and exon definition [18, 19, 20]. To further extend our previous study in which we observed a high enrichment of H3K79me2 at skipped exon sites in GM12878 and K562 cells, we conducted an integrative analysis of RNA-seq and H3K79me2 ChIP-seq data across 34 normal and cancer cell types. Intriguingly, we not only confirmed high enrichment of H3K79me2 in SE type splicing events, but also uncovered its enrichment in A3SS events (Fig. 2a). Further, a large proportion of SE sites are characterized by computationally defined H3K79me2 peaks (Fig. 2b), reflecting the high confidence of regulatory activity of H3K79me2 on the exon skipping process.
One novel finding in this study is the identification of six clusters of cell type-specific enrichment of H3K79me2 at skipping exons among 34 cell types. In particular, we discovered that a pattern of histone marks that promote exon skipping was a common feature in cell lines derived from hematological malignancies, in particular MLL-r AML cell types (Fig. 3b). Indeed, when closely examining this in each individual cell type, we observed a clear separation of enrichment of H3K79me2 in a majority of blood-related cell types (Additional file 2: Figure S3). Our data highlight the importance of H3K79me2 in AS events across different cell types, but most noticeably in blood cells. Previous studies showed that, for specific gene expression, inactivation of DOT1L led to the downregulation of direct MLL-AF9 targets and an MLL translocation-associated gene expression signature [59, 63]. However, in our study, due to a lack of data for DOT1L inhibition for all 34 cell types, we examined the expression level of transcripts associated with SE sites against a random set of non-SE genes in each cell type and did not observe any significant difference in gene expression between these two sets (Fig. 4a), indicating the correlation of H3K79me2 and SE may be independent of gene expression and such correlation might be through a co-transcriptional pre-RNA splicing mechanism. To our knowledge, this is the first comprehensive study to integrate all available matched RNA-seq and H3K79me2 ChIP-seq data in the same cell type. In a broader aspect, such an integrative strategy may provide a general approach for dissecting the relationship of other histone marks or epigenetic factors with the splicing process and further uncover their novel functionalities associated with various diseases or cancer types, providing a rationale to further explore the underlying mechanism for AML patients without mutations or independent of gene expression.
The gene enrichment and pathway analyses further revealed that H3K79me2-mediated exon skipping-associated genes were highly involved in acute or chronic myeloid leukemia cell types, underscoring their functional relevance to blood cancer progression. Indeed, the various functional assays in this study confirmed that such exon skipping events were highly coordinated by the H3K79me2 or DOT1L activities with DOT1L siRNA treatment in two MLL-r AML cell lines (Fig. 5), providing a new line of evidence of a co-transcriptional splicing process involved in AML. Other studies have shown that higher levels of H3K79me2 are associated with poorer prognosis in MLL-r leukemias , and the fusion of DOT1L and MLL partners, AF4, AF9, ENL, and AF10, leads to misregulation of DOT1L targets, resulting in aberrant H3K79me2 activity followed by leukemic transformation [64, 65]. However, our results further unveiled new regulatory and functional roles of H3K79me2 in determining transcript isoforms, providing a mechanistic link between H3K79me2 or DOT1L and splicing events in this particular disease progression.
Our findings may provide a new avenue and opportunity to develop novel combinatorial therapeutic drugs targeting both epigenetic mechanisms and splicing processes. EPZ-5676, a small-molecule inhibitor of DOT1L, is currently under clinical investigation for acute leukemias harboring rearrangements of the MLL gene. Although the agent effectively targets the DOT1L molecule in vitro, the results of a phase 1 clinical trial were disappointing due to low bioavailability and frequent adverse events . In light of this finding, we may consider in future studies testing a co-treatment model which adds the inhibition of a splicing factor as a second synergistic agent, which may enhance efficacy for treating this deadly disease.
Our study identifies for the first time at a genome-wide scale cell type-specific correlation between H3K79me2 enrichment and skipped exons. This correlation is further utilized to classify the diverse cell types into six distinct clusters. Experimental assays confirm H3K79me2’s functional and regulatory roles in AML disease progression. Our work provides more insights into underlying epigenetic regulatory mechanisms in the co-transcriptional AS process in normal or disease conditions.
We are grateful to all members of the Jin laboratories for valuable discussion.
This study is partially supported by NIH R01GM114142 and U54CA217297, as well as Owens foundation. Funding for open access charge: NIH. The authors gratefully acknowledge financial support from Scholarship of Jilin University.
Availability of data and materials
All data generated during this study are included in this published article and the additional files.
TL and VXJ conceived the project. TL and QL performed the computational analysis with help from NG and performed experiments. TL and VXJ wrote the manuscript with input from SK and all other authors. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 2.Nilsen TW, Graveley BR. Expansion of the eukaryotic proteome by alternative splicing. Nature. 2010;463(7280):457–63.Google Scholar
- 5.Santoro M, Masciullo M, Bonvissuto D, Bianchi ML, Michetti F, Silvestri G. Alternative splicing of human insulin receptor gene (INSR) in type I and type II skeletal muscle fibers of patients with myotonic dystrophy type 1 and type 2. Mol Cell Biochem. 2013;380(1-2):259–65.CrossRefPubMedGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.