Genome-wide RNA pol II initiation and pausing in neural progenitors of the rat
Global RNA sequencing technologies have revealed widespread RNA polymerase II (Pol II) transcription outside of gene promoters. Small 5′-capped RNA sequencing (Start-seq) originally developed for the detection of promoter-proximal Pol II pausing has helped improve annotation of Transcription Start Sites (TSSs) of genes as well as identification of non-genic regulatory elements. However, apart from the most well studied genomes of human and mouse, mammalian transcription has not been profiled with sufficiently high precision.
We prepared and sequenced Start-seq libraries from rat (Rattus norgevicus) primary neural progenitor cells. Over 48 million uniquely mappable reads from two independent biological replicates allowed us to define the TSSs of 7365 known genes in the rn6 genome, reannotating 2503 TSSs by more than 5 base pairs, characterize promoter-associated antisense transcription, and profile Pol II pausing. By combining TSS data with polyA-selected RNA sequencing, we also identified thousands of potential new genes producing stable RNA as well as non-genic transcripts representing possible regulatory elements.
Our study has produced the first Start-seq dataset for the rat. Apart from profiling transcription initiation, our data reaffirm the prevalence of Pol II pausing across the rat genome and indicate conservation of pausing mechanisms across metazoan genomes. We suggest that pausing location, at least in mammals, is constrained by a distance from initiation of transcription, whether it occurs at or outside of a gene promoter. Abundant antisense transcription initiation around protein coding genes indicates that Pol II recruited to the vicinity of a promoter is distributed to available start sites of transcription at either DNA strand. Transcriptome profiling of neural progenitors presented here will facilitate further studies of other rat cell types as well as other organisms.
KeywordsRNA pol II Transcription Small RNA Promoters
basic fibroblast growth factor
Chromatin Immunoprecipitation Sequencing
DRB Sensitivity-Inducing Factor
Epidermal growth factor
New England Biolabs
Negative ELongation Factor
- Pol II
RNA polymerase II
Precision Run-On Sequencing
Transcription Start Site
Transcription of genes was thought to be regulated mainly through recruitment of the RNA polymerase to promoters. However, work over the last several years [1, 2, 3] has demonstrated that mRNA production requires additional inputs even after the RNA polymerase has engaged a promoter and initiated RNA synthesis. Promoter-proximal Pol II pausing takes place within the first 100 nucleotides of many genes and, following a number of seminal studies (reviewed in [4, 5]), is now accepted as a common step in metazoan Pol II transcription. Regulated release of paused polymerase into productive transcription elongation accompanies key biological events including organism development and cellular responses to stimuli [2, 6, 7, 8, 9, 10, 11, 12, 13]. Better understanding of transcription initiation, promoter-proximal pausing, and their contributions to transcription regulation is limited by lack of high-resolution datasets especially in commonly used model organisms like the rat. Discrepancies by a few nucleotides do not affect Chromatin Immunoprecipitation-sequencing (ChIP-seq) or mRNA-sequencing (RNA-seq) analyses because their effective resolution is relatively low. However, even single base pair inaccuracies impede analyses relying on the sequence context of promoters and other elements, including CRISPR/Cas9 based targeting . With new technologies being rapidly developed, the demand for nucleotide-level precision of transcriptome annotations is only expected to increase. In addition to refining the Transcription Start Sites (TSSs) of known genes, there is also increasing interest in mapping non-genic transcription that does not produce stable RNA, but delineates non-genic regulatory elements [15, 16, 17, 18].
Thus far, Pol II TSSs have been profiled with high depth and resolution only for relatively well-studied genomes of human, mouse, C. elegans, and Drosophila [18, 19, 20, 21, 22]. Here we turned to the rat, an important model organism that still has few available genome-wide datasets and, based on the experience with other genomes, is likely to have incomplete annotation of transcriptomic features. Small capped RNA sequencing, also referred to as Start-seq, captures short 5′-capped RNAs (TSS-RNAs) that are produced by Pol II during early transcription elongation [22, 23]. TSS-RNAs yield dual information: their 5′-ends precisely delineate the sites of transcription initiation, whereas their 3′-end positions indicate the locations of promoter-proximal pausing . We report Start-seq in primary neural progenitors alongside poly-A selected high-coverage RNA-sequencing from the same RNA. We define high-confidence, base-pair resolution TSSs for 7365 of the ~ 24,000 currently annotated genes in the rn6 genome using the RefSeq annotation database, report the relationship of pausing with gene expression, and identify transcription start sites of new genes and potential non-genic regulatory elements. We identify general features of antisense transcription around gene promoters and characterize properties of Pol II pausing. The work outlines a high-resolution landscape of transcription initiation and Pol II pausing in rat neural progenitors of the rat and provides a workflow for transcriptional profiling of other cell types in the rat as well as in other organisms.
Start-seq in rat neuronal progenitors
Two independent biological replicates produced 27.9 M and 19.4 M Start-seq reads uniquely mappable to rn6 genome. Of these reads, 16,380,972 for replicate 1 and 7,395,642 for replicate 2 mapped within +/− 500 base pairs of annotated TSSs  of known genes. Selectivity of scRNAs for TSSs is also illustrated by examining individual genes. Even on a highly active Actin B (Actb) gene, a majority of Start-seq reads at Actb gene map within the gene promoter (Fig. 1b, c). When the numbers of TSS-RNA hits within +/− 500 bp from the same set of TSSs were compared, Spearman correlation between the replicates was 0.97 overall (Fig. 1d) and profiles of transcription initiation were very similar on individual genes (Fig. 1d, Additional file 1: Figure S2, and data not shown), attesting to consistency of the Start-seq procedure.
Refinement of gene transcription start sites in the rat
To validate the TSS reannotations, we compared the DNA sequence context around the observed Start-seq gene TSSs to those around existing TSSs in rn6 genome. While RefSeq positions showed no DNA sequence enrichment, following reannotations (Fig. 2), a clear Pol II initiator (Inr) sequence motif  centered around TSS-RNA defined locations was observed (Fig. 2c), similar to what was previously found in human and mouse datasets [20, 21]. Apart from validating the TSS annotations, the data also reaffirm conservation of Pol II initiation sequence context in mammals. To verify the sensitivity of Start-seq based TSS annotations, especially at lower TSS-RNA coverage loci, we divided the 7365 TSSs into quartiles based on the number of mapped TSS-RNA reads. Enrichment of the Inr motif persists even at low (between 10 and 300 reads per 1000 bp TSS window, Additional file 1: Figure S4) coverage, indicating that our Start-seq noise threshold is conservative. In contrast, the existing rn6 RefSeq annotations do not contain sequence motif information even for the most highly expressed gene quartile (Additional file 1: Figure S4). To avoid a potential bias of RNA-based readout datasets, we also utilized previously published RNA Pol II ChIP-seq data, obtained from mature rat neurons (GSM565202) . We observed sharpening of Pol II ChIP-seq metagene signal (Fig. 2b, right pane), suggesting that TSS-RNA method of reannotating start sites is not biased for a specific technique. Reannotated TSSs are listed in Additional file 2.
Antisense transcription around mRNA genes
Convergent transcription initiates downstream of the TSS and is directed head-on into the promoter. After applying the same 10-count TSS-RNA noise threshold, we detected convergent transcription on 2531 genes within a 500-bp window downstream of the main promoter (Fig. 3b). Because convergent transcription is even lower in intensity than divergent, this threshold under-reports transcription at current sequencing coverage. The site of convergent initiation was defined similar to the divergent TSS, that is, using the base pair position with the highest signal downstream of the main TSS. The convergent signal was lower than that from divergent initiation, and convergent initiation sites were also concentrated further away, ~ 200–250 nt downstream of the sense TSS than divergent transcription.
Just like the gene TSSs, divergent and convergent TSSs are enriched with an Inr-like sequence motif (Fig. 3d), indicating a common mechanism of Pol II initiation both at and outside of gene promoters. There is a modest yet positive correlation between the magnitude of gene TSS-RNA signal and its associated antisense, both divergent and convergent, transcription initiation signal ((Spearman, 0.36–0.46) Fig. 3c). Taken together, these data reinforce the notion that antisense Pol II initiation is common throughout mammalian transcription [26, 27, 28, 29, 30, 31] and may be co-regulated with the main promoter.
Promoter-proximal pol II pausing is ubiquitous across the rat genome
As the 3′-ends of TSS-RNAs define the locations of Pol II pausing , we next determined the positions of TSS-RNA 3′-ends. Metagene analysis around the TSSs shows that TSS-RNA 3′-ends peak around + 35 nt downstream of the TSS (Fig. 4b, Additional file 1: Figure S6). This is similar to our previously defined distribution of TSS-RNAs in other organisms, including Drosophila, human, and mouse [20, 21, 22], pointing to commonality of mechanisms that establish Pol II pausing across metazoans.
Ever since the original discovery of promoter-proximal Pol II pausing [36, 37, 38, 39], there remains a question about pervasiveness of pausing and, particularly, existence of non-paused genes (for example, [2, 7, 8, 12, 40]). In genome-wide datasets, paused genes are normally defined through threshold-based cutoffs in global PI distribution [1, 26, 35], which under-reports paused and over-represent non-paused genes. Even then, a majority of genes have Pol II accumulation at promoters indicative of pausing . Apart from completely inactive genes, low PI values should stem from active genes. However, these genes still show detectable Start-seq signal and have the same RNA size distribution as the rest of the genes, both overall (Fig. 4c) and on a representative gene with a high expression level based on RNA-seq signal (Fig. 4d). Examining individual genes, we failed to find an active gene without TSS-RNA signal (data not shown). While quantitative differences probably reflect genome-specified differential duration of premature transcription termination, presence of scRNA at the right location is detected on all active genes we examined. These observations indicate that Pol II pausing occurs on most if not all genes and that there would be few, if any, “nonpaused” active genes, at least in steady-state cells.
Pol II pausing is globally constrained by distance from transcription initiation
Analysis of RNA lengths relative to each RNA 5′-end rather than the gene TSS  shows a distribution of lengths peaking around ~ 35 nucleotides, similar to TSS-centric analysis. However, we noted that upstream initiated RNAs are, on average, 2–3 nucleotides longer than RNAs initiating downstream of the TSS. The TSS itself appears to be the inflection point (Fig. 5c and Additional file 1: Figure S5). While the reasons for this difference remain to be investigated, we suggest that this may be due to different availability, or activity, of factors such as NELF or TFIIS for initiation at different locations. Notably, this extra length does not compensate for the additional distance upstream from the TSS, retaining the RNA length constraints. Analysis of the sequence context around the RNA 3′-ends (but not distance from the TSS) shows preference for G/C nucleotides (as also determined recently ), indicating that generation of TSS-RNAs (either the initial pausing or subsequent Pol II backtracking) does to some extent depend on the sequence context (Fig. 5d) . Initiation events outside of promoters, namely, divergent transcription, where sequences are not likely to have specifically evolved to control transcription, had similar distributions of RNA sizes (Additional file 1: Figure S6). Taken together, these data indicate that pausing is a common, likely requisite step of Pol II transcription regardless of whether it initiates at or outside of gene promoters . The data also suggest that the location of pausing, while to some extent sensitive to the sequence context, is ultimately constrained by a distance from transcription initiation.
New transcription initiation elements identified from transcriptome sequencing
To identify potential regulatory elements, we used Homer  to find Start-seq peaks on both strands across the genome after excluding the known genic start sites (+/− 3 kb from the gene’s start site) found in either the Ref-Seq or our assembly. This resulted in 29,481 homer peaks. Because accessible genomic elements of including transcriptional enhancers are characterized by bidirectional transcription [20, 48, 49, 50], we used these peaks to identify regions of bidirectional TSS signal enrichment (Additional file 5, see Methods). Figure 6a shows one of those regions approximately 7 kb upstream of the Sox2 gene. This region represents an active enhancer near a transcriptionally productive developmental gene in these neural progenitor cells. The identified new genes and non-genic TSSs are listed in the Additional file 1.
Using small capped RNA sequencing (Start-seq), we profiled Pol II transcription start sites and pausing in neural progenitors of the rat. Compared with human and mouse, the rat genome appears to be even more misannotated for gene TSSs and likely other genomic elements as well. By refining TSSs of known genes and identifying thousands of new TSSs of potential genes and non-genic elements, the first Start-seq datasets in the rat reported here will facilitate transcriptome profiling in other cell types of the rat as well as other organisms. Our definitions of new genes and non-genic elements are likely conservative, so that additional datasets are expected to further improve the scope and confidence of rat transcriptome annotations in various cell types.
Nascent RNA analysis methods such as Global Run-On and Precision Run-On sequencing (GRO-seq and PRO-seq) are powerful tools for transcriptome profiling [19, 26]. Start-seq is not able to measure expression of genes, but unlike the Run-On methods, it can profile transcription initiation events in specimens with inactivated RNA polymerase. Broader adoption of Start-seq has been limited by technical complexity. We have streamlined the Start-seq method by reducing the bench time it takes to complete the protocol. Future iterations of Start-seq method development will increase the specificity of 5′-capped RNA recovery and reduce the requirement for starting RNA material. For example, ribosomal RNAs constituted 15.7 and 44.9% of reads, respectively, in each replicate; the difference in the abundance of TSS-RNAs across replicates is consistent with the relative abundance of ribosomal RNA reads in each sample, indicating that rRNA reads constitute a major variable in Start-seq libraries, presumably at the level of RNA size selection during library preparation (Additional file 1: Figure S1). Combining Start-seq with rRNA depletion, as was recently done in Drosophila , should help circumvent this issue. Given the growing affordability of sequencing, it may also be prudent to opt for a higher sequencing depth instead of extra steps in Start-seq library preparation.
Start-seq allowed us to visualize the overall Pol II initiation landscape across the rat genome. We reaffirmed the prevalence of convergent and divergent initiation around Pol II-transcribed genes in the rat. The distances of antisense initiation sites to main TSSs vary widely among genes and, therefore, rather than specific sequences, are likely defined by topological features of the genome such as chromatin looping and/or sequence features such as CpG islands. Convergent initiation in general is shifted further away from the main TSS than divergent initiation, probably because the former takes place next to the + 1 nucleosome [35, 51, 52]. The magnitudes of sense and antisense signals show modest but positive correlation with transcription of the gene, which is comparable to correlation between pausing and gene expression, indicating that these events are co-regulated. We suggest that the Pol II machinery is commonly brought to the vicinity of the promoter (or a transcription factory [53, 54]) and then distributed according to its affinity to each potential start site within the local environment.
Pol II pausing involves a complex interplay of processes that include RNA capping, initial pausing, backtracking, and premature termination (reviewed in [31, 33, 55]). The location of pausing on genes in relation to the start site appears to be highly conserved across metazoans and peaks around ~ 35 nt from a gene TSS. Using Start-seq, we did not detect the bimodal distribution of TSS-RNA lengths observed in PRO-seq based experiments , consistent with earlier TSS-RNA data in mammalian organisms [20, 21, 22], although individual genes such as Actb do show that (Figs. 1b,c and 5b). This may be because TSS-RNA and PRO-seq detect nonidentical populations of RNA generated at different stages of Pol II pausing including processing and backtracking. Because pausing appears to take place during transcription at and outside of promoters , likely through the same underlying mechanisms, pausing may be better termed as initiation-proximal rather than promoter-proximal pausing.
Contribution of sequence-dependent and sequence-independent mechanisms to the establishment of Pol II pausing and subsequent Pol II release remain to be fully understood. We suggest that positioning of Pol II from the TSS determines where promoter-proximal pausing would occur. Conservation of pausing among different organisms and at sequence contexts throughout the genome, at and outside of gene promoters, indicates sequence-independent, likely universal mechanisms. Pausing establishing factor NELF (Negative Elongation Factor) [56, 57] and DRB Sensitivity Inducing Factor (DSIF) govern pausing on most, if not all, transcription events [58, 59, 60]. For example, NELF, through its multiple RNA binding sites [61, 62] or DSIF , may serve as a “ruler” to measure the distance of initial pausing or to define the location of subsequent backtracking. Given that pausing is also constrained by the sequence context , at least within up to five nucleotides, multiple mechanisms are likely at play. We suggest that the length-based universal constraints define the upper limit for pausing whereas DNA sequence, or balance of promoter activity and pause release, can alter that within these limitations . Indeed, locations of 3′ ends vary on individual genes from 25 to 50 + nt (Figs. 4 and 5 and data not shown). Small RNAs reflect the complex processes during Pol II pausing and release , and their analysis in different systems and under different conditions will help shed light on these mechanisms.
By combining Start-seq and RNA-seq data from the same cells, we performed an initial profiling of genic and non-genic TSSs of the rat. This approach can be used for other systems, especially to map the noncoding transcription landscape. While our RNA-seq data detected 100% of known rn6 mRNAs and 100% of known exons, we are unlikely to have fully saturated the rat transcriptome by analyzing one cell type because some genes have low activity in these cells, especially for noncoding transcripts. The number of identified non-genic elements based on TSS-RNA in the rat is on a lower side of the numbers of enhancers reported based on histone marks [65, 66, 67]. Future analyses of RNA datasets will advance transcriptome annotations in various cell types of the rat as well as other, less studied organisms.
Applying an improved Start-seq procedure for rat neuronal progenitors and combining it with polyA RNA-sequencing from the same sample sets, we report the transcription initiation landscape in these cells that includes (i) refinement of known gene transcription start sites; (ii) profiling of antisense (divergent and convergent) transcription initiation; (iii) genome-wide profiling of Pol II pausing at and outside of gene promoters and (iv) identification of new genes and potential regulatory elements. The work presented here will help fine-tune DNA sequence-based approaches (e.g., CRISPR targeting) in rats and facilitate transcriptome profiling of other rat cell types as well as analyses of other organisms.
Animals and derivation of neuronal progenitors
All animal procedures were performed in accordance with the National Institute of Environmental Health Sciences (NIEHS) and the University of California Merced animal care committee’s regulations [NIEHS Institutional Animal Care and Use Committee (IACUC) approval: ASP#01–21; and University of California Merced IACUC approval: ASP#13–0007 and ASP#16–0004]. Time-pregnant rats were obtained from a commercial resource (Charles River). Pregnant dams were sacrificed by first deeply anesthetizing them (to minimize pain sensation during decapitation) by intraperitoneal injection of pentobarbital solution and then decapitated using a sharp guillotine. Embryonic animals were decapitated with a sharp pair of scissors, cortices were isolated and subsequently digested in Accutase (Gibco) for 5 min at room temperature. Cultures of cortical neural progenitors were prepared from embryonic day 14 (E14) Sprague Dawley rats of either sex. Single cell suspension was achieved by triturating digested tissue through fire-polished Pasteur pipettes. Cells were washed with Hank’s Balanced Salt Solution (HBSS) without calcium and magnesium, and then plated onto dishes coated with CELLstart (Gibco) in Knockout DMEM/F-12 (Gibco) supplemented with 2% StemPro Neural Supplement (Gibco), 2 mM GlutaMAX (Gibco), 20 ng/ml bFGF (Gibco), and 20 ng/ml EGF (Gibco). Cells were passaged every 2–3 days and routinely collected after the second passage.
Total RNA preparation
PolyA selected RNA libraries were prepared from 50 ng of total RNA extracted from frozen cell pellets using Trizol reagent. In addition to the standard Trizol procedure, we included a chloroform extraction step of the aqueous phase after chloroform-induced separation of phases, to remove traces of phenol. RNA Integrity Numbers were calculated by Bioanalyzer and were always > 6.8, to meet the RNA quality guidelines for RNA sequencing service (Novogene). PolyA-selected RNA-sequencing libraries were prepared using NEBNext Ultra II library preparation kit with NEB beads, using 12 cycles of amplification. Libraries were quantified on Bioanalyzer prior to sequencing.
Start-seq library preparation
scRNAs were prepared based on our earlier procedure [21, 22], with modifications. In brief (5*10^7) cells were used to extract nuclei by washing with hypotonic lysis buffer  followed by preparation of total RNA using Trizol reagent, size selection on 15% Urea-TBE gel (Novex), and crush and soak elution using cellulose acetate spin filters (Agilent cat# 5185-5990). After ethanol precipitation, size selected RNA was treated, successively, with T4 Polynucleotide Kinase 3′ phosphatase minus (New England Biolabs, NEB), 5′-polyphosphatase, terminator exonuclease (both Epicentre), followed by ligation of 3′-Illumina small RNA Tru-Seq adapter using T4 RNA Ligase 2, truncated K225Q (NEB). Reactions were then purified on 15% Urea-TBE gel (Novex) to select 45-100 nt RNA sizes, extracted from the gel as above and treated with Rpph (NEB) in Thermopol reaction buffer. After ligating the Illumina 5′-Tru-Seq small RNA adapter with WT T4 ssRNA Ligase 1 (NEB) in the presence of ATP, reverse transcription was done per Tru-Seq Illumina Small RNA kit and libraries were amplified for 18 PCR cycles. Phenol-chloroform, chloroform, and ethanol precipitation was used between each enzymatic treatment. PCR-amplified libraries were purified on a 6% TBE gel to remove linker dimers, extracted from the gel as above, and quantified using Bioanalyzer (Agilent) and droplet digital PCR (Bio-Rad) prior to sequencing.
Sequencing and initial data processing
Start-Seq libraries were sequenced on a MiSeq instrument for quality control locally and re-sequenced on HiSeq2500 using small RNA option (SE50) commercially (Novogene) to the depth of ~ 100 M raw reads per sample. Raw files for Start-seq were mapped to rn6 genome using Hisat2. To filter out highly abundant species with multiple genomic copies such as tRNAs, only uniquely mappable Start-seq reads (Hisat2 mapping score > 3) were considered for analysis. Mapped reads were assigned to annotated genes using rn6 RefSeq annotation.
PolyA-selective RNA sequencing was done to an average ~ 140 M raw reads per replicate using a commercial company (Novogene) from Trizol-extracted RNA. Reads were aligned to the rn6 genome using STAR and expression levels (FPKM) were obtained using the Rsubread  and DESeq2  R packages. Transcripts were assembled using stringtie with default parameters.
Small RNA data analysis
Rn6 annotated gene TSS locations were obtained from UCSC and deduplicated to produce a list of unique start site coordinates for each gene. Contaminating RNAs (tRNA, rRNA, etc) and micro RNA species were removed from consideration for this study. The deeptools package  was used to convert alignment files to bigwig (bamCoverage) and to count reads +/− 500 base pairs around the TSS locations defined previously (computeMatrix). Strand information was preserved, and reads were counted in the sense direction for all genes both on the 5’ and 3’ of the reads. After fitting the location of the highest peak in each (annotated TSS-centered) gene window to normal distribution, the range of 1 SD from the mean (Additional file 1: Figure S1) (146 nt for replicate 1 and 149 nt for replicate 2 (Additional file 1: Figure S3) was used as the maximum distance to define genes on which we could reannotate TSSs. Custom R scripts were used to analyze transcription around these sites. Pausing Index (PI) was calculated as the ratio of scRNA signal within the TSS +/− 500 bp window in the sense direction and RNA-seq-derived expression level (FPKM) of the same gene. Metagene plots and heatmaps were made using MakeHeatmap or custom R scripts. Due to sequencing read length of 50, the maximum length insert we could identify by adapter trimming was 47, and therefore sequences longer than 48 nt are not represented in RNA length-based analyses, although lower-coverage paired end sequencing of the same Start-Seq libraries (Supplement) shows that the size distribution calculated from paired end read sequencing is the same. Individual Start-seq replicates were processed independently and, unless indicated otherwise, replicate 1, which contained higher coverage, is shown in main Fig. 2-d plots for Start-seq RNA were made with the R package ggplot using coordinates of individual genes relative to their TSS-RNA reannotated TSSs. For identification of new TSS elements, peaks called by Homer using “factor” and “separate strand” flags were filtered to exclude peaks inside all gene promoter regions (+/− 3 kb from each promoter) using annotatePeaks from the same package. To identify bidirectional regions of TSS-RNA enrichment, peaks called by homer were filtered to exclude those near gene start sites (+/−3kbp from each TSS). Among the remaining peaks, those that were within 3000 bp from each other were merged using bedtools if at least two of the adjacent peaks were on opposite strands. This resulted in ~ 8600 bidirectional TSS regions.
We thank Bony De Kumar and Archana Dhasarathy for critical reading of the manuscript.
AS1 designed the bioinformatics data analyses, conceived the manuscript structure based on analyzed data, and was a major contributor to writing of the manuscript. CJD collected rat tissues and cultured the neural progenitors. AS2 extracted RNA from neural progenitors and prepared TSS-RNA libraries. NK designed the analysis for determination of TSS-RNA sizes. DP performed bioinformatics data analysis. RNS Designed the manuscript and was responsible for rat neural progenitors. SN designed and performed data analyses and wrote the manuscript together with AS1. All authors have read and approved the manuscript.
The work was supported by the University of California Cancer Research Coordinating Committee CRN-18-524978 and NIH R01ES028738 grants to RNS, as well as P20GM104360 CoBRE, R21CA217751, and NSF CAREER 1750379 grants to NS. CRN-18-524978 grant: supported animal acquisition and cell culture, and CJD salary. R01ES028738 grant: supported RNS salary. P20GM104360 grant: provided salary support for AS (first author), AS (third author) and DP, and provided compute infrastructure for the project. R21CA217751 grant: provided support for library preparation supplies and Illumina sequencing. NSF CAREER 1750379 grant: provided salary support for SN and NK and Illumina sequencing. The funders had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
All animal procedures were performed in accordance with the National Institute of Environmental Health Sciences (NIEHS) and the University of California Merced animal care committee’s regulations [NIEHS Institutional Animal Care and Use Committee (IACUC) approval: ASP#01–21; and University of California Merced IACUC approval: ASP#13–0007 and ASP#16–0004].
Consent for publication
The authors declare that they have no competing interests.
- 17.Franco HL, Nagari A, Malladi VS, Li W, Xi Y, Richardson D, Allton KL, Tanaka K, Li J, Murakami S, Keyomarsi K, Bedford MT, Shi X, Li W, Barton MC, Dent SYR, Kraus WL. Enhancer transcription reveals subtype-specific gene expression programs controlling breast cancer pathogenesis. Genome Res. 2018;28:159–70.CrossRefGoogle Scholar
- 21.Samarakkody A, Abbas A, Scheidegger A, Warns J, Nnoli O, Jokinen B, Zarns K, Kubat B, Dhasarathy A, Nechaev S. RNA polymerase II pausing can be retained or acquired during activation of genes involved in the epithelial to mesenchymal transition. Nucleic Acids Res. 2015;43:3938–49.CrossRefGoogle Scholar
- 50.Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T, Ntini E, Arner E, Valen E, Li K, Schwarzfischer L, Glatz D, Raithel J, Lilje B, Rapin N, Bagger FO, Jørgensen M, Andersen PR, Bertin N, Rackham O, Burroughs AM, Baillie JK, Ishizu Y, Shimizu Y, Furuhata E, Maeda S, Negishi Y, Mungall CJ, Meehan TF, Lassmann T, Itoh M, Kawaji H, Kondo N, Kawai J, Lennartsson A, Daub CO, Heutink P, Hume DA, Jensen TH, Suzuki H, Hayashizaki Y, Müller F, FANTOM C, Forrest AR, Carninci P, Rehli M, Sandelin A. An atlas of active enhancers across human cell types and tissues. Nature. 2014;507:455–61.CrossRefGoogle Scholar
- 57.Wada T, Takagi T, Yamaguchi Y, Ferdous A, Imai T, Hirose S, Sugimoto S, Yano K, Hartzog GA, Winston F, Buratowski S, Handa H. DSIF, a novel transcription elongation factor that regulates RNA polymerase II processivity, is composed of human Spt4 and Spt5 homologs. Genes Dev. 1998;12:343–56.CrossRefGoogle Scholar
- 62.Vos SM, Pöllmann D, Caizzi L, Hofmann KB, Rombaut P, Zimniak T, Herzog F, Cramer P. Architecture and RNA binding of the human negative elongation factor. eLife. 2016;5:e14981.Google Scholar
- 67.Fishilevich S, Nudel R, Rappaport N, Hadar R, Plaschkes I, Iny Stein T, Rosen N, Kohn A, Twik M, Safran M, Lancet D, Cohen D. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database (Oxford). 2017;2017.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.