Important cardiac transcription factor genes are accompanied by bidirectional long non-coding RNAs
Heart development is a relatively fragile process in which many transcription factor genes show dose-sensitive characteristics such as haploinsufficiency and lower penetrance. Despite efforts to unravel the genetic mechanism for overcoming the fragility under normal conditions, our understanding still remains in its infancy. Recent studies on the regulatory mechanisms governing gene expression in mammals have revealed that long non-coding RNAs (lncRNAs) are important modulators at the transcriptional and translational levels. Based on the hypothesis that lncRNAs also play important roles in mouse heart development, we attempted to comprehensively identify lncRNAs by comparing the embryonic and adult mouse heart and brain.
We have identified spliced lncRNAs that are expressed during development and found that lncRNAs that are expressed in the heart but not in the brain are located close to genes that are important for heart development. Furthermore, we found that many important cardiac transcription factor genes are located in close proximity to lncRNAs. Importantly, many of the lncRNAs are divergently transcribed from the promoter of these genes. Since the lncRNA divergently transcribed from Tbx5 is highly evolutionarily conserved, we focused on and analyzed the transcript. We found that this lncRNA exhibits a different expression pattern than that of Tbx5, and knockdown of this lncRNA leads to embryonic lethality.
These results suggest that spliced lncRNAs, particularly bidirectional lncRNAs, are essential regulators of mouse heart development, potentially through the regulation of neighboring transcription factor genes.
KeywordsHeart development Long non-coding RNA Haploinsufficiency Bidirectional promoter
congenital heart disease
fragments per kilobase of exon per million mapped fragments
long non-coding RNA
Tbx5 upstream antisense product
transcription start site
Morphogenesis is a complex process in which appropriate cell types are differentiated and positioned at the right place and at the proper timing. The surprising reproducibility of developmental processes is underpinned by the robustness of the genetic program . However, in spite of the high robustness under normal genetic conditions, the program can be easily collapsed by genetic abnormalities; for example, some genes require both alleles for proper function (i.e., haploinsufficiency) . This type of fragility is frequently observed in mammalian heart development. In the heart, even a slight alteration of the program leads to congenital heart diseases (CHDs) and this fact is associated with the high frequency of CHDs, which is around one in one hundred births . Genetic studies have shown that many of the transcription factor genes involved in the heart development are regulated in a highly spatiotemporal manner . However, how such an intricate control of gene expression is achieved has not been well understood.
Comparative genomics have shown that the complexity of the body plan and the proportion of non-coding regions in the genome are positively correlated . While most of the non-coding regions have previously been considered as “junk”, it is now accepted that some of them are necessary for the regulation of genes in a fine and complicated way . Many evo-devo studies support this view, suggesting that the evolution of multicellular organisms was largely driven by the adjustments in transcriptional regulators, such as enhancer elements, rather than by functional evolution of protein-coding genes . Recent advancements in genomics and transcriptomics have demonstrated that nearly half of the mammalian genome is actually transcribed into RNAs . Long non-coding RNA (lncRNA) is an emerging class of RNA that is generally defined as RNAs longer than 200 nucleotides that lack the ability to produce functional proteins. Many of these molecules have been demonstrated to work as transcriptional or translational regulators . Some lncRNAs are known to recruit epigenetic regulators to specific loci in the genome to modulate transcription. For example, a classical lncRNA, Xist, recruits Polycomb repressive complex 2 (PRC2) to the X chromosome in cis to inactivate one of the two X chromosomes to achieve dosage compensation . Many lncRNAs studied thus far have been found to bind epigenetic factors and recruit them to defined genomic loci; however, not a small proportion of proposed lncRNA-PRC interactions have been suggested to be non-specific [11, 12]. Other lncRNAs function as post-transcriptional modulators of gene expression through the formation of duplexes with mRNA to inhibit translation by RNAi (i.e., antisense transcripts) , through the inhibition of miRNAs by working as so-called sponges  or by controlling splicing . Although much attention has been paid to lncRNAs recently, the low conservation of sequences across species and the difficulty of determining their three-dimensional structures make it difficult to functionally and evolutionarily classify these molecules. Their biochemical characteristics (e.g., strong nonspecific binding to proteins) also make it difficult to dissect their precise molecular functions [12, 16]. Many lncRNAs show stage- and tissue-specific expression patterns, suggesting their roles in development .
Although several lncRNAs that function in mammalian heart development have been reported, the identification and characterization of lncRNAs in the mammalian heart are still insufficient [18, 19, 20, 21, 22, 23]. Considering the regulatory nature of lncRNAs, they are thought to be key components in solving the aforementioned problems regarding the developmental fragility in mammalian hearts.
Here, we report that key cardiac transcription factors genes are located in close proximity to genes encoding lncRNAs. Interestingly, there are transcription factor and lncRNA pairs that are bidirectionally transcribed from the same promoter. We have focused on one lncRNA near Tbx5, which we call Tbx5ua, and showed that it is required for heart development. Tbx5ua-knockdown mice showed abnormally thin ventricular walls and were embryonic lethal.
Identification of lncRNAs that are expressed during mouse heart development
To identify novel lncRNAs that are specifically expressed during heart development in mice, we extracted total RNA from the ventricles of embryonic day (E) 10.5 and E13.5 and 8 weeks-old mice and prepared cDNA libraries, that were subjected to paired-end 2 * 100 bp RNA-seq. The resulting read count was approximately 40 M reads for each sample. The obtained reads were mapped to the mouse genome (mm10) with Tophat2 , and the mapped reads were assembled using Cufflinks  with and without UCSC transcript annotations. Because many of the currently known functional lncRNAs are spliced and because it is difficult to confirm the existence of non-spliced transcripts unless they are expressed at very high levels, we focused on spliced lncRNA candidates in our analysis. We set the lower limit of expression at a fragments per kilobase of exon per million mapped fragments (fpkm) of 1, because above that level, the accuracy of the reconstruction of known transcripts without the transcript reference was sufficiently high (Additional file 1). We also checked if exons of known genes are mistaken as lncRNAs. We found that the direction of a majority of the lncRNAs that are located within 10,000 bp from known genes are in the opposite direction from them (225 vs 86), suggesting that such mis-annotations are rare.
Many of the cardiac transcription factor genes have neighboring lncRNAs
Gene ontology analysis of the genes closest to all lncRNAs that are expressed in the heart
GO:0006355~regulation of transcription, DNA-templated
GO:0045944~positive regulation of transcription from RNA polymerase II promoter
GO:0000122~negative regulation of transcription from RNA polymerase II promoter
Gene ontology analysis of the genes closest to heart-selective lncRNAs
GO:0045944~positive regulation of transcription from RNA polymerase II promoter
GO:0000122~negative regulation of transcription from RNA polymerase II promoter
GO:0051891~positive regulation of cardioblast differentiation
GO:0003151~outflow tract morphogenesis
GO:0060413~atrial septum morphogenesis
GO:0060347~heart trabecula formation
Next, in order to clarify the relationship between mRNAs and their bidirectional lncRNAs, we calculated Pearson correlation coefficients between the log2-transformed expression levels of the bidirectional promoter pairs over the course of development. The distribution of the correlation coefficients is plotted in Fig. 2b. Many gene pairs clearly show positive or negative correlation, and the positive correlation appears to be dominant (Fig. 2c).
By searching the protein coding genes that are close to lncRNAs, we found many transcription factor genes that have critical functions for heart development (i.e., Tbx5, Tbx20, Nkx2–5, Gata4, Gata6, Sall4, Hand1, Hand2, Wt1, Nr2f1, Irx3 and Irx5). Notably many of these lncRNAs were bidirectional lncRNAs (i.e., Tbx5, Tbx20, Nkx2–5, Gata6, Sall4, Hand1, Hand2, Wt1, Nr2f1, Irx3 and Irx5). Some of these lncRNAs (e.g., those divergent to Irx5, Gata6 and Wt1) are expressed in the kidney or in the liver, and in such cases divergent genes are also expressed, suggesting that the expression of bidirectional pairs are correlated not only temporally but spatially. We examined the conservation of these lncRNAs near transcription factors by searching the RefSeq database and found that at least some lncRNAs were conserved in the human genome (Tbx5, Nkx2–5, Hand2, Gata6, Wt1 and Nr2f1) (Additional file 5) and that the bidirectional lncRNA to Tbx5 (Lnc125) was even conserved in chicken, which diverged from mammals 400 million years ago. Here, we judged bidirectional lncRNAs to be conserved solely based on the existence of transcripts at the corresponding loci, since the sequences of lncRNAs are known to evolve rapidly.
Because haploinsufficient transcription factor genes seem to be highly enriched among the genes that are in close proximity to divergent lncRNAs, we determined whether the enrichment was limited to the heart or whether it was more generally true [27, 28]. Using the mouse RefSeq transcript database (GRCm38.p3) and a paper that comprehensively identified haploinsufficient genes, we tried to determine the proportion of genes with bidirectional lncRNAs among all genes and among haploinsufficient genes  (Additional file 6: Table S4). We indeed found that haploinsufficient genes were significantly more enriched among genes with bidirectional lncRNAs (p = 3.4 * 10− 5 based on hypergeometric distribution) (Fig. 2d). To exclude the possibility that the tissue specificity of bidirectional lncRNAs and haploinsufficient genes generates pseudo-correlations, we calculated the proportion of housekeeping genes among all genes and among haploinsufficient genes and showed that the proportions were not significantly different (Additional file 7) .
Generally, the conservation of lncRNAs across species is very low compared to protein-coding transcripts. However, the Tbx5-divergent lncRNA is observed among a wide range of species. Tbx5 is also a dosage-sensitive gene . These findings prompted us to examine the function of the Tbx5-divergent lncRNA.
Analysis of the Tbx5-divergent lncRNA
Tbx5 is a transcription factor that is known to be essential for the development of the heart and forelimb. Holt-Oram syndrome is a dominant disorder caused by a single-allele mutation of TBX5 and is characterized by hypoplasia of the forelimb, abnormalities in the thumb, and atrial and/or ventricular septal defects [31, 32, 33]. Importantly, the phenotypes of Holt-Oram syndrome show a high degree of variance, indicating that the dose of TBX5 is crucial in normal heart development .
We first quantified the expression level of the transcript in the heart ventricle, atrium and forelimb during normal development by quantitative RT-PCR (Fig. 3c). We found that the expression level of Tbx5ua was increased in the ventricle as development progressed, which was inconsistent with the expression pattern of Tbx5. We also examined the expression level of the Tbx5 isoform that is also transcribed from the bidirectional promoter (Isoform 2, RefSeq: XM_006530280.1). The expression level of that isoform was stable during the entire developmental process, which was also different from the expression pattern of Tbx5ua (Additional file 10). Next, we compared the expression level of the lncRNA in both of the ventricles at E11.5 because it is well-known that the expression level of Tbx5 is higher in the left ventricle than in the right ventricle and that the steep gradient is crucial for establishing a proper ventricular septum [37, 38]. We observed that Tbx5ua expression was almost the same between the left and right ventricles at E11.5, while we confirmed the differential expression level of Tbx5 (Fig. 3d). These results suggest that Tbx5ua is not just a byproduct of Tbx5 and is regulated separately as a different product.
Tbx5ua-knockdown (KD) mice were embryonic lethal with severe abnormalities in the heart
Chimeric KD embryos showed right ventricular hypoplasia at E9.5 (Fig. 4b, c, Additional file 12A). Hematoxylin and eosin (HE) staining of the cryosections showed that the ventricular walls of E9.5 KD mice were irregular and lacked trabeculae at some parts in the ventricle (Fig. 4d, Additional files 12B and 13). None of the embryos showed a visible abnormality in the forelimbs, which is observed in Tbx5-deficient embryos. By E13.5, all of the KD embryos were dead with a pale body (Fig. 4e). The hearts showed severe ventricular hypoplasia (Fig. 4f), which was probably the cause of the lethality. The forelimbs seemed completely normal even at this stage, which was a significant difference between the phenotype of the Tbx5ua KD mice and that of the mouse model of Holt-Oram syndrome (i.e., Tbx5 heterozygous knockout) (Fig. 4e). The phenotypes among KD embryos were similar and heart-specific, suggesting that they are attributed to genomic modification. In situ hybridization of Tbx5 revealed normal mRNA expression in the KD ventricle (Additional file 14A). In situ hybridization of Nppa, which often shows altered expression pattern in embryos with abnormal morphogenesis, showed an expanded expression around the pre-ventricular septal region of KD embryos (Additional file 14B).
To comprehensively investigate the genes affected by Tbx5ua knockdown, we performed RNA-seq with the RNAs extracted from the ventricles of tetraploid chimeric embryos derived from either KD or WT ES cells. We used three embryos for each group and used the Smart-Seq2 protocol to generate libraries from the small amount of RNA. By gene ontology analysis, we found that the genes involved in heart development were significantly enriched among the genes that were determined to be significantly changed (False Discovery Rate; FDR < 0.10, Additional file 15A). However, none of the structural genes that are important for cardiomyocyte contraction were changed (Additional file 15C), suggesting the possibility that Tbx5ua has a critical role in morphogenesis rather than in cell differentiation. Finally, we conducted principal component analysis (PCA) on the RNA-seq data (Additional file 15D). The two groups were evidently distinguished only by considering the first principal component.
In this study, we found that many cardiac transcription factor genes have neighboring spliced lncRNAs, especially bidirectional ones. The clear correlation of the expression levels of some bidirectional pairs suggests their regulatory roles. A typical example of such lncRNAs is Upperhand, which is divergent to Hand2 . The transcription of Upperhand but not the mature transcripts were shown to be necessary for the transcription of Hand2 by altering the local epigenetic environment. Many lncRNAs are known to regulate local transcription through the recruitment of epigenetic-altering protein complexes. Since the expression level of transcription factors is generally low and many of them are haploinsufficient, even a relatively small fluctuation could lead to severe consequences. It is possible that divergent lncRNAs are enriched among dose-sensitive genes to stabilize the expression level of adjacent genes.
An alternative hypothesis is that these transcription factors could be setting up optimal transcriptional environments for lncRNAs to evolve. As some transcription factor genes can cause direct lineage reprogramming, they are thought to define cell types. Thus, the use of these preexisting transcriptional environments is a cost-efficient way to evolve cell type-specific lncRNAs. Some studies have demonstrated that bidirectional transcription is a general phenomenon and that a so-called transcription ripple effect exists [40, 41]. These findings also support our idea by showing that the preexisting transcriptional environment enables precursor transcripts to evolve into defined, functional ones. In summary, active transcription factor genes may have been good sources from which lncRNA genes could evolve due to the cell lineage-specific and active epigenetic environment.
We showed that Tbx5ua is conserved from mammals to birds. Comparison of the sequence of Tbx5ua between mouse and chicken showed less similarity, but it does not mean that the function is not conserved as the previous studies have shown that precise conservation at the sequence level is not necessarily required for the functional conservation of lncRNAs [42, 43]. Tbx5ua was not found in the NCBI genomic annotations of reptiles, amphibians or fish at the corresponding loci. In fact, by conducting the reanalysis on the publicly available RNA-seq data, including RNA-seq of the adult heart of chicken, anole and frog (GSE41338) , we could confirm that Tbx5ua is expressed only in chicken among these species at the adult stages (Additional file 16). It is interesting that Tbx5ua is conserved in two-ventricle animals, which possess a complete ventricular septum, but not in animals with non-septated hearts. There is a possibility that the acquisition of Tbx5ua might have contributed to the evolution of a complete ventricular septum.
We showed that Tbx5ua lncRNA is required for proper heart development. Since we knocked down Tbx5ua by prematurely terminating the transcription, the loading of transcription complex at the transcription start site is not inhibited. Thus, if the transcription of Tbx5ua itself is important for altering the local transcriptional environment, our KD scheme is not sufficient to assess the true function of Tbx5ua. Although preliminary, our data suggested that the expression pattern of Tbx5 protein is altered in the KD mice (Additional file 17). While we do not have any evidences supporting the direct roles of Tbx5ua on Tbx5, the function of Tbx5ua might be atypical for a divergent lncRNA since many of such lncRNAs like Upperhand are shown to alter the transcription of neighboring genes. How the left-sided expression of Tbx5 is regulated is an unsolved important issue to understand the molecular mechanism of heart development .
This study revealed that many genes involved in the heart development, particularly transcription factor genes, are associated with spliced lncRNAs that are derived from nearby genomic regions. Furthermore, many of these lncRNAs were divergently transcribed from the promoter of protein-coding genes. We find that bidirectional lncRNAs are enriched among haploinsufficient genes, suggesting that they have functional roles for the regulation of dose-sensitive genes.
Total RNAs from embryonic and adult mice were extracted with Sepasol-RNA I Super G (Nacalai #09379–55). The cDNA libraries for paired-end RNA-seq for the screening of lncRNAs were prepared from 1 μg of RNAs with Truseq Stranded Total RNA Library Prep Kit (Illumina #RS-122-2201) according to Illumina’s instructions. The cDNA libraries for tetraploid chimeric mice were prepared by Smart-Seq2 protocol according to the original paper  with 12 cycles of preamplification and 9 cycles of enrichment PCR.
Total RNAs were extracted with Sepasol-RNA I Super G (Nacalai #09379–55). cDNA samples were prepared using RevaTra Ace qPCR RT Master Mix with gDNA remover (Toyobo #FSQ-301). Real-time PCR was performed with SYBR Premix EX Taq II (Takara #RR820). The PCR conditions were as follows: 95 °C for 30 s followed by 50 cycles of 95 °C for 5 s and 60 °C for 30 s, and a subsequent dissociation curve measurement. We used Gapdh as an internal control. Gene-specific primers are listed in the Additional file 18: Materials and Methods.
C57BL/6 J mice were purchased from CLEA Japan. For the first round of chimeric mice generation, we used mice form CLEA Japan. For the second experiment, we used tetraploid embryos from Ark Resource and recipient mice from Sankyo Labo Service. Mice are sacrificed by cervical dislocation.
Generation of genome edited ES cells
ES cells were cultured on the MEF feeder in ES culture medium (i.e., Knockout DMEM (Gibco #10829018), 20% Knockout Serum Replacement (Gibco #10828028), 1 * GlutaMAX (Gibco #10566016), 1 * NEAA (Sigma #M7145), 1 mM sodium pyruvate (Gibco # 11360070), 10− 4 M 2-Mercaptoethanol, 1000 U/ml LiF (Wako #198–15,781)).
The guide RNA (gRNA) target sequence to induce double strand break is 5’-GTCACTGCCGCTCCAATCCTCGG-3′. We designed the gRNA with Cas-OFFinder (http://www.rgenome.net/cas-offinder/), so that the number of off-target sites was as few as possible. Our gRNA has no potential off-target sites with 0, 1 or 2 mismatches and just 4 potential off-targets with 3 mismatches and proper PAM sequence, of them none is exonic. Homology directed repair donors were constructed so that the NeoR or EGFP expressing cassette was flanked by ~ 1,000 bp 5′ and 3′ homologous arms cloned from genomic DNA. ES cells were transfected with Cas9 expressing plasmid, gRNA expressing plasmid and the donor plasmid along with non-gRNA expressing negative control. After two days, the ES cells were passaged onto SNL feeder cells and cultured for 8 days with 250 μg/ml G418 and surviving EGFP-positive colonies were manually picked up. After one more cycle of single colony picking up to ensure that the ES cells are clonal, they were subjected to cell permeable Cre treatment  to remove the selection cassettes, and then EGFP-negative colonies were picked up to obtain cells without selection cassettes. Finally, the ES cells from each colony were genotyped and karyotyped.
Generation of tetraploid chimeric mice
Generation of chimeric mice was performed as described previously .
Immunohistochemistry for Tbx5 were performed as follows. Antigen retrieval was performed by microwaving the sections in 10 mM citrate acid pH 6.0. Then they were permeabilized for 10 min in 0.2% Triton X in PBS at RT. Blocking was performed with 10% Blocking One (Nacalai #03953–95) in PBST. Tbx5 antibody (Santa Cruz Biotechnology #sc-17,866) was diluted 1/100 in 5% Blocking One/PBT and second antibody (Invitrogen #A-11037) was diluted 1/200.
In situ hybridization was performed as follows. First, cryosections were permeabilized in 0.2 N HCl for 15 min. After washing with PBT three times, the sections were re-fixed with freshly made 4% PFA for 15 min. After washing, the sections were hybridized with DIG labeled probes at 70 °C for ON. The next day, the sections were washed with 0.2× SSC three times. After blocking the sections with 10% sheep serum for 1 h, 1/1000 diluted anti-DIG-AP Fab fragment (Roche #11093274910) were added and incubated for an hour at RT. After washing with TBST, the sections were washed with NTMT and colored with BM purple.
We would like to thank Fumihiro Sugiyama for generously sharing the B6J-ES cells; Akitsu Hotta for kindly sharing the Crispr/Cas9 plasmids; Kikuko Takeuchi and Masao Takeuchi for instructions on karyotyping; Yuri Nakagawa, Yuki Kato and Katsuhiko Shirahige for conducting RNA-seq. We are grateful to prof. Atsushi Miyajima for helpful discussions regarding this work. We also thank the animal centers of the University of Tokyo and the University of Tsukuba.
The research was supported by Grants-in-Aid for JSPS Research fellowship (YH), by the Takeda Science Foundation (JKT), by JSPS through the “Funding Program for Next Generation World-Leading Researchers (Next Program)”, initiated by the Council for Science and Technology Policy (CSTP; KKT) and by Grants-in-Aid for Scientific Research from the MEXT (Ministry of Education, Culture, Sports, Science and Technology of Japan; JKT). The funding body had no role in the design of the study and collection, analysis, and interpretation of data.
Availability of data and materials
The datasets supporting the conclusions of this article are available through the NCBI Gene Expression Omnibus (GEO) repository under the accession GSE93324 and GSE93357 (http://www.ncbi.nlm.nih.gov/geo/).
YT and JKT performed the tetraploid complementation assay under the direction of ST and TF. SF and TF also helped the study of tetraploid aggregation. YH performed all of the other experiments and analyses. YH wrote the manuscript with the help of KKT and JKT. KKT and JKT supervised the project. All authors read and approved the final version of the manuscript.
Ethics approval and consent to participate
All experimental procedures and animal care were performed according to the animal ethics committee of the University of Tokyo (2806).
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 17.Gloss BS, Dinger ME. The specificity of long noncoding RNA expression. Biochim Biophys Acta - Gene Regul Mech. 1859;2015:16–22.Google Scholar
- 44.Rabbow E, Rettberg P, Barczyk S, Bohmeier M, Panitz C, Horneck G, et al. The Evolutionary Landscape of Alternative Splicing in Vertebrate. Science (80- ). 2012;12:374–87.Google Scholar
- 47.Münst B, Patsch C, Edenhofer F. Engineering cell-permeable protein. J Vis Exp. 2009;34:e1627.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.