Genome-wide identification of oil biosynthesis-related long non-coding RNAs in allopolyploid Brassica napus
Long noncoding RNAs (lncRNAs) are transcripts longer than 200 bp that do not encode proteins but nonetheless have been shown to play important roles in various biological processes in plants. Brassica napus is an important seed oil crop worldwide and the target of many genetic improvement activities. To understand better the function of lncRNAs in regulating plant metabolic activities, we carried out a genome-wide lncRNA identification of lncRNAs in Brassica napus with a focus on lncRNAs involved in lipid metabolism. Twenty ribosomal RNA depleted strand specific RNA-seq (ssRNA-seq) datasets were generatred using RNAs isolated from B. napus seeds at four developmental stages. For comparison we also included 30 publically available RNA-seq datasets generated from poly(A) enriched mRNAs isolated from from various Brassica napus tissues in our analysis.
A total of 8905 lncRNA loci were identified, including 7100 long intergenic noncoding RNA (lincRNA) loci and 1805 loci generating long noncoding natural antisense transcript (lncNAT). Many lncRNAs were identified only in the ssRNA-seq and poly(A) RNA-seq dataset, suggesting that B. napus has a large lncRNA repertoire and it is necessary to use libraries prepared from different tissues and developmental stages as well as different library preparation approaches to capture the whole spectrum of lncRNAs. Analysis of coexpression networks revealed that among the regulatory modules are networks containing lncRNAs and protein-coding genes related to oil biosynthesis indicating a possible role of lncRNAs in the control of lipid metabolism. One such example is that several lncRNAs are potential regulators of BnaC08g11970D that encodes oleosin1, a protein found in oil bodies and involved in seed lipid accumulation. We also observed that the expression levels of B. napus lncRNAs is positively correlated with their conservation levels.
We demonstrated that the B. napus genome has a large number of lncRNA and that these lncRNAs are expressed broadly across many developmental times and in different tissue types. We also provide evidence indicating that specific lncRNAs appear to be important regulators of lipid biosynthesis forming regulatory networks with transcripts involved in lipid biosynthesis. We also provide evidence that these lncRNAs are conserved in other species of the Brassicaceae family.
KeywordsBrassica napus lncRNA Coexpression Oil biosynthesis Conservation
Non-coding RNAs (ncRNAs) are transcripts without a clear coding protein capacity found in the transcriptomes of plants and animals at an increasing frequency in recent years . The role of ncRNAs is still not fully known but has been suggested to be involved in regulation of gene expression, translation, cell-cycle progression and other cellular functions [2, 3]. There are diverse kinds of ncRNAs that have been generally grouped into housekeeping and regulatory ncRNAs. The housekeeping ncRNAs include transfer RNAs (tRNAs), small nuclear RNAs (snRNAs), small nucleolar RNAs (snoRNAs) and ribosomal RNAs (rRNAs). The regulatory ncRNAs fall into two subclasses in plants. One type is the small RNAs (sRNAs), including microRNAs (miRNAs) and small interfering RNAs (siRNAs) with a size of 20–24 nucleotides (nt). sRNAs achieve their functions via two main mechanisms: transcriptional gene silencing (TGS) and posttranscriptional gene silencing (PTGS). Another type is long non-coding RNAs (lncRNAs) with a size defined as longer than 200 nt. LncRNAs have been shown to function in response to a wide range of biotic and abiotic stresses in plants [4, 5, 6, 7]. LncRNAs are grouped according to their genomic location and orientation relative to their nearby protein-coding genes. Long intergenic noncoding RNAs (lincRNAs) locate in the interval between two genes. Long noncoding natural antisense transcripts (lncNATs) are those overlapping with protein coding genes in the opposite orientation. Long intronic noncoding RNAs are generated from intron of other transcripts and sense lncRNAs are those partially overlapping with protein coding genes on the same strand [8, 9]. LncRNAs are usually lowly expressed and tissue-specific . Plant lncRNAs have been shown to be involved in transcriptional gene silencing, gene expression regulation, chromatin structure remodeling and other epigenetic mechanisms [11, 12, 13, 14, 15].
With the development of high throughput sequencing technologies and the ability to generate large numbers of transcriptomes, there has been an ever increasing number of lncRNAs identified in plants including Arabidopsis [16, 17, 18, 19, 20, 21, 22], rice [11, 23, 24, 25, 26], maize [27, 28, 29], wheat [30, 31], and cotton [32, 33, 34]. Some lncRNA candidates have been identified in B. napus  and B. rapa, one of the two ancestors of B. napus [36, 37] and in synthesized Brassica hexaploids, but to date at genome-wide identification of lncRNAs in B. napus has not been reported.
B. napus, also known as oilseed rape, is second only to soybean as an oil crop with a world production of over 60 million tons . B. napus is an allotetraploid (AnAnCnCn) evolved from a spontaneous hybridization event between B. rapa (ArAr) and B. oleracea (CoCo) about 7500 to 12,500 years ago . With the availability of the B. napus genome sequence , it is now possible to identify and characterize lncRNAs at the whole-genome level in this important oil crop.
Oil biosynthesis is one of the key biological processes in B. napus and a major focus of much experimental research [40, 41]. Up to now, a role of ncRNAs in lipid and fatty acid metabolism in B. napus has only been investigated to a very limited extent [42, 43, 44]. Some miRNAs were found to be differentially expressed in cultivars with different seed oil content . Shen et al. (2015) found that 122 lipid-related genes are potentially regulated by 158 miRNAs. Recently, Wang et al. (2017) further showed that 11 miRNAs may have regulatory relationships with 12 lipid-related genes.
To further investigate the possible role of lncRNAs in the control of oil biosynthesis in B. napus, we have conducted a comprehensive analysis of lncRNAs at multiple stages of seed development. We also collected 30 publically available RNA-seq datasets generated from different tissues of B. napus.We show that the Brassica napus genome contains a large number of tissue and developmental stage specific lncRNAs and that some of these form part of regulatory networks specifically involved in the control lipid biosynthesis. We also show that some of these regulatory lncRNAs are conserved in other species of the Brassicaceae family, including the two progenitors (B. rapa and B. oleracea) of B. napus and A. thaliana.
Genome-wide identification of lncRNAs in B. napus
Combining results from the two datasets together, we identified 8905 non-redundant lncRNA loci, of which, 7100 were lincRNAs and 1805 were lncNATs (Additional file 2: Table S2). In total, 13,763 transcripts were identified from the 8905 non-redundant lncRNA loci, mainly due to alternative splicing events. The number of lincRNAs and lncNATs identified in the Cn subgenome was higher than that in the An subgenme (lincRNAs: 4130 versus 2763, 1.5 fold difference; lncNATs: 1076 versus 767, 1.4 fold difference; Additional file 3: Figure S1). This difference in complexity may be due to the differences in the size of the An (314.2 Mb) and Cn (525.8 Mb) genomes. Compared to the ssRNA-seq datasets (10.4%, 808 transcripts), the poly(A) RNA-seq datasets had a higher proportion (20.5%, 1501 transcripts) of lncNATs and a much higher proportion of single exon transcripts (44.0%, 3417 transcripts in the ssRNA-seq datasets versus to 4.9%, 357 transcripts in the poly(A) RNA-seq datasets) (Fig. 2b). Only about 20–30% of the lncRNA loci (1561, including 1402 lincRNAs and 159 lncNATs) were identified in both datasets (Fig. 2c), suggesting that, to have a full set of potential lncRNAs, it is necessary to use both library creating and sequencing methods in lncRNA identification.
The properties of lncRNAs in allopolyploid B. napus
To gain a comprehensive understanding of the lincRNAs and lncNATs in B. napus, we compared several different features of the lincRNAs, lncNATs and mRNAs: exon numbers, transcript length, A/U content, relationship with transposable elements (TEs), and chromosome distribution.
Comparison of the properties of lincRNA, lncNAT and mRNA in B. napus
Single exon (%)
Two exons (%)
Transcript length (bp)
A/U content ranking
TE in An(%)
TE in Cn(%)
Coexpression analysis revealed potential function of lncRNAs in lipid biosynthesis
Among the eight lipid-related genes identified in our study was BnaC08g11970D, an ortholog of the Arabidopsis oleosin1 encoding gene AT4G25140. Oleosin is a protein found in oil bodies and involved in seed lipid accumulation. BnaC08g11970D is co-expressed with 9 lncRNA loci, including 8 in the 30 vs 10–20 DAF comparison and 4 in the 30 vs 25 DAF comparison. Three (lnc_008548, lnc_014257 and lnc_030111) of the 9 lncRNA loci were found to be co-expressed with BnaC08g11970D in both comparisons (Fig. 5; Additional file 8: Figure S6; Additional file 9: Table S3).
Among the other lipid biosynthesis related genes of note are BnaC01g01840D, BnaA09g51510D and BnaC08g46110D. BnaC01g01840D annotates as a patatin-related phospholipase A and is co-expressed with 4 lncRNAs. BnaA09g51510D and BnaC08g46110D may have roles in acetyl-CoA biosynthesis, and are co-expressed with 7 and 2 lncRNAs, respectively. BnaC09g41580D and BnaA05g33500D are predicted to encode one of the two ∆9 palmitoyl-ACP desaturases responsible for biosynthesis of ω-7 fatty acids in the maturing endosperm (Additional file 9: Table S3). The lncRNAs co-expressed with these two lipid-related genes may be involved in regulation of the expression of the lipid-related genes to play a role in lipid biosynthesis in B. napus.
Conservation of lncRNAs in B. napus
Conservation of lncRNAs from B. napus (Bna) in A. thaliana (Ath), B. oleracea (Bol) and B. rapa (Bra)
Bna vs Ath
Bna vs Bol
Bna vs Bra
Bna vs Ath
Bna vs Bol
Bna vs Bra
Several studies have investigated the roles of small noncoding RNAs in lipid biosynthesis through small RNA sequencing and degradome sequencing [42, 43, 44], but no genome-wide study on lncRNAs has been previously carried out in B. napus up to now. In this study, we carried out the genome-wide study of lncRNAs in B. napus based on the newly sequenced B. napus genome, rRNA removed ssRNA-seq datasets generated from seeds of different developmental stages and publically available poly(A) RNA-seq datasets generated from diverse tissues. As a result, 7100 lincRNA loci and 1805 lncNAT loci were identified.
A large number of lncRNAs have been identified in many different plant species [11, 17, 19, 27, 32]. In Arabidopsis and rice, about half reported lncRNAs were un-spliced and contain only a single exon [11, 17, 19]. This feature was observed in B. npaus lncRNAs identified from rRNA-depleted total RNA, but not in the lncRNAs identified from poly(A) enriched mRNA (Additional file 4: Figure S2). Most B. npaus lncRNAs, particularly lncNATs, identified from the poly(A) enriched mRNA datasets contain two exons. Consequently, the average length of lncRNA transcripts (929 bp for lincRNAs and 985 bp for lncNATs) were longer in B. napus than in other plants. LncNATs had a higher proportion of multiple exons than lincRNAs (72% vs 60%). Compared to lncNATs, lincRNA are more likely to be overlapped with or derived from TEs, probably related to their genomic position. It seemed that TE-derived lncRNAs are more likely to generate alternative splicing events, compared to non-TE derived ones (18% vs 13%) (Additional file 17: Figure S8, Additional file 18: Table S10).
Two unique common features reported for lncRNAs are their low expression level and tissue-specific expression pattern [10, 11, 32]. Although we found the expression levels of both lincRNAs and lncNATs identified from the poly(A) RNA-seq datasets were lower than that of mRNAs (Additional file 19: Figure S9A), the expression levels of both lincRNAs and lncNATs identified from rRNA-depleted ssRNA-seq datasets were higher than that of mRNAs (Additional file 19: Figure S9B). Similar to B. napus homoeologous genes , on average, the An subgenome homoeologous lncRNAs seemed to have a higher expression level than the Cn subgenome ones (Additional file 20: Figure S10). In addition to the difference in exon numbers, lncRNAs identified from total RNA and mRNA also differ in their transcript length, A/U content, and degree of overlap with TEs (Additional file 4: Figure S2). These results together with the observed low level of overlap of the lncRNAs identified from total RNA and mRNA suggest that in order to capture a full set of lncRNAs and uncover as many features of the lncRNAs population as possible, it is necessary to use RNAs isolated from as diverse of a set of tissue and developmental staged samples as possible as a source of starting material.
Oil content is the most important agronomic trait of B. napus and increasing seed oil content is the final objective of many rapeseed breeding programs. Identifying genes involved in lipid biosynthesis regulation during seed development, including protein coding and non-coding ones, is an important first step towards improvement of the crop through genetic engineering. LncRNAs have been shown to play an important role in many aspects of plant development [15, 52, 53, 54]. Although it is now feasible to perform large scale lncRNA identification, it is still a challenge to study the function of lncRNAs and uncover the mechanism(s) underlying lncRNA-mediated regulation. Based on the rationale that genes involved in the same pathway(s) tend to be co-expressed, we reasoned that lncRNAs co-expressed with lipid-related genes would have a potential role in regulation of oil biosynthesis and accumulation in rapeseed. We found 13 lncRNAs whose expression patterns were significantly correlated with that of 8 lipid-related genes (Additional file 9: Table S3). Furthermore, these coexpression relationships were not related to the genomic location of the lncRNAs and lipid-related genes. Many of the coexpression relationships were further confirmed by qRT-PCR analysis of transcript levels in randomly selected B. napus cultivars. Among the coexpression modules, the relationships between several lncRNAs and BnaC08g11970D are particularly of interest. BnaC08g11970D is predicted to encode a protein homologous to oleosin1 of Arabidopsis, which contains a hydrophobic hairpin domain that is located in the surface of lipid droplets to make them stable and facilitate lipid accumulation . The expression level of BnaC08g11970D is dramatically increased in the developmental stage of rapid seed oil accumulation (Figs. 1, 6), strongly suggesting a role of this gene in oil accumulation. LncRNAs co-expressed with this gene would thus be the ideal candidates of further studies to investigate their potential role(s) in regulating the expression and function of BnaC08g11970D. In summary, our finding point to the importance of examining the lncRNAs as a possible source of novel information and tools for Brassica improvement in the future.
Plant materials and generation of RNA-seq libraries
Brassica napus L. cv KenC-8 plants were grown in the field (Hangzhou, China) in 2015 and 2016. Flowers were tagged on the day of blooming (i.e. 0 day after flowering (DAF)). Every 5 days starting from 5 DAF and up to 50 DAF, seeds from 10 individual plants were harvested, pooled and used in oil content analysis. Based on the seed oil content change profile (Fig. 1), seeds from four developmental stages, i.e. early little oil accumulation (10–20 DAF), early rapid accumulation (25 DAF) and middle rapid accumulation (30 DAF) were used in transcriptome analysis. Two 40 DAF samples were also used in transcriptome analysis. Seeds harvested from these four stages were frozen immediately in liquid nitrogen and used in RNA extraction. Total RNA was isolated using BiooPure™ RNA Isolation Reagents and rRNA was removed by using the Ribo-Zero Kit (Epidemiology). RNA-seq libraries were constructed using the Illumina TruSeq Stranded RNA Kit and sequenced on the Illumina Hiseq 4000 (paired-end 150 bp).
Public datasets used in this study
In total, we downloaded 45 publically available RNA-seq datasets from the National Center for Biotechnology Information (NCBI), including 30 poly(A) RNA-seq datasets from B. napus (accession number PRJEB5461, PRJEB2588, PRJNA262144, and PRJNA338132), 7 poly(A) RNA-seq datasets from B. oleracea (accession number PRJNA183713), and 8 poly(A) RNA-seq datasets from B. rapa (accession number PRJNA185152).
Identification of lncRNAs
All of the raw reads from transcriptome sequencing were treated using Trimmomatic (Version 3.0)  with the default parameters for quality control. The clean data were then mapped to the B. napus genome using Tophat (Version 2.1.1) . For each mapping result, Cufflinks (Version 2.1.1)  was used in transcript assembly. For strand-specific RNA-seq datasets, the parameter “--library-type fr-firststrand” was employed. All transcriptomes were merged with the annotated file from the reference genome to generate a final transcriptome using Cuffmerge. Cuffdiff was used to estimate the abundance of all transcripts based on the final merged transcriptome. We then used the following six filters to shortlist the bona fide lncRNAs from the obtained final transcriptome assembly: (1) transcripts without strand information were removed; (2) all single-exon transcripts that are within a 500-bp flanking region of known transcripts and in the same direction as the known transcripts were discarded; (3) transcripts overlapped with mRNAs annotated in the reference genome were deleted; (4) transcripts with FPKM scores < 0.5 (2 for single-exon transcripts) and shorter than 200 bp were discarded; (5) the coding potential value of each transcript was calculated using CPC  and those with CPC scores > 0 were discarded; (6) the remaining transcripts were searched against the Pfam database  by HMMER  to remove transcripts containing known protein domain. The transcripts remained were regarded as expressed candidate lncRNAs.
Analysis of seed oil content
Seeds harvested at each developmental stage were dried in an incubator at 70 °C until their weight became stable. Isolation and GC analysis of seed lipids for total oil content and fatty acid compositions (expressed as μg/mg of total seed weight) were performed previously described [62, 63].
The value of expression chosen for boxplot
The maximum FPKM of lncRNAs and mRNAs across all samples were selected as the expression values and used in generating of their expression distribution using Boxplot .
Coexpression network construction
Weighted gene coexpression network analysis (WGCNA)  was used to predict the potential roles of lncRNAs in lipid biosynthesis. First, we defined a gene coexpression similarity by the Pearson correlation. Second, an adjacency function was employed to convert the coexpression similarity to connection strengths with a soft thresholding power in each comparison. Third, hierarchical clustering with the topological overlap matrix was used to identify network modules consisting of the highly correlated gene expression patterns. Finally, a summary profile (eigengene) for each module was used to correlate eigengenes with traits (oil content and DAF) and calculate the correlation between each gene and traits by defining Gene Significance (GS). The software Cytoscape was employed to visualize the networks .
Positional synteny of lncRNAs
The synteny or co-linearity of lncRNAs among the four species (B. napus, B. rapa, B. oleracea and A. thaliana) was detected by MCScanX . BLASTp was employed to determine the synteny by pairwise comparison with the parameters of E-value <1e-5 and max_target_seqs < 6. For each lncRNA, its 10 flanking protein coding loci were retrieved from the annotation of each genome. Homology tests of lncRNA and flanking genes among the four species were performed by BLASTn and the top 5 hits of each B. napus lncRNA were chosen for comparison of its flanking genes. A syntenic lncRNA pair among B. napus, B. rapa, B. oleracea and A. thaliana was defined by with at least one identical upstream or downstream flanking protein coding gene [42, 65].
Sequence conservation of lncRNAs
To analyze the sequence conservation of lncRNAs, all the lncRNAs derived from B. napus were used as the query datasets and searched against lncRNAs from B. rapa, B. oleracea and A. thaliana and their genome sequences with BLASTn. The cutoff threshold for significant hits was an E-value <1e-5, coverage > 40% and identify > 50% for the matched regions .
Quantitative reverse transcription (qRT)-PCR analysis
Total RNA isolated from seed samples of four cultivars at two stages 10–20 DAF and 30 DAF was used for first-strand cDNA synthesis using a HiScript II 1st Strand cDNA Synthesis kit (Vazyme) according to the manufacturer’s protocol. The cDNA was used as templates in qRT-PCR (ChamQ SYBR qPCR Master Mix-Q311 (Vazyme). Real-time PCR was performed using the LightCycler 96 (Roche). The reactions were performed at least in triplicate with three independent experiments, and the data were analyzed by the 2-ΔΔct method. The primers used in our study were listed in Additional file 21: Table S11, including the reference gene (EF-1α). All values are presented as fold changes of 30 DAF to 10–20 DAF. Student’s t-test was performed to determine significant changes (P < 0.05).
In this study, a total of 8905 lncRNA loci were identified, including 7100 lincRNA loci and 1805 loci generating lncNAT. We demonstrated that the B. napus genome has a large number of lncRNA and that these lncRNAs are expressed broadly across many developmental times and in different tissue types. We also provide evidence indicating that specific lncRNAs appear to be important regulators of lipid biosynthesis forming regulatory networks with transcripts involved in lipid biosynthesis. We also provide evidence that these lncRNAs are conserved in other species of the Brassicaceae family. Taken together, our data will provide insight into the further study of lncRNAs roles in oil biosynthesis in B.napus.
We also thank Lixi Jiang for providing equipment to measure fatty acid and Maoteng Li and Weijun Zhou for providing materials. We also thank Prof. Michael P. Timko for revising the English language.
This work presented here was funded from grants made to LF by the National Basic Research Program of China (2015CB150200), the 111 Project (B17039), Sino-Germany PPP Project and Jiangsu Collaborative Innovation Center for Modern Crop Production. The authors are solely responsible for the experimental design, data interpretation, and conclusions drawn herein and all results and fundings are in the public domain and freely distributed.
Availability of data and materials
All the sequencing data generated in this study was submitted to NCBI with accession number PRJNA492185.
ES, XZ, QZ, LF, CY and XC designed research and analyzed data. ES, SH, HC, LZ, QL and XC conducted experiments; ES, QZ and XC wrote the paper. All the authors read and approved the paper.
Ethics approval and consent to participate
Whole plants and siliques were collected for these experiment by Prof. Shujin Hua from the Zheijiang Academy of Agricultural Sciences. The fields from which the materials are collected are maintained by the ZAAS and are freely accessible to other researchers on request. During the execution of these experiments, all plant materials collected were either consumed or destroyed at the conclusion of the work. Individuals wishing access to these fields and similar plant collections may contact the authors and in particular SH who oversees these fields. All collection of plant material was done under compliance with any applicable institutional, national, or international guidelines and as such no specific permissions and/or licenses were required in order to comply with the Convention on the Trade in Endangered Species of Wild Fauna and Flora since these plants fall outside of these jurisdiction.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.