Dynamic patterns of fat content and FA compositions in developing BSF
To investigate the dynamic patterns of fat accumulation in developing BSFL, we analyzed the crude fat (CF) content in eight different stages (1-d-L, 4-d-L, 8-d-L, 12-d-L, E-prepupa, L-prepupa, E-pupa, and L-pupa) (Fig. 1b). It was noted that the CF was lower in BSFL during early stages (1-d-L to 4-d-L) and late stages (E-pupa to L-pupa), whereas higher fat content was recorded at mature larvae stage (12-d-L) and prepupa stages (E-prepupa to L-prepupa). A remarkable increase of CF was recorded at 4-d-L to 8-d-L from 7.5% to 17.8%, with the highest CF (26.1%) at E-prepupa stage. However, a sharp decline (17.2%) in CF was noted at E-pupa to L-pupa after the E-prepupa stage (26.1%).
Moreover, to explore FA composition in developing BSFL, the dynamic spectra of FAs accumulation in various development phases were analyzed (Fig. 2). The relative proportion of lauric acid (C12:0) was found to be large during developmental stages. The decline of lauric acid (C12:0) was recorded from the 1-d-L stage (72.3%) to the 4-d-L stage (7.7%), at the same time, the proportion of palmitic acid (C16:0), oleic acid (C18:1) and linoleic acid (C18:2) was increased. Whereas the lauric acid (C12:0) showed a rapid increase from 4-d-L to L-prepupa, with a peak value at L-pupa (75.5%), while the proportion of oleic acid (C18:1), palmitic acid (C16:0) and linoleic acid (C18:2) were decreased from 4-d-L to L-prepupa. Moreover, the fluctuation of myristic acid (C14:0) and oleic acid (C18:0) was recorded from 4-d-L to L-pupa. It was noted 90.4% FAs present in developing BSF being short-chain FAs such as C14:0 (myristic acid), C16:0 (palmitic acid) and C12:0 (lauric acid); therefore, it was concluded that BSFL achieves rapid fat accumulation by synthesizing short-chain FAs early in its development, which makes them an ideal feedstock for high-performance biodiesel production.
Illumina sequencing and de novo assembly of developmental BSF
To investigate the molecular regulatory mechanism of rapid fat accumulation in developing BSF was investigated; therefore, RNA was extracted from E-egg, L-egg, 1-d-L, 4-d-L, 8-d-L, 12-d-L, E-prepupa, L-prepupa, E-pupa, L-pupa, F-adult, and M-adult. The 24 cDNA libraries were constructed and deeply sequenced by the Illumina HiSeq™X Ten system with two replicates in each stage, generating 218,295,450,000 nt from these RNA-Seq samples. After the filtering step, an average of 60,637,625 nt was obtained as clean reads in each sample, with an average Q20 percentage of 97.44% and an average GC percentage of 38.4% (Additional file 1: Table S1). After assembly, 70,475 unigenes were obtained with an N50 of 1749 nt, with the total length of unigenes being 74,988,057 nt and the average length of unigenes being 1064 nt (Additional file 2: Table S2). The length of unigene sequences ranged mainly from 300 to 3000 nt, with 28,121 (39.9%) in the range of 1000 to 3000 nt, 11,335 (16.08%) longer than 2000 nt, and the number of unigenes decreased as the length of sequences increased (Additional file 3: Figure S1). These results indicated that the assembly is of high quality. All clean reads were deposited in the National Center for Biotechnology Information (NCBI) Short Read Archive (SRA) database under accession number PRJNA506627.
Functional annotation and classification of BSF unigenes
To investigate the function of assembled unigenes in developing BSF, a total of 70,475 unigenes were matched to public databases, including NR (NCBI non-redundant protein sequences), Gene Ontology (GO), Swiss-Prot (a manually annotated and reviewed protein sequence database), Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of Orthologous Groups of proteins (COG), and NT (NCBI non-redundant nucleotide sequences). A total of 41,375 (58.7%) unigenes had matches with known genes (Table 1), with 37,960 (53.8%), 24,500 (34.7%), 29,277 (41.5%), 25,758 (36.5%), 19,406 (27.5%), and 19,108 (27.1%) matches in NR, GO, Swiss-Prot, KEGG, COG and NT, respectively. There were 29,100 (41.3%) unigenes with no matches, which may be due to tissue-specific novel genes or short sequences that do not contain a characterized protein domain to have BLAST hits.
Table 1 Functional annotation of BSF unigenes in public databases
Similarity analysis between the unigenes of BSF and NR was performed using BLAST (Additional file 4: Figure S2). The results exhibited that 51.9% of the annotated unigenes had strong homology with e value below 1e−45 (Additional file 4: Figure S2. 4A). There were 36.6%, 26.2% and 14.1% of putative proteins showing 40–60%, 60–80% and 80–100% of similarity with known proteins in NR, respectively (Additional file 4: Figure S2. 4B). From the species distribution of NR BLAST matches (Additional file 4: Figure S2. 4C), 24.1% of unigenes had strong homology with Drosophila. When compared to other species within Diptera, 12.3% of unigenes had matches to sequences from Aedes aegypti, followed by Culex pipiens quinquefasciatus (6.7%) and Anopheles gambiae PEST (5.4%) (Additional file 4: Figure S2. 4C), which are all mosquitoes.
A total of 24,500 unigenes were categorized into three main GO functional categories (biological process, cellular component, and molecular function) and 61 sub-categories (Additional file 5: Figure S3). Among the 61 sub-categories, ‘cellular process’ and ‘single-organism process’ were the two largest sub-categories that contained 17,197 (70.2%) unigenes and 15,010 (61.3%) unigenes, respectively. Large amounts of unigenes belonged to other sub-categories such as ‘cell’, ‘cell part’ and ‘metabolic process’, which contained 14,663 (59.8%) unigenes, 14,621 (59.7%) unigenes and 13,921 (56.8%) unigenes, respectively. These results revealed that many metabolic activities occur during the development of BSF. Only a few unigenes belonged to the sub-categories of ‘chemoattractant activity’, ‘chemorepellent activity’ and ‘nutrient reservoir activity’.
A total of 19,406 (27.5%) unigenes were categorized into 25 COG classifications (Additional file 6: Figure S4). Among these classifications, the cluster ‘General function prediction only’ represented the largest group, which contained 6218 (32.1%) unigenes. This indicated the existence of a large number of unknown genes in BSF, which may have a excellent exploration potential. The second largest group was ‘Carbohydrate transport and metabolism’ with 3215 (16.6%) unigenes, followed by ‘Transcription’ with 2850 (14.7%) unigenes, and ‘Posttranslational modification, protein turnover, chaperones’ with 2476 (12.7%) unigenes. Only six unigenes were assigned to ‘Nuclear structure’ (0.03%), which was the smallest group.
A total of 25,758 unigenes were categorized into five KEGG categories (A: Cellular Processes, B: Environmental Information Processing, C: Genetic Information Processing, D: Metabolism, and E: Organismal Systems), 41 sub-categories and 259 pathways (Additional file 7: Figure S5). Among the five categories, ‘Metabolism’ had a significantly larger number of unigenes than other categories, which contained 9134 (35.5%) unigenes, followed by ‘Organismal Systems’ with 7934 (30.8%) unigenes, ‘Cellular Processes’ with 5480 (21.3%) unigenes, ‘Genetic Information Processing’ with 4640 (18.1%) unigenes, and ‘Environmental Information Processing’ with 3829 (14.8%) unigenes. Among the 41 sub-categories, ‘Digestive system’ contained the largest number of 2124 (8.3%) unigenes, which may be explained by the fact that BSF can efficiently utilize the wastes, followed by ‘Signal transduction’ with 2051 (7.9%) unigenes, and ‘Transport and catabolism’ with 2032 (7.9%) unigenes. The smallest group was ‘Biosynthesis of other secondary metabolites’, which contained 53 (0.21%) unigenes.
To investigate the fat accumulation mechanism of BSF, we screened the results of KEGG pathway annotations. A total of 1810 (7.1%) unigenes were matched into 15 canonical pathways of lipid metabolic from among 259 pathways (Fig. 3). Among these 15 lipid metabolic canonical pathways, ‘Glycerolipid metabolism’ had the largest number of 362 unigenes, followed by ‘Glycerophospholipid metabolism’ with 351 unigenes, ‘Fatty-acid metabolism’ with 198 unigenes, ‘alpha-Linolenic acid metabolism’ with 142 unigenes, ‘Steroid hormone biosynthesis’ with 121 unigenes, and ‘Biosynthesis of unsaturated fatty acids’ with 115 unigenes. The pathway ‘Primary bile acid biosynthesis’ contained only 27 unigenes, while other pathways such as ‘FA biosynthesis’ contained 42 unigenes, and ‘FA elongation’ contained 72 unigenes.
Analysis of differentially expressed unigenes of developing BSF
The differential expression patterns of specific unigenes that are associated with BSF fat accumulation were investigated by calculating the ‘fragment per kilobase per million’ (FPKM) value. A false discovery rate (FDR) ≤ 0.05 was imposed with the absolute value of log2 (ratio) ≥ 1 to screen differentially expressed unigenes from all assembled BSF unigenes (Additional file 8: Table S3). The differentially expressed unigenes were matched into the GO database (Additional file 9: Table S4) and the KEGG database (Additional file 10: Table S5). Differentially expressed unigenes were concentrated during the early stages and the late stages, with 6911 of them identified during early stages (1-d-L and 4-d-L) and 8793 of them identified during late stages (E-pupa and L-pupa), while very few differentially expressed unigenes were identified at 8-d-L and 12-d-L (Fig. 4a). When the differentially expressed unigenes that are involved in lipid metabolism were screened and analyzed, they were also concentrated during the early stages and the late stages, with 220 of them identified at 1-d-L and 4-d-L, and 262 of them identified at E-pupa and L-pupa (Fig. 4b). As a result, lipid metabolism occurs mainly during the early stages and the late stages.
Five unigenes were found to have sustained up-regulation in early stage (1-d-L, 4-d-L and 8-d-L), while four unigenes were found to have sustained up-regulation in late stage (L-prepupa, E-pupa and L-pupa) (Fig. 4c). The five unigenes that have sustained up-regulation in early stage were related to triacylglycerol lipase (lip, EC:3.1.1.3) (Unigene22405_All), lipoprotein lipase (LPL, EC:3.1.1.34) (Unigene25391_All), carboxylesterase 1 (CES1, EC:3.1.1.1) (CL7976.Contig2_All), glucuronosyltransferase (UGT, EC:2.4.1.17) (CL182.Contig2_All), and beta-galactosidase (GLB1, EC:3.2.1.23) (CL2559.Contig1_All). The lip and LPL are involved mainly in the triacylglycerol degradation pathway. GLB1 can catalyze the decomposition of lactose to d-galactose and alpha-d-glucose, which can be further decomposed to provide energy. UGT can catalyze the conversion of UDP-glucuronate and beta-d-glucuronoside.
Among the four unigenes that have sustained up-regulation in late stage, two were related to aldehyde reductase (AKR1B, EC:1.1.1.21) (CL7623.Contig2_All, CL3836.Contig1_All), one was related to elongation of very long-chain fatty-acid protein 4 (ELOVL4, EC:2.3.1.199) (CL8829.Contig3_All), and one was related to (3R)-3-hydroxyacyl-CoA dehydrogenase/3a,7a,12a-trihydroxy-5b-cholest-24-enoyl-CoA hydratase/enoyl-CoA hydratase 2 (HSD17B4, EC:1.1.1.-4.2.1.107 4.2.1.119) (CL2509.Contig4_All). ELOVL4 and HSD17B4 are associated with biosynthesis of unsaturated fatty acids and fatty-acid elongation.
The distribution of down-regulated unigenes that are involved in lipid metabolism was also analyzed (Fig. 4d). Two unigenes were found to have sustained down-regulation in early stage (1-d-L, 4-d-L and 8-d-L). Among them, one unigene was related to AKR1B, and another unigene was related to diacylglycerol kinase (ATP) (DGK, EC:2.7.1.107) (CL3020.Contig2_All).
Meanwhile, two unigenes were found to have sustained down-regulation in late stage (E-prepupa, L-prepupa, E-pupa and L-pupa). Among them, one was related to 3-hydroxy acid dehydrogenase/malonic semialdehyde reductase (ydfG, EC:1.1.1.381 1.1.1.-) (CL10341.Contig4_All), with sustained down-regulation in E-prepupa, L-prepupa and E-pupa. The enzyme ydfG is a member of the 3-hydroxyacyl-CoA dehydrogenase family and can reduce malonic semialdehyde with NADPH to 3-hydroxypropionate. The other unigene was related to aldehyde dehydrogenase (NAD+) (ALDH, EC:1.2.1.3) (CL258.Contig4_All), with sustained down-regulation in L-prepupa, E-pupa and L-pupa.
Expression patterns of enzymes involved in pyruvate and acetyl-CoA formation in developing BSF
To investigate the expression patterns of genes associated with pyruvate formation, putative genes that are related to enzymes required for glycolysis were obtained from Illumina sequencing analysis. Among the 122 putative genes that are associated with glycolysis in BSF, 12 of them were related to hexokinase (HK, EC:2.7.1.1, e value: 6e−9 to 0), 5 of them were related to ADP-dependent glucokinase (ADPGK, EC:2.7.1.147, e value: 2e−18 to 5e−140), 12 of them were related to glucose-6-phosphate isomerase (GPI, EC:5.3.1.9, e value: 3e−15 to 0), 16 of them were related to 6-phosphofructokinase 1 (PFK, EC:2.7.1.11, e value: 1e−6 to 0), 3 of them were related to fructose-1,6-bisphosphatase I (FBP, EC:3.1.3.11, e value: 7e−50 to 9e−158), 5 of them were related to fructose-bisphosphate aldolase, class I (ALDO, EC:4.1.2.13, e value: 2e−11 to 0), 11 of them were related to glyceraldehyde 3-phosphate dehydrogenase (GAPDH, EC:1.2.1.12, e value: 1e−14 to 0), 4 of them were related to phosphoglycerate kinase (PGK, EC:2.7.2.3, e value: 2e−19 to 0), 11 of them were related to 2,3-bisphosphoglycerate-dependent phosphoglycerate mutase (gpmA, EC:5.4.2.11, e value: 3e−8 to 2e−147), 2 of them were related to 2,3-bisphosphoglycerate-independent phosphoglycerate mutase (gpmI, EC:5.4.2.12, e value: 3e−29 to 6e−56), 6 of them were related to enolase (ENO, EC:4.2.1.11, e value: 1e−12 to 0), and 22 of them were related to pyruvate kinase (PK, EC:2.7.1.40, e value: 1e−7 to 0).
Temporal transcript analysis was performed to examine the dynamic expression patterns of putative genes that are involved in glycolysis. The putative genes from glycolysis pathway were highly expressed during early and late stages of BSFL development (Fig. 5a). When compared the temporal profile of FPKM for putative genes encoding isozymes HK, ADPGK, gpmA and gpmI that are involved in glycolysis, HK had higher expression in early stage, while ADPGK was up-regulated at L-prepupa (Fig. 6a), meanwhile, the expression patterns of gpmI showed more consistent with CF accumulation patterns (Fig. 6b).
Pyruvate dehydrogenase complex (PDC) is an important enzyme for acetyl-CoA formation. There were 8, 7, 6 and 11 putative genes that are related to PDC subunits E1-α, E1-β, E2 and E3, respectively, with e value ranging from 6e−16 to 3e−170, 5e−25 to 4e−172, 5e−6 to 0 and 2e−6 to 0, respectively. The expression levels of these genes gradually decreased before 8-d-L, but increased from 8-d-L, with the highest expression level at L-prepupa (Fig. 5b). Since the putative genes that are associated with acetyl-CoA formation are also putative genes involved in glycolysis (Fig. 5a), the PDC subunits can respond to BSFL development, with acetyl-CoA production for CF accumulation occurring mainly in early stage. Although the PDC subunits had high expression level in late stages such as E-pupa and L-pupa, the CF content did not increase significantly. As a result, acetyl-CoA was used to provide energy rather than CF accumulation in late stage.
Expression patterns of enzymes involved in acetyl-CoA transportation and FA biosynthesis in developing BSF
By Illumina sequencing analysis, 8 and 5 putative genes were identified relation to the citrate synthase (CS, EC:2.3.3.1, e value: 3e−6 to 0) and ATP-citrate lyase (ACLY, EC:2.3.3.8, e value: 0). Acetyl-CoA can be catalyzed by CS to produce citric acid by condensation of oxaloacetic acid, and citrate is preferentially exported to the cytosol via the tricarboxylate transporter. Similarly, we analyzed the dynamic expression patterns of putative genes that are associated with acetyl-CoA transportation, the putative genes of CS and ACLY had similar expression patterns as pyruvate and acetyl-CoA biosynthesis (Fig. 5c).
The putative genes involved in FA biosynthesis in developing BSF were identified by Illumina sequencing analysis. There are 11 putative genes were related to encoding acetyl-CoA carboxylase (ACC, EC: 6.4.1.2, e value: 2e−7 to 0), 22 that were related to fatty-acid synthase, animal type (FASN, EC:2.3.1.85, e value: 2e−6 to 0), 4 that were related to [acyl-carrier-protein (ACP)] S-malonyltransferase (FabD, EC:2.3.1.39, e value: 9e−14 to 2e−156), and 4 that were related to 3-oxoacyl-[ACP] synthase II (FabF, EC:2.3.1.179, e value: 3e−6 to 7e−153). Since the first step of FA biosynthesis were catalyzed by ACC, ACC has been considered as a major rate-controlling enzyme in this pathway. Additionally, lauric acid (C12:0) is the main component of BSFL FAs, this phenomenon indicated that FASN possesses the function to produce lauric acid (C12:0). Interestingly, when the unigenes were matched into the canonical pathways of fatty-acid biosynthesis, we observed that FASN catalyzes a series of reactions in this pathway (Additional file 11: Figure S6). As shown by temporal transcript analysis, the putative genes involved in FA biosynthesis had higher expression in early stage and in late stage (Fig. 5d).
Expression patterns of enzymes involved in triacylglycerol synthesis in developing BSF
Triacylglycerol (TAG) biosynthesis begins with acyl-CoA formation. Two isozymes were identified by Illumina sequencing analysis in the acyl-CoA biosynthesis pathway, with 10 putative genes that were related to long-chain-fatty-acid-CoA ligase (ACSBG, EC:6.2.1.3, e value: 7e−11 to 0) and 6 putative genes that were related to long-chain acyl-CoA synthetase (ACSL, EC:6.2.1.3, e value: 1e−17 to 0). ACSBG had high expression level in developing BSFL from the temporal transcript analysis of ACSBG and ACSL (Fig. 6c).
By Illumina sequencing analysis, 54 putative genes for TAG biosynthesis were identified. Temporal transcript analysis showed that the putative genes for TAG biosynthesis were highly expressed during early stages (1-d-L to 4-d-L) and late stages (E-prepupa to E-pupa) (Fig. 5e). Since differential expression analysis indicated that TAG degradation occurs mainly during the early stages, these results indicated that rapid TAG accumulation occurs mainly during the late stages.
Glycerol-3-phosphate O-acyltransferase (GPAT) catalyzes the first step of TAG biosynthesis. It plays a critical role in the conversion of glycerol 3-phosphate and acyl-CoA to 1-acyl-sn-glycerol 3-phosphate. From among the putative genes that are related to GPAT, 13 of them were related to glycerol-3-phosphate O-acyltransferase 1/2 (GPAT1_2, EC:2.3.1.15, e value: 2e−10 to 6e−133), and 4 of them were related to glycerol-3-phosphate O-acyltransferase 3/4 (GPAT3_4, EC:2.3.1.15, e value: 2e−124 to 6e−177). GPAT3_4 had high expression level from the temporal transcript analysis of GPAT1_2 and GPAT3_4 (Fig. 6d).
In the second step of TAG biosynthesis, an additional FA is transferred to 1-acyl-sn-glycerol 3-phosphate by the family members of 1-acylglycerol-3-phosphate acyltransferase (AGPAT) to produce 1,2-diacyl-sn-glycerol 3-phosphate. Interestingly, three isozymes of AGPAT and one putative acyltransferase were identified to catalyze this step, with 11 putative genes that were related to AGPAT1_2 (EC:2.3.1.51, e value: 2e−14 to 5e−108), 3 putative genes that were related to AGPAT3_4 (EC:2.3.1.51 2.3.1.-, e value: 3e−8 to 1e−123), 1 putative gene that was related to AGPAT8 (EC:2.3.1.51 2.3.1.-, e value: 3e−8), and 1 putative gene that was related to lysophospholipid acyltransferase 1/2 (MBOAT1_2, e value: 6e−152, EC:2.3.1.51 2.3.1.-). Temporal transcript analysis for the isozymes of AGPAT and MBOAT1_2 showed that the expression patterns of AGPAT3_4 are consistent with the ones from putative genes involved in TAG biosynthesis (Fig. 6e).
In the third step of TAG biosynthesis, phosphatidate is dephosphorylated to provide 1,2-diacylglycerol (DAG) for the biosynthesis of TAG. Two isozymes of phosphatidate phosphatase were identified by Illumina sequencing analysis, with 11 putative genes that were related to phosphatidate phosphatase (PLPP1_2_3, EC:3.1.3.4, e value: 2e−27 to 8e−105), and 8 putative genes that were related to phosphatidate phosphatase LPIN (LPIN, EC:3.1.3.4, e value: 3e−12 to 0). Temporal transcript analysis of PLPP1_2_3 and LPIN showed that PLPP1_2_3 had high expression level, and the expression patterns are consistent with the ones from putative genes involved in TAG biosynthesis (Fig. 6f).
In the last step of TAG biosynthesis, acyl-CoA: diacylglycerol acyltransferase (DGAT) is used to synthesize FA into triglycerides. In this study, only two putative genes were identified to be related to diacylglycerol O-acyltransferase 1 (DGAT1, EC:2.3.1.20 2.3.1.75 2.3.1.76, e value: 0) in developing BSF; the result indicated DGAT1 is specific to TAG biosynthesis in BSF.
Experimental validation and analysis of key genes involved in BSF fat accumulation
To assess the accuracy of sequencing and assembly of the BSF transcriptome, the relative expression levels and temporal transcript patterns of the putative genes which involved in fat accumulation were analyzed. Four putative genes of vital enzymes, including FAS, ACC, ACSBG and DGAT1, were selected for qRT-PCR (Additional file 12: Table S6). The results from qRT-PCR showed that the relative expression levels of these selected genes were mostly consistent with the FPKM comparative ratios (with 1-d-L as the control) (Fig. 7). These results indicated that the unigene assembly is accurate and reliable, and it is feasible to use the DESeq method to select differentially expressed gene. Both enzymes ACC and FAS that are involved in FA biosynthesis had high expression level at the 4-d-L stage.