Refining a steroidogenic model: an analysis of RNA-seq datasets from insect prothoracic glands
The prothoracic gland (PG), the principal steroidogenic organ of insects, has been proposed as a model for steroid hormone biosynthesis and regulation.
To validate the robustness of the model, we present an analysis of accumulated transcriptomic data from PGs of two model species, Drosophila melanogaster and Bombyx mori. We identify that the common core components of the model in both species are encoded by nine genes. Five of these are Halloween genes whose expression differs substantially between the PGs of these species.
We conclude that the PGs can be a model for steroid hormone synthesis and regulation within the context of mitochondrial cholesterol transport and steroid biosynthesis but beyond these core mechanisms, gene expression in insect PGs is too diverse to fit in a context-specific model and should be analysed within a species-specific framework.
KeywordsSteroidogenesis Ecdysone Steroid hormone Prothoracic gland Ring gland Bombyx mori genome Halloween genes Insect orthology
5-hydroxytryptamine (serotonin) receptor 1A
Acyl-CoA cholesterol acyltransferase
Diazepam binding inhibitor
Acyl-CoA binding domain containing 3
Acyl-CoA thioesterase 2
Acyl-CoA synthetase 4
Aldo Keto Reductase 2E4
Adenine nucleotide transporter
ATPase family AAA domain-containing protein 3
Brookhaven Protein Database
DNA Data Bank of Japan
Ecdysone-induced protein 75
Fragments Per Kilobase of transcript per Million of mapped reads
fushi tarazu transcription factor 1
Germ cell-expressed bHLH-PAS
hormone receptor 4
hormone receptor-like 46
Isocitrate dehdydrogenase 2
inner mitochondrial membrane
Mitochondrial lon protease homolog
malate dehydrogenase 2
Metastatic lymph node 64
Niemann Pick C type 1a
Niemann Pick C type 1b
outer mitochondrial membrane
optic atrophy type 1
Phosphotidylinositol 3 kinase
protein kinase A (cAMP-dependent)
Protein kinase C
Reads Per Gene Model
Ribosomal protein S6
Smad on X
Steroidogenic Acute Regulatory protein
Orthologue of Steroidogenic Acute Regulatory protein encoding gene
Target of Rapamycin
tuberous sclerosis complex gene 1
tuberous sclerosis complex gene 2
translocator protein (18-kDa)
voltage dependent anion channel
In all animal models studied so far, steroidogenic cells are defined by the expression of cholesterol-metabolizing enzymes to convert it to biologically active steroids [1, 2]. Although vertebrates synthesize and release a variety of steroids depending on their site of synthesis, they are all characterized by the cholesterol side chain cleavage by cytochrome P450 (CYP) CYP11A . On the other hand, ecdysozoa are considered to synthesize, in general, a class of steroids characterized by the presence of the cholesterol side chain [2, 3]. In insects, steroidogenesis results in the synthesis of molecules called ecdysteroids and among insect species there are variations in the exact molecule that is secreted by the prothoracic glands (PGs). The Drosophila melanogaster ring glands (Dm RGs), a composite organ that comprises three glands, the corpora cardiaca (CC), the corpora allata (CA) and the prothoracic gland (PG) , secrete Makisterone A  in addition to ecdysone, while other species secrete 3-dehydroecdysone or a mixture of 3-dehydroecdysone and ecdysone as in the case of Bombyx mori (Bm) . Irrespective of the secreted end product, there exists a core component of ecdysteroidogenesis-regulating enzymes in insects that have been termed Halloween genes  and numerous studies have clarified their critical role in ecdysteroidogenesis .
Beyond the well-established role of vertebrate steroidogenic enzymes described extensively in authoritative reviews [1, 8, 9, 10], research on mammalian cells has identified a large complex of proteins, termed the transduceosome [10, 11, 12], which mediates the transport of cholesterol across the outer mitochondrial membrane (OMM) to the inner mitochondrial membrane (IMM). This enables the conversion of cholesterol to pregnenolone, the precursor of all mammalian steroids, by the mitochondrial CYP11A1 , which is the rate-limiting step in steroidogenesis. The activity of this large multi-protein complex is regulated by signalling molecules such as the MAPK3/ERK1/2, which phosphorylates and activates StAR [10, 14, 15, 16] and the cAMP-dependent protein kinase-regulatory subunit Iα (PKA-RIα), which also mediates phosphorylation and activation of StAR .
While in mammalian cells the transport of cholesterol across the OMM to the IMM is the rate-limiting step in steroidogenesis  and the step most acutely influenced by second messengers , in insects the rate-limiting step of steroidogenesis has been termed the “Black Box” [2, 7], the reaction between 7-dehydrocholesterol (7dC) and D4-diketol considered to be taking place before the transport of 2,22-dideoxyecdysone to the mitochondria where it is further converted to ecdysone by the Halloween enzymes, Disembodied and Shadow [2, 7, 17]. Thus, the order of biosynthetic reactions that lead to steroid synthesis is different between insects and mammals: In insects, conversion of cholesterol to ecdysone is considered to begin outside the mitochondria by the microsome-localized [2, 17, 18] protein termed Neverland, which converts cholesterol to 7dC . This step is followed by the biosynthetic reactions taking place within the “Black box” considered but not proven to be mediated by two other enzymes, Spook and Shroud [2, 7, 17]. All research evidence [2, 7, 17] show that the subsequent biochemical conversions lead to the generation of 2,22-dideoxyecdysone, which is transported to the mitochondria only after the conversion of 2,22,25-trideoxyecdysone to 2,22-dideoxyecdysone [2, 7, 17] by another Halloween enzyme, the microsome-localized Phantom . In mammals, after the rate-limiting step of cholesterol transport from the OMM to the IMM is accomplished , the first modification of cholesterol takes place by the action of CYP11A1 [1, 13], followed by the efflux of pregnenolone out of the mitochondria to be further modified by other CYP enzymes . Efflux of the mitochondrial steroid product pregnenolone to the endoplasmic reticulum (ER) is thought to occur by free diffusion  while influx of the steroidogenesis substrate, cholesterol, to the mitochondria has been shown to occur through mitochondria-associated membranes (MAMs) where the protein ATAD3 is a critical component of the process . Although the mechanism of ecdysone efflux from the mitochondria in PGs is unknown, research on Dm RGs has shown that secretion of steroid hormones such as ecdysone may not be simply taking place by diffusion across the plasma membrane, but may be actively released from steroidogenic cells by a mechanism that involves calcium signalling and an ABC transporter termed Atet, which fills secretory vesicles with steroid hormone .
While the order of steroid biosynthesis reactions appears to be different between mammals and insects, the mechanism of intracellular cholesterol trafficking that takes place in the early steroidogenic steps  appears to be mechanistically similar in steroidogenic mammalian and insect cells. One well-documented common feature is the involvement of Niemann-Pick C1 protein in cholesterol trafficking from late lysosomes to the ER in both mammalian and insect cells [8, 23, 24, 25]. To sum up the current knowledge of the steroidogenic cascade, in PG cells [2, 26] suitable sterols are first delivered to the ER to be converted to 2,22-dideoxyecdysone, then transported to the mitochondria to be converted to ecdysone which then diffuses from the mitochondria to be actively released by the action of Atet . In mammalian cells, cholesterol moves from the late lysosomes to the ER [8, 10] and then to the mitochondria through the action of the transduceosome . It is then converted to pregnenolone, which diffuses to the ER to be converted to tissue-specific steroid(s) whose precise mode of secretion is currently unknown .
Several studies have made available transcriptomic data from RGs of model insect species taking advantage of developments in transcript analysis and the powerful genetic model of Drosophila melanogaster [4, 22, 27, 28, 29, 30, 31, 32]. While each study highlights a different set of regulatory mechanisms, the wealth of all this available data poses one important question: Are patterns of steroidogenic gene expression in the insect PGs so highly conserved to justify it being proposed as a model for steroid hormone biosynthesis and regulation ? In other words, is there a universal cascade in insect steroidogenic cells from cholesterol import to steroid secretion or is the steroidogenic process species-specific and not context-specific?
To answer these questions and draw conclusions on what universally applies among the various studies in insects, we carried out an analysis of six datasets from two model insects, Drosophila melanogaster (Dm) and Bombyx mori (Bm), encompassing eight different studies that also include phenotypic data from RG-specific gene silencing by RNAi [4, 22, 27, 28, 29, 30, 31, 32]. Our aim was to refine this steroidogenic model, the PGs, by identifying which genes are commonly expressed in the PGs of the two model insect species. To be able to compare RNA-seq data from Dm RGs  with data from Bm PGs, we re-analysed our RNA-seq data from Bm [27, 28] using Cufflinks . We also re-analysed another Bm RNA-seq dataset [29, 30] and consolidated the datasets with microarray data that identified Dm RG-specific transcripts . These datasets were then enriched with data from prothoracicotropic hormone (PTTH)-stimulated PGs from Bm and Dm  and data that identified the resulting phenotype of RG-specific gene knockdown, an amazing feat made possible only in Drosophila melanogaster . We find that beyond a small subset of genes that are shared by the two species, there is very little in common in the signalling mechanisms and expression patterns of Halloween genes that drive ecdysteroidogenesis between these species. On the other hand, we find that the components of the transduceosome that has been identified in mammalian cells [10, 13, 16, 34] are present in PG cells of both species. The PGs can be viewed as a steroidogenic model within a refined framework that includes the transduceosome regulatory cascade, the steroid biosynthesis reactions and transporter-mediated cholesterol and steroid trafficking. Our analysis does not support the existence of highly similar gene regulatory networks and signalling cascades between Drosophila melanogaster and Bombyx mori which should be analysed in a species-specific context.
Analysis of RNA-seq data from the prothoracic gland of Bombyx mori and the ring gland of Drosophila melanogaster
We analysed our previously published RNA-seq data (SRA Acc. No. SRP062258)  de novo using Cufflinks  to draw comparisons with RNA-seq data from Dm RGs . Our Bm RNA-seq data from V-0 and V-6 of the 5th instar match the available Dm RNA-seq data  and provides insights on the ecdysteroidogenic mechanisms between these two developmental time points. On V-0, the high juvenile hormone titre as well as the neurally-derived Bommo-FMRFamide (BRFa) suppress the ecdysteroidogenic activity of the PGs [35, 36], whereas on V-6, which is the day of the onset of wandering behaviour in Bm, the juvenile hormone titre is low, PG cells secrete high amounts of ecdysteroids [35, 36] and the PG cells are not fully stimulated by prothoracicotropic hormone .
To assess the quality of the Cufflinks analysis, we plotted the Cufflinks values (Additional file 3: Table S3) against the Bowtie2 values on gene models from our previous analysis [27, 28] (shown also in Additional file 3: Table S3). We find a poor correlation between the two values in both V-0 and V-6 (Fig. 1b and c) an indication that about 20% of the identified Bm gene models are either erroneously annotated or do not match with the transcripts provided by Cufflinks. Instead of arbitrarily using the median FPKM value, as we did previously with RPGM values , or use an arbitrary value of 1, as done for RNA-seq data from Dm RGs , above which all genes are considered as expressed, we determined the number of genes that are expressed on V-0 and V-6 by analysing by qPCR the expression of a sample of 650 genes throughout the 5th instar and the first day of pupal stage (Fig. 1d and e). We then plotted the qPCR data (expressed as transcript per prothoracic gland cell)  against the FPKM values from Cufflinks and the RPGM values from PANDORA (Fig. 1d and e, see also Additional file 7: Table S7) [38, 39]. The results showed that a value of ≥2.873 FPKM for V-0 and ≥ 3.061 FPKM for V-6 can be used as a safe estimate of gene expression (see Additional file 7: Table S7). These values corresponded to expression levels of BmBestrophin-4 on V-0 and BmLMBR1L  on V-6, respectively (Additional file 7: Table S7). Comparing the qPCR results with the RPGM data of these 2 genes from our previous analysis , we find that a value of ≥0.0259 RPGM for V-0 and ≥ 0.0253 RPGM for V-6 can be used as a safe estimate of gene expression based on the PANDORA analysis  (Additional file 7: Table S7). Plotting these three values (i.e. FPKM, RPGM and transcripts per cell) for each gene confirmed our previous assumption that a large number of gene models in the KAIKObase database are erroneously annotated (Fig. 1d and e). This observation is numerically illustrated in the Venn diagram (Fig. 1f) which shows that 1025 loci identified by Cufflinks as expressed in PG cells, do not correspond to Bm gene models. The Venn diagram shows that there are 6008 genes that can be identified as expressed in Bm PGs (Fig. 1f) by both methods of evaluation of gene expression. By plotting in a Venn diagram the identified genes, the differentially expressed and the alternatively spliced genes between V-0 and V-6 in Bm PGs, only a minor proportion (0.023%) of differential expression can be attributed to alternative splicing (Additional file 8: Figure S1A), and the same minor proportion (0.027%) is observed when only the expressed genes in V-0 and V-6 are considered (Additional file 8: Figure S1B).
We next used g:Cocoa  to discover overrepresented and underrepresented gene clusters according to their gene ontology annotation (Fig. 1g and Additional file 3: Table S3). The results show that genes involved in ribosome biogenesis are expressed on V-0 and genes involved in cellular protein modification are overrepresented on V-6 (Fig. 1g and Additional file 3: Table S3). Another noteworthy observation is that genes involved in steroid hormone signalling and synthesis are underrepresented on V-0 (Fig. 1g) providing further support to our choice of the developmental time point to be examined (i.e. V-0).
To identify common and tissue-specific genes between the two species, we annotated and identified the Bm and Dm orthologues (Additional file 9: Table S8) and generated 4 distinct uncorrelated datasets: 1) of the 272 PG-specific genes derived from RNA-seq data [29, 30] (Additional file 10: Table S9 and Additional file 11: Table S10 and Additional file 12: Table S11), 2) of the 472 expressed and Dm RG-specific genes derived from RNA-seq data  (Additional file 12: Table S11), 3) of the 208 Dm RG-specific transcripts derived from microarray data  (Additional file 12: Table S11) and 4) of the 665 genes whose levels differed by 10-fold from V-0 to V-6 and vice versa (Additional file 12: Table S11). We then plotted the common genes identified in these datasets in a Venn diagram (Fig. 2b) which shows that there are only 9 genes expressed in a PG-specific manner in both species. These are the 5 Halloween genes identified in previous studies , the prothoracicotropic hormone receptor, Torso , the Niemann-Pick C1 gene encoding the cholesterol transporter NPC1, a gene expressing the organic-anion transporter 74D  and the ecdysone-inducible gene E3 (ImpE3)  (Fig. 2b). A sixth Halloween gene (i.e Shroud)  had a constantly high < 10-fold difference between V-0 and V-6 of Bm (Fig. 2b). Most of the other genes shown in the grid (Fig. 2b) have already been described as having a critical role in ecdysteroidogenesis with Atet  being the most recently assigned one.
To determine which of the genes present in these 4 datasets are regulated by the PTTH signalling cascade, we used RNA-seq data from PTTH-treated Bm PGs  which we annotated (Additional file 13: Table S12) to add an additional dataset of Dm orthologues of PTTH-regulated genes (Additional file 14: Table S13) . By plotting all 5 datasets together in a Venn diagram (Fig. 2c) we identify that only 4 Halloween genes are regulated by the Torso signalling pathway, in addition to the Torso and OATP74D genes (Fig. 2c). These results are in complete agreement with previously reported results which showed that the ecdysteroidogenesis-promoting PTTH upregulates the JH-dependent expression of the Halloween genes Neverland, Spook, Disembodied and Phantom, but not the JH-independent expression of Shadow and Shroud [45, 46].Among the other genes, most notable is the expression of a gene encoding for a NAD kinase (CG33156) which is a PG-specific  and RG-specific  gene but not highly expressed in the PGs and RGs of both species (see Additional file 12: Table S11).
We next asked how many PG-specific genes are up- or down-regulated by PTTH in Bm PGs using previously published datasets [4, 29, 30]. To answer this, we annotated and compared the RNA-seq data of PTTH-stimulated PGs  with our re-analysed RNA-seq data derived from a previous study . The results (Fig. 2d and e) show that PTTH up-regulates 20 PG-specific genes and down-regulates 5 PG-specific genes while there was no statistically significant gene enrichment in all the up-regulated or down-regulated genes. A similar comparison in Dm done by another research group  using transgenic flies yielded a set of genes (Additional file 13: Table S12 and Additional file 14: Table S13) that were different from those shown in Fig. 2d and Fig. 2e.
Next, we integrated the phenotypic data based on RG-specific in vivo RNAi screen of 12,504 Dm genes  with our transcriptomic data based on orthologous genes between the two species. Our analysis (see Additional file 9: Table S8, Additional file 14: Table S13, Additional file 15: Table S14 and Additional file 16: Table S15) for the different phenotypes observed in Dm (Fig. 2f)  shows that there is a core subset of highly expressed genes of Bm whose orthologues in Dm are developmentally vital (see Additional file 9: Table S8, Additional file 15: Table S14 and Additional file 16: Table S15). Covering almost 1/3 of the highly-expressed genes in PGs (125 of 350) (Fig. 2f), the number of highly expressed genes of Bm which are orthologous to Dm genes is proportionally decreasing in less severe phenotypes (Fig. 2f and Additional file 9: Table S8, Additional file 15: Table S14 and Additional file 16: Table S15). In our view, this is an indication that ecdysteroidogenesis diverged significantly between these two species and subtler regulatory mechanisms come into play to shape ecdysteroidogenesis by the PGs as development ensues.
Component dynamics of critical signalling pathways in the prothoracic gland of Bombyx mori and the ring gland of Drosophila melanogaster
To identify any dynamic interplay between the expression of ACs and PDEs in the PGs, we identified and analysed the expression of PDE encoding genes present in the Bm genome. We identified 9 PDE genes present in the Bm genome (Fig. 3c). Multiple sequence alignment of their catalytic domains (Additional file 18: Figure S2C) revealed that the critical amino acids for cAMP hydrolysis [52, 53, 54] and IBMX binding  are highly conserved in Bm phosphodiesterases. Domain analysis using HMMER (Additional file 18: Figure S2D) showed that four of the nine proteins are cAMP-specific while the remaining five contain domains and residues similar to cGMP-specific PDEs (Additional file 18: Figure S2C) [52, 53, 54]. Analysis of expression of phosphodiesterase genes in Bm PGs (Fig. 3c) showed that only one phosphodiesterase gene (BmPDE4L), encoding the Bm orthologue of Dunce , is expressed in PGs, while another gene, BmPDE11L, had very low expression levels in the late 5th instar. A posterior probability tree showed that most Bm PDEs are closely related to their orthologues in Dm (Fig. 3d). Based on RNA-seq data , 4 PDE encoding genes, among them Dunce, are expressed in the Dm RGs at the wandering stage (Additional file 16: Table S15).
To resolve the discrepancy between gene expression and ability to be activated by forskolin , we measured the cAMP levels of Bm PGs upon activation by forskolin, IBMX or ionomycin (Fig. 3e). The results show that forskolin produces massive increases in cAMP levels only on V-5 and V-6 when BmAC1L is highly expressed but not on V-7 when BmAC5L is highly expressed (Fig. 3f). IBMX produced a 10-fold less increase in cAMP levels on V-5, V-6 and V-7 (Fig. 3e). The pattern of IBMX-induced cAMP increases probably results from the catalytic activity of multiple ACs (Fig. 3a), while ionomycin did not produce any substantial increases in cAMP (Fig. 3e) although it is a potent stimulator of ecdysteroid secretion .
We then plotted the expression pattern of BmPDE4L, the Dunce orthologue, against the expression pattern of BmAC1L, the Rutabaga orthologue, to identify the cAMP signalling dynamics during the final larval instar of Bm (Fig. 3g). In agreement with results from other research groups , our results show that the cAMP signalling pathway is dominant in the middle of the 5th instar, until the first major release of PTTH which takes place in the scotophase of V-6 and upregulates the expression of BmPDE4L (Additional file 13: Table S12) to suppress cAMP production in the final part of the 5th instar (Fig. 3g).
Finally, we identified and analysed the expression of PLC encoding genes present in the Bm genome to be able to compare gene expression patterns for this family of genes since proteins encoded by these genes have been shown to mediate ecdysteroidogenesis and ecdysteroid secretion [22, 60, 61, 66]. We identified 2 PI-PLC X domain containing-protein (dcp) and 4 PLC genes present in the Bm genome (Fig. 4c). Multiple sequence alignment of the catalytic domain of the 4 PLCs revealed that they all have the residues critical for catalytic activity [67, 68] while PLCβ1 and PLCβ4 contain domains necessary for interaction with Gαq  (Fig. 4c — and Additional file 19: Figure S3C and Figure S3D). However, only PLCβ4 is expressed in Bm PGs (Fig. 4c) in contrast with the Dm RGs where both PLCβ1 and PLCβ4 are expressed [22, 31]. Knockdown of PLCβ1 in RGs results in major developmental delay  (Additional file 17: Table S16). It is noteworthy that BmPLCγ shows an unusual expression pattern in Bm PGs with high expression on V-4 and V-5 and a drop in its expression on V-7 (Fig. 4c). A posterior probability tree of Bm and Dm PLCs and PI-PLC X dcps (Fig. 4d) shows that, apart from 2 exceptions, there is almost a one-to-one orthology between the PLCs of both species. Based on RNA-seq data , all but one of the genes encoding for a PLC or a PI-PLC X dcp are expressed in Dm RGs (Additional file 17: Table S16).
Expression profiles of cytochrome P450 and ecdysteroidogenesis-regulating genes in the prothoracic glands of Bombyx mori
Next, we focused on the CYP family of proteins to benchmark under a single method of absolute quantification the large number of CYP genes identified in publications from different research groups, reviewed in [2, 26], as playing a role in ecdysteroid synthesis and secretion in Dm and Bm. To accomplish this, we re-analysed and annotated datasets of CYP proteins [70, 71] and integrated them with data from the transcriptome datasets (see Additional file 20: Table S17). A posterior probability tree of Bm CYP proteins (Additional file 21: Figure S4) was congruent with previous analysis resolving 4 major clans of CYP proteins . Three of the proteins belonging to the mitochondrian clan (Mito. clan) are encoded by Halloween genes known as Shadow(CYP315A1), Disembodied(CYP302A1) and Shade(CYP314A1) and two other proteins belonging to the CYP2 clan  are encoded by Halloween genes known as Spook (CYP307A1) and Phantom(CYP306A1) [2, 26].
We analysed the expression of all 82 genes in Bm PGs throughout the 5th instar and the first day of the pupal stage and we also analysed the expression of 15 other genes all known or implicated in ecdysteroidogenesis or ecdysteroid metabolism in peripheral tissues.
Among the 82 CYP genes, 23 were found to be expressed in PG cells although most of them had peaks of expression on specific days but were barely detectable in most of the other days examined (Fig. 5). Among those expressed, CYP307A1, the Bm orthologue of Spook and Spookier, had the highest expression and in specific developmental time points it was the most abundant transcript of all 650 examined genes (see Additional file 7: Table S7). Besides the CYP genes, we examined the expression of the short-chain dehydrogenase/reductase (SDR) gene, non-molting glossy/shroud (BmShroud) , the cholesterol 7,8-dehydrogenase gene, neverland , the Halloween genes haunted and ghost  which encode for the subunits of the coat protein complex of COPII vesicles, Sec23 and Sec24CD, respectively, the mitochondrial NADPH:adrenodoxin oxidoreductase gene, Dare , the Bm orthologue of Start1(BmStart1) , the pre-mRNA splicing factor gene, Ecdysoneless (BmEcdysoneless) , the ecdysone oxidase (BmEO) and 3-dehydroecdysone-3α-reductase (Bm3DE-3α-Red.) genes , the 3-dehydroecdysone 3β-reductase (Bm3DE 3β-Red.) gene [78, 79], the 3-dehydroecdysone reductase gene, AKR2E4  and two closely related isoforms (aldo-keto reductase AKR2E4-like and AKR2E4-like2, (BMgn009799 and BMgn012152, respectively), the glutathione S-transferase gene, Noppera-bo , Woc which encodes for 7,8-dehydrogenase modulator zing-finger protein  and the Dm orthologue of the nuclear receptor HR4 .
There were two patterns of expression immediately evident from the transcript expression analysis in Bm PGs. Namely, the mitochondrial genes Disembodied, Shadow and Dare had expression peaks on V-5 followed by reduction of their expression levels late in the 5th instar when ecdysteroid secretion is increasing (Fig. 5 and Additional file 22: Figure S5). On the other hand, the microsomal-localized genes Spookier, Shroud, Neverland as well as BmStart1, BmNoppera-bo, BmAKR2E4-like and BmAKR2E4-like2 (Fig. 5 and Additional file 22: Figure S5) had patterns of expression that were matching the peaks and troughs of ecdysteroid secretion [27, 36]. Contrary to these patterns, CYP18A1 which encodes for an ecdysteroid 26-hydroxylase, an ecdysone inactivating enzyme , had a pattern of steadily high expression in the first half of the 5th instar followed by a decrease after V-6 (Fig. 5 and Additional file 22: Figure S5). Comparing these patterns with those reported for Dm , we observe that only Neverland, Spookier and CYP18A1 had patterns of expression that matched those in Bm PGs (compare Fig. 5 and Fig. 3 in ). Finally, we investigated the expression in Bm PGs of HR4, the gene encoding for the nuclear receptor HR4 which has been proposed as a key regulator of PTTH signalling in both Bm and Dm PGs [4, 72]. Our results (Additional file 23: Figure S6) show that HR4 is undetectable in Bm PGs until V-6, has a peak of expression on V-7 and then falls to undetectable levels on V-8. This result is consistent with the documented upregulation of HR4 expression by PTTH stimulation of Bm PGs  after the reported secretion of PTTH on the scotophase between V-6 and V-7 in the hybrid used in this study . Therefore, the peak of HR4 expression on V-7 is the result of PTTH stimulation of Bm PGs but its transcript levels are barely detectable on the other days of the 5th instar (Additional file 23: Figure S6).
Data synthesis towards refining the model
We also compared the expression levels of plasma membrane receptors based on our previous results  and those available from Dm RNA-seq data  (see Additional file 24: Table S18) to find that only Torso has a PG-specific and PG-enriched mode of expression (Fig. 2 and Additional file 24: Table S18) in both species. There were 4 Dm genes encoding for methuselah receptors and a gene encoding for a putative prostaglandin receptor (CG7497), that had > 10-fold enrichment in Dm RGs (see Additional file 24: Table S18)  while related receptors or orthologues of these genes are expressed in Bm PGs . Beyond that, there is very little similarity on the class A and class B GPCR genes that are expressed in the Bm PGs and Dm RGs on this particular developmental time point of the two species. However, receptors that belong to class D Frizzled/Smoothened group  are expressed in both species (Additional file 24: Table S18). The same is true for genes encoding for receptor tyrosine kinases and receptor serine/threonine kinases such as Egfr, Babo, wit and tkv which are expressed in PG cells of both species in agreement with previously published results [27, 85]. Our analysis provides no clues to the nature of an unknown GPCR and its cognate ligand implicated as mediators of the calcium signalling event that stimulates ecdysone release from Dm RGs .
We extended the comparison of PGs from both species by comparing the expression of genes involved in ecdysteroidogenesis as presented for Dm RGs  with their corresponding orthologues in Bm PGs (Fig. 6b). The results show that besides Neverland, Shroud and RpS6, there is very little similarity in relative transcript abundance between the two species (Fig. 6b). Most striking is the difference in the expression of Spook/Spookier between the two species (Fig. 6b). Whereas Spook is expressed at very low levels in Dm RGs, its Bm orthologue, BmSpookier(CYP307A1), is the most abundant transcript in Bm PGs (Fig. 6b). The most abundant transcript in Dm was Phantom (Fig. 6b) which also had high levels of expression in Bm PGs (Fig. 5).
Bombyx mori gene ID
Flybase gene ID
FPKM Prothoracic gland
FPKM Ring gland
L1, L2, L3 arrest
In this study we performed an analysis of six different datasets presented in eight different publications to define the framework within which the insect PG cells can be a model for steroid biosynthesis and regulation . To accomplish our goal, we integrated these datasets into a single file (Additional file 12: Table S11) and attempted to identify common regulatory components of the steroidogenic cascade of insect PGs. We extended our search of common regulatory components into vertebrate steroidogenic models to provide a detailed comparison between insect and vertebrate endocrine events at a molecular and cellular signalling level in light of the latest advances in mitochondria cholesterol transport in mammalian steroidogenic cells [1, 8, 10, 13, 34, 86].
Despite the major technical differences between the examined studies, which comprise microarray data , RNA-seq data [27, 28, 29, 30, 31] and phenotypic data , the integrated datasets are in agreement with most, if not all, the published literature on insect steroidogenesis. Although we were initially discouraged by the potential fallibility of the cross-comparison of the datasets, we find that the various datasets are congruent in a species-specific context but less so in a cross-species context.
Using the same criteria as those used in the publications originally describing the datasets, we find that there are few common regulatory components of steroidogenesis conserved between Bm and Dm (Fig. 2) that can set the PG cells apart from other cells in the insect body. In all 6 datasets examined, only 5 Halloween genes , Torso , NPC1 , OATP74D [42, 89], and ImpE3  (Fig. 2b) met the set criteria, while several genes failed to meet one of the criteria set for each dataset (Fig. 2b). The latter group contains Atet  that is not specifically expressed in Bm PGs, CYP18A1  that is not specifically expressed in Dm RGs and Shroud and Noppera-bo that showed a steadily high level of expression in Bm PGs (i.e not > 10-fold change between V-0 and V-6). It should be noted that NPC1 , OATP74D [42, 89] and ImpE3  have been reported to be expressed in other insect tissues as well, so the results presented here show that they are highly, but not exclusively, expressed in the PGs relative to other tissues. Although the role of several of the genes presented in Fig. 2b and c has been well-documented in insect steroidogenic cells [2, 7, 30, 41, 88], we identified several genes whose role in ecdysteroidogenesis should be further explored (Fig. 2). One of these is the OATP74D gene which encodes for a protein with cytoplasmic localisation when overexpressed in a heterologous system  but its PG-specific expression in both species merits further investigation. Another example is CYP18A1, a gene encoding for a CYP enzyme with 26-hydroxylase activity  involved in ecdysteroid inactivation , whose enriched expression in Bm PGs (Fig. 5) when ecdysone secretion is low  indicates that there may exist an interplay between ecdysteroid synthesis and catabolism within the PG cells that dictates the rate of ecdysone release.
One striking difference in the levels of expression of the Halloween genes involved in ecdysteroid synthesis in both species is evident when comparing the levels of expression of Spook/Spookier (Fig. 6). The role of this enzyme in ecdysteroidogenesis remains unknown , but the very low levels of its expression in Dm RGs indicate that this may be the “rate-limiting” step in ecdysteroidogenesis in this species. On the contrary, Spookier, the Bm orthologue of Spook , is the most abundant transcript in Bm PGs (Fig. 6b), an indication that it may not be the “rate-limiting” step in ecdysteroidogenesis in Bm PGs. Its placement in the “Black box” of the ecdysteroidogenic cascade in insects  may represent its unknown enzymatic role in ecdysteroidogenesis but should not represent its rate-limiting role in the process. In comparison with other Halloween enzymes, there are more examples of post-transcriptional [92, 93] and post-translational  regulation of Spook expression and this suggests that this enzyme is under a dynamic regulatory mechanism that may be different in lepidopteran species  than in Drosophila melanogaster.
Τhe cAMP signalling pathway was the first cellular signalling mechanism that was associated with ecdysteroidogenesis  but the importance of this signalling pathway in ecdysteroidogenesis receives less attention ever since the ERK-mediated signal transduction mechanism of PTTH was identified . Recent phenotypic data from RG-specific RNAi-mediated gene knockdown [22, 32] shows that the IP3/DAG signalling pathway is the dominant signalling pathway in Dm PGs since RNAi-mediated gene knockdown of the cAMP signalling pathway in Dm PGs gives no observed phenotype . Conversely, a large number of publications have shown that the cAMP signalling pathway is the prominent mechanism of ecdysteroidogenesis in lepidopteran species . Our results further show that the cAMP signalling pathway is most critical in the middle of the 5th instar of Bm, as judged by the sharp increase in expression of BmAC1L (Fig. 3a), the Bm orthologue of Rutabaga, in concert with the low levels of expression of BmPDE4L (Fig. 3b), the Bm orthologue of Dunce, the only phosphodiesterase expressed in Bm PGs. However, the very high levels of expression of BmAC5L (Fig. 3a)  present a puzzle that may be difficult to resolve regarding the role of ACs in PGs and at the same time highlight the inconsistent pattern of gene expression emerging from the comparison between the two species.
This pattern is even more inconsistent between the two species when it comes to transcription factors expressed in the PGs  (Additional file 12: Table S11). Of the 13 DNA-binding proteins identified as specifically expressed in the Dm RGs , only 2 of their orthologues, timeless (tim) and vvl are expressed in Bm PGs (Additional file 3: Table S3) and only 1 of these (timeless (tim)) shows specific expression in the Bm PGs (in comparison with its levels in Br-CC-CA complexes (Additional file 11: Table S10 and Additional file 12: Table S11). Even the proposed transcription factor network in Dm PGs  does not apply for the Bm PGs. Identification of gene orthologues proposed to participate in interactions of RG-enriched transcription factors with some of their putative RG targets, based on REDfly and modENCODE data [4, 94, 95], shows that only Traf4 is expressed in Bm PGs while Snail expression in RGs  is not supported by the RNA-seq data  and no orthologue of Snail is identified as expressed in Bm PGs (Additional file 12: Table S11).
The transcription factor HR4 has been proposed as an example of a factor that shows no transcript enrichment in the Dm RG, but appears to be a key regulator of PTTH signalling in both Bm and Dm . Our results support this notion only within the context of PTTH signalling since HR4 expression was undetectable until V-6 and had a peak of expression only on V-7, the day after a major secretion of PTTH takes place in Bm . Thus, HR4 expression can serve as a marker of PTTH/Torso transcriptional activation of PGs but its precise role in Bm PGs, beyond the PTTH signalling cascade, remains elusive given the complex patterns of expression of the Halloween genes (Fig. 5).
We believe that the diversity in gene expression we observe in cells of the same tissue from two different insect species is likely to have an underlying biological basis that is intrinsically complex and reflects the evolutionary distance between the two species. Therefore, caution should be exercised when interpreting data on steroid synthesis and secretion from one species as a universal paradigm, when in fact it may be a species-specific paradigm. This rule, though, should not apply to the components of the transduceosome [13, 34] (Table 1) which seems to be the key regulatory mechanism in unravelling the inherent differences between mammalian and insect steroidogenesis.
The hybrid J106xDAIZO of Bombyx mori was used in this study. Hybrid animals were generated from the pure races, J106 and DAIZO, that are bred and maintained in the insect rearing facility of the Department of Biology, Naitonal and Kapodistrian University of Athens. In this hybrid, the 5th instar period lasts about ~ 208 h, the onset of pupal commitment occurs after 60 h (day 3) and the onset of wandering behaviour occurs 144 h (day 6) after the final larval ecdysis. This hybrid has a short period of cocoon spinning that lasts ~ 38 h followed by a period of ~ 26 h before pupal metamorphosis. In this study, each day of the 5th (V) instar is designated with its numerical number (i.e.V-0, V-1 etc.) while the first day of the pupal stage is designated as P-0. Larvae were reared on fresh mulberry leaves under a 12:12-L:D photoperiod at 25 ± 1 °C and 60% relative humidity. Larvae were staged after every larval ecdysis, and the day of each ecdysis was designated as day 0 (V-0). Since larvae mainly moult to the final (5th) instar during the scotophase, all larvae that ecdysed during the scotophase were segregated immediately after the onset of photophase. This time was designated as 0 h of the 5th instar and 4 h later samples of PGs were taken (V-0 samples), while samples of PGs from day 6 (V-6) were taken 144 h later, at the onset of wandering behaviour.
Quantification of cAMP levels in prothoracic glands
Larvae were anesthetized by submersion in water and PGs were dissected rapidly (~ 2 min/animal) from each larva in sterile saline (0.85% NaCl). The glands were pre-incubated in Grace’s medium (Invitrogen) for 15–30 min and then each gland was incubated for 30 min in 50 μl of Grace’s medium at 25 ± 1 °C in a well of a 96-well plate (Greiner) with or without either forskolin (50 μM), 3-isobutyl 1-methylxanthine (IBMX) (100 μM), or ionomycin (50 μM) (Calbiochem) for 30 min at 25 ± 1 °C. The cAMP levels in PGs were quantified by an enzyme immunoassay  kit according to manufacturer’s instructions (Cayman Chemical Company). Prothoracic glands (n = 4–5) were removed after incubation and homogenized in 100 μl acidified ethanol (ethanol/1 N HCl, 100:1 v/v). The homogenate was centrifuged (2000 g, 15 min) and the supernatant was removed and saved. The pellet was resuspended in 200 μl ethanol/DW (2:1 v/v) and centrifuged as above. The supernatants were combined, dried in vacuo, and the residue was dissolved in phosphate buffer provided with the kit and acetylation of standards or samples by acetic anhydride was performed according to the instructions in the kit.
Transcriptomic analysis of Bombyx mori PG cells
Prothoracic glands from V-0 and V-6 (onset of wandering stage) of the 5th instar were isolated as described above, meticulously cleared of any associated tissue or debris and total RNA was isolated immediately upon gland removal with TRIzol (Invitrogen) according to the manufacturer’s instructions. The lllumina® mRNA-Seq Sample Prep Kit was used to process the samples according to the manufacturer’s instructions (1,004,898 Rev.D). Briefly, mRNA was isolated from total RNA using oligo-dT magnetic beads. After fragmentation of the mRNA, cDNA synthesis was performed and the resulting cDNA was ligated with the sequencing adapters and amplified by PCR. Quality and yield after sample preparation was measured with the Agilent 2100 Bioanalyzer (Agilent Technologies). The size of the resulting products was consistent with the expected size distribution with a broad peak between 200 and 500 bp on a DNA 1000 chip. A concentration of 17 pM of DNA was used for clustering and DNA sequencing on lllumina cBot and HiSeq2500 (HCS v2.2.58 software) according to manufacturer’s protocols.
The Bombyx mori reference genome
To analyse the raw files from RNA-seq experiments that compared transcript abundance between Bm prothoracic glands from V-0 and V-6 of the 5th instar [27, 28] as well as public Bm RNA-Seq data from Bm PG cells and Br-CC-CA complexes at the wandering stage of the 5th instar, which is equivalent to V-6 of our samples [29, 30] (DDBJ sequence read archive accession number: DRA002282), we constructed a Bm reference genome by retrieving sequences assembled in scaffolds and anchored to chromosomes from the public data repository of KAIKObase (http://sgp.dna.affrc.go.jp/pubdata/genomicsequences.html). Contigs of Bm genome assembled to scaffolds but not anchored to chromosomes that were less than 20 kb were excluded and the remaining contigs were assembled to a final FASTA file which was used to construct a Bowtie2  (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) index for subsequent use with TopHat2  and Bowtie2  aligners. Comprehensive gene sets were retrieved again from KAIKObase (http://sgp.dna.affrc.go.jp/ComprehensiveGeneSet/) and a similar procedure as described above was followed to construct a gene file containing genes anchored to chromosomes and additional genes which were inferred in the scaffold sequences so as to construct a final Bm gene file in GTF format to supply it to the TopHat2 aligner. From the final GTF file, BED files suitable for visualization in the University of California, Santa Cruz (UCSC) Genome Browser (https://genome.ucsc.edu/) were also constructed.
Short read mapping
A total of 1.63 Gb from PG samples and 1.45 Gb from Br-CC-CA samples [29, 30] assembled in the FASTQ files containing single-end 36 bp sequence reads, from DDBJ sequence read archive accession number DRA002282, or paired-end 125 bp sequence reads from our own experiments [27, 28], were subjected to quality control using the FastQC package and mapped on the reference genome using TopHat2  with the standard parameters for reads obtained with Illumina platforms and the --GTF parameter was supplied with additional transcript annotation data for the Bm reference genome as described above. In the case of our paired-end data, we also applied the following: the --read-gap-length and --read-edit-dist parameters were set to 3 (default is 2) to allow some more freedom in the strictness of the overall alignment procedure since the Bm genome is not fully annotated yet. For both datasets, after completing a first round of spliced alignment with TopHat2, reads which failed to map to the reference genome were supplied to the bedtools bamtofastq command from the BEDTools suite (https://github.com/arq5x/bedtools2) to create a FASTQ subset of the original raw short reads. These short reads subset was subjected to a second round of unspliced alignment with Bowtie2 in sensitive mode (options applied: --local --very-sensitive-local --maxins 1000 --dovetail) to allow mapping of part of this subset back to the reference genome. Mapping of this subset back to the reference genome occurred when a sufficient and continuous proportion of each read was successfully aligned. This procedure allowed for the alignment of paired-end reads located quite further than the average pair distance. These two rounds of alignment procedure led to increased alignment rates.
Cufflinks analysis of Bombyx mori PG cells transcripts
The raw reads of our RNA-seq data have been deposited into the NCBI Short Read Archive (SRA, http://www.ncbi.nlm.nih.gov/sra/) under accession number SRP062258 . This record combines 3 biological replicates from V-0 and 3 biological replicates from V-6 with accession numbers SAMN03978782 and SAMN03978783, respectively. To compare and complement the results of the differential expression analysis on our RNA-seq data performed using PANDORA [27, 28] as well as explore potential differentially expressed isoforms, we used the Cufflinks pipeline . Specifically, we first reassembled and quantified the Bm transcriptome for each sample using the Cufflinks command with parameters --multi-read-correct, −-max-mle-iteration 10,000, −-max-bundle-frags 100,000, −-overhang-tolerance 10, −-max-multiread-fraction 0.5, −-no-faux-reads --upper-quartile-norm. In addition, we provided a guide for the transcriptome assembly of Cufflinks using a GFF file we assembled from the procedure described in the previous section through the cufflinks option --GTF-guide. Subsequently, we merged the assembled and quantified transcriptomes using the cuffmerge command. Finally, we executed cuffdiff for the contrast V-6 versus V-0 with the following arguments: --multi-read-correct, −-max-mle-iteration 10,000, −-max-bundle-frags 100,000.
In order to perform differential expression analysis of the RNA-Seq data, we used the Bioconductor  package metaseqR with the PANDORA  method which has been shown to improve the overall accuracy by optimising the trade-off between true positives and false hits in differential expression analysis through the systematic combination of multiple analysis algorithms. Specifically, and regarding the metaseqR workflow, the BAM files, one for each RNA-seq sample [27, 29, 30], were summarised to a gene read counts table, using the Bioconductor package GenomicRanges  and the Bm genes derived as described above. Prior to the statistical testing procedure, the gene read counts were filtered for possible artefacts that could affect the subsequent statistical testing procedures. Genes presenting any of the following were excluded from further analysis: i) genes whose average reads per 100 bp was less than the 25th quantile of the total normalised distribution of average reads per 100 bp, ii) genes with read counts below the median read counts of the total normalised count distribution. The resulting gene counts table was subjected to differential expression analysis for the contrasts Br-CC-CA versus PG cells , using the Bioconductor packages DESeq , edgeR , limma , NBPSeq , NOISeq  and baySeq . To combine the statistical significance from multiple algorithms so as to optimise the trade-off between true positives and false hits, we applied the PANDORA  weighted p-value method across all results. In the final read counts table, which was used for differential expression analysis, each row represented one gene, each column one RNA-Seq sample and each cell the corresponding read counts associated with each row and column.
RNA-seq data visualisation
To create a UCSC Genome Browser visualization track data hub, BAM files resulting from the alignment procedure, of our RNA-seq data  and that of another research group [29, 30], were converted to BED format (https://genome.ucsc.edu/FAQ/FAQformat.html#format1) using the bedtools bamtobed command from the BEDTools suite (http://bedtools.readthedocs.org/en/latest/index.html#) with the -split option to report RNA-seq reads split by the TopHat2 algorithm as separate alignments, referred hereafter as “tags”. The RNA signal from these files was extracted by reformatting them in BedGraph (http://genome.ucsc.edu/goldenPath/help/bedgraph.html) format using the bedtools genomecov command from the BEDTools suite with the -bg option and then to bigwig (https://genome.ucsc.edu/goldenpath/help/bigWig.html) format using the bedGraphToBigWig program supplied by UCSC. The bigWig tracks were visualised in a custom UCSC Genome Browser track data hub, hosting the Bm (bmori2) reference genome and the normalised (total signal of 1010) RNA-seq samples. Two separate track data hubs were generated. One for our RNA-seq data [27, 28] for V-0 and V-6 of the 5th instar at (http://epigenomics.fleming.gr/tracks/hs_trackhubs/ekpa_dedos_2/hub.txt) and another for the RNA-seq data generated by another research team [29, 30] (DDBJ Sequence Read Archive Accession number: DRA002282) which compares transcript abundance between Bm PGs and Br-CC-CA complexes at the onset of wandering stage at (http://epigenomics.fleming.gr/tracks/hs_trackhubs/ekpa_dedos_1/hub.txt). Each link must be pasted in the My Hubs tab of the UCSC Genome Browser application.
Data assembly and bioinformatic analysis
Τo identify candidate coding regions within transcript sequences as predicted by Cufflinks analysis, we used the TransDecoder algorithm  (https://github.com/TransDecoder/TransDecoder/wiki). Specifically, we used TransDecoder utilities to convert the assembled cufflinks transcriptome file to a FASTA file containing the corresponding cDNAs and input this file to TransDecoder with standard parameters. Subsequently, we applied standard Blast of the TransDecoder identified longest ORFs output against the Uniprot database (with parameters -max_target_seqs 1 -evalue 1e-5) and against the Trembl database (with parameters -max_target_seqs 1 -evalue 1e-6). We also looked for domain homologies against the Pfam database using again TransDecoder facilities. The final protein coding potential was assessed using the prediction algorithm of TransDecoder and the results of the previous steps. To produce a comprehensive protein dataset that was used to assign orthologues, the TransDecoder output was subjected to Blastp analysis on NCBI against the Bm annotated genes (annotation 102) https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Bombyx_mori/102/. Results were filtered to retain the best hit and % identity for each protein sequence. Then, the Cufflinks, TransDecoder  (https://github.com/TransDecoder/TransDecoder/wiki), PANDORA [27, 28] and Blastp analysis files were integrated into a single file that was used to further analyse our results. In addition, we re-analysed, as described above, and integrated in our final analysis the raw files from RNA-seq data that compared transcript abundance between Bm PG cells and Br-CC-CA complexes at the wandering stage of the 5th instar [29, 30]. Once tabulated this dataset was used to define as PG-specific genes that had: 1) meta p-value < 0.05, and 2) meta FDR value < 0.05, and 3) expression ratio PG/Br > 1, and 4) Br RPGM <median value (0.4902) and 5) PG RPGM value > 0.0259 (i.e. the value found to be the lowest for PG expression by qPCR (see below), and 6) Cufflink values above those found for expressed genes (see below). The final list contained 272 PG-specific genes (see Additional file 11: Table S10).
With these four datasets, we could define gene expression patterns in the wandering stage of both insect species and also identify the genes that are predominantly, although not exclusively, expressed in PG cells. To determine the tissue specificity of genes that are up-regulated and down-regulated by prothoracicotropic hormone stimulation of PG cells, we analysed and annotated RNA-seq data previously described . All these datasets were integrated with phenotypic data from Dm RG-specific RNAi experiments [22, 32] coupled with assignment of gene orthologues between the two species using OrthoDB , g:Orth  and OrthoMCL .
KEGG pathway maps (http://www.genome.ad.jp/kegg/pathway.html) where available for Bm [109, 110] or Dm  were used to identify proteins that are components of signalling pathways and examine their presence in the various datasets. For AC encoding genes present in Bm genome, Dm AC protein sequences  were searched by Blastp against the Bm RefSeq annotated proteins (annotation 102). Hits were downloaded, manually curated to identify partial sequences, alternatively spliced variants and duplicate entries and then validated against the Bm genome database (http://sgp.dna.affrc.go.jp/KAIKObase/) and annotated to their corresponding gene(s). Closely related Bm proteins were also retrieved using a combination of literature searches and available databases such as the UniprotKB (http://www.uniprot.org), the SilkDB database (http://metazoa.ensembl.org/Bombyx_mori/Info/Index) and tBLASTN queries against the KAIKOBLAST server (http://sgp.dna.affrc.go.jp/KAIKObase/) to identify and exclude any possibility that a protein of interest was left undetected. Selected hits were then searched using the profile hidden Markov model on HMMER (http://www.ebi.ac.uk/Tools/hmmer/) to identify the Pfam-A domains present on each protein sequence. Additional domain analysis of the retrieved protein sequences was carried out in PROSITE (http://au.expasy.org/prosite/) and InterPro (http://www.ebi.ac.uk/interpro/). Signal peptide and transmembrane domains were analysed by SignalP4.1 (http://www.cbs.dtu.dk/services/SignalP/) and TMHMM server v.2.0 (http://www.cbs.dtu.dk/services/TMHMM/), respectively. Exactly the same process, as the one described above, was followed for 3′, 5′ cyclic nucleotide phosphodiesterase encoding genes (hereafter referred to as phosphodiesterases(PDEs)) present in Bm genome using Dm PDE protein sequences , for protein kinase C (PKC) encoding genes present in Bm genome using Dm PKC protein sequences , and for phospholipase C encoding genes present in Bm genome using Dm PLC protein sequences including the phospholipase C X-domain containing protein sequences . For cytochrome P450 (CYP) analysis, we used sequences available from the cytochrome P450 homepage  (http://drnelson.uthsc.edu/silkworm.pub.htm) as well as results presented in previous studies [70, 71]. By combining the available sequences from these sources and validating them against the GenBank database, we conclude that there are 82 cytochrome P450 genes in the Bm genome. Our results largely agree with previous studies [70, 71] apart from the following cases: CYP6AE3P  protein sequence is identical to CYP6AE21  and CYP6AE2  protein sequence is identical to CYP6AE8 . The full length sequences of proteins encoded by CYP333un1, CYP340A5P and CYP340un1, that were previously reported to be pseudogenes , were recovered (see Additional file 20: Table S17) and these genes are not predicted to be pseudogenes. CYP340A1-de9b  is part of a misplaced genomic scaffold and CYP4G45  is probably an erroneous entry as suggested in (http://drnelson.uthsc.edu/silkworm.pub.htm) . The 82 annotated protein sequences were searched using HMMER3 (http://hmmer.janelia.org/) with the default parameters. As described before , the insect CYP genes code for proteins of approximately 500 amino acids each having five conserved motifs: the helix C motif (WxxxR), the helix I (oxygen binding) motif (Gx[E/D]T[T/S]), the helix K motif (ExLR), the PERF motif (PxxFxP[E/D]RF) and the heme-binding motif (PFxxGxRxCx[G/A]) all found in the sequences we annotated.
To annotate the secondary structure of selected domains shown in figures, the putative three-dimensional structure of the protein indicated in figure legends was predicted at the automated comparative protein modelling server (Swiss-Model) (https://www.swissmodel.expasy.org/) with the optimized mode using the coordinates of the best model for each protein group available from the Brookhaven Protein Database (BPD) .
For Gene Ontology (GO) term enrichment analysis, the Gene Ontology Consortium term description and hierarchical ranking (http://amigo2.berkeleybop.org/amigo/landing) was adopted. We used g:Cocoa  to identify clusters of biological processes and molecular functions that were highly represented (p-value) in our datasets with Bombyx mori as the species and default settings.
Sequence alignments and phylogenetic analysis
Protein sequences were assembled from GenBank or Flybase (see above) and aligned using the software MSAProbs  v0.9.7 with default parameters. In each of the five protein datasets we examined, phylogenetic relationships were inferred from maximum likelihood and Bayesian inference. We used Prottest v3.4.2  to identify the appropriate phylogenetic model for each of the 5 different alignments. According to Bayesian Information Criterion (BIC) framework, the best fit model for ACs, was LG + I + G + F, which uses a general amino acid replacement matrix  with a proportion of invariable sites (+I) , a gamma distribution for modelling the rate heterogeneity (+G) , and empirical amino acid frequencies (+F) . Likewise, the best fit model for PDEs was VT + I + G + F which uses a general amino acid replacement matrix , the best fit model for PKCs was VT + I + G, the best fit model for PLCs was LG + I + G and the best fit model for CYPs was LG + I + G + F. MrBayes 3.2.6  was used for Bayesian inference tree calculation on 2 parallel sets of 4 chains (3 hot and 1 cold) run for 100,000 generations until convergence was reached (‘average standard deviation of split frequencies’ < 0.01). Temperature heating parameter was set to 0.2 (temp = 0.2) and burn-in was set to 25% (burninfrac = 0.25). The average standard deviation of split frequencies was 0.000000 for the analysis of ACs, PDEs, PKCs and PLCs and 0.007888 for CYPs. Maximum likelihood trees were produced with RAxML 8.2.3 using the LG4X PROTGAMMA model  and 1000 bootstrap replicates . All analyses were run on the CIPRES server at the San Diego Supercomputer Center (SDSC). Trees were visualized and annotated in figtree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/).
Total RNA isolation and cDNA synthesis from prothoracic glands
Prothoracic gland isolation was carried out as described above, glands were meticulously cleared of any associated tissue or debris and total RNA was isolated immediately with TRIzol (Invitrogen) according to manufacturer’s instructions (n = 7 for each day of the investigated developmental period). Integrity of total RNA from each sample was determined using gel electrophoresis and RNA quality was determined by measuring the absorbance at 260 and 280 nm (A260/280 of all samples > 1.9). First strand cDNA was synthesized from 2 μg total RNA with 200 U Superscript®III reverse transcriptase (Invitrogen) in 20 μl reaction volumes using an oligo20 primer (Invitrogen) according to manufacturer’s instructions. The resulting cDNA was diluted with nuclease-free water (Thermo Fisher Scientific) before use in quantitative PCR.
Quantitative PCR analysis of Bombyx mori genes
Quantitative real-time PCR was carried out with the SYBR® Green dye in 96-well PCR micro plates (Applied Biosystems) on a 7500 Real-Time PCR System (Applied Biosystems). Fluorescence emission of the products and subsequent calculations were carried out with the Sequence Detection System software v2.0.6 (Applied Biosystems). The reaction mixture (10 μl total volume per well) included 20 ng cDNA, 0.8 μl nuclease-free water (Thermo Fisher Scientific), 5 μl Kapa SYBR® Fast Universal 2X qPCR Master Mix (Kapa Biosystems), 0.2 μl of 50× Rox Low passive reference dye (Kapa Biosystems) and primers at a final concentration of 200 nmol/l. Reactions (n = 3–7) to amplify 230–270 bp amplicons were performed under the following conditions: 95 °C for 3 min as an initial step followed by 40 cycles of 95 °C for 15 s and 60 °C for 60 s. After amplification, dissociation curves were produced (60 °C to 95 °C at a heating rate of 0.1 °C/sec and acquiring fluorescence data every 0.3 °C) to discriminate the main reaction products from other nonspecific ones or primer-dimers and PCR products were subjected to electrophoresis on 2% w/v agarose gels to corroborate the presence of a unique amplicon (see Additional file 25: Table S19 for detailed description). Each qPCR run always included a no-cDNA template control and reverse transcription negative controls. For each of the replicates (n = 7) genomic DNA as a qPCR template was also analysed. All the aforementioned negative controls gave no detectable quantification cycle (Cq) value, proving the lack of any contamination or nonspecific signal. Absolute quantification analysis via a standard curve approach was utilized to calculate the transcript levels of 650 genes (see legend of Additional file 7: Table S7). Serial dilutions (4 to 4 × 106 copies/reaction) of known concentration of Bombyx mori Torso (BmTorso) in plasmid pBRAcPA  (a generous gift from Dr. Michael O’Connor, University of Minnesota, USA), linearised with NheI, was used in each qPCR run to construct the standard curve(s). Concentration of the standard solutions of the plasmid was determined by spectrophotometry and was converted to copy numbers per microliter using the following formula: Copy number/μl = 6.023 × 1023 copies/mol × DNA concentration (g/μl)/molecular weight (g/mol). Standard curves were constructed by plotting the threshold cycle (Cq; ΔRn = 0.25) values against the initial copy number of BmTorso containing plasmid. Copy numbers of transcripts in the samples were calculated by interpolating the Cq value of the sample within the generated standard curve(s). We conducted validation experiments to test the requirements for applying the above methodology. We estimated reaction efficiency (E) with the formula, E% = [− 1 + 10(− 1/slope)] × 100 using dilution series of sample cDNA, incorporating several orders of magnitude (100–0.1 ng) and Cq values were plotted against log10[cDNA quantity]. Primers used for qPCR are listed in Additional file 26: Table S20 and the MIQE guidelines-adopted  complete protocols of the qPCR assays are described in Additional file 25: Table S19 [27, 28]. All primers were designed with an online tool (http://primer3plus.com/cgi-bin/dev/primer3plus.cgi) using the following custom settings: 1) amplicons size range 230–270 bp, 2) primer size minimum 18 bases, optimum 20 bases, maximum 23 bases, 3) primer Tm minimum 59 °C, optimum 60 °C, maximum 62 °C, maximum Tm difference 1 °C, 4) Primer GC% minimum 30, optimal 50, maximum 80, and the other settings were left to default values. Among the various combinations of primers returned by the online tool, the pair that combined the following criteria was selected: 1) minimal penalty value, 2) amplicon at the 3′ end of the cDNA sequence, 3) unique hit in NCBI Primer-Blast (http://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi?LINK_LOC=BlastHome) webtool using the RefseqmRNA database and Bombyx mori as the species, 4) unique hit in the KAIKObase website using Blastn.
Research reported in this publication was supported from funds by the National and Kapodistrian University of Athens under award number Kapodistrias:11240 and by an IKY Fellowship of Excellence for postgraduate studies in Greece-Siemens program awarded to Alexandros Alexandratos.
Availability of data and materials
All data generated or analysed during this study are included in this published article and its supplementary information files. Data generated by other research groups [4, 29, 30, 31, 32] and analysed in this study is publicly available data. The raw reads of Bombyx mori PGs transcriptome data [27, 28] are available from the NCBI Short Read Archive (SRA, http://www.ncbi.nlm.nih.gov/sra/), under the accession number SRP062258. The raw reads of Bombyx mori PGs and Br-CC-CA complexes transcriptome data [29, 30] are available from the NCBI Short Read Archive (SRA, http://www.ncbi.nlm.nih.gov/sra/), under the accession number DRP002346. Data for all microarray and RNA-seq experiments from Drosophila melanogaster and Bombyx mori, respectively, as reported in , are available from Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/), under the accession number GSE80485. Raw sequence reads and processed data files, as reported in , including the table of FPKM estimates output by Cuffdiff, are available from the National Center for Biotechnology Gene Expression Omnibus (GEO), under the accession number GSE76304.
PM, AA, IN and SGD conceived and designed the experiments. PM, AA, IN and SGD performed the experiments. PM and SGD provided software and resources. PM, AA and SGD analysed the data. PM, AA and SGD wrote the paper. All authors have read and approved the final manuscript.
Ethics approval and consent to participate
There are no specific guidelines and permits required for the use of silkworms in scientific experiments in Greece. The research abides by the Presidential Decree No 160/1991, which implements the EEC Directive 86/609/EEC, and provides the guidelines for protection of animals used for experimental and other scientific purposes, the Presidential Decree No 1197/1981 that allows experiments on animals for scientific purposes to be conducted by biologists and recommendations 2007/526 EEC.
Consent for publication
Dr. Panagiotis Moulos is a co-founder of HybridStat Predictive Analytics G.P. and a member of its scientific advisory board.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 2.Lafont R, Dauphin-Villemant C, Warren JT, Rees HH. Ecdysteroid chemistry and biochemistry. In: Reference Module in Life Sciences: Elsevier; 2017.Google Scholar
- 13.Rone MB, Midzak AS, Issop L, Rammouz G, Jagannathan S, Fan J, Ye X, Blonder J, Veenstra T, Papadopoulos V. Identification of a dynamic mitochondrial protein complex driving cholesterol import, trafficking, and metabolism to steroid hormones. Mol Endocrinol. 2012;26(11):1868–82.PubMedPubMedCentralCrossRefGoogle Scholar
- 15.Poderoso C, Converso DP, Maloberti P, Duarte A, Neuman I, Galli S, Maciel FC, Paz C, Carreras MC, Poderoso JJ, et al. A mitochondrial kinase complex is essential to mediate an ERK1/2-dependent phosphorylation of a key regulatory protein in steroid biosynthesis. PLoS One. 2008;3(1):e1443.PubMedPubMedCentralCrossRefGoogle Scholar
- 18.Yoshiyama-Yanagawa T, Enya S, Shimada-Niwa Y, Yaguchi S, Haramoto Y, Matsuya T, Shiomi K, Sasakura Y, Takahashi S, Asashima M, et al. The conserved Rieske oxygenase DAF-36/Neverland is a novel cholesterol-metabolizing enzyme. J Biol Chem. 2011;286(29):25756–62.PubMedPubMedCentralCrossRefGoogle Scholar
- 31.Christesen D, Yang YT, Somers J, Robin C, Sztal T, Batterham P, Perry T. Transcriptome analysis of Drosophila melanogaster third instar larval ring glands points to novel functions and uncovers a cytochrome p450 required for development. G3 Genes Genomes Genetics. 2017;7(2):467–79.PubMedGoogle Scholar
- 32.Danielsen ET, Moeller Morten E, Yamanaka N, Ou Q, Laursen Janne M, Soenderholm C, Zhuo R, Phelps B, Tang K, Zeng J, et al. A Drosophila genome-wide screen identifies regulators of steroid hormone production and developmental timing. Dev Cell. 2016;37(6):558–70.PubMedPubMedCentralCrossRefGoogle Scholar
- 37.Figueroa-Montiel A, Ramos MA, Mares RE, Dueñas S, Pimienta G, Ortiz E, Possani LD, Licea-Navarro AF. In silico identification of protein disulfide isomerase gene families in the De novo assembled transcriptomes of four different species of the genus conus. PLoS One. 2016;11(2):e0148390.PubMedPubMedCentralCrossRefGoogle Scholar
- 44.Niwa R, Namiki T, Ito K, Shimada-Niwa Y, Kiuchi M, Kawaoka S, Kayukawa T, Banno Y, Fujimoto Y, Shigenobu S, et al. Non-molting glossy/shroud encodes a short-chain dehydrogenase/reductase that functions in the ‘black box’ of the ecdysteroid biosynthesis pathway. Development. 2010;137(12):1991–9.PubMedCrossRefGoogle Scholar
- 58.Smith W, Rybczynski R. 1 - Prothoracicotropic Hormone. In: Gilbert LI, editor. Insect Endocrinology. San Diego: Academic Press; 2012. p. 1–62.Google Scholar
- 63.Pu Y, Peach ML, Garfield SH, Wincovitch S, Marquez VE, Blumberg PM. Effects on ligand interaction and membrane translocation of the positively charged arginine residues situated along the C1 domain binding cleft in the atypical protein kinase C isoforms. J Biol Chem. 2006;281(44):33773–88.PubMedCrossRefGoogle Scholar
- 64.Kazanietz MG, Bustelo XR, Barbacid M, Kolch W, Mischak H, Wong G, Pettit GR, Bruns JD, Blumberg PM. Zinc finger domains and phorbol ester pharmacophore. Analysis of binding to mutated form of protein kinase C zeta and the vav and c-raf proto-oncogene products. J Biol Chem. 1994;269(15):11590–4.PubMedGoogle Scholar
- 73.Niwa R, Sakudoh T, Matsuya T, Namiki T, Kasai S, Tomita T, Kataoka H. Expressions of the cytochrome P450 monooxygenase gene Cyp4g1 and its homolog in the prothoracic glands of the fruit fly Drosophila melanogaster (Diptera: Drosophilidae) and the silkworm Bombyx mori (Lepidoptera: Bombycidae). Appl Entomol Zool. 2011;46(4):533.CrossRefGoogle Scholar
- 92.Komura-Kawa T, Hirota K, Shimada-Niwa Y, Yamauchi R, Shimell M, Shinoda T, Fukamizu A, O’Connor MB, Niwa R. The Drosophila zinc finger transcription factor Ouija board controls Ecdysteroid biosynthesis through specific regulation of spookier. PLoS Genet. 2015;11(12):e1005712.PubMedPubMedCentralCrossRefGoogle Scholar
- 104.Di Y, Schafer Daniel W, Cumbie Jason S, Chang Jeff H. The NBP negative binomial model for assessing differential gene expression from RNA-Seq. In: Statistical Applications in Genetics and Molecular Biology, vol. 10; 2011.Google Scholar
- 107.Zdobnov EM, Tegenfeldt F, Kuznetsov D, Waterhouse RM, FA S√£o, Ioannidis P, Seppey M, Loetscher A, Kriventseva EV. OrthoDB v9.1: cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs. Nucleic Acids Res. 2017;45(D1):D744–9.PubMedCrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.