Background

The marine diatom Pseudo-nitzschia multiseries Hasle (Hasle) (Ps-n) produces the neurotoxin domoic acid (DA), which causes amnesic shellfish poisoning (ASP)[14]. DA is a neuroexcitatory, water-soluble amino acid that exhibits structural similarity with the neurotransmitter glutamate[5]. DA binds with high affinity to glutamate receptors, leading to excitation and ultimately cell death of neurons exposed to this toxin[6]. Production of DA by Ps-n, and at least 14 other members of the genus Pseudo-nitzschia, has been verified in oceanic regions throughout the world, primarily in coastal and upwelling zones[7, 8]. The documented effects of DA on humans, birds, finfish, cephalopods, and marine mammals, and the economic costs of shellfishery closures due to DA contamination, has generated ongoing interest in understanding the regulation and control of DA production in this genus[7, 911]. Yet, the biosynthetic pathways leading to DA production and the genes that govern these pathways remain unresolved[12, 13].

Numerous studies on Ps-n growth dynamics have shown that DA production does not begin until early stationary phase, i.e. toxin is not typically produced in detectable amounts during the exponential growth phase (reviewed in[9]). In other studies that exposed Ps-n to conditions that slowed cell division during the mid-exponential phase, cells produced low levels of toxin. Therefore, toxin production appears to be associated with stages in the cell cycle when cell division has slowed or stopped due to some limiting nutrient factor, most notably silicon (Si)[10, 14]. In addition, several bacterial isolates have been shown to enhance DA production by Ps-n[1517]. Ps-n can produce DA in axenic cultures[2, 18], yet, reintroduction of bacteria to axenic cultures results in increased Ps-n DA production[1517].

In this study, we developed a Ps-n cDNA library and used it to construct a microarray in order to screen for genes that were differentially expressed under high-toxin-producing versus low-toxin-producing conditions. A total of 5,265 Ps-n cDNAs were printed in replicate, and mRNAs from cells that were in late-exponential growth phase were compared to those that were in stationary phase in both axenic and non-axenic cultures. Using these array data, we identified candidate reference and target genes for further study. Eleven reference genes were evaluated for stability in reverse-transcription quantitative PCR (RT-qPCR) analyses of Ps-n mRNA from Si-limited cultures. We performed a GeNorm analysis to validate transcripts that did not vary across conditions. Using the validated reference transcripts, we then confirmed the differential regulation of several transcripts whose expression correlates with DA production. These findings will facilitate future work aimed at elucidating the DA biosynthesis pathway and identifying transcriptional biomarkers indicative of DA production.

Results

Pseudo-nitzschia growth and toxin production for microarray studies

Samples for microarray analysis were obtained from three biological experiments using Ps-n strain CL-125. These trials included one axenic and two non-axenic cultures, all grown in standard medium f/2. DA production began at the onset of stationary phase and continued to increase over time in all three experiments (Figure 1). Final DA concentrations, expressed on a per mL basis, were ~30 times lower in the axenic growth experiment compared to the non-axenic growth experiments, as expected based on previous studies[2, 1518]. Previous studies also indicated that Si is the limiting nutrient for Ps-n cells grown in batch cultures with medium f/2[9, 10, 14]; therefore, we presume that the cells in these experiments were Si-limited during stationary phase. Samples were harvested for microarray analysis during the late-exponential and stationary phases to compare gene expression between low-toxin-producing vs. high-toxin-producing cells. These time points are indicated by arrows in Figure 1a,1b.

Figure 1
figure 1

Change in cell number and DA production as a consequence of growth under non-axenic and axenic conditions. a) Ps-n strain CL-125, Non-axenic culture experiments 1 (solid) and 2 (open). b) Ps-n strain CL-125, Axenic culture experiment. Cells were harvested for RNA extraction on the days indicated by arrows. c) Increase in cell number (squares) and in DA concentration (circles) of Ps-n strain GGB1 in non-axenic, triplicate cultures. Cells were harvested for RNA extraction on days 3–10. d) Nutrient concentrations over time in GGB1 cultures (nitrite/nitrate, phosphate, silicate). Data for (c) and (d) represent the mean change of triplicate samples (± 1 SD).

Identification and validation of reference transcripts

Our initial goal was the identification of transcripts whose expression levels were stable between late-exponential and stationary phases, which could then be used for normalization of other transcripts’ expression levels under these conditions. We selected eleven candidate reference genes to evaluate in RT-qPCR studies based on their stability in the microarray results as well as their biological roles and use as controls in previous studies (Table 1; Additional file1). These included transcripts encoding: dynein, histone H3, cyclophilin, ubiquitin, elongation factor 1 alpha (EF-1α), phosphoglycerate kinase 1 (PGK), eukaryotic initiation factor 2 (eIF-2), a JmjC-domain containing protein (JmjC), an AAA-domain containing ATPase, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), and 18s rRNA. RT-qPCR primer sets for each candidate reference gene were designed and tested, and exhibited high sequence specificity and PCR efficiency under our assay conditions with an annealing temperature of 60°C (Table 2).

Table 1 Expression data for candidate reference genes from Pseudo-nitzschia multiseries ( Ps-n ) cDNA microarray analysis a
Table 2 RT-qPCR reference gene primer sequences and characteristics for all candidate reference genes

To validate the stability of candidate reference genes, biological triplicates of Ps-n strain GGB1 were grown under non-axenic, Si-limited conditions. RNA was harvested at multiple time points during the late-exponential and stationary phases (Figure 1c). The initiation of DA production again corresponded with the onset of stationary phase, which in this study was on day four. Initial silicate concentrations were reduced to 37.2 μM in the culture medium (vs. 107 μM in the standard f/2 medium). The measured silicate concentration was below 1.0 μM by day four (Figure 1d), corresponding with the entry into stationary phase. Nitrate and phosphate appear to be present in sufficient quantities throughout the experiment (Figure 1d)[9, 10, 14], further supporting that growth of these cultures was Si-limited.

In an initial GeNorm analysis, all eleven candidate genes were tested for stability across a subset of samples from the Si-limitation experiment, including replicates from days 3, 4, and 10 (Figure 2a). The four genes with the best stability values (M-value), i.e. JmjC (0.33), dynein (0.33), histone H3 (0.37) and cyclophilin (0.39), along with EF-1α (0.62), were further evaluated for stability using GeNorm analysis across the complete set of expression data for the Si-limitation experiment (Figure 2b). All five genes showed acceptable stability (M-value <0.5) when evaluated across the complete set of data. GeNorm pairwise-variation analysis determined that only two genes (JmjC, dynein) were necessary for subsequent normalization. However, since the JmjC, dynein, and histone H3 genes had equivalent M-values and were matched for 1st rank, we used all three for normalization of the expression data as described below. The expression profiles of the reference genes show the stability of these top-ranked genes, and the slight variability of the EF-1α and cyclophilin genes (Figure 3).

Figure 2
figure 2

Average expression stability (M-value) of the reference genes determined by GeNorm analysis. An individual reference gene is tested against the other reference genes in a pairwise variation that serially excludes the least stable genes from the analysis. The most stable reference genes exhibit the lowest M-values. The accepted cut-off for stability of reference genes is an M-value of 0.50. a) In the initial analysis, which tested a subset of mRNA samples under Si-limited conditions for all eleven of our potential reference genes, four reference genes were determined to be acceptably stable. b) In the analysis of the Si-limited growth experiment, these four reference genes and EF1-alpha were tested. All five showed acceptable stability across the mRNAs, and JmjC, Dynein, and Histone H3 were tied for the 1st rank.

Figure 3
figure 3

RT-qPCR analysis of candidate reference genes for normalization from Pseudo-nitzschia multiseries . Bars represent the mean change (± 1 SD) in expression relative to Day 3. Open bars represent measurements using primers that spanned an exon-exon junction; grey bars represent measurements using standard primers that did not span an exon-exon junction. Means were not significantly different (p < 0.05), except EF-1α. Means with different letters were significantly different (p < 0.05). Statistical analyses were performed using a general linear model ANOVA with Bonferroni post-hoc test, 95% confidence intervals.

Identification and verification of differentially expressed transcripts

In the microarray study, only those transcripts that were up- or down-regulated in all three trials were considered further (Tables 3 and4; Additional file1). Higher transcript levels in stationary (high-toxin-producing) as compared to late-exponential (low-toxin-producing) phase were observed for 12 transcripts, corresponding to 76 cDNA clones printed on the Ps-n array (Table 3; Additional file1); reduced transcript levels under these conditions were observed for six genes, corresponding to 17 cDNA clones printed on the array (Table 4; Additional file1). In addition to those genes identified based on the overall false discovery rate (FDR), we also included genes predicted to encode glutamate dehydrogenase and fucoxanthin-chlorophyll a-c binding protein (FCP) in our results, as both were of interest in this study, and the local false discovery rates (LFDR) indicated valid changes in gene expression. Fold-change differences were consistently lower in the axenic growth experiment; however, the patterns of expression were comparable across the three growth experiments.

Table 3 Transcripts at higher levels in stationary (high-toxin-producing) as compared to late-exponential (low-toxin-producing) phase in Pseudo-nitzschia multiseries ( Ps-n ) as determined by cDNA microarray analysis
Table 4 Transcripts at lower levels in stationary (high-toxin-producing) as compared to late-exponential (low-toxin-producing) phase in Pseudo-nitzschia multiseries ( Ps-n ) as determined by cDNA microarray analysis

Eight genes were selected for further study using RT-qPCR (Table 5). Of these, six genes had higher transcript levels during the stationary phase (Figure 4a), and were predicted to encode a cycloisomerase, an SLC6 transporter, an aldo-keto reductase, glutamate dehydrogenase, phosphoenolpyruvate carboxykinase (PEPCK), and a small heat shock protein. The cycloisomerase, SLC6 transporter, and aldo-keto reductase genes all had a statistically significant step-wise increase in transcript abundance from days 3 to 5 that correlated with the gradual increase in DA production (Figures 1c, and4a). The other up-regulated genes showed similar trends during this time period. FCP showed decreased transcript levels during the transition from exponential to stationary phase; the phosphofructokinase (PFK) transcript levels, which were down-regulated in the microarray experiment, were not statistically different as measured by RT-qPCR (Figure 4b). Of note, the absence of DNA contamination in these studies is shown visually by the parallel results from amplification of cDNA using both standard and exon-exon spanning primer sets for the cyclophilin, SLC6 transporter, and PFK genes (Figures 3 and4). Potential connections between these transcripts and DA production are discussed below.

Table 5 RT-qPCR target gene primer sequences and characteristics
Figure 4
figure 4

RT-qPCR analysis of Pseudo-nitzschia multiseries genes whose expression was up-regulated (a) or down-regulated (b) in microarray analysis. Bars represent the mean change (± 1 SD) in expression relative to Day 3. Open bars represent measurements using primers that spanned an exon-exon junction; grey bars represent measurements using standard primers that did not span an exon-exon junction. Means with different letters were significantly different (p < 0.05). Statistical analyses were performed using a general linear model ANOVA with Bonferroni post-hoc test, 95% confidence intervals.

Discussion

Our data support the validity of the reference genes, JmjC, dynein, and histone H3, as suitable controls for normalization of Ps-n mRNAs under conditions of Si-limitation, as cells transition from late-exponential to stationary phase (i.e. from minimal to high DA production). Multiple reference genes are typically more effective for accurate normalization[19, 20]; therefore, we recommend the use of the geometric mean of these three reference genes for normalization of Ps-n RT-qPCR expression data under these conditions. The stability of a histone H3 gene, whose expression often varies with growth phase in other species, may be attributed to regulation at the level of translation vs. transcription[21]. Alternatively, the presence of four histone H3 homologs in the Ps-n genome[22], as revealed by BLAST analysis, presents the possibility that we have identified a replication-independent family member, as shown in other studies[2327]. Of note, our microarray data indicate that the standard “housekeeping” genes, GAPDH and actin, may not be suitable reference genes in this experimental system as their transcript levels varied (Table 1, Additional file1). We tested GAPDH for stability in the RT-qPCR GeNorm analysis, and it showed the least stable M-value of the candidate reference genes tested. Similar results were observed in the diatom Phaeodactylum tricornutum[28]. These results highlight the importance of validating reference genes prior to use for normalization. The primer sets provided in Table 2 should prove useful for examining changes in Ps-n gene expression as populations transition from late-exponential to stationary phase, when DA production is stimulated, and for testing these candidate genes for use as controls under other growth conditions.

The overall correlation between the array and RT-qPCR data in these studies offers proof of concept for the reliability of these expression data. Also, the differential expression of these genes was evidenced in two different Ps-n strains, CL-125 and GGB1, which were isolated from Atlantic and Pacific coastal regions, respectively. The down-regulation of an FCP gene during stationary phase, when photosynthesis and chlorophyll presumably decline e.g.[29], also supports the validity of these results. The down-regulation of FCP gene expression has been correlated with stationary phase and decreases in light-harvesting pigments in the related pennate diatom Phaeodactylum tricornutum[30], as well as other marine algae[31, 32]. The pathways leading to chlorophyll and DA production are both predicted to draw on a pool of glutamate[12, 33], so the down-regulation of FCP in Ps-n correlates well with the onset of DA production.

Our results provide a framework to further investigate the control of toxin production in Ps-n. Previous 13C- and 14C-labeling studies suggested a model involving condensation of an activated glutamate intermediate derived from the citric acid cycle with an isoprenoid intermediate, and subsequent cyclization as a mechanism to generate DA[12]. These studies allow us to generate hypotheses regarding the biological function of the genes identified in our study relative to DA production. For example, up-regulation of a gene encoding a putative Ps-n cycloisomerase is intriguing as its product may be directly involved in the proposed cyclization step leading to the pyrrolidine ring in DA. Alternatively, the cycloisomerase, similar to other enzymes in the related pfam20282 group, may be involved in converting aromatic compounds into citric acid cycle intermediates, proposed to feed the pathway leading to DA synthesis[3438]. The identification of a differentially expressed transcript encoding a member of the SLC6 amino acid transporter family is also interesting. The translated Ps-n open reading frame aligns most closely with characterized γ-aminobutyric (GABA) neurotransmitter transporters[39], suggesting the hypothesis that the Ps-n transporter is involved in movement of DA, or a synthetic precursor, into or out of cells.

Our discovery of the up-regulation of a predicted cycloisomerase belonging to the lactonase/lactonizing family, as well as the SLC6 transporter, entertains the speculation that these gene products are involved in communication between Ps-n cells, or Ps-n and bacteria. The parallels with GABA in plant signaling pathways[4042] pose a potential role for DA in Ps-n biology, which has not yet been defined[7, 9]. For example, GABA produced by wounded plant tissues appears to control the lactone quorum-sensing signal in Agrobacterium tumefaciens by regulating the A. tumefaciens lactonase gene[41]. Bacterial production of lactones in Ps-n cultures is correlated with increased DA production[16, 17, 43], suggesting a possible relationship between DA and quorum sensing[44]. Characterization of the predicted cycloisomerase’s enzymatic properties will be of significant interest in relation to these hypotheses. Similarly, demonstration that the SLC6 transporter is involved in movement of DA into or out of the cell would be a valuable contribution to understanding the role of DA in Ps-n biology. While we have taken the perspective that DA may function in signaling pathways, including quorum sensing or pheromone communication[7, 8, 42], some studies suggest that DA may function as a chelating agent[4547]. Hence, studying the transport of DA into and out of Ps-n cells directly would contribute to describing the role(s) of DA in Ps-n biology. A family of four SLC6 transporters was identified in the recently released Ps-n draft genome[22, 39], so characterization of this entire family should advance our understanding of Ps-n biology.

Many of the differentially expressed genes in this study relate to general metabolic pathways. Therefore, further investigation is needed to resolve the role of these genes in relation to both the growth state and DA synthesis in Ps-n. For example, the up-regulation of PEPCK, as well as the potential down-regulation of PFK, suggests a change in energy metabolizing pathways as Ps-n cells transition from exponential to stationary phase, consistent with a shift to gluconeogenesis and carbon metabolism through the citric acid cycle[48, 49]. Similarly, glutamate dehydrogenase, which catalyzes the reversible conversion of glutamate and the citric acid cycle intermediate α-ketoglutarate, is a key enzyme involved in nitrogen and energy metabolism[50, 51]. In addition, the differential expression of a predicted acyl-CoA synthetase (Table 3) suggests the possibility that lipids and fatty acids are being broken down, and while this may be a physiological response to growth-limiting conditions, the products could then be channeled as precursors into DA synthesis. Previous studies have shown that Ps-n lipid content decreases in response to Si deficiency during stationary phase[9, 52]. Acyl-CoA synthetases are also involved in amino acid acylation, so could be directly involved in the condensation of the glutamate and isoprenoid-like moieties[13, 53, 54].

A small heat shock protein gene was most highly up-regulated later in the stationary phase as determined by RT-qPCR, suggestive of its expression relative to physiological stress. The aldo-keto reductase transcript levels showed a step-wise progression from the exponential into the stationary phase, with the highest expression levels later in stationary phase, as well. The expression patterns of these genes may be useful for monitoring the physiological state of Ps-n cells. The aldo-keto reductase may also have a functional role in DA synthesis, as the labeling studies indicated that the C7’ in DA is selectively oxidized to a carboxyl group[12]. Several of the genes that were identified as being up-regulated in this study have not been previously characterized from diatoms and represent potential targets for further studies of DA synthesis.

The enhancement of DA production by co-existing microbes is a complex and fascinating aspect of DA biology[7]. A limited number of genes in our study indicated significantly different expression patterns between the non-axenic vs. axenic growth experiments. For example, a subtilisin-like gene, predicted to encode a secreted protease, was up-regulated in the non-axenic cultures relative to the axenic culture (Additional file2). In addition, microbes may influence the metabolic pathways predicted to be involved in fatty acid production. Ramsey et al.[12] suggested that the principal pathway to the isoprenoid portion of DA is via an alternative glyceraldehyde 3-phosphate (G3P)-independent route, and it is interesting to note that glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was up-regulated only in the axenic growth experiment (Table 3), suggesting that bacteria may influence this pathway. Future studies will focus on the specific roles that co-existing microbes play in the regulation of Ps-n genes and domoic acid production.

The Ps-n genome is predicted to include 19,703 genes[22]; thus, the estimated 3,675 non-redundant transcripts monitored via this microarray represent ~20% of the genome. Future studies using RNA sequencing methods will determine if other transcripts related to those highlighted here are also differentially expressed in correlation with DA production.

Conclusions

Our study identified a number of significantly up- and down-regulated genes that provide the basis for future studies on DA production, growth state, stress, and amino acid transport in Ps-n. The identified transcripts may be particularly useful as early indicators of toxin production and the switch of Ps-n cells to an alternative growth state. The reliability of RT-qPCR data will be enhanced by use of the validated internal reference genes presented in this study. To our knowledge, this is the first identification and validation of reference genes for RT-qPCR studies in Ps-n.

Methods

Pseudo-nitzschia multiseries strains and culture conditions

Ps-n strain CL-125 was isolated by Claude Léger (Fisheries and Oceans Canada, Gulf Fisheries Centre, Moncton, New Brunswick, Canada) from a sample collected on September 23, 2000, in Mill River (a brackish water estuary), Prince Edward Island, Canada. Cultures for the microarray studies were grown in 0.2 μm-filtered, autoclaved seawater (from Woods Hole, MA) enriched with f/2 nutrients[55] and amended with 10-8 μM Se. These batch cultures were grown in 15 L of f/2 medium in 19 L borosilicate carboys, and incubated at 20°C. The irradiance was maintained at 100 μmol photons m-2 s-1, with a 14:10-h light:dark (L:D) cycle for the cDNA library cultures, and continuous light for the experimental cultures. The cultures were aerated using aquarium pumps with sterile cotton and activated carbon filters and were constantly mixed with magnetic stirrers. An axenic culture of CL-125 was obtained by antibiotic treatment for 72 h, using 1.6:0.8 mg mL-1 penicillin:streptomycin[2]. These were tested for culturable bacteria by incubation in Bacto-peptone broth (Difco Laboratories, Detroit, MI, USA; 1 g L-1 seawater) and 2216 Marine Agar (Difco) at ~20°C for at least 20 d.

Ps-n strain GGB1 was isolated by Michael Carlson and Kyle Frishkorn (University of Washington, Seattle, WA, USA) in July 2010, from Puget Sound, WA, USA. Cultures for the RT-qPCR study were grown in 0.45 μm-filtered, autoclaved seawater (from Portsmouth Harbor, Newcastle, NH, USA) enriched with f/2 nutrients[55], except that the initial Si was lowered to 37.2 μM and ferric sequestrene was replaced with Na2EDTA • 2H2O and FeCl3 • 6H2O (Provasoli-Guillard National Center for Marine Algae and Microbiota). Before inoculation of experimental cultures, cells were maintained in exponential growth for at least two preceding transfers. Triplicate GGB1 experimental cultures were grown in 2.6 L of f/2 medium in 3-L polycarbonate baffled flasks, and incubated at 15°C. The irradiance was maintained at ~100 μmol photons m-2 s-1, with a 16:8-h L:D cycle. Flasks were aerated by constant mixing supplied by magnetic stirrers.

Sampling, toxin, and nutrient analysis

The microarray study included three biological replicates: two non-axenic cultures and one axenic culture. Samples were taken every two to three days for cell counts and domoic acid (DA) analysis. Cell concentrations were estimated by averaging the number of cells enumerated by light microscopy, using a Neubauer hemacytometer chamber in triplicate counts of individual samples preserved in Lugol’s iodine. DA was analyzed in whole-culture samples (cells plus medium[1]), using HPLC of the FMOC (fluorenylmethoxycarbonyl) derivative[56]. The lower limit of detection was 15 ng mL-1 for the first non-axenic experiment, 3 ng mL-1 for the second non-axenic experiment, and 7.5 ng mL-1 for the axenic experiment. RNA was prepared from cells harvested during an initial time point from the late-exponential (low-toxin-producing) growth phase, and a final time point during the stationary (high-toxin-producing) phase (Figure 1a,1b; RNA extraction protocol outlined below).

For RT-qPCR evaluation, three non-axenic biological replicate cultures were sampled daily, from the time of inoculation until day 10 of growth, for cell counts, whole-culture DA and nutrients. Cell count samples were taken by preserving 5 mL of culture with 250 μL of formalin and stored at 4°C until cells were counted (400 cells or the entire slide) on a Sedgwick-Rafter slide. Whole-culture DA samples were taken by freezing 15 mL of culture at -20°C; samples were sonicated at 50% power on ice for 2 min and filtered through a 0.2-μm filter prior to analysis. DA samples were analyzed using the Abraxis Domoic Acid ELISA kit ([57], Warminster, PA, USA). The limit of detection was 0.06 ng mL-1. Filtered samples were stored at -80°C for nutrient analyses. Silicate was measured using the molybdate method[5860]; phosphate was measured by the ascorbic acid-molybdate method[61, 62]. Nitrite and nitrate were measured on an auto-analyzer (Lachat Instruments, Loveland, CO, USA) using a copper-cadmium reduction and colorimetric assay[61, 63, 64]. Total RNA was extracted daily from each flask beginning on day three of growth (Figures 1c,3,4; RNA extraction protocol outlined below). One RNA sample was lost on the initial day of extraction, so this resulted in two biological replicates for this time point (Day 3). The remaining RNA samples for the profile through to Day 10 included three biological replicates for each time point.

Microarray and cDNA library construction

Ps-n strain CL-125 cells from non-axenic cultures were harvested during the late-exponential through mid-stationary phases, under predominantly toxin-producing conditions. Cultures were split into 250–500 mL aliquots that were centrifuged for 15 min at 1000 g. The loose pellets were pooled, and centrifuged again briefly to remove any residual culture medium. Total RNA was extracted immediately by homogenizing the cells in TRIzol® (Invitrogen Corporation, Carlsbad, CA, USA). Insoluble material was removed by low-speed centrifugation of the samples, which increased quality and yield of the resulting total RNA. Precipitating twice with salt and ethanol also contributed to high-quality total RNA, as indicated by both 260/280 O.D. ratios and gel electrophoresis. Poly (A)+ RNA was then isolated from total RNA using a biotin-labeled oligo(dT)20-streptavidin kit (Roche Molecular Biochemicals, Indianapolis, IN, USA) following the manufacturer’s instructions.

First-strand cDNA was prepared from 5 μg poly (A)+ RNA using Superscript II (Invitrogen, Grand Island, NY, USA), NC-p7 (an RNA chaperone), and oligo pd(TZ) (an oligo-dT primer with some of the internal thymidine residues replaced with 3-nitropyrrole to minimize mispriming to internal A-rich sequences). Double-stranded cDNA was generated using RNase H, E. coli DNA polymerase I, and E. coli ligase. The ends of the cDNAs were polished with T4 DNA polymerase, and BstXI adaptors were ligated to the cDNA ends. The cDNAs were then fractionated on sucrose gradients, ligated into pMD1 (a pUC-based vector) and transformed, by electroporation, into E. coli DH10B cells[65]. Following an initial library plating, 19,200 individual colonies were picked and stored at -80°C in 15% glycerol for further analysis.

EST sequencing, assembly, and annotation

2,220 cDNA inserts were sequenced to verify the quality of the library and to begin gene discovery. Many of the cDNAs were sequenced more than once in the 5′- and 3′- direction, yielding a set of 3,533 Ps-n ESTs. These sequences are deposited in the NCBI dbEST database [GenBank accession numbers FD476666-FD480212]. (Note: Of the sequenced cDNAs, 1,889 were a subset of the 5,265 Ps-n cDNAs printed on the microarray (see below)). Sequence reactions were run on an automated DNA sequencer (ABI 3700 with dye terminators); and selected cDNAs were sequenced at ACGT, Inc. (Wheeling, IL, USA). ESTs were edited using Seqman (DNAStar, Inc., Madison, WI, USA), and ContigExpress (VectorNTI, Carlsbad, CA, USA), as well as manually edited to remove low quality data, poly (A) tails, and vector sequence. The Ps-n ESTs were assembled into consensus or contig sequences, using a criterion of 95% identity over more than 50 nucleotides (Seqman, DNAStar). Average sequence length of the individual reads was 639 bp (after editing). The ESTs were assembled into 1,550 non-redundant (NR) sequences, indicating a redundancy of ~43% within our Ps-n library. From this, we estimate that approximately 3,675 unique genes were printed on the Ps-n microarray since 5,265 Ps-n cDNAs from the Ps-n library were printed on the microarray. The ESTs, final assembled NR sequences, and annotations are provided in Additional files2,3,4 and5.

Assembled sequences were compared against NCBI’s NR and Swissprot databases using the Basic Local Alignment Search tools, blastx and tblastn, via theBlast2Go application[6668]. The Ps-n sequences from this study were also compared against the unpublished Pseudo-nitzschia multiseries CLN-47 genome sequence, Assembly v1 (October 2011), sequenced by the US Department of Energy Joint Genome Institute (JGI)[22]. The reference and target genes specifically discussed in this paper were further annotated using NCBI’s ORF Finder[69] and BlastP[66, 67], and the Center for Biological Sequence Analysis’ (Technical University of Denmark) ChloroP 1.1, TargetP 1.1, and SignalP 4.1[70], as well as searching for the conserved diatom AFAP chloroplast targeting domain[7173].

Microarray construction

cDNA preparation: 5,265 clones from the Ps-n cDNA library were grown overnight in Luria broth with carbenicillin (50 μg mL-1) at 37°C on a shaker table. Three “vector-only” and twelve Homo sapiens cDNA control clones (J. Pelletier) were also grown under these conditions. Ten microliters of bacterial culture were used in 100-μL PCR reactions with primers T7 forward (TAATACGACTCACTATAGGG) and M13 reverse (CAGGAAACAGCTATGAC), which flanked the cloning site of the pMD1 vector. PCR amplification was performed using HiFI Taq polymerase (Invitrogen Corporation). An initial DNA denaturation step at 94°C for 2 min was followed by 35 amplification cycles (0:30 melting at 94°C, 0:30 annealing at 55°C, 1:00 extension at 68°C). PCR products were purified using MultiScreen size-exclusion filter plates (Millipore, Billerica, MA, USA). The DNA was then resuspended in 100 μL of nuclease-free de-ionized water and transferred to clean plates using a mechanical pipetting station. DNA quality was verified by 1% agarose gel electrophoresis for eight samples per 96-well plate; DNA concentration was determined by PicoGreen fluorescent staining[63]. Fifty μL of each PCR product was dried by vacuum centrifugation and then resuspended in 10 μL of 1.5 M Betain /3X SSC print buffer, yielding an average final concentration of 600 ng μL-1.

Ps-n cDNA probes were printed onto CMT-GAPS slides (Corning, Corning, NY, USA), using a MicroGrid 610 TAS array printer (Biorobotics, Woburn, MA, USA) with quill pins. A total of 5,169 Ps-n cDNAs were printed in duplicate, and 96 were printed in quadruplicate; in addition, 3 “vector-only” cDNAs, 12 H. sapiens control cDNAs, and 10 control cDNAs from the SpotReport Alien Array Validation System (Stratagene, La Jolla, CA, USA) were printed in duplicate, resulting in a final chip that included 10,772 features. Spots were printed with a 32 print-tip head, producing a lay-out represented by 8 × 4 grids. Each grid was sub-divided into two sections, representing replicate spots. Individual features were 13 μm in diameter and were separated by 130 μm (from one spot to the next). 0.005 μL of ~600 ng μL-1 DNA (2–3 ng) was transferred to each spot. Final Ps-n arrays displayed a strong signal-to-noise ratio, with virtually no background, as demonstrated visually (Figure 5). Experimental hybridization results also confirmed the high degree of reproducibility between replicate spots on the Ps-n chip (See Additional files1 and2; and, the corresponding Gene Expression Omnibus (GEO)[74] file, accession number GSE46845).

Figure 5
figure 5

Scanning fluorescence image of the Pseudo-nitzschia multiseries ( Ps-n ) cDNA microarray hybridized with Cy3- and Cy5-labeled cDNAs from non-toxin-producing vs. toxin-producing cells. 5,169 individual Ps-n cDNAs and 25 control cDNAs were printed in replicate, and 96 Ps-n cDNAs were printed in quadruplicate, yielding a final chip including 10,772 features. A representative grid is enlarged to illustrate the sub-division of each 8 × 4 grid into two replicate sections, differentiated by the arrowed lines.

RNA preparation and microarray hybridizations

RNA was prepared from cells harvested during both the late-exponential (low-toxin-producing) and stationary (high-toxin-producing) phases for all three biological replicates. Eight liters of culture were harvested at an initial time point during the mid- to late-exponential growth phase and the remaining 7 L were harvested at a final time point during the stationary phase (Figure 1a,1b). Cell suspensions were centrifuged in 0.5-L aliquots for 15 min at 1,000 g, which resulted in loose pellets that were pooled, split among 2–4, 50-mL conical tubes and spun again briefly to remove any remaining liquid. Ten to 20 mL of TRIzol (depending on cell pellet volume) were added to the conical tubes, and the pellets were homogenized for 60 s, frozen in liquid N, and stored at -80°C until RNA extraction. Total RNA was extracted, as above, cleaned with RNeasy columns (Qiagen, Valencia, CA, USA) and run on formaldehyde agarose gels to confirm the quality of the RNA.

Ten micrograms of Ps-n RNA from each harvest were spiked with mRNA from the SpotReport Alien Array Validation System, incubated for 10 min at 65°C with oligo-dT and then cooled at 25°C for 5 min. Four microliters of 1 mM Cy3- or Cy5-conjugated dUTPs were added to each RNA sample and the mixtures were incubated at 42°C for 2 min. A master mix, including 4.5 μL of 0.2 M DTT, 18 μL of 5X 1st strand buffer, 1.8 μL of 25 mM dATP, dGTP and dCTP, 1.8 μL of 10 mM dTTP, and 2 μL of Superscript II reverse transcriptase, was added to each RNA mixture and incubated for 1 h at 42°C. After 1 h, an additional 1 μL of Superscript II was added to each and the reactions were incubated at 42°C for another hour. Starting RNA was degraded by addition of stop solution (3 μL of 0.5 M EDTA, pH 8; 3 μL of 1 N NaOH) and incubated for 30 min at 60°C. Labeled cDNA was cleaned using RNeasy columns (Qiagen); Cy3-labeled cDNA and the corresponding Cy5-labeled cDNA that were to be compared were combined and loaded onto the same column. The labeled target cDNA pools were then hybridized to the probe cDNAs on the Ps-n cDNA microarrays (construction described above). Ps-n microarrays were processed before hybridization by holding them face-down over a steaming water bath for a few seconds, and then snap-drying them on a 95°C heat block. The DNA was immobilized onto the slides by UV cross-linking at 65 mJoules. Cross-linked slides were soaked for 15 min in freshly prepared succinic anhydride/sodium borate solution with gentle agitation, soaked for 2 min in boiling nuclease-free, de-ionized water and finally, rinsed in 95% ethanol, and spun dry. Processed microarrays were pre-hybridized at room temperature for 1 h. Pre-hybridization solution was composed of 50% formamide, 5X SSC, 0.1% SDS, and 1% BSA. Hybridization buffer was composed of 50% formamide, 10X SSC, 0.2% SDS, and 0.26% salmon sperm. Labeled cDNA was denatured prior to hybridization by heating for 2 min at 80°C, while the cassette and microarray were pre-warmed at 42°C. The cDNA was then loaded onto the array, and arrays were hybridized for 16 h at 42°C in humidified chambers. Hybridized arrays were washed successively in 1X SSC, 0.03% SDS, 0.1X SSC, 0.01% SDS, and 0.1X SSC, and dried by brief centrifugation.

Replicate hybridizations were repeated within each biological experiment and dye-swapped to account for differences in dye labeling and detection efficiencies. Non-axenic experiments 1 and 2 each included six technical replicates, while the axenic experiment included four technical replicates.

Microarray image analysis and normalization

Dual-channel arrays were scanned at 595 nm (Cy3) and 685 nm (Cy5) on ArrayWoRx scanners (Applied Precision, Inc., Issaquah, WA, USA). The scanning system converts signal from fluors to “pixel” values, which allows the data to be saved as tiff files. DigitalGenome software (MolecularWare, Cambridge, MA, USA) was then used to integrate annotated chip information with the tiff files and to visualize, edit and export the data for normalization. A loess algorithm was applied to the spot mean intensity values across replicate arrays within each biological experiment to correct for systematic biases using S+ArrayAnalyzer software (Insightful Corp., Seattle, WA, USA)[75, 76]. Quality control included analyzing final intensity ratios for the control set of data after normalization. The normalized intensity data for each control spike that corresponded to time zero (T0) and time final (TF) experimental mRNAs in the labeling reactions were analyzed using linear regression analysis to verify that the mean integrated intensity across the control spots was equal (slope ≈ 1). The slope of the linear regression of T0 to TF control intensity values averaged across arrays approached 1 for all three biological experiments: Non-axenic experiment 1 = 0.95, R2 = 0.97; Non-axenic experiment 2 = 0.91, R2 = 0.95; Axenic experiment = 0.91, R2 = 0.98. Any negative values, outliers (defined as two standard deviations away from the mean for individual spots), and any spots that did not include data for at least three replicate arrays within each dataset, were removed from further analysis. These parameters resulted in the removal of data for <1.5% of the original 10,772 features printed on array (final datasets: Axenic experiment 1 = 10,723; Axenic experiment 2 = 10,614; Non-axenic experiment = 10,630).

Microarray statistics

Significance analysis of gene expression was performed using a t-test algorithm modified for multiple tests: Significance Analysis of Microarrays (SAM)[77] [http://www-stat.stanford.edu/~tibs/SAM/]. SAM reports those genes with statistically significant differences between treatments based on an overall false discovery rate (FDR). A score d(i) is assigned to each gene based on changes in gene expression relative to the standard deviation of repeated measurements. The FDR is an estimate of the percentage of genes identified by chance that would have an observed relative difference d(i) greater than the expected relative difference dE(i) set by an adjustable threshold, delta. An FDR of 1% estimates that for every 100 genes called significant, less than one would be identified incorrectly. The FDR may be adjusted by changing the delta and fold-change thresholds. While SAM does not report individual p-values, each gene is assigned its own “local FDR” (LFDR), which is the comparable statistical measurement to identify individual genes with changes in expression. LFDR can be used to review the data beyond the defined set of differentially expressed genes based on FDR. Hong et al.[78] demonstrated an LFDR of 10% as a reliable cut-off to successfully identify changes in expression for specific genes. While the FDR is considered the most reliable measure of the statistically accurate gene list within an experiment, the LFDR offers a second method for reviewing the statistical likelihood of changes in expression for a particular gene. In our study, we used the overall FDR to define the initial set of differentially expressed genes. We used the LFDR to confirm the overall change in gene expression for those transcripts that had multiple cDNAs printed on the microarray.

Initially, each dataset in our study was analyzed independently. Non-axenic experiments 1 and 2 were analyzed for statistical significance using a relatively stringent fold-change cut-off of 2.5 to target genes that were substantially up- or down-regulated during the transition to stationary phase, when toxin was produced. A delta value of 0.275 resulted in overall FDRs that were <1% in both of these experiments. Expression levels were consistently lower in the axenic experiment as compared to the non-axenic experiments. For example, the cDNAs with positive fold-change differences averaged 4.07 ± 0.97 in Non-axenic experiment 1, 3.85 ± 1.17 Non-axenic experiment 2, and 1.92 ± 0.54 in the Axenic growth experiment. Therefore, a lower fold-change cut-off of 1.5, and a delta value of 0.275, resulted in a comparable FDR that was <2.5% for the Axenic experiment. Only those transcripts that were determined to be significantly up- or down-regulated in all three biological experiments were further analyzed.

Two layers of replicates (replicate cDNAs on the array and replicate hybridizations) were accounted for by first analyzing the replicates spots as uncollapsed, independent data points using the normalization and statistical analysis described above. Then, replicate spots were averaged and collapsed, accordingly, depending on whether or not they fell into a greater contig. For singletons, both replicate spots were required to be statistically significant in all three biological experiments to be further considered. In this case, replicate spots were averaged for a final expression ratio and standard deviation. For cDNA features that fell into a larger contig, 90% of the cDNAs that fell within the contig were required to be significantly differentially expressed (based on initial SAM analysis or LFDR) to further consider the overall contig as up- or down-regulated. In this case, the replicates were collapsed by averaging the mean ratios of all cDNAs for a final fold-change ratio and standard deviation for the overall contig. The individual cDNAs within contigs served as additional replicates for these transcripts, and the results confirmed the consistent change in gene expression (see Additional file1). The final fold-change values for those transcripts that were statistically higher or lower in stationary (toxin-producing) as compared to late-exponential (low-toxin-producing) growth phase are presented in Tables 3 and4. Additional data files complying with MIAME format[79] were deposited at the GEO[74] data repository, accession number GSE46845.

Primer design and validation for RT-qPCR

Primers were designed manually to be within a length of 18–26 nucleotides with a GC content between 50-65%. These values resulted in high sequence specificity and melting temperatures (Tm) that worked well under our assay conditions using an annealing temperature of 60°C. JmjC forward, ATPase reverse and SLC6 Ex-Ex forward had lower GC contents than the original specifications, but still worked efficiently under these assay conditions. All primer sets were designed for PCR amplicons of 50–200 bp in length. Primers were synthesized by Integrated DNA Technologies, Inc. (Coralville, IA, USA) and purified by standard desalting. Efficiencies of amplification were initially determined for each primer set by running standard curves with 5-fold serial dilutions of Ps-n cDNA derived from stationary phase cultures, as well as genomic DNA. PCR conditions are described, below. Primer sequences and information can be found in Tables 2 and5. Reported efficiencies in the tables correspond with the initial cDNA standard curve analyses. Standard curves were also run using 2-fold serial dilutions of pooled cDNA from the experimental samples combined in equal amounts. The primer sets used in the Si-limitation experimental analyses showed efficiencies >95%, with R2 values >0.99. Primers that span an exon-exon junction were designed for one reference and two of the target genes; these primer sets did not yield a product using gDNA as a template.

RNA isolation, cDNA synthesis and RT-qPCRs

Total RNA was extracted daily from each flask beginning on day three of growth. Cells were collected from 250 mL of culture by filtering onto a 5.0-μm, 47-mm membrane filter (MF-Millipore mixed cellulose ester). Filters were transferred to 50 mL conical tubes and 3 mL of TRIzol were added. Cells were washed quickly and gently from the filter and homogenized at full speed with a Polytron homogenizer (Kinematica, Inc, Bohemia, NY) for 90 s. Samples were incubated at room temperature for 5 min following homogenization and centrifuged at 3000 g for 10 min at 4°C in order to pellet cellular debris. 200 μL of chloroform was added for every 1 mL of homogenate and samples were incubated for 3 min with periodic shaking at room temperature. Samples were centrifuged at 12,000 g for 20 min at 4°C in order to separate the aqueous and organic phases. 75-80% of the aqueous phase was transferred to fresh tubes and an equal volume of 70% EtOH was added. The RNA-EtOH mixture was cleaned using RNeasy mini columns with on-column RNAase-free DNAase digestion (Qiagen). Clean RNA samples were eluted in 100 μL of DEPC-treated water, and stored at -80°C until analyzed. RNA concentrations were analyzed using a Nanodrop 2000 spectrophotometer (Wilmington, DE, USA), and RNA quality was verified by gel electrophoresis using Lonza 1.2% RNA cassettes (Walkersville, MD, USA). RNA samples were diluted to 20 ng μL-1. 600 ng of total RNA from each sample was reverse transcribed using the iscript cDNA synthesis kit in 50 μL volume reactions, using both poly A and random hexamer primers (Bio-Rad Laboratories, Inc., Hercules, CA, USA). The reverse transcription reaction was carried out by incubating at 25°C for 5 min, followed by 42°C for 30 min. The enzyme was deactivated by heating to 85°C for 5 min and samples were held at 4°C until retrieved and stored at -20°C.

RT-qPCR reactions were set up as follows: 10 μL of SYBR Green PCR mix (Bio-Rad Laboratories, Inc.), 0.75 μL of cDNA, 0.2 μM of forward primer, 0.2 μM of reverse primer, and nuclease-free water to a final volume of 20 μL. An exception was that the β-tubulin Ex-Ex efficiency was within the 95-105% range using 0.4 μM for both forward and reverse primers. Each experimental cDNA was amplified in triplicate for each primer set using the following cycling parameters: 1) 95°C for 3 min; 2) 95°C for 10 s; 3) 60°C for 15 s; 4) 72°C for 30 s (plate read); 5) repeat 39 more cycles of steps 2–4; 6) 72°C for 10 min; 8) melting curve analysis from 65-95°C in 0.5°C increments every 5 s; and 9) hold at 4°C. Cq values were determined for each reaction at 150 relative fluorescent units.

Evaluation of DNA contamination in RT-qPCRs

qPCR reactions were run on 1 μL of each RNA from the Si-limitation growth experiment to test for amplification due to contaminating DNA. The absence of DNA contamination was further confirmed by using both standard and exon-exon spanning primer sets in parallel for one control and two target genes in the RT-qPCR reactions (Figures 3 and4).

Analysis of candidate reference gene expression stability

Cq values were inputted into the GeNorm plus algorithm[20, 80] and a stability value (M-value) was calculated for each gene (Figure 2A,2B). Genes with the lowest M-values are considered the most stable; M-values <0.5 are ideal for use in normalization of qPCR data. The optimum number of reference genes to use for normalization is determined by calculating the geometric average of the two, three, four, and five most stable genes. The pairwise-variation between subsequent normalization factors is calculated and when the variation is below 0.15, the amount of change caused by the addition of the new control gene is considered negligible, and therefore unnecessary to include in subsequent normalization calculations.

Normalization, quantification, and statistics of RT-qPCR analysis

The arithmetic mean of triplicate technical replicates was calculated and used for subsequent calculations. The ΔCq values were calculated by the difference between each sample and the average Cq of the chosen reference point, which was the first time point (T3). Relative quantities (RQ) were calculated by exponentiation of the ΔCqs (2ΔCq). RQ values of the target genes, for each sample, were divided by the geometric average of the chosen reference genes’ RQ values, resulting in a normalized relative quantity (NRQ)[19]. The arithmetic mean and standard deviation of biological replicate NRQs were then calculated and plotted (Figures 3 and4). NRQ values were log transformed (Log2 NRQ) into Cq’ values for statistical analysis[81]. Statistically significant changes in gene expression were determined using a general linear model analysis of variance (ANOVA) with Bonferroni post-hoc test. Statistical analyses were performed with 95% confidence intervals (p < 0.05) using Minitab16 statistical software (State College, PA, USA). Additional files6 and7 contain the complete set of RT-qPCR data and statistical results. RT-qPCR data analyses and reporting were in accordance with MIQE guidelines[82, 83].

Availability of supporting data

The microarray data supporting the results discussed in this paper are included within the paper, and in Additional files1 and2. The microarray data are also available in the NCBI’s GEO repository, and are accessible through the GEO Series accession number GSE46845. The sequence data and annotations are presented in Additional files2,3,4 and5. The ESTs are also available through the NCBI dbEST database [GenBank accession numbers FD476666-FD480212]. The RT-qPCR data and statistical results are available in Additional files6 and7.