Background

Enzymes used in industry often originate from extremophilic organisms to fit the operating conditions for biocatalysis required by the process [1, 2]. These enzymes have evolved to function in ecological niches that may include very low or very high temperature, extreme pH, high salt concentration or high pressure. Recombinant production of some of these enzymes presents a significant challenge for their use in biotechnology and industry due to the demands of these applications for large amounts of highly pure protein [2, 3]. This is particularly relevant in the case of cold-adapted (psychrophilic) enzymes [4, 5] as the intrinsic instability of these proteins, which is a consequence of adaptations that allow them to function at low temperatures, makes them more challenging to produce than their mesophilic and thermophilic homologues (reviewed in [6, 7]). Little attention has been given to heterologous expression of psychrophilic enzymes, despite their large range of technological applications and commercial value (reviewed in [8]). In this study, we present a vector suite designed to optimize E. coli-based expression of putatively cold-adapted proteins which are encoded by genes sourced from the marine Arctic environment. We have opted to control the expression of our recombinant proteins by a cold-shock inducible promoter, the E. coli cold-shock protein A (cspA) promoter [9, 10]. This promoter drives high levels of protein expression, comparable to the commonly used T7-system, but is induced by a downshift in temperature rather than addition of an exogenous chemical inducer [9, 10]. In addition, we have utilized several well-documented solubilization tags, such as the maltose-binding protein (MBP) [1113], the nascent chain chaperone trigger factor (TF) [14, 15], E. coli thioredoxin (TRX) [16] and small ubiquitin-like modifier (SUMO) [17] as fusion partners to our candidate proteins, combined with the hexahistidine-tag (His) for purification [1821]. Using a restriction-free (RF) cloning method [22, 23], we have built up a suite of vectors encoding these fusion partners, based on the pCold-II vector that encodes the cspA promoter [24]. This vector suite facilitates the parallel testing of several important factors for recombinant expression, solubility and functionality of putatively cold-active enzymes encoded by environmental genes of Arctic origin.

Methods

Origins and cloning of candidate genes

The genes used in this study are part of a large bioprospecting project (the Norwegian MARZymes initiative), which aims to discover and develop commercially useful, cold-active enzymes sourced from the Arctic environment. Fosmid libraries were generated from metagenomic DNA purified from two sediment samples that were collected from either the sea floor of the Barents Sea (candidates MZ0003, MZ0009, MZ0012, MZ0013), or the intertidal zone at Kapp Wijk, Svalbard (MZ0047). The details of library preparation are outside of the scope of this publication, and in the case of the Kapp Wijk sample are described elsewhere [25]. The Fosmid inserts were sequenced using 454 technology [26], and the relevant gene sequences have been deposited to the European Nucleotide Archive [ENA: LM651246, LM651247, LM651248, LM651249 and LM651250]. Open reading frames were found in the Fosmid sequences using the MetaGeneMark prediction tool [27]. Gene candidates and putative activities were identified based on homology searches using t-BLAST [28]. Sequences were analyzed for the presence of signal peptides, which would target trafficking across the cytoplasmic membrane in the native host, by the SignalP 4.1 server [29] using the default cutoff values. Both Gram-positive and Gram-negative organism settings were run as the species these genes originate from are unknown. Domain information was assigned using the Pfam database with a threshold E-value of 1.0 [30]. Structural homologues and predicted tertiary structures were analyzed using the homology modeling software SWISS-MODEL [31] in the ‘automodel’ mode. The molecular weight and pI were calculated using the ProtParam tool at the ExPASy server [32].

The candidate genes described in this paper represent a sub-set of ‘difficult’ targets from the bioprospecting project, which had previously failed in recombinant expression trials using T7-based promoter systems. Previously, the genes were cloned as constructs with a variety of truncations, signal sequences or fusion tags and then expressed in various cell strains at different temperatures in attempts to obtain sufficient amounts of soluble product for characterization (Additional file 1: Table S1–S3). Truncations were designed to express individual fragments or domains annotated from Pfam or trimmed to remove predicted signal peptides. The reasons for failed expression included insoluble production as inclusion bodies or toxicity resulting in death of the expression cultures (Additional file 1: Table S1–S3). Sub-cloning of genes into the pCold-II-based vectors was done by RF-cloning [22, 23], as described below. Several of the candidates were truncated based on the above-mentioned bioinformatic analyses of the sequences, our previous examples of successful T7-based expression (Additional file 1: Table S1–S3), and on the observation that N-terminal and C-terminal truncations can reduce flexibility and increase stability [33]. The regions cloned are listed in Table 1.

Table 1 Summary of in silico analysis and cloning information of candidate sequences

Construction of a pCold-II vector suite

To enable tag-removal at a later stage of purification, a Tobacco Etch Virus (TEV) protease site was introduced to the pCold-II vector (Takara) by annealed oligonucleotide cloning. 50 μM of each of the oligonucleotides TEV5’ and TEV3’ (Additional file 1: Table S4) were hybridized at 95 °C for 5 min, and cooled to 4 °C. 0.75 μM of the annealed oligos were phosphorylated by 10 U T4 polynucleotide kinase (NEB) in T4 kinase buffer supplemented with 1 mM ATP (Sigma). After stopping the reaction with 20 mM EDTA, 0.5 pmols annealed oligos were ligated in a 1:10 ratio of oligos to BamHI (NEB) and CIAP (Finnzymes) treated pCold-II vector, using T4 DNA ligase (NEB). Vectors encoding different fusion partners were generated by introducing genes for the solubility tags between the existing His-tag and TEV site of the pCold-II-TEV vector using RF-cloning [22, 23, 34]. Genes encoding the TRX-, MBP-, TF- and SUMO-tags were amplified from templates using the primers listed in Additional file 1: Table S4 that were designed using an online tool described in [35]. The E. coli trx gene encoding residues 2–109 of TRX [UniProt: P0AA25] was amplified from pET32a(+) (Novagen). The E. coli malE gene encoding residues 27–391 of MBP [UniProt: P0AEX9] was amplified from pHMGWA [36]. Genomic DNA isolated from a DH5α strain was template for the E. coli tig gene encoding residues 2–432 of TF [UniProt: B7UJQ9] and a codon-optimized yeast smt3 gene (GenScript) served as template for SUMO [UniProt: Q12306] encoding resides 2–98. The megaprimers containing fusion partner genes were amplified using Phusion polymerase (Finnzymes), purified with the QIAquick PCR purification kit (Qiagen) and inserted in the vector by linear plasmid amplification. To remove parental DNA, the PCR products were digested with 20 U DpnI (NEB) and transformed to E. coli XL1blue cells by conventional heat shock. Sanger sequencing was used to confirm correct cloning of all vectors. Information about the vectors used in this study is summarized in Table 2.

Table 2 Expression vectors used in this study

Insertion of candidate genes into the pCold-II vector suite

To introduce candidate genes into the panel of pCold-II vectors, megaprimers of each candidate were generated by PCR and then inserted into vectors by RF-cloning as described above. Clone selection was carried out by PCR screening of single colonies using a Taq DNA polymerase master mix (Ampliqon). Plasmids were isolated using the Wizard plasmid purification kit (Promega), and Sanger sequencing confirmed successful insertions of sequences to vectors.

Recombinant protein expression

Recombinant proteins were expressed in 24 deep well plates (DW24) according to the protocol described in [37]. Plasmids were transformed into the E. coli host strains, Rosetta2(DE3)pLysS, BL21CodonPlus(DE3)RIL or ArcticExpress(DE3)RIL by heat shock. Cells were grown in 4 ml 2YT medium supplemented with 2 % (w/v) D-glucose and relevant antibiotics (100 μg/ml ampicillin, 25 μg/ml kanamycin, 12.5 μg/ml tetracycline, 20 μg/ml gentamycin and/or 34 μg/ml chloramphenicol) at 37 °C in a Micro-Expression Plate Shaker (GlasCol) at 450 rpm for 2–4 h until cells reached optical density at 600 nm (OD600) of 0.5–0.9. Recombinant expression was induced by a temperature downshift to 15 °C and addition of 0.4 mM IPTG. After 16–20 h cells were harvested by centrifugation and resuspended in 1 ml of 50 mM TrisHCl pH 8.0, 250 mM NaCl, 1x cOmplete protease inhibitor cocktail (Roche) and lysed by three 5 s pulses at 25 % amplitude on VibraCell sonicator (Sonics) with a 3 mm microtip. Samples collected before and after a centrifugation at 3461 x g for 30 min were analyzed on 4–20 % MiniProtean TGX precast SDS-PAGE gels (BioRad) to determine the total amounts of induced and soluble protein, respectively. Successful expression of recombinant protein was assessed as the presence of a band in the post-induction sample at the molecular weight predicted by its primary sequence, and was scored on a qualitative scale from 0 to 3: 0, no visible recombinant protein; 1, visible recombinant protein at similar levels as endogenous host proteins (<10 % of total); 2, visible recombinant protein at higher levels than endogenous host proteins (≈10–50 % of total); and 3, visible recombinant protein at very high levels (>50 % of total). Failure of the cells to grow after transformation was taken as a sign of toxicity, and these samples were assigned a score of −1. To determine the solubility of recombinant proteins, the density of the protein bands was quantified by integration of the band intensity to area from SDS-PAGE gels after Coomassie staining using the QuantityOne software (BioRad) and normalized to the culture OD600 at harvest to correct for different growth rates. Solubility was calculated as a percentage of the total expression yield after subtraction of background intensity.

Chitinase assay

To measure chitinase activity, a fluorometric assay (Sigma) was performed according to the manufacturer’s instructions. Activity from 10 μl of crude extract was assayed with 0.72 μg substrate in 0.1 M Na2HPO4 at 20 °C in 100 μl final volume. Three different chemicals, all labeled with 4-methylumbelliferone (4MU), 4MU-N-acetyl-β-D-glucosaminide, 4MU-N,N’-diacetyl-β-D-chitobioside and 4MU-β-D-N,N’,N”-triacetylchitotriose, were used as substrate analogs of chitin. Reactions were stopped with 1 M alkaline sodium carbonate. The release of fluorometric 4MU was measured using an excitation wavelength of 360 nm and emission wavelength of 450 nm in a SpectroMax microplate reader (Molecular Devices). The background fluorescence intensities (relative values) were subtracted form the experimental values, which were then normalized to the semi-quantitative measure of soluble protein (as explained above). A chitinase extract from Trichoderma viride (Sigma) was used as a positive control.

Results and discussion

Selection of candidates and rationale for their recombinant expression

The genes tested in this study are part of a bioprospecting project that aims to discover new enzymes with potential commercial applications, and includes two putative carbohydrate esterases, two putative glycosyl hydrolases and a putative ATP-dependent DNA ligase. All genes originate from metagenomic DNA that was purified from marine Arctic sediment samples. Candidate genes with sequence homology to proteins of known activity were selected based on tBLAST searches of open reading frames [28] and Pfam domain assignments [30] (Fig. 1, Table 1). Four of the candidates have probable extracellular locations in the native hosts based on the prediction of leader peptides, and three candidates encode two or more cysteine residues in their native polypeptide, some of which could be involved in forming disulfide bonds in the folded structures. The mature proteins have predicted isoelectric points ranging from 4.4 to 9.9, one or two separate domains (for MZ0003 there was no domain prediction), and masses from 32 to 48 kDa. In some cases only portions of the genes were expressed to remove signal peptides, to express only annotated Pfam domains, or to minimize the numbers of low complexity regions and hydrophobic patches in an attempt to improve solubility (Fig. 1). Our candidates are typical representatives of enzyme discovery projects, which cover a set of sequences with diverse properties. Because the DNA was sourced from a permanently cold environment, we presume these genes derive from cold-adapted organisms and wanted to ensure low-temperature expression conditions for the recombinant protein products. The challenge was how to efficiently produce such varied and challenging enzymes in an optimal but universal way, with an acceptable yield and throughput. We decided to use E. coli as a host for expression, although this host is known to have lower growth rates and less efficient protein production at low temperatures [38, 39]. To compensate for this, we tested the commercial pCold-II vector [24], and utilized its cspA promoter to direct high levels of cold-shock induced expression in E. coli. The cspA 5′UTR region, which is included in the promoter region of the vector (Fig. 2), adopts a highly stable structure allowing for efficient protein synthesis at low temperatures, such as 15 °C [9, 4043]. Basal expression from the cspA promoter is repressed by a constitutively expressed LacI repressor protein that binds the lac operator [24], and a 5′UTR that contains a sequence that traps translation of the recombinant protein and enhances its translation [4446] (Fig. 2). The pCold-II vector encodes an N-terminal His-tag for purification; so to allow its removal after expression we introduced a TEV protease cleavage site into the multiple cloning site of pCold-II (Fig. 2, Table 2). The genes from our five candidates where successfully inserted into the vector using the RF-cloning method [22, 23, 34].

Fig. 1
figure 1

Environmental candidate sequences. Cartoon of the sequences of all candidates annotated with leader sequences (black boxes) and predicted Pfam domains (white boxes, borders are given by residue number), drawn to scale. No Pfam domain predictions were found for MZ0003. The presence of known catalytic residues and cysteines is highlighted with black and white circles, respectively. The length of native proteins are given next to each candidate, and cloned regions are indicated with arrowed lines

Fig. 2
figure 2

Cloning and expression region of pCold-II-TEV. The cold-shock protein A promoter (cspA), the lac operator, ribosome binding site (RBS) and the translation-enhancing element (TEE) regulate gene expression from pCold-based vectors. The cspA-derived 5′UTR region allows low-temperature induction as it is stable at temperatures around 15 °C, but highly unstable at 37 °C. A TEV protease recognition sequence was introduced at the BamHI restriction site of the cloning region of pCold-II to allow removal of the vector-encoded His-tag from recombinant proteins. The cartoon is drawn to scale

The cspA promoter gave high total expression and some soluble protein for all candidates

In initial E. coli screens, T7 promoter-driven expression of the five candidates caused problems with insolubility and toxicity (Additional file 1: Tables S1–S3).

To evaluate the performance of the cspA promoter for production of our metagenomic proteins, recombinant expression was tested in three common expression strains: BL21CodonPlus(DE3)RIL, ArcticExpress(DE3)RIL and Rosetta2(DE3)pLysS (Fig. 3a). All these strains contain genes encoding rare tRNAs to compensate for the codon usage bias in the Arctic-sourced genes (Table 1) [4749]. In addition, the strain ArcticExpress(DE3) co-expresses cold-active chaperonines, Cpn60 and Cpn10 from Oleispira antarctic, which impart an active protein folding system at low temperatures and is expected to be advantageous for expression of psychrophilic proteins [50]. Successful expression of recombinant proteins, identified from their size on SDS-PAGE, was scored according to the qualitative scale from −1 to 3 described in the Methods section. We found cspA drove high expression levels for all candidate proteins in at least one of the strains tested. Interestingly, none of the candidates had negative effects on cell viability, either during transformation, from inoculation to log phase grown or after induction of protein expression (Fig. 3a). Consequently, we were able to express candidate proteins in all strains, including the ArcticExpress(DE3)RIL strain which had previously been unusable in combination with several targets (MZ0003, MZ00013 and MZ0047) under T7-driven expression (Tables S1-S3). The explanations for this have not been elucidated; however, our data is consistent with a recent report where combining the cspA promoter system with ArcticExpress was successful for expression of cold-adapted proteins [51] and expands the repertoire of expression strains that can be used for these, and most likely other, candidate proteins.

Fig. 3
figure 3

Heterologous expression of candidate genes driven by the cspA promoter. a cspA promoter driven heterologous expression of candidate proteins in three E. coli expression strains; BL21CodonPlus(DE3)RIL (light grey bars), ArcticExpress(DE3)RIL (medium grey bars) and Rosetta2(DE3)pLysS (black bars). A qualitative scale for scoring expression success was used as described in the main text. b Normalized protein expression (upper panel) and solubility (lower panel) of candidates after cspA driven expression in BL21CodonPlus(DE3)RIL. Error bars show the standard deviation between two independent experiments

The proportion of recombinant protein in the soluble fraction was evaluated for all candidates expressed under cspA in the BL21CodonPlus(DE3)RIL expression strain, according to a semi-quantitative scale. All candidates were expressed in the soluble fraction with a proportion of approximately 10 % or lower compared to the total expression yield (Fig. 3b). Our data is consistent with previous publications reporting that the pCold-vectors can facilitate soluble expression [24, 52, 53] although the extent of solubility depends on the particular protein being expressed.

While the solubility of these candidates expressed in the cspA system was not dramatic, this may still be of importance for particular poorly expressed proteins where T7 does not appear to be a viable option; as in the cases of MZ0009 and MZ0047 where complete insolubility and toxicity had previously precluded recombinant production of these targets (Additional file 1: Table S1–S3).

Building a cold-shock inducible vector suite encoding fusion partners

As the final yields of soluble protein were still not sufficient for up-scaling for most of our candidates, we were inspired by previous findings [52, 53] to fuse established solubility partners to our recombinant proteins in an attempt to improve their solubility (Fig. 4). We selected the small, soluble protein TRX [16], the large, highly soluble MBP [1113], and two proteins that have not previously been used in combination with the cspA system: the ubiquitin-like SUMO protein [17] and the co-chaperone TF [14, 15], to be used as fusion partners. Previously, these partners have proved successful for soluble expression in large and systematic studies [5456]. The vector suite was designed to facilitate an efficient and parallel cloning workflow of candidates as fusions to partners. Only three primers per candidate are required to clone each candidate gene into the vector suite: one forward primer to generate the His, His-TRX, His-MBP and His-TF fusion constructs, a second forward primer for the SUMO fusion constructs required to engineer the immediate transition from SUMO-tag to candidate protein, and a reverse primer which was used for all five (Additional file 1: Table S4). Complementing a recent report [57], we demonstrated the application of RF-cloning in building a tailored vector suite and enabling parallel cloning of candidate sequences. All vectors were tested in the E. coli Rosetta2(DE3)pLysS strain for soluble expression of fusion partners without candidate genes, and as expected, the majority of recombinant proteins were soluble (data not shown).

Fig. 4
figure 4

Cartoon of fusion-proteins expressed from the cold-inducible vector suite. Cartoon of the affinity tag (black box) and solubility tags (light grey box) in fusion to candidate sequences (dark grey box) in the recombinant constructs that were generated in this study. Fusion partners are removable in all constructs either by TEV or SUMO protease cleavage; their recognition sites are indicated by triangles. Fusion partners are drawn to scale

Improvement in solubility with several fusion partners

Small-scale expression of all five candidates from the pCold-II-based vector suite gave significant amounts of recombinant protein, with the exception of the TRX fusion constructs (Table 3). To evaluate the solubility of each candidate expressed from the vector suite, we inspected the SDS-PAGE results of cleared lysates relative to the total yields of recombinant protein produced using the qualitative scores described above. As the His-tag alone does not aid solubility, this construct served as a reference for comparing the improvement in solubility imparted by the fusion partners. We found that the large fusion partners His-MBP and His-TF cloned N-terminal to the candidates gave an outstanding improvement in solubility compared to the His-TRX- and His-SUMO-tags (Table 3). This is in line with previous reports where MBP exceeds smaller tags for soluble expression of both psychrophilic and mesophilic prokaryotic proteins using T7-driven [58] and cspA-driven expression vectors [53]. Several studies have also been conducted on yeast, plant, mammalian and insect proteins showing a similar performance of MBP [13, 53, 59]. His-TRX-fusion proteins had varying expression levels between independent experiments, and for some constructs (MZ0003, MZ0009 and MZ0047) cell viability was negatively impacted during transformation. Previous findings have shown that the TRX-tag does not work consistently in combination with the pCold system [53]; however as we found that the pCold-II-TRX-TEV vector gave soluble expression of the tag alone, an explanation for the observed toxic effect in our system has not been elucidated.

Table 3 Semi-quantitative analysis of fusion construct expression and solubility in Rosetta2(DE3)pLysSa

To summarize, we found that fusion tags, in particular MBP and TF, can be successfully utilized to improve solubility of putatively cold-active enzymes. For all candidates tested in this study, at least one condition was identified which gave rise to sufficient levels of soluble fusion protein to make further attempts at recombinant production feasible.

Using the vector suite, a functional chitinase was produced

As a proof of principle, the candidate MZ0009 was chosen for functional investigation. This candidate showed a marked improvement in expression under the cspA promoter in Rosetta2(DE3)pLysS; previous T7-based experiments resulted in completely insoluble protein (Tables S1-S3), while cspA-controlled expression produced 5-10 % soluble product with only a His-tag (Fig. 3b), which increased to 80-100 % when His-MBP- and His-TF-tags were used (Table 3). Semi-quantitative evaluation of MZ0009 solubility with different tags showed an improvement in the order: TF > MBP > SUMO = His (Fig. 5a and b). The two latter tags show considerable variation in solubility between parallel experiments, and were therefore considered equal. As described above, no transformants were obtained when MZ0009 was cloned in fusion with the His-TRX-tag.

Fig. 5
figure 5

Comparison of expression level, solubility and activity of the cold-shock inducible, recombinant chitinase. a A representative SDS-PAGE gel showing the total protein produced (T) and the soluble fraction (S) in lysed Rosetta2(DE3)pLysS extracts of MZ0009 expressed from the pCold-vector suite. Asterisks indicate the presence of recombinant proteins of theoretical expected mass (given below figure). M, Protein standard in kilodaltons (kDa). b Semi-quantitative calculation of soluble fraction of the MZ0009 fusion proteins. Error bars show variation in two independent experiments. c Chitinolytic activity, presented as normalized fluorescence intensities, towards three synthetic 4-methylumbelliferone-labelled (4MU) chitin analogs in the cleared lysates containing MZ0009 fused to fusion proteins. Background activity from expression of ‘empty’ vectors is indicated with a dashed baseline. A.U., arbitrary units. Error bars show variation in two replicates in one representative experiment

Based on amino acid sequence similarity, the MZ0009 candidate was annotated as a chitinase belonging to the glycoside hydrolase family 18 (GH18) [60, 61]. This class of enzymes hydrolyze the glycosidic β-1,4-linkage between N-acetylglucosamine (GlcNAc) units of the chitin biopolymer, and can be endo- and exo-acting [6265]. A putatively cold-adapted chitinase of the GH18-family is industrially relevant for treatment of chitin-rich biomass at low temperatures, for biocontrol of phytopathogens in cold environments and prevention of microbial spoilage of refrigerated food [64, 6668].

The conserved catalytic DxDxE motif [69, 70] was found in MZ0009 based on an alignment of the MZ0009 sequence with characterized GH18 chitinases (Additional file 2: Figure S1). The motif includes the essential, catalytic residues asp147, asp149 and glu151, corresponding to asp140, asp142 and glu144 respectively in the well-characterized ChiB from Serratia marcescens [7173]. To confirm this sequence annotation, clarified lysates containing MZ0009 fused to each of the four tags were assayed for chitinase activity (Fig. 5c) using three synthetic 4-methylumbelliferone (4MU)-labeled chitin analogs. The His-, His-MBP- and His-SUMO-tagged fusion-constructs of MZ0009 displayed pronounced chitinolytic activity on the 4MU-β-D-N,N’,N”-triacetylchitotriose (4MU-(GlcNAc)3) substrate (Fig. 5c), while lower levels of chitinase activity were detected on the 4MU-N,N’-diacetyl-β-D-chitobioside (4MU-(GlcNAc)2) substrate with the His-SUMO and His-MBP constructs (Fig. 5c). As exo-chitinolytic enzymes act on terminal N-acetylglucosamine residues to remove either monosaccharides or disaccharides, endochitinases are required to release fluorescent products from 4MU-(GlcNAc)3 [74], indicating that MZ0009 is an endochitinase. Interestingly the His-SUMO-fusion construct, which gave little or no improvement in solubility compared to the His-only construct (Fig. 5a and b), showed the most pronounced chitinolytic activity (Fig. 5c). This indicates that soluble expression, which was equivalent for both constructs, does not necessarily equate to a functional enzyme, as the His-tagged control was significantly less active than the His-SUMO fusion. As none of the solubility tags had chitinolytic activity when expressed alone (data not shown) this improvement in activity can only be ascribed to an influence of the fusion partner on MZ0009 folding. Our data strengthens the observation that MBP and SUMO partners can be used to promote correct folding of difficult-to-express proteins [58, 75]. In addition to confirming the successful folding of the MZ0009 fusion constructs, we have provided a preliminary characterization of an endo-chitinase. As a proof of concept, a functional chitinase, representing the first recombinant endo-chitinase from an environmental Arctic source was produced by using the cold-shock inducible vector suite.

Although approximately 80 % of the His-TF fusion construct of MZ0009 was expressed as a soluble protein, this fusion construct has no significant activity towards 4MU-(GlcNAc)3 in our chitinolytic assay (Fig. 5c). We attempted to remove the His-TF-tag utilizing the TEV-protease site between the tag and candidate (Fig. 2), but TEV protease treatment in a cleared lysate resulted in only approximately 10 % cleavage of the fusion protein (data not shown), which may indicate that the TEV-site in the linker (Table 2) was inaccessible to the TEV protease. A known drawback of fusion tags is they may interfere with the activity of the candidate protein by sterically hindering access of substrate to the catalytic site, or by causing the candidate to adopt a non-functional, though soluble, conformation [76, 77]. Although this property can be successfully exploited for expression of toxic targets [78], it is generally undesirable and generates false-positive results as the candidate may remain inactive upon removal of the tag. The importance of the length and nature of the linker joining the candidate to the fusion partner has been recognized and discussed in several studies [7982], however a systematic study of optimal linker length has not been carried out for TF-fusion constructs and will be required to fully exploit its potential as a solubility and co-folding partner in recombinant protein expression.

Conclusions

Our aim was to develop an efficient screening system for putatively cold-active enzymes intended for biotechnological purposes, using metagenomic DNA from the marine Arctic environment as a source of candidate genes. To facilitate this, a vector suite based on the cspA promoter was designed for fast-cloning and low-temperature heterologous expression. Our data show that the cspA promoter can be utilized for low-temperature production of high levels of expression for putatively cold-adapted candidate genes. The cspA system also mitigated the toxic effects that were observed with several candidates under T7-driven expression. We found that combining the candidate genes with the established MBP fusion partner substantially improved the solubility of the recombinant proteins. We have also extended the utility of the SUMO protein and TF fusion proteins for soluble protein production under cspA promoter control. In workflows typical of an enzyme discovery project, a robust production procedure to obtain sufficient quantities of active enzyme is typically part of the initial screening phase. From this perspective, parallel testing of different fusion tags is extremely useful as it can help improve the efficiency of the production procedure. In summary, the vector suite facilitates a low-temperature optimized system for heterologous expression of putatively cold-active enzymes in fusion to both small and large fusion partners. The process of generating all tag-candidate combinations is simplified by using our parallel cloning strategy for candidate gene insertion.

As a proof of concept, we showed that the Arctic-sourced MZ0009 candidate, a putative GH18 family member, was expressed in a soluble and functional form in the optimized expression system. Recombinant MZ0009 showed activity towards synthetic chitin analogs when cell lysates with overexpressed fusion proteins were tested in an activity assay. The highest activity was found for the 4MU-(GlcNAc)3 substrate, indicating that the MZ0009 protein is an endo-chitinase.

We envision this expression system being employed in further bioprospecting endeavors for psychrophilic proteins, and suggest that it can provide a good starting point for enzyme discovery and development. Moreover, the inclusion of His-tag and either TEV or SUMO protease cleavage sites in all constructs would allow large-scale purification and tag removal to proceed directly from successful screens for crystallization studies. Finally, we suggest that such a cold-shock inducible system could be advantageous for the heterologous expression of toxic mesophilic and thermophilic proteins, where properties of the proteins are deleterious to the host cell growth [83, 84]. If such proteins were expressed at temperatures below their activity optima, we would expect a decrease in their toxicity during recombinant production [85].