Contribution of nonconsensus base pairs within ArsR binding sequences toward ArsR-DNA binding and arsenic-mediated transcriptional induction
- 50 Downloads
A transcriptional reporter is the key component in bacterial biosensors which are employed to monitor the induction or repression of a reporter gene corresponding to environmental change. Interaction of a transcription factor with its consensus sequence generated by using a position weight matrix (PWM) model is crucial for its sensitivity of the reporter. However, recent studies suggest that PWM model based on independent contribution of individual consensus base pairs to protein interaction is often insufficient to explain complex regulation, such as the effect of nonconsensus sequences on the protein-DNA binding affinity. In the present study, we employed a simpler prokaryotic arsenic repressor (ArsR) regulation system to access the protein-DNA recognition. Contribution of nonconsensus base pairs within ArsR binding sequences toward ArsR-DNA binding and arsenic-mediated transcriptional induction was studied.
We constructed a series of arsenic responsive reporters, each comprising two copies of the ArsR binding sequences from different resources. We found that high arsenic-mediated induction specifically requires the binding sequence from Escherichia coli to be placed at the first binding sequence; however, no such preference was observed for the second binding sequence, which could be from Acidithiobacillus ferrooxidans, plasmid R773, Synechococcus, or a core binding sequence of arsR. By creating a series of reporters differed at the nonconsensus base pairs of the second binding sequence, we observed that some constructs bound weakly while others strongly to ArsR. Most interestingly, although a number of these reporters showed similar binding affinity to ArsR, their arsenic-dependent induction differed significantly.
The results indicated that nonconsensus base pairs could have profound influence on protein binding and may also modulate post-binding function. These findings provide new insights into the complex regulation of gene expression and facilitate the development of transcriptional reporter-based biosensors.
KeywordsBacterial biosensor Arsenic bioreporter Nonconsensus base pair Arsenic repressor Arsenic binding sequence Protein-DNA recognition
binding sequences from A. ferrooxidans
R773 arsR operon
binding sequence from arsRBC
consensus sequence of arsRBC and cadCA
binding sequences from E. coli
ethylene diamine tetraacetic acid
electrophoretic mobility shift assay
position weight matrix
relative light units
sodium dodecyl sulfate
binding sequence from Synechococcus smt2/1
- TRITON X-100
polyethylene glycol octylphenol ether
The interaction between a transcription factor (TF) and its corresponding DNA binding sequence is crucial in gene regulation [1, 2]. The base composition determines the binding affinity of the sequence to the TF. The contribution of individual base pairs in the interaction with a TF can be assessed by their conservation. Algorithms typically use the statistically simple position weight matrix (PWM) model for a binding consensus sequence [3, 4]. Moreover, the binding consensus sequence can be determined by sequencing a group of DNA fragments or oligonucleotides selected by a TF using in vitro methods such as ChIP-seq or SELEX . Nevertheless, more than 40% of TFs still remain unknown for their binding sequences . In Escherichia coli (E. coli), most TFs bind to a single binding site in chromosomal DNA, such as arsenic repressor (ArsR), a metalloregulatory transcriptional repressor to its operator/promoter (O/P) sequence . Due to the abundant presence of ArsR binding sequences in microbial chromosomes, the alignment of these binding sequences via comparison and analysis with PWM can lead to the identification of its binding consensus sequence or motif . However, recent studies suggest that PWM model based on independent contribution of individual consensus base pairs to protein interaction is often insufficient to explain various complex regulations , such as the effect of nonconsensus sequences on the protein-DNA binding affinity. In the present study, we employed a simpler prokaryotic ArsR regulation system to access the protein-DNA recognition.
ArsR, belonging to the Smt/ArsR family, is a regulatory protein that controls the expression of the genes involved in arsenical resistance via interaction with the arsenic-responsive operon. ArsR binding prevents the RNA polymerase from interacting with the O/P sequence of its targeted genes in the absence of arsenicals [7, 9]. Upon arsenic binding, the protein dissociates from the promoter, subsequently activating the gene expression [9, 10, 11]. ArsR protein is well characterized in plasmid R773 and E. coli chromosome. Both of these ArsR proteins are able to form homodimer, each with a Cys32-Val-Cys-Asp-Leu-Cys arsenic-binding sequence located at the start of their DNA binding domain . ArsR from Acidithiobacillus ferrooxidans (A. ferrooxidans) does not have the binding sequence at this location, instead, their cysteine residues are located at amino acid residues of 95, 96, and 102 . Both binding and the consensus sequences of Smt/ArsR family proteins, including those in A. ferrooxidans, have been characterized [6, 7, 12, 13, 14].
In a previous study, we created two arsenic reporters, pLHPars9 and pLLPars9, in order to rapidly and cost-effectively monitor arsenic on site and measure arsenic bioavailability. The bioreporters pLHPars9 and pLLPars9 comprised either a high or low copy-number plasmid, along with common elements of ArsR-luciferase fusion and addition of two binding sequences, one each from E. coli (ECBS) and A. ferrooxidans (AFBS) chromosome, before the R773 arsR operon (arsRBC) . Both of these reporters were highly sensitive to arsenite, with a low detection limit of 0.04 μM arsenite (~ 5 μg/L) and differed in their metal specificity, with pLLPars9 being more specific to arsenite and pLHPars9 to both arsenite and antimonite. The only difference between pLHPars9 and pLLPars9 is their copy numbers.
In the present study, we constructed a set of arsenic bioreporters comprising two copies of different binding sequences. We found that high arsenic-mediated induction specifically requires ECBS to be placed at the first binding sequence; however, no such preference was observed for the second binding sequence. By creating a series of reporters differed at the nonconsensus base pairs of the second binding sequence, we tested the interaction of these probes with the protein. Interestingly, while some of the nonconsensus base pairs resembling the consensus are needed for the interaction with the ArsR protein, some of the nonconsensus base pairs appear to also affect the post-binding function of the TF.
Arsenic transcriptional induction with a promoter containing ECBS binding sequence in arsenic bioreporters
Furthermore, we replaced the AFBS moiety within ECBS-AFBS with the binding sequence of Synechococcus smt2/1 (smt2/1BS) or arsRBC (arsRBCBS), to create the reporters pECBS-smt2/1BS and pECBS-arsRBCBS, and compared the luciferase activities of cell lysates prepared from their transformed cells with or without arsenic treatment. As presented in Fig. 1b, the induction of ECBS-smt2/1BS and ECBS-arsRBCBS moderately declined, losing approximately 15–25% induction folds of ECBS-AFBS. This suggested that AFBS at this position is not crucial for induction and can be substituted by other ArsR binding sequences. When we replaced ECBS moiety within ECBS-AFBS with the binding sequence of smt2/1 or arsRBC to create reporters psmt2/1BS-ECBS and parsRBCBS-ECBS, we found that the ratio of luciferase activities significantly declined, losing approximately 70% compared to ECBS-AFBS, as shown in Fig. 1b. The aforementioned results demonstrated that ECBS needs to be the first binding sequence in order to robustly respond to arsenic.
The consensus sequence of a DNA-binding protein can be determined by comparison of a group of binding sequences. Those consensus base pairs are believed to be crucial for the protein to bind DNA and the nonconsensus base pairs are not important to the binding. Arsenic binding proteins from different microbes are DNA-binding proteins. The DNA sequences that they bind to display a consensus sequence . Our above data indicated that the second binding sequence within the biosensors was relative flexible, which could be more tolerant to bioengineering manipulation, such as a consensus sequence. According to the consensus sequence of arsRBC and cadCA, we designed a binding sequence CS (Fig. 1c) and swapped it with the AFBS moiety to construct pECBS-CS, with 3 Ts in between. Luciferase assay revealed that pECBS-CS showed no significant difference in the response to arsenic treatment when compared to pECBS-AFBS, suggesting that the CS can be used to replace AFBS within the biosensors. However, when we swapped ECBS with the CS to make pCS-AFBS, it demonstrated a significant change (Fig. 1d). Moreover, when we replaced ECBS of ECBS-CS with arsRBCBS or AFBS to construct parsRBCBS-CS and pAFBS-CS, they lost induction significantly like any other constructs without ECBS being at the first position as shown as above. These results with CS indicated that ECBS must be the first binding sequence.
Arsenic cannot remove the repressor protein from AFBS-ECBS and CS-ECBS binding sequences
Next, we compared ECBS-CS and CS-ECBS probes with EMSA. The ECBS-CS probe revealed two shifted bands in control cell lysate and much weaker intensity of the shifted bands in arsenic-treated cell lysate (Fig. 2b). Again, we observed no significant difference in the intensity of the shifted bands between control and treated cells with CS-ECBS probe. This result is in accordance with AFBS-ECBS, suggesting no removal of the repressor protein from CS-ECBS probe under arsenic treatment.
Arsenic removal of the repressor protein from ECBS-CS binding sequence required a linker of 3Ts
Moreover, to examine whether the absence of 3Ts caused a steric hindrance for binding of the dimers to the binding sequences or prevented the removal of the bound repressor protein from the bound sequence, we performed EMSA with biotin-labeled probes of ECBS-CS and ECBS-EC(− 3Ts). As shown in Fig. 3c, like ECBS-AFBS probe, the result with ECBS-CS probe displayed two shift bands in arsenic-untreated cells but significant decline in arsenic-treated cells. Without the linker, ECBS-CS(− 3 T) displayed two shifted bands in both control and treated cells, indicating the absence of any interference with protein binding, thus ruling out the possibility of steric hindrance. Therefore, this result suggested that the absence of the linker hampered arsenic-mediated removal of the repressor protein from the bound sequence.
Fast analysis of the DNA binding sequences of ArsR with DNA filter assay
The above results indicated that two ArsR binding elements within the biosensors were needed in order to have a sensitive response to arsenic treatment. The first element must be from E. coli and the second one was more flexible, such as arsRBCBS or CS. Although arsRBC of pECBS-arsRBCBS and CS of pECBS-CS contained the same consensus sequence, their responses to arsenic treatment were distinct with moderate difference. We assumed that the difference could arise only from the contribution of nonconsensus base pairs of the second binding sequence. To investigate the contribution of the nonconsensus base pairs in both binding and induction, we constructed a series of probes and reporters exclusively with alternative nonconsensus base pairs in the second binding site within ECBS-arsRBCBS. According to the consensus sequence of arsRBC, only 4 base pairs are not conserved. Investigation of different combinations of these 4 base pairs required testing of a large series of probes.
To examine the feasibility of the filter assay, we employed probes ECBS-AFBS, AFBS-ECBS, ECBS-CS, CS-ECBS and ECBS-CS(− 3 T) and mixed them with lysates prepared from arsenic-treated and mock-treated cells. The probe mixtures with lysates were first validated with EMSA before using for the filter assay. The filter assay indicated that the binding of ECBS-AFBS probe with the lysate without arsenic treatment was much stronger than the lysate with arsenic treatment, and the ratio of the binding intensities of control to treated cells was about 5-fold (Fig. 4b). As expected, binding of the AFBS-ECBS probe with control and arsenic treated cell lysates was both strong and no obvious difference in binding was observed. The result with ECBS-CS probe was similar to that with the ECBS-AFBS probe and the binding ratio of control to treated cells was about 3-fold. In addition, both ECBS-CS(− 3 T) and CS-ECBS probes displayed no difference in the binding ratio of the two different lysates (Fig. 4b). These results demonstrated that the filter assay in general was in accordance with EMSA. The only difference of the filter assay was that it could not present two distinct shifted bands like EMSA.
Next, we employed both assays to perform two-fold dilutions of lysates from both arsenic treated and control cells with the ECBS-AFBS probe. As shown in Fig. 4c, EMSA could detect the complex in 1:16 diluted lysates and filter assay in 1:64, indicating that the filter assay is 4 times more sensitive than EMSA. Therefore, the filter binding assay was able effectively to analyze the binding of several probes with target protein in a quick mode.
Identification of alternations at nonconsensus base pairs crucial in protein binding using DNA filter assay
The original E. coli ArsR binding sequence was identified as acacattcg TT AA GT CA TA TA (TG) TT TT TG AC TT A . Based on the comparison with other ArsR binding sequences, we noted an extra tail of 9 base pairs at the 5′ end that are unlikely to contribute to arsenic-mediated induction. To reduce the cost in oligonucleotide synthesis, we removed 5 base pairs at the 5′ end to make sECBS as ttcg TT AA GT CA TA TA (TG) TT TT TG AC TT A. Functional analysis using reporters with the shorter version, sECBS to replace ECBS revealed no difference (data not shown).
Contribution of nonconsensus base pairs to protein binding and luciferase induction
To investigate the induction of few binding sequences, we used sECBS-CS12m, sECBS-CS9m, and sECBS-CS15m to replace the ECBS-AFBS of pECBS-AFBS, in order to construct psECBS-CS12m, psECBS-CS9m, and psECBS-CS15m. As expected, psECBS-CS12m resulted in a better induction than the other two reporters psECBS-CS9m and psECBS-CS15m (Fig. 6b). Furthermore, we compared psECBS-CS12m with pECBS-AFBS and pECBS-CS. The transformants were treated with 10 μM arsenite for 15, 30, 60 and 120 min. As shown in Fig. 6c, arsenic-mediated induction of psECBS-CS12m was significantly better than that with either pECBS-AFBS or pECBS-CS.
Arsenic, as a naturally occurring element, is widely distributed throughout the environment. Long-term exposure to arsenic from drinking water and food can cause human diseases . Prevention of further exposure to arsenic needs rapid and cost-effective on-site analytical techniques to monitor arsenic in water supplies. Bacteria-based assays are an emerging technology, in the case of arsenical contamination, to monitor arsenic-induced gene expression. Compared to the traditional capital equipment-based methods that are inappropriate for on-site detection, bacteria-based assays are robust and inexpensive for detecting arsenic in the field . More significantly, they could measure arsenic bioavailability that accounts for the difference between exposure and dose . The crucial component of bacteria-based assays is the reporter, comprising a promoter/operator (or an operon) and a reporter gene . Ideally, a good reporter should display high sensitivity and specificity, low endogenous background, and a wide dynamic range of response . In our previous study of making sensitive arsenic reporter, we constructed pLLPars9 (the same construct as pECBS-AFBS in this study) reporter and demonstrated that it is equivalent to some of the best reporters constructed to date in response to arsenic . In this study, we demonstrated that the reporter psECBS-CS12m is significantly better than pLLPars9.
Metal-inducible operons contain an imperfect 12–2-12 inverted repeat, except the smt operon having two inverted repeats S2/S1 and S4/S3. Each repeat is occupied by an ArsR homodimer. In the previous study, we designed, such as smt operon, two binding sequences ECBS-AFBS and found that the induction of the luciferase reporter in response to the treatment of arsenic is better than either of single binding sequences . We also found that the induction with these two different binding sequences is better than the two identical sequences either from EC or AF. In the present study, we uncovered that ECBS must be at the front position and the induction dramatically declined if it was replaced by other binding sequences such as smt2/1 or arsRBC. In contrast, AFBS at the second position can be replaced by other binding sequences without affecting the induction to a significant degree. This indicated that an appropriate order of these two binding sequences is important for achieving maximal induction. As the two binding sequences bind to two dimers, the complex could be stabilized by dimer-dimer interaction . Change in the order of these two binding sequences, that is AFBS-ECBS, could allow the binding of two dimers, but the order might affect arsenic interaction with the repressor protein or removal of the repressor protein from the binding sequences. Therefore, unlike ECBS-AFBS, arsenic binding sites within AFBS-ECBS might be hidden due to steric structure, which prevents arsenic binding or dissociation the repressor from the binding sequence.
Protein-DNA recognition has been increasingly appreciated to be more complex than previously thought. Although the simple model of PWM has been widely used to define the DNA binding motifs of individual TFs, recent studies suggest that this model based on independent contribution of individual consensus base pairs to protein interaction is often insufficient to explain various complex regulation , such as the relevant dinucleotides or trinucleotides crucial to protein-DNA recognition [21, 22, 23, 24], significant difference of low-affinity binding sites from the consensus sequence [25, 26], novel DNA-binding specificities of multi-protein complexes formed with a TF [27, 28, 29], and the effect of flanking sequences on the binding affinity . In the present study, we employed a simpler prokaryotic ArsR regulation system to access the protein-DNA recognition. We found base pairs at nonconsensus positions within the second binding sequences such as ECBS-CS15m could result in lower binding with target protein, whereas others such as ECBS-CS9m resulted in binding with higher affinities, although both still maintained the consensus sequence. PWM was unable to explain these results. More interestingly, our study demonstrated that one of the base pairs at the nonconsensus position could also affect induction, the function beyond DNA binding. We found sECBS-CS12m, which could bind to target protein as well as ECBS-CS; however, its response to arsenic was much stronger than ECBS-CS. Their similar basal binding levels but differential induction rates suggest that arsenic-mediated removal of the binding protein from the DNA binding sequence of CS12m is faster than CS. Therefore, like AFBS-ECBS, the interaction of these nonconsensus base pairs with the repressor protein could influence arsenic binding or arsenic-induced conformational change of the repressor protein, leading to differential turnover of the bound protein from the binding sequence, as exemplified by the observations that AFBS-ECBS was no longer sensitive to arsenic while sECBS-CS12m became more sensitive to the arsenic.
In the present study, we found that nonconsensus base pairs played important roles in protein-DNA binding and gene transcriptional regulation. More sensitive and accurate biosensors for arsenic detection can be developed through the design of nonconsensus base pairs. Our current findings illustrate an innovative strategy to construct better reporters, which will facilitate the development of more sensitive biosensors to monitor environmental arsenic via the induction of reporter gene expression.
Reporter constructs with different orders and sources of binding sequences were made by modifying the binding sequence of pLLPars9 , which was renamed as pECBS-AFBS in this study. The sense and antisense strand sequences were synthesized and annealed to generate double strand fragments with the sticky end of XbaI and HindIII, which were subsequently cloned into the XbaI and HindIII site of pLLPars9  to replace the ECBS-AFBS to make constructs, pAFBS-ECBS, pECBS-smt2/1BS, pECBS-arsRBCBS, psmt2/1BS-ECBS, parsRBCBS-ECBS, pCS-AFBS, parsRBCBS-CS, pAFBS-CS, pECBS-CS, and pECBS-CS(− 3 T). Five base pairs at the 5′ end of ECBS were removed to make sECBS. Different nucleotides at the nonconsensus position of CS were designated to make CS1-16 m. Then sECBS and CS1-16 m were subsequently cloned into the XbaI and HindIII site of pLLPars9  to replace the ECBS-AFBS to make constructs, sECBS-CS9m, sECBS-CS12m and sECBS-CS15m.
E. coli DH5α competent cells were transformed with the recombinant plasmids constructed in this study. Single colonies were picked and inoculated in 2 mL Luria-Bertani (LB) media supplied with 25 μg/mL chloramphenicol for 12–16 h at 37 °C with vigorous shaking. The overnight culture was 1:50 diluted in a 1.5 mL microcentrifuge tube with pre-warm and fresh-prepared 2 mL LB media supplied with chloramphenicol. The diluted cells were cultured for additional 3 h at 37 °C until the optical density (OD) reached 0.5. Cells were treated with or without 10 μM sodium arsenite [As (III)] for 60 min at 37 °C. The cell samples were sonicated to lyse the cells, and the protein concentration was measured with Bradford Protein Assay (Bio-Rad, Cat#5000201) to confirm the equal protein concentration among the treated and untreated cell samples. Twenty μL of induced sample was taken and mixed 50 μL luciferase substrate, and the luciferase activities were measured on the luminescence plate reader (Veritas).
Preparation of cell lysates
One mL of cell culture with or without sodium arsenite was centrifuged at 10,000 g for 1 min and the pellet was resuspended in 300 μL of lysis buffer (10 mM Tris-HCl, pH 8.0, 0.1 M NaCl, 1 mM ethylene diamine tetraacetic acid (EDTA), and 0.1% [w/v] polyethylene glycol octylphenol ether (TRITON X-100)). Senve point five μL of a freshly prepared lysozyme solution (10 mg/mL in 10 mM Tris-HCl, pH 8.0, final concentration is 0.25 mg/mL) was added and mixed well by tapping the tube gently, and the lysis mixture was incubated for 10–20 min at room temperature. After centrifugation, the supernatant was used for electrophoretic mobility shift assay (EMSA) or Filter assay.
One to 3 μg cell lysate was mixed with 2 μL of 5× binding buffer and 1 μL polyd(I-C) and incubated on ice for 5 min. One μL of biotin-labeled probe was added to the mixture and incubated at 22 °C for 30 min. Each reaction mixture was separated using a 6.5% non-denaturing polyacrylamide gel at 100 V at 4 °C in 0.5 × Tris-borate-EDTA (TBE) for about 50 to 60 min. After the gel was transferred onto an NC membrane and blocked by adding 15 mL of blocking buffer for 20 min at room temperature, the biotin-labeled probe on the blot was then detected with streptavidin–HRP and chemiluminescent substrates (enhanced chemiluminescence by luminol, Pierce). The image was acquired using an imager.
Filter assay method
In this assay, 2 μL cell lysate (2–10 μg) was mixed with 10 μL 2× Binding Buffer Mix (40 mM 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES), pH 7.6, 20 mM ammonia sulfate, 2 mM dithiothreitol (DTT), 20 mM KCl, and 0.4% Tween-20), 1 μL biotin-labeled probe, and 7 μL ddH2O in a 96-well PCR plate. After incubation at room temperature for 30 min, the reaction mix was loaded onto to a prewashed filter assay plate and incubated on ice for 20 min, following which it was centrifuged at 600 g for 2 min. The flow-through was discarded and the plate was washed for 4 times with filter wash buffer (100 mM Tris-HCl, pH 7.6, 2.5 mM EDTA, and 0.1% Tween-20). The bound probe was eluted with Elution buffer (0.5% SDS). The eluted probe was heated at 95 °C for 3 min before hybridization. Hybridization was carried out by adding the eluted DNA probe to a plate pre-coated with corresponding DNA and incubating at 42 °C overnight. After wash, the bound probe was eluted from the filter and collected for quantitative analysis through DNA plate hybridization. The captured DNA probe was further detected with streptavidin-HRP and the signals were read by a luminescence plate reader (Beckman Coulter, LD-400), and reported as relative light units (RLUs). Induction fold was the ratio of luminescence of arsenic-treated cells to that of arsenic-untreated cells.
Thank Professors Yinghua Cen and Xiangdong Fu for reading the manuscript.
YW, MX, GS, JG, and XL developed the protocol. XC, XJ, CT, and JY performed the experiments. XC, XJ, MX, and XL analyzed the data and wrote the manuscript. XC and XJ contributed equally to this work. All authors read and approved the final manuscript.
This work was supported by the High-level Leading Talent Introduction Program of GDAS (2016GDASRC-0208) and the Science and Technology Planning Project of Guangzhou City (201707020021) to XL, the National Natural Science Foundation of China (21677042) and the Natural Science Foundation of Guangdong Province (2018B0303110010) to XC, and the National Natural Science Foundation of China (91851202, 51678163) to MX.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
- 10.Wu J, Rosen BP. Metalloregulated expression of the ars operon. J Biol Chem. 1993;268(1):52–8.Google Scholar
- 11.Shi W, Wu J, Rosen BP. Identification of a putative metal binding site in a new family of metalloregulatory proteins. J Biol Chem. 1994;269(31):19826–9.Google Scholar
- 16.Jomova K, Jenisova Z, Feszterova M, Baros S, Liska J, Hudecova D, Rhodes CJ, Valko M. Arsenic: toxicity, oxidative stress and human disease. J Appl Toxicol. 2011;31(2):95–107.Google Scholar
- 18.Soangra R, Majumder B, Roy P. Whole cell arsenic biosensor - a cheap technology for bioavailable arsenic (as) determination. Eur J Adv Eng Technol. 2015;2:52–61.Google Scholar
- 29.Funabiki T, Kreider BL, Ihle JN. The carboxyl domain of zinc fingers of the Evi-1 myeloid transforming gene binds a consensus sequence of GAAGATGAG. Oncogene. 1994;9(6):1575–81.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.