Computational prediction of membrane-tethered transcription factors
- 12k Downloads
Sequestration of transcription factors in the membrane is emerging as an important mechanism for the regulation of gene expression. A handful of membrane-spanning transcription factors has been previously identified whose access to the nucleus is regulated by proteolytic cleavage from the membrane. To investigate the existence of other transmembrane transcription factors, we analyzed computationally all proteins in SWISS-PROT/TrEMBL for the combined presence of a DNA-binding domain and a transmembrane segment.
Using Pfam hidden Markov models and four transmembrane-prediction programs, we identified with high confidence 76 membrane-spanning transcription factors in SWISS-PROT/TrEMBL. Analysis of the distribution of two proteins predicted by our method, MTJ1 and DMRT2, confirmed their localization to intracellular membrane compartments. Furthermore, elimination of the predicted transmembrane segment led to nuclear localization for each of these proteins.
Our analysis uncovered a wealth of predicted membrane-spanning transcription factors that are structurally and taxonomically diverse, 56 of which lack experimental annotation. Seventy-five of the proteins are modular in structure, suggesting that a single proteolysis may be sufficient to liberate a DNA-binding domain from the membrane. This study provides grounds for investigations into the stimuli and mechanisms that release this intriguing class of transcription factors from membranes.
KeywordsHide Markov Model Transmembrane Helix Transmembrane Segment Bipartite Nuclear Localization Signal Regulate Intramembrane Proteolysis
A critical step in regulating many transcriptional responses is the import of transcription factors from the cytosol to the nucleus. Many transcription factors are held outside the nucleus in a complex with cytosolic proteins or with membrane receptors, and translocate to the nucleus in response to various stimuli . Alternatively, transcription factors may be inserted directly into the membrane, thereby preventing their access to the nucleus. A handful of such proteins has been shown to be released from membranes by a process known as regulated intramembrane proteolysis (RIP) . This process is best understood for SREBP-1 and SREBP-2, two basic leucine zipper (bZIP) transcription factors that normally reside in the membrane of the endoplasmic reticulum and Golgi apparatus. When cellular sterol levels dip, SREBPs are liberated from the membrane in a two-step mechanism involving the action of Site-1 protease, a site-specific protease that cleaves the protein within the Golgi lumen, followed by Site-2 protease, an integral membrane protease, that cleaves a membrane-spanning helix. Once liberated from the membrane, transport to the nucleus enables these transcription factors to initiate expression of genes involved in cholesterol uptake and biosynthesis .
Several more examples of membrane-tethered transcriptional regulators have recently been identified by biochemical means, notably ATF6 , G13 , CadC , ToxR , Lzip (Luman ), Notch  and SPT23 . All appear to undergo proteolytic cleavages to release a fragment that is targeted to DNA or the nucleus, but may use different proteases. For example, ATF6 uses the same proteolytic machinery, as do SREBPs , whereas Notch is cleaved by different proteases . Tumor necrosis factor (TNFα)-converting enzyme catalyzes the cleavage of the extracellular domain of Notch, followed by presenilin/gamma-secretase-like activity to liberate the intracellular fragment . Thus, the release of some membrane-bound nuclear proteins involves regulated cleavages in the lumenal or extracellular space, followed by a cleavage by an integral membrane protease to release an active fragment.
Using conventional biochemistry, the identification of transmembrane transcription factors (TMTFs) can be easily overlooked. For example, transcription factors are generally assumed to be soluble proteins and, consequently, membrane fractions are often discarded during purification. Moreover, the nuclear form of the protein may be rapidly degraded and thus difficult to detect, as is the case for SREBPs . Lastly, the subcellular distributions of transcription factors are often not examined. Cell-fractionation studies of other transcription factors show smaller-molecular-weight forms of these proteins enriched in the nucleus, suggestive of a cleavage event [14,15]. We thus investigated the prevalence of transmembrane transcription factors using computational tools to search for membrane-spanning proteins that contain conserved DNA-binding domains.
Results and discussion
Computational analysis of protein databases reveals a large number of predicted transmembrane transcription factors
Our analysis predicted a surprisingly large and diverse set of membrane-tethered DNA-binding proteins. Seventeen of the 53 DNA-binding domains chosen for this analysis were represented in the final set of TMTFs. Of these, the most abundant is the zf-C4 (zinc-finger type C4) nuclear hormone receptor DNA-binding domain, found in 14 proteins in Caenorhabditis elegans and avian erythroblastosis virus. TMTFs in Arabidopsis were the most diverse, and were associated with eight different DNA-binding domains. All but two proteins have DNA-binding domains that could be separated from the rest of the protein by a single hypothetical cleavage event, if singly predicted transmembrane segments are discounted (Figure 1). DNA-binding domains were also frequently juxtaposed to bipartite nuclear localization signals, suggesting that transmembrane and DNA-binding domains in TMTFs are modular. Thus, the overall topology of these proteins is consistent with other known TMTFs. C. elegans has an impressive 25 predicted TMTFs, suggesting that RIP may be particularly important in the regulation of transcriptional responses in the worm. Interestingly, 56 of the 76 identified proteins lack any experimental annotation.
We deliberately used a stringent method to increase the likelihood of identifying only bona fide TMTFs and, as expected, most experimentally known TMTFs were detected by our analysis, including CadC , Lzip , ToxR  and all SWISS-PROT/TrEMBL orthologs of SREBP-l and SREBP-2. Also found were several well-characterized proteins whose predicted membrane insertion had not been recognized. For example, the human doublesex-related protein DMRT2, Drosophila B-H2 (BarH2) protein, C. elegans UNC-86, and mouse OASIS protein are predicted TMTFs. Two known TMTFs, ATF6 and SPT23, did not satisfy our minimum criteria. The transmembrane helix of ATF6 was predicted by only two programs: PSORT and HMMTOP. The immunoglobulin DNA-binding domain (TIG) of SPT23 is found in both cell-surface proteins as well as transcription factors and was therefore excluded from the set of DNA-binding domains. These results indicate that reducing the stringency of our prediction method will expand the number of predicted TMTFs.
TMTFs translocate to the nucleus on deletion of the predicted transmembrane helix
DMRT2, a human homolog of C. elegans mab-3, was identified in our analysis as having a carboxy-terminal transmembrane segment (Figure 1). mab-3 encodes a transcription factor known for its role in sex determination in worms . DMRT2 has gained recent attention as a candidate gene for sex-reversal phenotypes in humans . To verify our prediction that DMRT2 is a membrane-tethered transcription factor, we examined the subcellular localization of full-length and truncated forms of DMRT2 in COS-7 cells (Figure 2b). Full-length DMRT2 is localized primarily, but not exclusively, to vesicles outside the nucleus. A carboxy-terminal truncation containing the DNA-binding domain is, however, concentrated almost entirely in the nucleus. These results are consistent with the idea that DMRT2 is cleaved from the membrane to produce a nuclear fragment. Interestingly, transformer protein TRA-2A, an indirect activator of MAB-3, has been identified recently as a membrane-tethered nuclear protein [24,25]. Thus, RIP maybe a conserved mechanism common to sex determination in humans and worms.
We have used computational methods to investigate the prevalence of membrane-tethered transcription factors. The identification of 76 predicted TMTFs by our method, and the supporting cell biology, indicate that membrane-tethering may be a common mechanism for regulating transcriptional responses. As stringent criteria were used to identify transmembrane segments and DNA-binding domains, we believe that the actual number of TMTFs is likely to be much larger. Compared to other signal transduction mechanisms, tethering transcription factors in the membrane provides an expeditious route to the nucleus in response to stimuli that must be communicated across a membrane. Our understanding of this process will be enhanced as more TMTFs are studied and the signals for membrane cleavage and their proteases are discovered.
Materials and methods
Pfam  hidden Markov models for 53 DNA-binding domains (see DNA-binding domains below) were used to search proteins in SWISS-PROT/TrEMBL (October 2000 release; 388,909 proteins) with p-value < 0.0019 (0.01/53). SwissPfam proteins identified as having any of the 53 domains were also included in our analysis. The resulting 9,261 proteins were then analyzed for the presence of transmembrane helices. Default parameters were used for HMMTOP , PHDhtm [l8], and TMHMM (version 2 ). A higher stringency (-5.0) than default was used for PSORT II (ALOM2 ). Transmembrane segments predicted by individual programs were considered overlapping if ten or more amino acids were shared by each segment. Proteins containing transmembrane helices predicted by at least three of the four programs were included in the final set. Bipartite nuclear localization signals were identified using PSORT II. Three predicted TMTFs were discounted as false-positives on the basis of partial or complete overlap of transmembrane helices with other Pfam domains (O01612, O23045 and Q13771).
The following Pfam models for DNA-binding domains were used (abbreviated as in Pfam): 7 kDa DNA-binding; AP2-domain; ARID; ASNC trans reg; AT hook; Arg represser; B3; BAH; BRO; Bac DNA-binding; basic; bZIP; CBFB NFYA; CSD; CUT; copper-fist; DM-domain; E2F TDP; fork head; GATA; HALZ; HLH; homeobox; HSF DNA-binding; HTH 3; HTH 4; HTH 5; IRF; LexA DNA-binding; MBD; MetJ; Myb DNA-binding; MutS N; Myc-LZ; PHD; RFX DNA-binding; RHD; Runt; SAP; sigma70; SRF-TF; STAT; sigma54 factors; sigma70 ECF; T-box; TBP; yeast DNA-binding; Trans reg C; zf (zinc finger)-C2H2; zf-C2HC; zf-C4; zf-NF-X1; Zn-clus.
Full-length DMRT2 and MTJ1Δ were generated by PCR using Pfu polymerase (Stratagene) and cloned directionally into BamHI/XbaI sites of pCDNA3 (Invitrogen). Truncated MTJ1, in which an ATG (methionine) was added immediately before amino acid 171 (Q61712), was amplified from expressed sequence tag (EST) AI790297 (Incyte Genomics) and a Myc tag was added at the carboxyl terminus. MTJ1Δ -forward primer: 5'-CGCGGATCCGCGATGGAAAAGCAACTGGATGAACTG-3'. MTJ1Δ -reverse primer: 5'-GCTCTAGAGCTACAGGTCCTCCTCCGAGATGAGTTTCTGTTCCATGCTTTTAGCCTGCTTTTTCTT-3'. The ATG in bold indicates the translation start site of truncated MTJ1. Full-length MTJ1 was prepared by digesting clone AI790297 with XhoI, blunting ends, then digesting with EcoRI. This fragment was then cloned into pcDNA3-MTJ1Δ, which was digested with BamHI, blunt-ended, and digested with EcoRI. Full-length and truncated DMRT2 (at amino acid 180; Q9Y5R5) were amplified from EST AI985131 (Incyte), and a Myc tag was added at the amino terminus. DMRT2-forward primer: 5'-CGCGGATCCGCGATGGAACAGAAACTCATCTCGGAGGAGGACCTGATGGCCGACCCGCAGG-3'. DMRT2-reverse primer: 5'-GCTCTAGAGCTAAAGATGGTTCATTATGTAC-3'. DMRT2Δ -reverse primer: 5'-GCTCTAGAGTCAGGCTCTGACTTGCCTCTG-3'.
Cell culture and immunocytochemistry
Standard DEAE transfections  of plasmids were done in COS-7 cells (ATCC) and grown in 10% FBS/DMEM. Cells were fixed 72 h post-transfection in 3% PFA in PBS and Myc tags were detected with mouse anti-Myc antibodies (NeoMarkers, Fremont, CA) and Texas-Red-X goat anti-mouse antibodies (Molecular Probes, Eugene, OR) using standard procedures. Nuclei were counterstained with Hoechst 33258. Photomicrographs were taken on a Zeiss Axiophot.
We thank J. Rine and O. Kelly for critical comments on the manuscript, and D. He for assembling overlapping domains. This work was supported by the NIH (S.E.B. and W.C.S.). S.E.B. and W.C.S. are Searle Scholars.
- 4.Haze K, Okada T, Yoshida H, Yanagi H, Yura T, Negishi M, Mori K: Identification of the G13 (cAMP-response-element-binding protein-related protein) gene product related to activating transcription factor 6 as a transcriptional activator of the mammalian unfolded protein response. Biochem J. 2001, 355: 19-28. 10.1042/0264-6021:3550019.PubMedPubMedCentralCrossRefGoogle Scholar
- 23.Raymond CS, Parker ED, Kettlewell JR, Brown LG, Page DC, Kusz K, Jaruzelska J, Reinberg Y, Flejter WL, Bardwell VJ, et al: A region of human chromosome 9p required for testis development contains two genes related to known sexual regulators. Hum Mol Genet. 1999, 8: 989-996. 10.1093/hmg/8.6.989.PubMedCrossRefGoogle Scholar
- 26.Ausubel FM: Current Protocols in Molecular Biology. New York: Greene Publishing. Associates/Wiley-Interscience;. 1988Google Scholar