Abstract
Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structural information to infer context-specific amino acid-nucleotide recognition preferences. These are used to predict binding sites for novel transcription factors from the same structural family. We apply our approach to the Cys2His2 Zinc Finger protein family, and show that the learned DNA-recognition preferences are compatible with various experimental results. To demonstrate the potential of our algorithm, we use the learned preferences to predict binding site models for novel proteins from the same family. These models are then used in genomic scans to find putative binding sites of the novel proteins.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barash, Y., et al.: Modeling dependencies in Protein-DNA binding sites. In: Proc. of the 7th International Conf. on Research in Computational Molecular Biology, pp. 28–37 (2003)
Barash, Y., et al.: CIS: Compound Importance Sampling method for protein-DNA binding site p-value estimation. Bioinformatics (2004)
Benos, P.V., Bulyk, M.L., Stormo, G.D.: Additivity in protein-DNA interactions: how good an approximation is it? Nucleic Acids Res. 30, 4442–4451 (2002)
Benos, P.V., Lapedes, A.S., Stormo, G.D.: Probabilistic code for DNA recognition by proteins of the EGR family. J. Mol. Biol. 323, 701–727 (2002)
Berg, J.M.: Sp1 and the subfamily of zinc finger proteins with guanine-rich binding sites. Proc. Natl. Acad. Sci. USA 89, 11109–11110 (1992)
Bulyk, M.L., et al.: Exploring the DNA-binding specificities of zinc fingers with DNA microarrays. Proc. Natl. Acad. Sci. USA 98, 7158–7163 (2001)
Bulyk, M.L., Johnson, P.L.F., Church, G.M.: Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res. 30, 1255–1261 (2002)
Cawley, S., et al.: Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499–509 (2004)
Choo, Y., Klug, A.: Selection of DNA binding sites for zinc fingers using rationally randomized DNA reveals coded interactions. Proc. Natl. Acad. Sci. USA 91, 11168–11172 (1994)
Choo, Y., Klug, A.: Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage. Proc. Natl. Acad. Sci. USA 91, 11163–11167 (1994)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood form incomplete data via the EM algorithm. J. Royal Stat. Soc. B. 39, 1–38 (1977)
Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14, 755–763 (1998)
Elrod-Erickson, M., Benson, T.E., Pabo, C.O.: High-resolution structures of variant Zif268-DNA complexes: implications for understanding zinc finger-DNA recognition. Structure 6, 451–464 (1998)
Kono, H., Sarai, A.: Structure-based prediction of DNA target sites by regulatory proteins. Proteins 35, 114–131 (1999)
Kriwacki, R.W., et al.: Sequence-specific recognition of DNA by zinc-finger peptides derived from the transcription factor Sp1. Proc. Natl. Acad. Sci. USA 89, 9759–9763 (1992)
Luscombe, N.M., Laskowski, R.A., Thornton, J.M.: Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res. 29, 2860–2874 (2001)
Mandel-Gutfreund, Y., Baron, A., Margalit, H.: A structure-based approach for prediction of protein binding sites in gene upstream regions. In: Proc. of the Pac. Symp. Biocomput., pp. 139–150 (2001)
Mandel-Gutfreund, Y., Schueler, O., Margalit, H.: Comprehensive analysis of hy- drogen bonds in regulatory protein DNA-complexes: in search of common principles. J. Mol. Biol. 253, 370–382 (1995)
Mandel-Gutfreund, Y., Margalit, H.: Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. Nucleic Acids Res. 26, 2306–2312 (1998)
Pavletich, N.P., Pabo, C.O.: Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 Å. Science 252, 809–817 (1991)
Robison, K., McGuire, A.M., Church, G.M.: A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. J. Mol. Biol. 284, 241–254 (1998)
Shultzaberger, R.K., Schneider, T.D.: Using sequence logos and information analysis of Lrp DNA binding sites to investigate discrepancies between natural selection and SELEX. Nucleic Acids Res. 27, 882–887 (1999)
Steffen, N.R., et al.: DNA sequence and structure: direct and indirect recognition in protein-DNA binding. Bioinformatics 18(suppl. 1), S22–S30 (2002)
Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16(1), 16–23 (2000)
Suzuki, M., Gerstein, M., Yagi, N.: Stereochemical basis of DNA recognition by Zn fingers. Nucleic Acids Res. 22, 3397–3405 (1994)
Tupler, R., Perini, G., Green, M.R.: Expressing the human genome. Nature 409(6822), 832–833 (2001)
Wingender, E., et al.: The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29, 281–283 (2001)
Wolfe, S.A., et al.: Analysis of zinc fingers optimized via phage display: evaluating the utility of a recognition code. J. Mol. Biol. 285, 1917–1934 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kaplan, T., Friedman, N., Margalit, H. (2005). Predicting Transcription Factor Binding Sites Using Structural Knowledge. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2005. Lecture Notes in Computer Science(), vol 3500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415770_40
Download citation
DOI: https://doi.org/10.1007/11415770_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25866-7
Online ISBN: 978-3-540-31950-4
eBook Packages: Computer ScienceComputer Science (R0)