Machine Learning Study of DNA Binding by Transcription Factors from the LacI Family

  • Gennady G. Fedonin
  • Mikhail S. Gelfand
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6282)


We studied 1372 LacI-family transcription factors and their 4484 DNA binding sites using machine learning algorithms and feature selection techniques. The Naive Bayes classifier and Logistic Regression were used to predict binding sites given transcription factor sequences. Prediction accuracy was estimated using 10-fold cross-validation. Experiments showed that the best prediction of nucleotide densities at selected site positions is obtained using only a few key protein sequence positions. These positions are stably selected by the forward feature selection based on the mutual information of factor-site position pairs.


transcription factors naive Bayes classifier logistic regression mutual information 


  1. 1.
    Luscombe, N.M., Laskowski, R.A., Thornton, J.M.: Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Research 29, 2860–2874 (2001)CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Baker, C.M., Grant, G.H.: Role of aromatic amino acids in protein-nucleic acid recognition. Biopolymers 85, 456–470 (2007)CrossRefPubMedGoogle Scholar
  3. 3.
    Suzuki, M., Brenner, S.E., Gerstein, M., Yagi, N.: DNA recognition code of transcription factors. Protein Engineering 8, 319–328 (1995)CrossRefPubMedGoogle Scholar
  4. 4.
    Benos, P.V., Lapedes, A.S., Stormo, G.D.: Is there a code for protein-DNA recognition? Probab(ilistical)ly. Bioessays 24, 466–475 (2002)CrossRefPubMedGoogle Scholar
  5. 5.
    Luscombe, N.M., Thornton, J.M.: Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. Journal of Molecular Biology 320, 991–1009 (2002)CrossRefPubMedGoogle Scholar
  6. 6.
    Luscombe, N.M., Austin, S.E., Berman, H.M., Thornton, J.M.: An overview of the structures of protein-DNA complexes. Genome Biology 1, REVIEWS001 (2000)Google Scholar
  7. 7.
    Sandelin, A., Wasserman, W.W.: Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. Journal of Molecular Biology 338, 207–215 (2004)CrossRefPubMedGoogle Scholar
  8. 8.
    Mahony, S., Auron, P.E., Benos, P.V.: Inferring protein-DNA dependencies using motif alignments and mutual information. Bioinformatics 23, i297–i304 (2007)CrossRefGoogle Scholar
  9. 9.
    Moscou, M.J., Bogdanove, A.J.: A simple cipher governs DNA recognition by TAL receptors. Science 326, 1501Google Scholar
  10. 10.
    Korostelev, Y., Laikova, O.N., Rakhmaninova, A.B., Gelfand, M.S.: Correlations between amino acid sequences of transcription factors and their DNA binding sites. In: Abstr. First RECOMB Satellite Conference on Bioinformatics Education, San Diego, USA (2009)Google Scholar
  11. 11.
    Kalinina, O.V., Novichkov, P.S., Mironov, A.A., Gelfand, M.S., Rakhmaninova, A.B.: SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucleic Acids Research 32, W424–W428 (2004)CrossRefGoogle Scholar
  12. 12.
    Novichkov, P.S., Laikova, O.N., Novichkova, E.S., Gelfand, M.S., Arkin, A.P., Dubchak, I., Rodionov, D.A.: Nucleic Acids Research 38, D111–D118 (2010)CrossRefGoogle Scholar
  13. 13.
    Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29, 103–137 (1997)CrossRefGoogle Scholar
  14. 14.
    Hosmer, D., Lemeshow, S.: Applied Logistic Regression, 2nd edn. Wiley, Chichester (2000)CrossRefGoogle Scholar
  15. 15.
    Peng, H.C., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1226–1238 (2005)CrossRefPubMedGoogle Scholar
  16. 16.
    Henikoff, S., Henikoff, J.G.: Amino Acid Substitution Matrices from Protein Blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Gennady G. Fedonin
    • 1
  • Mikhail S. Gelfand
    • 1
  1. 1.Institute for Information Transmission Problems (the Kharkevich Institute)RASMoscowRussia

Personalised recommendations