Using Varying Negative Examples to Improve Computational Predictions of Transcription Factor Binding Sites

  • Faisal Rezwan
  • Yi Sun
  • Neil Davey
  • Rod Adams
  • Alistair G. Rust
  • Mark Robinson
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 311)


The identification of transcription factor binding sites (TFBSs ) is a non-trivial problem as the existing computational predictors produce a lot of false predictions. Though it is proven that combining these predictions with a meta-classifier, like Support Vector Machines (SVMs), can improve the overall results, this improvement is not as significant as expected. The reason for this is that the predictors are not reliable for the negative examples from non-binding sites in the promoter region. Therefore, using negative examples from different sources during training an SVM can be one of the solutions to this problem. In this study, we used different types of negative examples during training the classifier. These negative examples can be far away from the promoter regions or produced by randomisation or from the intronic region of genes. By using these negative examples during training, we observed their effect in improving predictions of TFBSs in the yeast. We also used a modified cross-validation method for this type of problem. Thus we observed substantial improvement in the classifier performance that could constitute a model for predicting TFBSs. Therefore, the major contribution of the analysis is that for the yeast genome, the position of binding sites could be predicted with high confidence using our technique and the predictions are of much higher quality than the predictions of the original prediction algorithms.


Support Vector Machine Transcription Factor Binding Site False Prediction Wellcome Trust Sanger Institute Binding Site Prediction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., Makeev, V.J., Mironov, A.A., Noble, W.S., Pavesi, G., Pesole, G., Régnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137–144 (2005)CrossRefGoogle Scholar
  2. 2.
    Elnitski, L., Jin, V.X., Farnham, P.J., Jones, S.J.: Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res. 16, 1455–1464 (2006)CrossRefGoogle Scholar
  3. 3.
    Pavesi, G., Mauri, G., Pesole, G.: In silico representation and discovery of transcription factor binding sites. Brief. Bioinformatics 5, 217–236 (2004)CrossRefGoogle Scholar
  4. 4.
    Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res. 33, 4899–4913 (2005)CrossRefGoogle Scholar
  5. 5.
    Brown, C.T.: Computational approaches to finding and analyzing cis-regulatory elements. Methods Cell Biol. 87, 337–365 (2008)CrossRefGoogle Scholar
  6. 6.
    Sun, Y., Robinson, M., Adams, R., Rust, A.G., Davey, N.: Using Pre and Posting-processing Methods to Improve Binding Site Predictions. Pattern Recognition 42(9), 1949–1958 (2009)zbMATHCrossRefGoogle Scholar
  7. 7.
    Robinson, M., Castellano, C.G., Rezwan, F., Adams, R., Davey, N., Rust, A.G., Sun, Y.: Combining experts in order to identify binding sites in yeast and mouse genomic data. Neural Networks 21(6), 856–861 (2008)zbMATHCrossRefGoogle Scholar
  8. 8.
    Cherry, J.M., Hong, E.L., Amundsen, C., Balakrishnan, R., Binkley, G., Chan, E.T., Christie, K.R., Costanzo, M.C., Dwight, S.S., Engel, S.R., Fisk, D.G., Hirschman, J.E., Hitz, B.C., Karra, K., Krieger, C.J., Miyasato, S.R., Nash, R.S., Park, J., Skrzypek, M.S., Simison, M., Weng, S., Wong, E.D.: Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40(Database issue), D700–D705 (2012)Google Scholar
  9. 9.
    Montgomery, S.B., Griffith, O.L., Sleumer, M.C., Bergman, C.M., Bilenky, M., Pleasance, E.D., Prychyna, Y., Zhang, X., Jones, S.J.M.: ORegAnno: An open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics (March 2006)Google Scholar
  10. 10.
    MacIsaac, K.D., Wang, T., Gordon, D.B., Gifford, D.K., Stormo, G., Fraenkel, E.: An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7, 113 (2006)CrossRefGoogle Scholar
  11. 11.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeye, W.P.: SMOTE: Synthetic minority over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)zbMATHGoogle Scholar
  12. 12.
    Rezwan, F., Sun, Y., Davey, N., Adams, R., Rust, A.G., Robinson, M.: Effect of Using Varying Negative Examples in Transcription Factor Binding Site Predictions. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds.) EvoBIO 2011. LNCS, vol. 6623, pp. 1–12. Springer, Heidelberg (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Faisal Rezwan
    • 1
  • Yi Sun
    • 1
  • Neil Davey
    • 1
  • Rod Adams
    • 1
  • Alistair G. Rust
    • 2
  • Mark Robinson
    • 3
  1. 1.School of Computer ScienceUniversity of HertfordshireHatfieldUK
  2. 2.Wellcome Trust Sanger InstituteHinxtonUK
  3. 3.Benaroya Research Institute at Virginia MasonSeattleUSA

Personalised recommendations