Improving Transcription Factor Binding Site Predictions by Using Randomised Negative Examples

Rezwan, Faisal; Sun, Yi; Davey, Neil; Adams, Rod; Rust, Alistair G.; Robinson, Mark

doi:10.1007/978-3-642-28792-3_28

Faisal Rezwan²⁰,
Yi Sun²⁰,
Neil Davey²⁰,
Rod Adams²⁰,
Alistair G. Rust²¹ &
…
Mark Robinson²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7223))

Included in the following conference series:

International Conference on Information Processing in Cells and Tissues

959 Accesses

Abstract

It is known that much of the genetic change underlying morphological evolution takes place in cis-regulatory regions, rather than in the coding regions of genes. Identifying these sites in a genome is a non-trivial problem. Experimental methods for finding binding sites exist with some limitations regarding their applicability, accuracy, availability or cost. On the other hand predicting algorithms perform rather poorly. The aim of this research is to develop and improve computational approaches for the prediction of transcription factor binding sites (TFBSs) by integrating the results of computational algorithms and other sources of complementary biological evidence, with particular emphasis on the use of the Support Vector Machine (SVM). Data from two organisms, yeast and mouse, were used in this study. The initial results were not particularly encouraging, as still giving predictions of low quality. However, when the vectors labelled as non-binding sites in the training set were replaced by randomised training vectors, a significant improvement in performance was observed. This gave substantial improvement over the yeast genome and even greater improvement for the mouse data. In fact the resulting classifier was finding over 80% of the binding sites in the test set and moreover 80% of the predictions were correct.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arnone, M.I., Davidson, E.H.: The hardwiring of development: organization and function of genomic regulatory systems. Development 124, 1851–1864 (1997)
Google Scholar
Davidson, E.H.: Genomic Regulatory Systems: Development and Evolution. Academic Press (2001)
Google Scholar
Sun, Y., Robinson, M., Adams, R., Davey, N., Rust, A.G.: Predicting Binding Sites in the Mouse Genome. In: ICMLA, pp. 476–481. IEEE Computer Society (2007)
Google Scholar
Sun, Y., Robinson, M., Adams, R., Rust, A.G., Davey, N.: Prediction of Binding Sites in the Mouse Genome Using Support Vector Machines. In: Kůrková, V., Neruda, R., Koutník, J. (eds.) ICANN 2008, Part II. LNCS, vol. 5164, pp. 91–100. Springer, Heidelberg (2008)
Chapter Google Scholar
Sun, Y., Robinson, M., Adams, R., te Boekhorst, R., Rust, A.G., Davey, N.: Integrating genomic binding site predictions using real-valued meta-classiers. Neural Comput. Appl. 18, 577–590 (2009)
Article Google Scholar
Sun, Y., Castellano, C.G., Robinson, M., Adams, R., Rust, A.G., Davey, N.: Using pre and post-processing methods to improve binding site predictions. Pattern Recogn. 42, 1949–1958 (2009)
Article MATH Google Scholar
Robinson, M., Castellano, C.G., Adams, R., Davey, N., Sun, Y.: Identifying Binding Sites in Sequential Genomic Data. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007, Part II. LNCS, vol. 4669, pp. 100–109. Springer, Heidelberg (2007)
Chapter Google Scholar
Robinson, M., Castellano, C.G., Rezwan, F., Adams, R., Davey, N., Rust, A., Sun, Y.: Combining experts in order to identify binding sites in yeast and mouse genomic data. Neural Networks 21(6), 856–861 (2008)
Article Google Scholar
Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., Makeev, V.J., Mironov, A.A., Noble, W.S., Pavesi, G., Pesole, G., Régnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137–144 (2005)
Article Google Scholar
Brown, C.T.: Computational approaches to finding and analyzing cis-regulatory elements. Methods Cell Biol. 87, 337–365 (2008)
Article Google Scholar
Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20 (1995)
Google Scholar
Zhu, J., Zhang, M.Q.: SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 15, 607–611 (1999)
Article Google Scholar
Blanco, E., Farré, D., Albà, M.M., Messeguer, X., Guigó, R.: ABS: a database of Annotated regulatory Binding Sites from orthologous promoters. Nucleic Acids Res. 34(Database issue), D63–D67 (2006)
Article Google Scholar
Montgomery, S.B., Griffith, O.L., Sleumer, M.C., Bergman, C.M., Bilenky, M., Pleasance, E.D., Prychyna, Y., Zhang, X., Jones, S.J.M.: ORegAnno: An open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics (March 2006)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. (JAIR) 16, 321–357 (2000)
Google Scholar
Radivojac, P., Chawla, N.V., Dunker, A.K., Obradovic, Z.: Classification and knowledge discovery in protein databases. J. Biomed. Inform. 37, 224–239 (2004)
Article Google Scholar
Rezwan, F., Sun, Y., Davey, N., Adams, R., Rust, A.G., Robinson, M.: Effect of Using Varying Negative Examples in Transcription Factor Binding Site Predictions. In: Giacobini, M. (ed.) EvoBIO 2011. LNCS, vol. 6623, pp. 1–12. Springer, Heidelberg (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Hertfordshire, College Lane, Hatfield, Hertfordshire, AL10 9AB, UK
Faisal Rezwan, Yi Sun, Neil Davey & Rod Adams
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
Alistair G. Rust
Benaroya Research Institute at Virginia Mason, 1201 9th Avenue Seattle, WA, 98101, USA
Mark Robinson

Authors

Faisal Rezwan
View author publications
You can also search for this author in PubMed Google Scholar
Yi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Neil Davey
View author publications
You can also search for this author in PubMed Google Scholar
Rod Adams
View author publications
You can also search for this author in PubMed Google Scholar
Alistair G. Rust
View author publications
You can also search for this author in PubMed Google Scholar
Mark Robinson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronics, University of York, YO10 5DD, York, UK
Michael A. Lones & Stephen L. Smith &
MRC Laboratory of Molecular Biology, Hills Road, CB2 0QH, Cambridge, UK
Sarah Teichmann
The Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
Felix Naef
Department of Electronics, University of York, YO10 5DD, York, UK
James A. Walker & Martin A. Trefzer &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rezwan, F., Sun, Y., Davey, N., Adams, R., Rust, A.G., Robinson, M. (2012). Improving Transcription Factor Binding Site Predictions by Using Randomised Negative Examples. In: Lones, M.A., Smith, S.L., Teichmann, S., Naef, F., Walker, J.A., Trefzer, M.A. (eds) Information Processign in Cells and Tissues. IPCAT 2012. Lecture Notes in Computer Science, vol 7223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28792-3_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-28792-3_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28791-6
Online ISBN: 978-3-642-28792-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics