Skip to main content

Boosting Binding Sites Prediction Using Gene’s Positions

  • Conference paper
Algorithms in Bioinformatics (WABI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6833))

Included in the following conference series:

Abstract

Understanding transcriptional regulation requires a reliable identification of the DNA binding sites that are recognized by each transcription factor (TF). Building an accurate bioinformatic model of TF-DNA binding is an essential step to differentiate true binding targets from spurious ones. Conventional approches of binding site prediction are based on the notion of consensus sequences. They are formalized by the so-called position-specific weight matrices (PWM) and rely on the statistical analysis of DNA sequence of known binding sites. To improve these techniques, we propose to use genome organization knowledge about the optimal positioning of co-regulated genes along the whole chromosome. For this purpose, we use learning machine approaches to optimally combine sequence information with positioning information. We present a new learning algorithm called PreCisIon, which relies on a TF binding classifier that optimally combines a set of PWMs and chrommosal position based classifiers. This non-linear binding decision rule drastically reduces the rate of false positives so that PreCisIon consistently outperforms sequence-based methods. This is shown by implementing a cross-validation analysis in two model organisms: Escherichia coli and Bacillus Subtilis. The analysis is based on the identification of binding sites for 24 TFs; PreCisIon achieved on average an AUC (aera under the curve) of 70% and 60%, a sensitivity of 80% and 70%, and a specificity of 60% and 56% for B. subtilis and E. coli, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bauer, A.L., Hlavacek, W.S., Unkefer, P.J., Mu, F.: Using sequence-specific chemical and structural properties of dna to predict transcription factor binding sites. PLoS Comput. Biol. 6 (2010)

    Google Scholar 

  2. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Conf. Computational Learning Theory, pp. 92–100. ACM, New York (1998)

    Google Scholar 

  3. Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)

    MATH  Google Scholar 

  4. Carpentier, A.S., Torresani, B., Grossmann, A., Henaut, A.: Decoding the nucleoid organisation of Bacillus subtilis and Escherichia coli through gene expression data. BMC Genomics 6, 84 (2005)

    Article  Google Scholar 

  5. Cook, P.R.: Predicting three-dimensional genome structure from transcriptional activity. Nat. Genet. 32 (2002)

    Google Scholar 

  6. Elati, M., Neuvial, P., Bolotin-Fukuhara, M., Barillot, E., Radvanyi, F., Rouveirol, C.: Licorn: learning cooperative regulation networks from gene expression data. Bioinformatics 23, 2407–2414 (2007)

    Article  Google Scholar 

  7. Fraser, P., Bickmore, W.: Nuclear organization of the genome and the potential for gene regulation. Nature 447, 413–417 (2007)

    Article  Google Scholar 

  8. Gama-Castro, S.: Regulondb (version 6.0): gene regulation model of escherichia coli k-12 beyond transcription, active (experimental) annotated promoters and textpresso navigation. Nucleic Acids Res. 36, D120–D124 (2008)

    Article  Google Scholar 

  9. van Hijum, S.A.F.T., Medema, M.H., Kuipers, O.P.: Mechanisms and Evolution of Control Logic in Prokaryotic Transcriptional Regulation. Microbiol. Mol. Biol. Rev. 73, 481–509 (2009)

    Article  Google Scholar 

  10. Hong, C.S.: Optimal threshold from roc and cap curves. Communications in Statistics 38, 2060–2072 (2009)

    Article  MATH  Google Scholar 

  11. Huang, D.W., Sherman, B.T., Lempicki, R.A.: Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nat. Protocols 4, 44–57 (2008)

    Article  Google Scholar 

  12. Junier, I., Herisson, J., Képès, F.: Periodic pattern detection in sparse boolean sequences. Algorithms for Molecular Biology 5, 31 (2010)

    Article  Google Scholar 

  13. Junier, I., Martin, O., Képès, F.: Spatial and topological organization of dna chains induced by gene co-localization. PLoS Comput. Biol. 6 (2010)

    Google Scholar 

  14. Képès, F.: Periodic transcriptional organization of the e.coli genome. J. Mol. Biol. 340, 957–964 (2004)

    Article  Google Scholar 

  15. Képès, F., Vaillant, C.: Transcription-based solenoidal model of chromosomes. ComPlexUs 1, 171–180 (2003)

    Article  Google Scholar 

  16. Kolesov, G., Wunderlich, Z., Laikova, O.N., Gelfand, M.S., Mirny, L.A.: How gene order is influenced by the biophysics of transcription regulation. Proc. Natl. Acad. Sci. USA 104, 13948 (2007)

    Article  Google Scholar 

  17. Lam, L., Suen, C.Y.: Optimal combinations of pattern classifiers. Pattern Recogn. Lett. 16, 945–954 (1995)

    Article  Google Scholar 

  18. Müller-Hill, B.: The function of auxiliary operators. Molecular Microbiology 29, 13–18 (1998)

    Article  Google Scholar 

  19. Pennacchio, L., Rubin, E.: Genomic strategies to identify mammalian regulatory sequences. Nat. Rev. Genet. 2, 100–109 (2001)

    Article  Google Scholar 

  20. Sandve, G., Drablos, F.: A survey of motif discovery methods in an integrated framework. Biology Direct 1, 11 (2006)

    Article  Google Scholar 

  21. Schapire, R.E.: A brief introduction to boosting. In: IJCAI 1999: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 1401–1406. Morgan Kaufmann Publishers Inc., San Francisco (1999)

    Google Scholar 

  22. Schoenfelder, S., Sexton, T., Chakalova, L., Cope, N.F., Horton, A., Andrews, S., Kurukuti, S., Mitchell, J.A., Umlauf, D., Dimitrova, D.S., Eskiw, C.H., Luo, Y., Wei, C.L., Ruan, Y., Bieker, J.J., Fraser, P.: Preferential associations between co-regulated genes reveal a transcriptional interactome in erythroid cells. Nature Genetics, 53–61 (2010)

    Google Scholar 

  23. Sierro, N., Makita, Y., de Hoon, M., Nakai, K.: Dbtbs: a database of transcriptional regulation in bacillus subtilis containing upstream intergenic conservation information. Nucleic Acids Res. 36, 93–96 (2008)

    Article  Google Scholar 

  24. Spilianakis, C.G., Lalioti, M.D., Town, T., Lee, G.R., Flavell, R.A.: Interchromosomal associations between alternatively expressed loci. Nature 435, 637–645 (2005)

    Article  Google Scholar 

  25. Stormo, G.: Dna binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000)

    Article  Google Scholar 

  26. Stormo, G.D.: Determining the specificity of protein-DNA interactions. Nat. Rev. Genet. 11, 751–760 (2010)

    Google Scholar 

  27. Thijs, G., Lescot, M., Marchal, K., Rombauts, S., Moor, B.D., Rouzé, P., Moreau, Y.: A higher-order background model improves the detection of promoter regulatory elements by gibbs sampling. Bioinformatics 17, 1113–1122 (2001)

    Article  Google Scholar 

  28. Thomas-Chollier, M., Sand, O., Turatsinze, J.V., Janky, R., Defrance, M., Vervisch, E., Brohée, S., van Helden, J.: Rsat: regulatory sequence analysis tools. Nucleic Acids Res. 36, 119–127 (2008)

    Article  Google Scholar 

  29. Thompson, W., Rouchka, E.C., Lawrence, C.E.: Gibbs recursive sampler: finding transcription factor binding sites. Nucleic Acids Res. 31, 3580–3585 (2003)

    Article  Google Scholar 

  30. Vilar, J.M.G., Leibler, S.: DNA looping and physical constraints on transcription regulation. J. Mol. Biol. 331 (2003)

    Google Scholar 

  31. Wright, M., Kharchenko, P., Church, G., Segrè, D.: Chromosomal periodicity of evolutionarily conserved gene pairs. PNAS 104 (2007)

    Google Scholar 

  32. Xu, M., Cook, P.R.: Similar active genes cluster in specialized transcription factories. J. Cell. Biol. 181, 615–623 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Elati, M., Fekih, R., Nicolle, R., Junier, I., Hérisson, J., Képès, F. (2011). Boosting Binding Sites Prediction Using Gene’s Positions. In: Przytycka, T.M., Sagot, MF. (eds) Algorithms in Bioinformatics. WABI 2011. Lecture Notes in Computer Science(), vol 6833. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23038-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23038-7_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23037-0

  • Online ISBN: 978-3-642-23038-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics