Abstract
Understanding transcriptional regulation requires a reliable identification of the DNA binding sites that are recognized by each transcription factor (TF). Building an accurate bioinformatic model of TF-DNA binding is an essential step to differentiate true binding targets from spurious ones. Conventional approches of binding site prediction are based on the notion of consensus sequences. They are formalized by the so-called position-specific weight matrices (PWM) and rely on the statistical analysis of DNA sequence of known binding sites. To improve these techniques, we propose to use genome organization knowledge about the optimal positioning of co-regulated genes along the whole chromosome. For this purpose, we use learning machine approaches to optimally combine sequence information with positioning information. We present a new learning algorithm called PreCisIon, which relies on a TF binding classifier that optimally combines a set of PWMs and chrommosal position based classifiers. This non-linear binding decision rule drastically reduces the rate of false positives so that PreCisIon consistently outperforms sequence-based methods. This is shown by implementing a cross-validation analysis in two model organisms: Escherichia coli and Bacillus Subtilis. The analysis is based on the identification of binding sites for 24 TFs; PreCisIon achieved on average an AUC (aera under the curve) of 70% and 60%, a sensitivity of 80% and 70%, and a specificity of 60% and 56% for B. subtilis and E. coli, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bauer, A.L., Hlavacek, W.S., Unkefer, P.J., Mu, F.: Using sequence-specific chemical and structural properties of dna to predict transcription factor binding sites. PLoS Comput. Biol. 6 (2010)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Conf. Computational Learning Theory, pp. 92–100. ACM, New York (1998)
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
Carpentier, A.S., Torresani, B., Grossmann, A., Henaut, A.: Decoding the nucleoid organisation of Bacillus subtilis and Escherichia coli through gene expression data. BMC Genomics 6, 84 (2005)
Cook, P.R.: Predicting three-dimensional genome structure from transcriptional activity. Nat. Genet. 32 (2002)
Elati, M., Neuvial, P., Bolotin-Fukuhara, M., Barillot, E., Radvanyi, F., Rouveirol, C.: Licorn: learning cooperative regulation networks from gene expression data. Bioinformatics 23, 2407–2414 (2007)
Fraser, P., Bickmore, W.: Nuclear organization of the genome and the potential for gene regulation. Nature 447, 413–417 (2007)
Gama-Castro, S.: Regulondb (version 6.0): gene regulation model of escherichia coli k-12 beyond transcription, active (experimental) annotated promoters and textpresso navigation. Nucleic Acids Res. 36, D120–D124 (2008)
van Hijum, S.A.F.T., Medema, M.H., Kuipers, O.P.: Mechanisms and Evolution of Control Logic in Prokaryotic Transcriptional Regulation. Microbiol. Mol. Biol. Rev. 73, 481–509 (2009)
Hong, C.S.: Optimal threshold from roc and cap curves. Communications in Statistics 38, 2060–2072 (2009)
Huang, D.W., Sherman, B.T., Lempicki, R.A.: Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nat. Protocols 4, 44–57 (2008)
Junier, I., Herisson, J., Képès, F.: Periodic pattern detection in sparse boolean sequences. Algorithms for Molecular Biology 5, 31 (2010)
Junier, I., Martin, O., Képès, F.: Spatial and topological organization of dna chains induced by gene co-localization. PLoS Comput. Biol. 6 (2010)
Képès, F.: Periodic transcriptional organization of the e.coli genome. J. Mol. Biol. 340, 957–964 (2004)
Képès, F., Vaillant, C.: Transcription-based solenoidal model of chromosomes. ComPlexUs 1, 171–180 (2003)
Kolesov, G., Wunderlich, Z., Laikova, O.N., Gelfand, M.S., Mirny, L.A.: How gene order is influenced by the biophysics of transcription regulation. Proc. Natl. Acad. Sci. USA 104, 13948 (2007)
Lam, L., Suen, C.Y.: Optimal combinations of pattern classifiers. Pattern Recogn. Lett. 16, 945–954 (1995)
Müller-Hill, B.: The function of auxiliary operators. Molecular Microbiology 29, 13–18 (1998)
Pennacchio, L., Rubin, E.: Genomic strategies to identify mammalian regulatory sequences. Nat. Rev. Genet. 2, 100–109 (2001)
Sandve, G., Drablos, F.: A survey of motif discovery methods in an integrated framework. Biology Direct 1, 11 (2006)
Schapire, R.E.: A brief introduction to boosting. In: IJCAI 1999: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 1401–1406. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Schoenfelder, S., Sexton, T., Chakalova, L., Cope, N.F., Horton, A., Andrews, S., Kurukuti, S., Mitchell, J.A., Umlauf, D., Dimitrova, D.S., Eskiw, C.H., Luo, Y., Wei, C.L., Ruan, Y., Bieker, J.J., Fraser, P.: Preferential associations between co-regulated genes reveal a transcriptional interactome in erythroid cells. Nature Genetics, 53–61 (2010)
Sierro, N., Makita, Y., de Hoon, M., Nakai, K.: Dbtbs: a database of transcriptional regulation in bacillus subtilis containing upstream intergenic conservation information. Nucleic Acids Res. 36, 93–96 (2008)
Spilianakis, C.G., Lalioti, M.D., Town, T., Lee, G.R., Flavell, R.A.: Interchromosomal associations between alternatively expressed loci. Nature 435, 637–645 (2005)
Stormo, G.: Dna binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000)
Stormo, G.D.: Determining the specificity of protein-DNA interactions. Nat. Rev. Genet. 11, 751–760 (2010)
Thijs, G., Lescot, M., Marchal, K., Rombauts, S., Moor, B.D., Rouzé, P., Moreau, Y.: A higher-order background model improves the detection of promoter regulatory elements by gibbs sampling. Bioinformatics 17, 1113–1122 (2001)
Thomas-Chollier, M., Sand, O., Turatsinze, J.V., Janky, R., Defrance, M., Vervisch, E., Brohée, S., van Helden, J.: Rsat: regulatory sequence analysis tools. Nucleic Acids Res. 36, 119–127 (2008)
Thompson, W., Rouchka, E.C., Lawrence, C.E.: Gibbs recursive sampler: finding transcription factor binding sites. Nucleic Acids Res. 31, 3580–3585 (2003)
Vilar, J.M.G., Leibler, S.: DNA looping and physical constraints on transcription regulation. J. Mol. Biol. 331 (2003)
Wright, M., Kharchenko, P., Church, G., Segrè, D.: Chromosomal periodicity of evolutionarily conserved gene pairs. PNAS 104 (2007)
Xu, M., Cook, P.R.: Similar active genes cluster in specialized transcription factories. J. Cell. Biol. 181, 615–623 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Elati, M., Fekih, R., Nicolle, R., Junier, I., Hérisson, J., Képès, F. (2011). Boosting Binding Sites Prediction Using Gene’s Positions. In: Przytycka, T.M., Sagot, MF. (eds) Algorithms in Bioinformatics. WABI 2011. Lecture Notes in Computer Science(), vol 6833. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23038-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-23038-7_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23037-0
Online ISBN: 978-3-642-23038-7
eBook Packages: Computer ScienceComputer Science (R0)