Abstract
Promoter prediction is a well known, but challenging problem in the field of computational biology. Eukaryotic promoter prediction, an important step in the elucidation of transcriptional control networks and gene finding, is frustrated by the complex nature of promoters themselves. Within this paper we explore a representational scheme that describes promoters based on a variable number of salient binding sites within them. The multiple instance learning paradigm is used to allow these variable length instances to be reasoned about in a supervised learning context. We demonstrate that the procedure performs reasonably on its own, and allows for a significant increase in predictive accuracy when combined with physico-chemical promoter prediction.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
International Human Genome Sequencing Consortium, Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004)
Kondrakhin, Y.V., Kel, A.E., Kolchanov, N.A., Romashchenko, A.G., Milanesi, L.: Eukaryotic promoter recognition by binding sites for transcription factors. Comput. Appl. Biosci. 11, 477–488 (1995)
Prestridge, D.S.: Predicting Pol II Promoter Sequences using Transcription Factor Binding Sites. Journal of Molecular Biology 249, 923–932 (1995)
Berman, B.P., Nibu, Y., Pfeiffer, B.D., Tomancak, P., Celniker, S.E., Levine, M., Rubin, G.M., Eisen, M.B.: Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. 99(2), 757–762 (2002)
Frith, M.C., Li, M.C., Weng, Z.: Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nuc. Acids Res. 31(13), 3666–3668 (2003)
Kel, A.E., Kolchanov, N.A., Kapitonov, V.V., Ponomarenko, M.P., Likhachev, A.E., Lim, H.A., Milanesi, L.: Computer analysis and recognition of functional sites on the base of oligonucleotide patterns distributions. In: Second International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis, St. Petersburg Beach, Florida, USA (1993)
Narang, V., Sung, W., Mittal, A.: Computational modeling of oligonuceotide positional densities for human promoter prediction. Artificial Intelligence in Medicine 35(1-2), 107–119 (2005)
Campbell, N.A., Mitchell, L.G., Reece, J.B.: Biology, 5th edn. Benjamin/Cummings Publ. Co., Inc., Menlo Park (1999)
Ohler, U.: Promoter Prediction on a Genomic Scale—The Adh Experience. Genome Res. 10(4), 539–542 (2000)
Ohler, U., Liao, G.-C., Niemann, H., Rubin, G.M.: Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3(12) (2002)
Ohler, U., Niemann, H.: Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet. 17(2), 56–60 (2001)
Ohler, U., Niemann, H., Liao, G., Rubin, G.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(Suppl 1), S199–S206 (2001)
Fickett, J.W., Hatzigeorgiou, A.G.: Eukaryotic Promoter Recognition. Genome Research 7, 861–878 (1997)
Abeel, T., Saeys, Y., Bonnet, E., Rouze, P., Peer, Y.V.D.: Generic eukaryotic core promoter prediction using structural features of DNA. Genome Res. 18(2), 310–323 (2008)
Bajic, V.B., Tan, S.L., Suzuki, Y., Sugano, S.: Promoter prediction analysis on the whole human genome. Nature Biotechnology 22, 1467–1473 (2004)
Pedersen, A.G., Baldi, P., Chauvin, Y., Brunak, S.: The biology of eukaryotic promoter prediction-a review. Computers and Chemistry 23(3-4), 191–207 (1999)
Oppon, J., Hide, W.: A Statistical Model for Prokaryotic Promoter Prediction. Genome Informatics 9, 271–273 (1998)
Uren, P., Cameron-Jones, R.M., Sale, A.: Promoter Prediction Using Physico-chemical Properties of DNA. In: The 2nd International Symposium on Computational Life Science. Springer, Cambridge (2006)
Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the Multiple Instance Problem with Axis-Parallel Rectangles. Artificial Intelligence 89(1-2), 31–71 (1997)
Zucker, J.D., Ganascia, J.G.: Changes of representation for efficient learning in structural domains. In: Thirteenth International Conference on Machine Learning. Morgan Kaufmann, Bary (1996)
Auer, P.: On learning from multi-instance examples: Empirical evaluation of a theoretical approach. In: The Fourteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1997)
Maron, O., Lozano-Perez, T.: A Framework for Multiple-Instance Learning. In: Advances in Neural Information Processing Systems. MIT Press, Cambridge (1998)
Zhang, Q., Goldman, S.A.: EM-DD: an improved multiple-instance learning technique. Neural Information Processing Systems 14(10) (2001)
Zhou, Z.-H., Zhang, M.-L.: Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowledge and Information Systems 11(2), 155–170 (2007)
Xu, X., Frank, E.: Logistic Regression and Boosting for Labeled Bags of Instances. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS, vol. 3056, pp. 272–281. Springer, Heidelberg (2004)
Ray, S., Craven, M.: Supervised versus multiple instance learning: An empirical comparison. In: The 22nd International Conference on Machine Learning. ACM Press, New York (2005)
Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1998)
Zhou, Z.-H., Zhang, M.-L.: Multi-Instance Multi-Label Learning with Application to Scene Classification. In: Advances in Neural Information Processing Systems, vol. 19. MIT Press, Cambridge (2007)
Zhang, Q., Goldman, S.A., Yu, W., Fritts, J.E.: Content-Based Image Retrieval Using Multiple-Instance Learning. In: Nineteenth International Conference on Machine Learning, Sydney, Australia (2002)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W., Lenhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucl. Acids Res. 32(suppl_1), D91–D94 (2004)
Xu, X.: Statistical learning in multiple instance problems. Unpublished Masters Thesis, University of Waikato (2003)
Wilcoxon, F.: Individual Comparisons by Ranking Methods. Biometrics 1, 80–83 (1945)
Conover, W.J.: Practical nonparametric statistics. Wiley, Chichester (1980)
Breiman, L.: Bagging Predictors. Machine Learning 24(3), 123–140 (1996)
Freund, Y.: Boosting a weak learning algorithm by majority. Information and Computation 121(2), 256–285 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Uren, P.J., Cameron-Jones, R.M., Sale, A.H.J. (2008). Improving Promoter Prediction Using Multiple Instance Learning. In: Wobcke, W., Zhang, M. (eds) AI 2008: Advances in Artificial Intelligence. AI 2008. Lecture Notes in Computer Science(), vol 5360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89378-3_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-89378-3_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89377-6
Online ISBN: 978-3-540-89378-3
eBook Packages: Computer ScienceComputer Science (R0)