Improving Promoter Prediction Using Multiple Instance Learning

Uren, P. J.; Cameron-Jones, R. M.; Sale, A. H. J.

doi:10.1007/978-3-540-89378-3_28

Improving Promoter Prediction Using Multiple Instance Learning

P. J. Uren³,
R. M. Cameron-Jones³ &
A. H. J. Sale³

Conference paper

1815 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5360))

Abstract

Promoter prediction is a well known, but challenging problem in the field of computational biology. Eukaryotic promoter prediction, an important step in the elucidation of transcriptional control networks and gene finding, is frustrated by the complex nature of promoters themselves. Within this paper we explore a representational scheme that describes promoters based on a variable number of salient binding sites within them. The multiple instance learning paradigm is used to allow these variable length instances to be reasoned about in a supervised learning context. We demonstrate that the procedure performs reasonably on its own, and allows for a significant increase in predictive accuracy when combined with physico-chemical promoter prediction.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

International Human Genome Sequencing Consortium, Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004)
Google Scholar
Kondrakhin, Y.V., Kel, A.E., Kolchanov, N.A., Romashchenko, A.G., Milanesi, L.: Eukaryotic promoter recognition by binding sites for transcription factors. Comput. Appl. Biosci. 11, 477–488 (1995)
Google Scholar
Prestridge, D.S.: Predicting Pol II Promoter Sequences using Transcription Factor Binding Sites. Journal of Molecular Biology 249, 923–932 (1995)
Article Google Scholar
Berman, B.P., Nibu, Y., Pfeiffer, B.D., Tomancak, P., Celniker, S.E., Levine, M., Rubin, G.M., Eisen, M.B.: Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. 99(2), 757–762 (2002)
Article Google Scholar
Frith, M.C., Li, M.C., Weng, Z.: Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nuc. Acids Res. 31(13), 3666–3668 (2003)
Article Google Scholar
Kel, A.E., Kolchanov, N.A., Kapitonov, V.V., Ponomarenko, M.P., Likhachev, A.E., Lim, H.A., Milanesi, L.: Computer analysis and recognition of functional sites on the base of oligonucleotide patterns distributions. In: Second International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis, St. Petersburg Beach, Florida, USA (1993)
Google Scholar
Narang, V., Sung, W., Mittal, A.: Computational modeling of oligonuceotide positional densities for human promoter prediction. Artificial Intelligence in Medicine 35(1-2), 107–119 (2005)
Article Google Scholar
Campbell, N.A., Mitchell, L.G., Reece, J.B.: Biology, 5th edn. Benjamin/Cummings Publ. Co., Inc., Menlo Park (1999)
Google Scholar
Ohler, U.: Promoter Prediction on a Genomic Scale—The Adh Experience. Genome Res. 10(4), 539–542 (2000)
Article Google Scholar
Ohler, U., Liao, G.-C., Niemann, H., Rubin, G.M.: Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3(12) (2002)
Google Scholar
Ohler, U., Niemann, H.: Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet. 17(2), 56–60 (2001)
Article Google Scholar
Ohler, U., Niemann, H., Liao, G., Rubin, G.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(Suppl 1), S199–S206 (2001)
Article Google Scholar
Fickett, J.W., Hatzigeorgiou, A.G.: Eukaryotic Promoter Recognition. Genome Research 7, 861–878 (1997)
Google Scholar
Abeel, T., Saeys, Y., Bonnet, E., Rouze, P., Peer, Y.V.D.: Generic eukaryotic core promoter prediction using structural features of DNA. Genome Res. 18(2), 310–323 (2008)
Article Google Scholar
Bajic, V.B., Tan, S.L., Suzuki, Y., Sugano, S.: Promoter prediction analysis on the whole human genome. Nature Biotechnology 22, 1467–1473 (2004)
Article Google Scholar
Pedersen, A.G., Baldi, P., Chauvin, Y., Brunak, S.: The biology of eukaryotic promoter prediction-a review. Computers and Chemistry 23(3-4), 191–207 (1999)
Article Google Scholar
Oppon, J., Hide, W.: A Statistical Model for Prokaryotic Promoter Prediction. Genome Informatics 9, 271–273 (1998)
Google Scholar
Uren, P., Cameron-Jones, R.M., Sale, A.: Promoter Prediction Using Physico-chemical Properties of DNA. In: The 2nd International Symposium on Computational Life Science. Springer, Cambridge (2006)
Google Scholar
Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the Multiple Instance Problem with Axis-Parallel Rectangles. Artificial Intelligence 89(1-2), 31–71 (1997)
Article MATH Google Scholar
Zucker, J.D., Ganascia, J.G.: Changes of representation for efficient learning in structural domains. In: Thirteenth International Conference on Machine Learning. Morgan Kaufmann, Bary (1996)
Google Scholar
Auer, P.: On learning from multi-instance examples: Empirical evaluation of a theoretical approach. In: The Fourteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Maron, O., Lozano-Perez, T.: A Framework for Multiple-Instance Learning. In: Advances in Neural Information Processing Systems. MIT Press, Cambridge (1998)
Google Scholar
Zhang, Q., Goldman, S.A.: EM-DD: an improved multiple-instance learning technique. Neural Information Processing Systems 14(10) (2001)
Google Scholar
Zhou, Z.-H., Zhang, M.-L.: Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowledge and Information Systems 11(2), 155–170 (2007)
Article Google Scholar
Xu, X., Frank, E.: Logistic Regression and Boosting for Labeled Bags of Instances. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS, vol. 3056, pp. 272–281. Springer, Heidelberg (2004)
Chapter Google Scholar
Ray, S., Craven, M.: Supervised versus multiple instance learning: An empirical comparison. In: The 22nd International Conference on Machine Learning. ACM Press, New York (2005)
Google Scholar
Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Zhou, Z.-H., Zhang, M.-L.: Multi-Instance Multi-Label Learning with Application to Scene Classification. In: Advances in Neural Information Processing Systems, vol. 19. MIT Press, Cambridge (2007)
Google Scholar
Zhang, Q., Goldman, S.A., Yu, W., Fritts, J.E.: Content-Based Image Retrieval Using Multiple-Instance Learning. In: Nineteenth International Conference on Machine Learning, Sydney, Australia (2002)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W., Lenhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucl. Acids Res. 32(suppl_1), D91–D94 (2004)
Article Google Scholar
Xu, X.: Statistical learning in multiple instance problems. Unpublished Masters Thesis, University of Waikato (2003)
Google Scholar
Wilcoxon, F.: Individual Comparisons by Ranking Methods. Biometrics 1, 80–83 (1945)
Article MathSciNet Google Scholar
Conover, W.J.: Practical nonparametric statistics. Wiley, Chichester (1980)
Google Scholar
Breiman, L.: Bagging Predictors. Machine Learning 24(3), 123–140 (1996)
MATH Google Scholar
Freund, Y.: Boosting a weak learning algorithm by majority. Information and Computation 121(2), 256–285 (1995)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Information Systems Faculty of Science, Engineering and Technology, University of Tasmania, Hobart and Launceston, Tasmania, Australia
P. J. Uren, R. M. Cameron-Jones & A. H. J. Sale

Authors

P. J. Uren
View author publications
You can also search for this author in PubMed Google Scholar
R. M. Cameron-Jones
View author publications
You can also search for this author in PubMed Google Scholar
A. H. J. Sale
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Wales, School of Computer Science and Engineering,, University of New South, NSW 2052, Sydney, Australia
Wayne Wobcke
School of Mathematics, Statistics and Computer Science, Victoria University of Wellington, P.O. Box 600, 6140, Wellington, New Zealand
Mengjie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Uren, P.J., Cameron-Jones, R.M., Sale, A.H.J. (2008). Improving Promoter Prediction Using Multiple Instance Learning. In: Wobcke, W., Zhang, M. (eds) AI 2008: Advances in Artificial Intelligence. AI 2008. Lecture Notes in Computer Science(), vol 5360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89378-3_28

Download citation

DOI: https://doi.org/10.1007/978-3-540-89378-3_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89377-6
Online ISBN: 978-3-540-89378-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics