Skip to main content

Improving Promoter Prediction Using Multiple Instance Learning

  • Conference paper
  • 1815 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5360))

Abstract

Promoter prediction is a well known, but challenging problem in the field of computational biology. Eukaryotic promoter prediction, an important step in the elucidation of transcriptional control networks and gene finding, is frustrated by the complex nature of promoters themselves. Within this paper we explore a representational scheme that describes promoters based on a variable number of salient binding sites within them. The multiple instance learning paradigm is used to allow these variable length instances to be reasoned about in a supervised learning context. We demonstrate that the procedure performs reasonably on its own, and allows for a significant increase in predictive accuracy when combined with physico-chemical promoter prediction.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. International Human Genome Sequencing Consortium, Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004)

    Google Scholar 

  2. Kondrakhin, Y.V., Kel, A.E., Kolchanov, N.A., Romashchenko, A.G., Milanesi, L.: Eukaryotic promoter recognition by binding sites for transcription factors. Comput. Appl. Biosci. 11, 477–488 (1995)

    Google Scholar 

  3. Prestridge, D.S.: Predicting Pol II Promoter Sequences using Transcription Factor Binding Sites. Journal of Molecular Biology 249, 923–932 (1995)

    Article  Google Scholar 

  4. Berman, B.P., Nibu, Y., Pfeiffer, B.D., Tomancak, P., Celniker, S.E., Levine, M., Rubin, G.M., Eisen, M.B.: Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. 99(2), 757–762 (2002)

    Article  Google Scholar 

  5. Frith, M.C., Li, M.C., Weng, Z.: Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nuc. Acids Res. 31(13), 3666–3668 (2003)

    Article  Google Scholar 

  6. Kel, A.E., Kolchanov, N.A., Kapitonov, V.V., Ponomarenko, M.P., Likhachev, A.E., Lim, H.A., Milanesi, L.: Computer analysis and recognition of functional sites on the base of oligonucleotide patterns distributions. In: Second International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis, St. Petersburg Beach, Florida, USA (1993)

    Google Scholar 

  7. Narang, V., Sung, W., Mittal, A.: Computational modeling of oligonuceotide positional densities for human promoter prediction. Artificial Intelligence in Medicine 35(1-2), 107–119 (2005)

    Article  Google Scholar 

  8. Campbell, N.A., Mitchell, L.G., Reece, J.B.: Biology, 5th edn. Benjamin/Cummings Publ. Co., Inc., Menlo Park (1999)

    Google Scholar 

  9. Ohler, U.: Promoter Prediction on a Genomic Scale—The Adh Experience. Genome Res. 10(4), 539–542 (2000)

    Article  Google Scholar 

  10. Ohler, U., Liao, G.-C., Niemann, H., Rubin, G.M.: Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3(12) (2002)

    Google Scholar 

  11. Ohler, U., Niemann, H.: Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet. 17(2), 56–60 (2001)

    Article  Google Scholar 

  12. Ohler, U., Niemann, H., Liao, G., Rubin, G.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(Suppl 1), S199–S206 (2001)

    Article  Google Scholar 

  13. Fickett, J.W., Hatzigeorgiou, A.G.: Eukaryotic Promoter Recognition. Genome Research 7, 861–878 (1997)

    Google Scholar 

  14. Abeel, T., Saeys, Y., Bonnet, E., Rouze, P., Peer, Y.V.D.: Generic eukaryotic core promoter prediction using structural features of DNA. Genome Res. 18(2), 310–323 (2008)

    Article  Google Scholar 

  15. Bajic, V.B., Tan, S.L., Suzuki, Y., Sugano, S.: Promoter prediction analysis on the whole human genome. Nature Biotechnology 22, 1467–1473 (2004)

    Article  Google Scholar 

  16. Pedersen, A.G., Baldi, P., Chauvin, Y., Brunak, S.: The biology of eukaryotic promoter prediction-a review. Computers and Chemistry 23(3-4), 191–207 (1999)

    Article  Google Scholar 

  17. Oppon, J., Hide, W.: A Statistical Model for Prokaryotic Promoter Prediction. Genome Informatics 9, 271–273 (1998)

    Google Scholar 

  18. Uren, P., Cameron-Jones, R.M., Sale, A.: Promoter Prediction Using Physico-chemical Properties of DNA. In: The 2nd International Symposium on Computational Life Science. Springer, Cambridge (2006)

    Google Scholar 

  19. Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the Multiple Instance Problem with Axis-Parallel Rectangles. Artificial Intelligence 89(1-2), 31–71 (1997)

    Article  MATH  Google Scholar 

  20. Zucker, J.D., Ganascia, J.G.: Changes of representation for efficient learning in structural domains. In: Thirteenth International Conference on Machine Learning. Morgan Kaufmann, Bary (1996)

    Google Scholar 

  21. Auer, P.: On learning from multi-instance examples: Empirical evaluation of a theoretical approach. In: The Fourteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  22. Maron, O., Lozano-Perez, T.: A Framework for Multiple-Instance Learning. In: Advances in Neural Information Processing Systems. MIT Press, Cambridge (1998)

    Google Scholar 

  23. Zhang, Q., Goldman, S.A.: EM-DD: an improved multiple-instance learning technique. Neural Information Processing Systems 14(10) (2001)

    Google Scholar 

  24. Zhou, Z.-H., Zhang, M.-L.: Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowledge and Information Systems 11(2), 155–170 (2007)

    Article  Google Scholar 

  25. Xu, X., Frank, E.: Logistic Regression and Boosting for Labeled Bags of Instances. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS, vol. 3056, pp. 272–281. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  26. Ray, S., Craven, M.: Supervised versus multiple instance learning: An empirical comparison. In: The 22nd International Conference on Machine Learning. ACM Press, New York (2005)

    Google Scholar 

  27. Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  28. Zhou, Z.-H., Zhang, M.-L.: Multi-Instance Multi-Label Learning with Application to Scene Classification. In: Advances in Neural Information Processing Systems, vol. 19. MIT Press, Cambridge (2007)

    Google Scholar 

  29. Zhang, Q., Goldman, S.A., Yu, W., Fritts, J.E.: Content-Based Image Retrieval Using Multiple-Instance Learning. In: Nineteenth International Conference on Machine Learning, Sydney, Australia (2002)

    Google Scholar 

  30. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  31. Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W., Lenhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucl. Acids Res. 32(suppl_1), D91–D94 (2004)

    Article  Google Scholar 

  32. Xu, X.: Statistical learning in multiple instance problems. Unpublished Masters Thesis, University of Waikato (2003)

    Google Scholar 

  33. Wilcoxon, F.: Individual Comparisons by Ranking Methods. Biometrics 1, 80–83 (1945)

    Article  MathSciNet  Google Scholar 

  34. Conover, W.J.: Practical nonparametric statistics. Wiley, Chichester (1980)

    Google Scholar 

  35. Breiman, L.: Bagging Predictors. Machine Learning 24(3), 123–140 (1996)

    MATH  Google Scholar 

  36. Freund, Y.: Boosting a weak learning algorithm by majority. Information and Computation 121(2), 256–285 (1995)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Uren, P.J., Cameron-Jones, R.M., Sale, A.H.J. (2008). Improving Promoter Prediction Using Multiple Instance Learning. In: Wobcke, W., Zhang, M. (eds) AI 2008: Advances in Artificial Intelligence. AI 2008. Lecture Notes in Computer Science(), vol 5360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89378-3_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89378-3_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89377-6

  • Online ISBN: 978-3-540-89378-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics