Prediction of Binding Sites in the Mouse Genome Using Support Vector Machines

Sun, Yi; Robinson, Mark; Adams, Rod; Rust, Alistair; Davey, Neil

doi:10.1007/978-3-540-87559-8_10

Yi Sun¹,
Mark Robinson²,
Rod Adams¹,
Alistair Rust³ &
…
Neil Davey¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5164))

Included in the following conference series:

International Conference on Artificial Neural Networks

2427 Accesses
3 Citations

Abstract

Computational prediction of cis-regulatory binding sites is widely acknowledged as a difficult task. There are many different algorithms for searching for binding sites in current use. However, most of them produce a high rate of false positive predictions. Moreover, many algorithmic approaches are inherently constrained with respect to the range of binding sites that they can be expected to reliably predict. We propose to use SVMs to predict binding sites from multiple sources of evidence. We combine random selection under-sampling and the synthetic minority over-sampling technique to deal with the imbalanced nature of the data. In addition, we remove some of the final predicted binding sites on the basis of their biological plausibility. The results show that we can generate a new prediction that significantly improves on the performance of any one of the individual prediction algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bailey, T.L., Elkan, C.: Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28–36. AAAI Press, Menlo Park (1994)
Google Scholar
Blanchette, M., Tompa, M.: FootPrinter: A Program Designed for Phylogenetic Footprinting. Nucleic Acids Research 31(13), 3840–3842 (2003)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Ettwiller, L., Paten, B., Souren, M., Loosli, F., Wittbrodt, J., Birney, E.: The Discovery, Positioning and Verification of a Set of Transcription-associated Motifs in Vertebrate. Genome Biol. 6(12) (2005)
Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46, 389–422 (2002)
Article MATH Google Scholar
Hu, J.J., Yang, Y.F.D., Kihara, D.: EMD: an Ensemble Algorithm for Discovering Regulatory Motifs in DNA Sequsences. BMC Bioinformatics (2006)
Google Scholar
Huber, B.R., Bulyk, M.L.: Meta-analysis Discovery of Tissue-specific DNA Sequence Motifs from Mammalian Gene Expressin Data. BMC Bioinformatics (2006)
Google Scholar
Japkowicz, N.: Class Imbalances: Are We Focusing on the Right Issure? In: Workshop on learning from imbalanced datasets, II, ICML (2003)
Google Scholar
Scholköpf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Google Scholar
Sun, Y., Robinson, M., Adams, R., Kaye, P., Rust, A.G., Davey, N.: Using Real-valued Meta Classifiers to Integrate Binding Site Predictions. In: Proceedings of International Joint Conference on Neural Network (2005)
Google Scholar
Sun, Y., Robinson, M., Adams, R., Davey, N., Rust, A.: Predicting Binding Sites in the Mouse Genome. In: Proceedings The Sixth International Conference on Machine Learning and Applications (ICMLA 2007) (2007)
Google Scholar
Tompa, M., et al.: Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites. Nature Biotechnology 23(1) (2005)
Google Scholar
Wu, G., Chang, E.: Class-boundary Alignment for Imbalanced Dataset Learning. In: Workshop on learning from imbalanced datasets, II, ICML (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Science and technology research school, University of Hertfordshire, United Kingdom, AL10 9AB
Yi Sun, Rod Adams & Neil Davey
Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
Mark Robinson
Institute for Systems Biology, 1441 North 34th Street, Seattle, WA, 98103, USA
Alistair Rust

Authors

Yi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Mark Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Rod Adams
View author publications
You can also search for this author in PubMed Google Scholar
Alistair Rust
View author publications
You can also search for this author in PubMed Google Scholar
Neil Davey
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Véra Kůrková Roman Neruda Jan Koutník

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, Y., Robinson, M., Adams, R., Rust, A., Davey, N. (2008). Prediction of Binding Sites in the Mouse Genome Using Support Vector Machines. In: Kůrková, V., Neruda, R., Koutník, J. (eds) Artificial Neural Networks - ICANN 2008. ICANN 2008. Lecture Notes in Computer Science, vol 5164. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87559-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-87559-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87558-1
Online ISBN: 978-3-540-87559-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics