Abstract
In this paper, we examined the problem of identifying motifs in DNA sequences. Transcription-binding sites, which are functionally significant subsequences, are considered as motifs. In order to reveal such DNA motifs, our method makes use of Fuzzy clustering of Position Weight Matrix. The Fuzzy C-Means (FCM) algorithm clearly predicted known motifs that existed in intergenic regions of GAL4, CBF1 and GCN4 DNA sequences. This paper also provides a comparison of FCM with some clustering methods such as Self-Organizing Map and K-Means. The results of the FCM algorithm is compared to the results of popular motif discovery tool Multiple Expectation Maximization for Motif Elicitation (MEME) as well. We conclude that soft-clustering-based machine learning methods such as FCM are useful to finding patterns in biological sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Das, M.K., Dai, H.K.: A Survey of DNA Motif Finding Algorithms. BMC Bioinformatics 8 Suppl 7, 21 (2007)
Timothy, L.B., Nadya, W., Chris, M., Li, W.W.: MEME: Discovering and Analyzing DNA and Protein Sequence Motifs. Nucleic Acids Research 34, 369–373 (2006)
Mahony, S., Hendrix, D., Smith, T.J., Golden, A.: Self-Organizing Maps of Position Weight Matrices for Motif Discovery in Biological Sequences. Artificial Intelligence Review 24, 397–413 (2005)
Derong, L., Xiaoxu, X., Bhaskar, D., Huaguang, Z.: Motif Discoveries in Unaligned Molecular Sequences Using Self-Organizing Neural Networks. IEEE Trans. Neural Networks 17(4), 919–928 (2006)
Sandve, G.K., Drablos, F.: A Survey of Motif Discovery Methods in an Integrated Framework. Biol. Direct. 1, 11 (2006)
The Promoter Database of Saccharomyces cerevisiae (SCPD) web site, http://rulai.cshl.edu/SCPD/
The Fraenkel Lab web site, http://fraenkel.mit.edu/
Ferreira, P.G., Azevedo, P.J.: Evaluating Protein Motif Significance Measures: A Case Study on Prosite Patterns. In: Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining CIDM (2007)
MEME (Multiple EM for Motif Elicitation) web site, http://meme.sdsc.edu/meme/meme.html
Shane, T., Jensen, S.L., Liu, J.S.: Combining Phylogenetic Motif Discovery and Motif Clustering to Predict Co-regulated Genes. Bioinformatics 21(20), 3832–3839 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Karabulut, M., Ibrikci, T. (2008). Fuzzy C-Means Based DNA Motif Discovery. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues. ICIC 2008. Lecture Notes in Computer Science, vol 5226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87442-3_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-87442-3_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87440-9
Online ISBN: 978-3-540-87442-3
eBook Packages: Computer ScienceComputer Science (R0)