Motif Location Prediction by Divide and Conquer

Alshalalfa, Mohammed; Alhajj, Reda

doi:10.1007/978-3-540-70600-7_8

Mohammed Alshalalfa¹ &
Reda Alhajj¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 13))

Included in the following conference series:

International Conference on Bioinformatics Research and Development

730 Accesses

Abstract

Motif discovery recently received considerable interest from both computational biologists and computer scientists. Identifying motifs is greatly significant for understanding the mechanism behind regulating gene expressions. Although many algorithms have been proposed to solve this problem, only some of them use prior information about motifs. In this paper, we propose a method to limit the search space of the existing methods for motif discovery. Our method is based on the following observation: if some elements are conserved, then these elements may be part of a conserved motif. Further, the proposed approach is based on the divide and conquer concept, where we divide each DNA sequence into four subsequences, one subsequence per each of the four letters, representatives of the nucleotides, namely {A, C, G, T}. Then, we consider the subsequences for G as the major source for deciding on candidate motifs because G is found in almost all the transcription factors binding sites; the decision is supported and enhanced by the subsequences of the other three letters. We have applied this idea to yst04 and hm03r datasets; the results are encouraging as we have successfully predicted the locations of some of the motifs hidden within the analyzed sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baily, T.L., et al.: MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Research 34, 369–373 (2006)
Article Google Scholar
Bailey, T.L., Elkan, C.: The Value of Prior Knowledge in Discovering Motifs with MEME. In: Proc. of ISMB, Menlo Park, CA (1995)
Google Scholar
Battle, A., Segal, E., Koller, D.: Probabilistic Discovery of Overlapping Cellular Processes and Their Regulation. In: Proc. of RECOMB, San Diego, CA (2004)
Google Scholar
Beer, M.A., Tavazoie, S.: Predicting gene expression from sequence. Cell 117, 185–198 (2004)
Article Google Scholar
Brazma, A., Jonassen, I., Vilo, J., Ukkonen, E.: Predicting Gene Regulatory Elements in Silico on a Genomic Scale. Genome Research 8, 1202–1215 (1998)
Google Scholar
Bussemaker, H.J., Li, H., Siggia, E.D.: Regulatory Element Detection using Correlation with Expression. Nature Genetics 27, 167–171 (2001)
Article Google Scholar
Conlon, E.M., Liu, X.S., Lieb, J.D., Liu, J.S.: Integrating Regulatory Motif Discovery and Genome-wide Expression Analysis. PNAS 100(6), 3339–3344 (2003)
Article Google Scholar
D’haeseleer, P.: How does DNA sequence motif discovery work? Nature Biotechnology 24(8) (2006)
Google Scholar
Hertz, G.Z., Stormo, G.D.: Identifying DNA and Protein Patterns with Statistically Significant Alignments of Multiple Sequences. Bioinformatics 15(7/8), 563–577 (1999)
Article Google Scholar
Holmes, I., Bruno, W.J.: Finding regulatory elements using joint likelihoods for sequence and expression profile data. In: Proc. of International Conference of Intelligent Systems for Molecular Biology, pp. 202–210 (2000)
Google Scholar
Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Research 33(15), 4899–4913 (2005)
Article Google Scholar
Hughes, J.D., Estep, P.W., Tavazoie, S., Church, G.M.: Computational Identification of Cis-regulatory Elements Associated with Groups of Functionally Related Genes in Saccharomyces Cerevisiae. Journal of Molecular Biology 296, 1205–1214 (2000)
Article Google Scholar
Jensen, S.T., Shen, L., Liu, J.S.: Combining phylogenetics motif discovery and motif clustering to predict co–regulated genes. Bioinformatics 21(20), 3832–3839 (2005)
Article Google Scholar
Kechris, K.J., van Zwet, E., Bickel, P.J., Eisen, M.B.: A Boosting Approach for Motif Modeling using ChIP-chip Data. Bioinformatics 21(11), 2636–2643 (2005)
Article Google Scholar
Keles, S., van der Laan, M.J., Vulpe, C.: Regulatory Motif finding by Logic Regression. U.C. Berkeley Biostatistics Working Paper Series, (145) (2004)
Google Scholar
Kundaje, A., Middendorf, M., Gao, F., Wiggins, C., Leslie, C.: Combining sequence and time series expression data to learn transcriptional modules. IEEE Transactions on Computational Biology and Bioinformatics 2(3), 194–202 (2005)
Article Google Scholar
Liu, X., Brutlag, D.L., Liu, J.S.: Bioprospector: Discovering Conserved DNA Motifs in Ppstream Regulatory Regions of Co-expressed Genes. In: Proc. of Pacific Symposium on Biocomputing (2001)
Google Scholar
Liu, X.S., Brutlag, D.L., Liu, J.S.: An Algorithm for Finding Protein-DNA Binding Sites with Applications to Chromatin-Immunoprecipitation Microarray Experiments. Nature Biotechnology (20), 835–839 (2002)
Google Scholar
Lones, M.A., Tyrrell, A.M.: The evolutionary computation approach to motif discovery in biological sequences. In: Proc. of GECCO workshop (2005)
Google Scholar
Marsan, L., Sagot, M.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. Journal of computational Biology 7(3/4), 345–362 (2000)
Article Google Scholar
Middendorf, M., Kundaje, A., Shah, M., Freund, Y., Wiggings, C.H., Leslie, C.: Motif Discovery through Predictive Modeling of Gene Regulation. In: Proc. of RECOMB, Cambridge, MA (2005)
Google Scholar
Moreau, Y., Thijs, G., Marchal, K., De Smet, F., Mathys, J., Lescot, M., Rombauts, S., Rouze, P., De Moor, B.: Integrating Quality-based Clustering of Microarray Data with Gibbs Sampling for the Discovery of Regulatory Motifs. JOBIM, 75–79 (2002)
Google Scholar
Narlikar, L., Hartemink, A.: Sequence features of DNA binding sites reveal structural class of associated transcription factor. Bioinformatics 22, 157–163 (2006)
Article Google Scholar
Narlikar, L., Gordan, R., Hartemink, A.J.: Nucleosome occupancy information improves de novo motif discovery. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS (LNBI), vol. 4453. Springer, Heidelberg (2007)
Chapter Google Scholar
Narlikar, L., Gordan, R., Ohler, U., Hartemink, A.J.: Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics 22, 384–392 (2006)
Article Google Scholar
Paul, T.K., Iba, H.: Identification of weak motifs in multiple biological sequences using genetic algorithm. In: Proc. of GECCO 2006 (2006)
Google Scholar
Pavesi, G., et al.: ’Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acid Research 32, 199–203 (2004)
Article Google Scholar
Segal, E., Yelensky, R., Koller, D.: Genome-wide Discovery of Transcriptional Modules from DNA Sequence and Gene Expression. Bioinformatics 19(1), 273–282 (2003)
Article Google Scholar
Segal, E., Barash, Y., Simon, I., Friedman, N., Koller, D.: From Promoter Sequence to Expression: A Probabilistic Framework. In: Proc. of RECOMB, Washington, DC (2001)
Google Scholar
Thompson, W., Rouchka, E.C., Lawrence, C.E.: Gibbs recursive sampler: finding transcription factor binding sites. Nucleic Acids Research 31(13), 3580–3585 (2003)
Article Google Scholar
Tompa, M., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23(1), 137–144 (2005)
Article MathSciNet Google Scholar
Stine, M., et al.: Motif discovery in upstream sequences of coordinately expressed genes. In: Proc. of CEC, USA, pp. 1596–1603 (2003)
Google Scholar
Wolfe, S.A., Nekludova, L., Pabo, C.O.: DNA Recognition by Cys ₂ His ₂ Zinc Finge Proteins. Annu. Rev. Biophys. Biomol. Stru. 3, 183–212 (1999)
Google Scholar
Ben-Zaken Zilberstein, C., Eskin, E., Yakhini, Z.: Sequence Motifs in Ranked Expression Data. Technion CS Dept. Technical Report (CS-2003-09) (2003)
Google Scholar
Zhang, Y., Chen, Y., Ji, X.: Motif Discovery as a multiple instance problem. In: Proc. of IEEE ICTAI, pp. 805–809 (2006)
Google Scholar
Zhu, Z., Pilpel, Y., Church, G.M.: Computational Identification of Transcription Factor Binding Sites via a Transcription-factor-centric Clustering (TFCC) Algorithm. Journal of Molecular Biology (318), 71–81 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer science, University of Calgary, Calgary, Alberta, Canada
Mohammed Alshalalfa & Reda Alhajj

Authors

Mohammed Alshalalfa
View author publications
You can also search for this author in PubMed Google Scholar
Reda Alhajj
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Mourad Elloumi Josef Küng Michal Linial Robert F. Murphy Kristan Schneider Cristian Toma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alshalalfa, M., Alhajj, R. (2008). Motif Location Prediction by Divide and Conquer. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-70600-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70598-7
Online ISBN: 978-3-540-70600-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics