Abstract
Motif discovery recently received considerable interest from both computational biologists and computer scientists. Identifying motifs is greatly significant for understanding the mechanism behind regulating gene expressions. Although many algorithms have been proposed to solve this problem, only some of them use prior information about motifs. In this paper, we propose a method to limit the search space of the existing methods for motif discovery. Our method is based on the following observation: if some elements are conserved, then these elements may be part of a conserved motif. Further, the proposed approach is based on the divide and conquer concept, where we divide each DNA sequence into four subsequences, one subsequence per each of the four letters, representatives of the nucleotides, namely {A, C, G, T}. Then, we consider the subsequences for G as the major source for deciding on candidate motifs because G is found in almost all the transcription factors binding sites; the decision is supported and enhanced by the subsequences of the other three letters. We have applied this idea to yst04 and hm03r datasets; the results are encouraging as we have successfully predicted the locations of some of the motifs hidden within the analyzed sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baily, T.L., et al.: MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Research 34, 369–373 (2006)
Bailey, T.L., Elkan, C.: The Value of Prior Knowledge in Discovering Motifs with MEME. In: Proc. of ISMB, Menlo Park, CA (1995)
Battle, A., Segal, E., Koller, D.: Probabilistic Discovery of Overlapping Cellular Processes and Their Regulation. In: Proc. of RECOMB, San Diego, CA (2004)
Beer, M.A., Tavazoie, S.: Predicting gene expression from sequence. Cell 117, 185–198 (2004)
Brazma, A., Jonassen, I., Vilo, J., Ukkonen, E.: Predicting Gene Regulatory Elements in Silico on a Genomic Scale. Genome Research 8, 1202–1215 (1998)
Bussemaker, H.J., Li, H., Siggia, E.D.: Regulatory Element Detection using Correlation with Expression. Nature Genetics 27, 167–171 (2001)
Conlon, E.M., Liu, X.S., Lieb, J.D., Liu, J.S.: Integrating Regulatory Motif Discovery and Genome-wide Expression Analysis. PNAS 100(6), 3339–3344 (2003)
D’haeseleer, P.: How does DNA sequence motif discovery work? Nature Biotechnology 24(8) (2006)
Hertz, G.Z., Stormo, G.D.: Identifying DNA and Protein Patterns with Statistically Significant Alignments of Multiple Sequences. Bioinformatics 15(7/8), 563–577 (1999)
Holmes, I., Bruno, W.J.: Finding regulatory elements using joint likelihoods for sequence and expression profile data. In: Proc. of International Conference of Intelligent Systems for Molecular Biology, pp. 202–210 (2000)
Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Research 33(15), 4899–4913 (2005)
Hughes, J.D., Estep, P.W., Tavazoie, S., Church, G.M.: Computational Identification of Cis-regulatory Elements Associated with Groups of Functionally Related Genes in Saccharomyces Cerevisiae. Journal of Molecular Biology 296, 1205–1214 (2000)
Jensen, S.T., Shen, L., Liu, J.S.: Combining phylogenetics motif discovery and motif clustering to predict co–regulated genes. Bioinformatics 21(20), 3832–3839 (2005)
Kechris, K.J., van Zwet, E., Bickel, P.J., Eisen, M.B.: A Boosting Approach for Motif Modeling using ChIP-chip Data. Bioinformatics 21(11), 2636–2643 (2005)
Keles, S., van der Laan, M.J., Vulpe, C.: Regulatory Motif finding by Logic Regression. U.C. Berkeley Biostatistics Working Paper Series, (145) (2004)
Kundaje, A., Middendorf, M., Gao, F., Wiggins, C., Leslie, C.: Combining sequence and time series expression data to learn transcriptional modules. IEEE Transactions on Computational Biology and Bioinformatics 2(3), 194–202 (2005)
Liu, X., Brutlag, D.L., Liu, J.S.: Bioprospector: Discovering Conserved DNA Motifs in Ppstream Regulatory Regions of Co-expressed Genes. In: Proc. of Pacific Symposium on Biocomputing (2001)
Liu, X.S., Brutlag, D.L., Liu, J.S.: An Algorithm for Finding Protein-DNA Binding Sites with Applications to Chromatin-Immunoprecipitation Microarray Experiments. Nature Biotechnology (20), 835–839 (2002)
Lones, M.A., Tyrrell, A.M.: The evolutionary computation approach to motif discovery in biological sequences. In: Proc. of GECCO workshop (2005)
Marsan, L., Sagot, M.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. Journal of computational Biology 7(3/4), 345–362 (2000)
Middendorf, M., Kundaje, A., Shah, M., Freund, Y., Wiggings, C.H., Leslie, C.: Motif Discovery through Predictive Modeling of Gene Regulation. In: Proc. of RECOMB, Cambridge, MA (2005)
Moreau, Y., Thijs, G., Marchal, K., De Smet, F., Mathys, J., Lescot, M., Rombauts, S., Rouze, P., De Moor, B.: Integrating Quality-based Clustering of Microarray Data with Gibbs Sampling for the Discovery of Regulatory Motifs. JOBIM, 75–79 (2002)
Narlikar, L., Hartemink, A.: Sequence features of DNA binding sites reveal structural class of associated transcription factor. Bioinformatics 22, 157–163 (2006)
Narlikar, L., Gordan, R., Hartemink, A.J.: Nucleosome occupancy information improves de novo motif discovery. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS (LNBI), vol. 4453. Springer, Heidelberg (2007)
Narlikar, L., Gordan, R., Ohler, U., Hartemink, A.J.: Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics 22, 384–392 (2006)
Paul, T.K., Iba, H.: Identification of weak motifs in multiple biological sequences using genetic algorithm. In: Proc. of GECCO 2006 (2006)
Pavesi, G., et al.: ’Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acid Research 32, 199–203 (2004)
Segal, E., Yelensky, R., Koller, D.: Genome-wide Discovery of Transcriptional Modules from DNA Sequence and Gene Expression. Bioinformatics 19(1), 273–282 (2003)
Segal, E., Barash, Y., Simon, I., Friedman, N., Koller, D.: From Promoter Sequence to Expression: A Probabilistic Framework. In: Proc. of RECOMB, Washington, DC (2001)
Thompson, W., Rouchka, E.C., Lawrence, C.E.: Gibbs recursive sampler: finding transcription factor binding sites. Nucleic Acids Research 31(13), 3580–3585 (2003)
Tompa, M., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23(1), 137–144 (2005)
Stine, M., et al.: Motif discovery in upstream sequences of coordinately expressed genes. In: Proc. of CEC, USA, pp. 1596–1603 (2003)
Wolfe, S.A., Nekludova, L., Pabo, C.O.: DNA Recognition by Cys 2 His 2 Zinc Finge Proteins. Annu. Rev. Biophys. Biomol. Stru. 3, 183–212 (1999)
Ben-Zaken Zilberstein, C., Eskin, E., Yakhini, Z.: Sequence Motifs in Ranked Expression Data. Technion CS Dept. Technical Report (CS-2003-09) (2003)
Zhang, Y., Chen, Y., Ji, X.: Motif Discovery as a multiple instance problem. In: Proc. of IEEE ICTAI, pp. 805–809 (2006)
Zhu, Z., Pilpel, Y., Church, G.M.: Computational Identification of Transcription Factor Binding Sites via a Transcription-factor-centric Clustering (TFCC) Algorithm. Journal of Molecular Biology (318), 71–81 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alshalalfa, M., Alhajj, R. (2008). Motif Location Prediction by Divide and Conquer. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-70600-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70598-7
Online ISBN: 978-3-540-70600-7
eBook Packages: Computer ScienceComputer Science (R0)