MProfiler: A Profile-Based Method for DNA Motif Discovery

  • Doaa Altarawy
  • Mohamed A. Ismail
  • Sahar M. Ghanem
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5780)


Motif Finding is one of the most important tasks in gene regulation which is essential in understanding biological cell functions. Based on recent studies, the performance of current motif finders is not satisfactory. A number of ensemble methods have been proposed to enhance the accuracy of the results. Existing ensemble methods overall performance is better than stand-alone motif finders. A recent ensemble method, MotifVoter, significantly outperforms all existing stand-alone and ensemble methods. In this paper, we propose a method, MProfiler, to increase the accuracy of MotifVoter without increasing the run time by introducing an idea called center profiling. Our experiments show improvement in the quality of generated clusters over MotifVoter in both accuracy and cluster compactness. Using 56 datasets, the accuracy of the final results using our method achieves 80% improvement in correlation coefficient nCC, and 93% improvement in performance coefficient nPC over MotifVoter.


Bioinformatics DNA Motif Finding Clustering 


  1. 1.
    Qiu, P.: Recent advances in computational promoter analysis in understanding the transcriptional regulatory network. Biochemical and Biophysical Research Communications 309(3), 495–501 (2003)CrossRefPubMedGoogle Scholar
  2. 2.
    Wei, W., Yu, X.D.: Comparative analysis of regulatory motif discovery tools for transcription factor binding sites. Genomics Proteomics Bioinformatics 5(2), 131–142 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Das, M., Dai, H.K.: A survey of DNA motif finding algorithms. BMC Bioinformatics 8(suppl. 7) (2007)Google Scholar
  4. 4.
    Li, N., Tompa, M.: Analysis of computational approaches for motif discovery. Algorithms for Molecular Biology 1(1), 8–15 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res. 33(15), 4899–4913 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16(1), 16–23 (2000)CrossRefPubMedGoogle Scholar
  7. 7.
    Tompa, M., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23, 137–144 (2005)CrossRefPubMedGoogle Scholar
  8. 8.
    Wijaya, E., Yiu, S., Son, N.T., Kanagasabai, R., Sung, W.: Motifvoter: a novel ensemble method for fine-grained integration of generic motif finders. Bioinformatics 24, 2288–2295 (2008)CrossRefPubMedGoogle Scholar
  9. 9.
    Chakravarty, A., Carlson, J.M., Khetani, R.S., Gross, R.H.: A novel ensemble learning method for de novo computational identification of DNA binding sites. BMC Bioinformatics 8, 249–263 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Che, D., Jensen, S., Cai, L., Liu, J.S.: BEST: Binding-site estimation suite of tools. Bioinformatics 21(12), 2909–2911 (2005)CrossRefPubMedGoogle Scholar
  11. 11.
    Hu, J., Yang, Y.D., Kihara, D.: EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences. BMC Bioinformatics 7, 342–454 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)Google Scholar
  14. 14.
    Pavesi, G., Mereghetti, P., Mauri, G., Pesole, G.: Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 32(Web Server issue) (July 2004)Google Scholar
  15. 15.
    Liu, X., Brutlag, D.L., Liu, J.S.: Bioprospector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pac. Symp. Biocomput., pp. 127–138 (2001)Google Scholar
  16. 16.
    Wijaya, E., Kanagasabai, R., Yiu, S.-M.M., Sung, W.-K.K.: Detection of generic spaced motifs using submotif pattern mining. Bioinformatics 23(12), 1476–1485 (2007)CrossRefPubMedGoogle Scholar
  17. 17.
    Liu, X.S., Brutlag, D.L., Liu, J.S.: An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat. Biotechnol. 20(8), 835–839 (2002)CrossRefPubMedGoogle Scholar
  18. 18.
    Workman, C.T., Stormo, G.D.: ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. In: Pac. Symp. Biocomput., pp. 467–478 (2000)Google Scholar
  19. 19.
    Thijs, G., et al.: A higher-order background model improves the detection of promoter regulatory elements by gibbs sampling. Bioinformatics 17(12), 1113–1122 (2001)CrossRefPubMedGoogle Scholar
  20. 20.
    Eskin, E., Pevzner, P.A.: Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(suppl. 1) (2002)Google Scholar
  21. 21.
    Huang, H.-D., Horng, J.-T., Sun1, Y.-M., Tsou, A.-P., Huang, S.-L.: Identifying transcriptional regulatory sites in the human genome using an integrated system. Nucleic Acids Res. 32(6), 1948–1956 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Ao, W., Gaudet, J., Kent, W.J., Muttumu, S., Mango, S.E.: Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science 305, 1743–1746 (2004)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Doaa Altarawy
    • 1
  • Mohamed A. Ismail
    • 1
  • Sahar M. Ghanem
    • 1
  1. 1.Computer and Systems Engineering Dept. Faculty of EngineeringAlexandria UniversityAlexandriaEgypt

Personalised recommendations