Skip to main content

Motif Discovery Using Multi-Objective Genetic Algorithm in Biosequences

  • Conference paper
Advances in Intelligent Data Analysis VII (IDA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4723))

Included in the following conference series:

Abstract

We propose an efficient method using multi-objective genetic algorithm (MOGAMOD) to discover optimal motifs in sequential data. The main advantage of our approach is that a large number of tradeoff (i.e., nondominated) motifs can be obtained by a single run with respect to conflicting objectives: similarity, motif length and support maximization. To the best of our knowledge, this is the first effort in this direction. MOGAMOD can be applied to any data set with a sequential character. Furthermore, it allows any choice of similarity measures for finding motifs. By analyzing the obtained optimal motifs, the decision maker can understand the tradeoff between the objectives. We compare MOGAMOD with the three well-known motif discovery methods, AlignACE, MEME and Weeder. Experimental results on real data set extracted from TRANSFAC database demonstrate that the proposed method exhibits good performance over the other methods in terms of runtime, the number of shaded samples and multiple motifs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bailey, T.L, Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proc. Second Int. Conf. ISMB, USA, pp. 28–36 (1994)

    Google Scholar 

  2. Zhang, Y., Zaki, M.: EXMOTIF: Efficient structured motif extraction. Algorithms for Molecular Biology 1, 21 (2006)

    Article  Google Scholar 

  3. Zhang, Y., Zaki, M.: SMOTIF: Efficient structured pattern and motif search. Algorithms for Molecular Biology 1, 22 (2006)

    Article  Google Scholar 

  4. Pisanti, N., Carvalho, A.M., Marsan, L., Sagot, M.F.: RISOTTO: Fast extraction of motifs with mismatches. In: 7th Latin American Theoretical Informatics Symposium (2006)

    Google Scholar 

  5. Che, D., et al.: MDGA: motif discovery using a genetic algorithm. In: Proc. GECCO 2005, USA, pp. 447–452 (2005)

    Google Scholar 

  6. Congdon, C.B., et al.: Preliminary results for GAMI: a genetic algorithms approach to motif inference. In: Proc. CIBCB 2005, USA, pp. 1–8 (2005)

    Google Scholar 

  7. Deb, K., et al.: A fast and elitist multi-objective genetic algorithm: NSGA II. IEEE Trans. Evolutionary Computation 6, 182–197 (2002)

    Article  Google Scholar 

  8. D’heaseleer, P.: What are DNA sequence motifs? Nat. Biotechnol 24, 423–425 (2006)

    Article  Google Scholar 

  9. Kaya, M., Alhajj, R.: Integrating Multi-Objective Genetic Algorithms into Clustering for Fuzzy Association Rules Mining. In: IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK (2004)

    Google Scholar 

  10. Kaya, M., Alhajj, R.: Multi-Objective Genetic Algorithm Based Approach for Optimizing Fuzzy Sequential Patterns. In: 16th IEEE International Conference on Tools with Artificial Intelligence, 15-17 November 2004, Boca Raton, FL, USA (2004)

    Google Scholar 

  11. Kaya, M.: Multi-Objective Genetic Algorithm Based Approaches for Mining Optimized Fuzzy Association Rules. Soft Computing Journal 10(7), 578–586 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  12. Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. In: Proc. STOC, USA, pp. 473–482 (1999)

    Google Scholar 

  13. Liu, F.M.M., et al.: FMGA: finding motifs by genetic algorithm. In: Proc. BIBE 2004 Taiwan, pp. 459–466 (2004)

    Google Scholar 

  14. Notredame, C., Higgins, D.G.: SAGA: Sequence Alignment by Genetic Algorithm. Nucleic Acids Res. 24, 1515–1524 (1996)

    Article  Google Scholar 

  15. Paul, T.K., Iba, H.: Identification of weak motifs in multiple biological sequences using genetic algorithm. In: Proc.GECCO 2006, USA, pp. 271–278 (2006)

    Google Scholar 

  16. Pavesi, G., et al.: Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 32, W199–W203 (2004)

    Article  Google Scholar 

  17. Roth, F.P., et al.: Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol. 16, 939–945 (1998)

    Article  Google Scholar 

  18. Sinha, S., Tompa, M.: YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 31, 3586–3588 (2003)

    Article  Google Scholar 

  19. Stine, M., et al.: Motif discovery in upstream sequences of coordinately expressed genes. In: CEC 2003, USA, pp. 1596–1603 (2003)

    Google Scholar 

  20. Tatusova, T.A., Madden, T.L.: Blast2 sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiology Letters 2, 247–250 (1999)

    Article  Google Scholar 

  21. Thijs, G., et al.: A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J. Comp. Biol. 9, 447–464 (2002)

    Article  Google Scholar 

  22. Thompson, W., et al.: Gibbs Recursive Sampler: Finding transcription factor binding sites. J. Nucleic Acids Research 31, 3580–3585 (2003)

    Article  Google Scholar 

  23. Tompa, M.: An exact method for finding short motifs in sequences with application to the ribosome binding site problem. In: Proc. Int. Conf. ISMB, Germany, pp. 262–271 (1999)

    Google Scholar 

  24. Tompa, M., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144 (2005)

    Article  Google Scholar 

  25. Wingender, E., et al.: TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Research 24, 238–241 (1996)

    Article  Google Scholar 

  26. Paul, T.K., Iba, H.: Identification of weak motifs in multiple biological sequences using genetic algorithm. In: Proc.GECCO 2006, USA, pp. 271–278 (2006)

    Google Scholar 

  27. Zitzler, E., et al.: Comparison of multiobjective evolutionary algorithms: empirical results. Evolutionary Computation 2, 173–195 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Michael R. Berthold John Shawe-Taylor Nada Lavrač

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kaya, M. (2007). Motif Discovery Using Multi-Objective Genetic Algorithm in Biosequences. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74825-0_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74824-3

  • Online ISBN: 978-3-540-74825-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics