Skip to main content

New Algorithms for Finding Monad Patterns in DNA Sequences

  • Conference paper
Book cover String Processing and Information Retrieval (SPIRE 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3246))

Included in the following conference series:

Abstract

In this paper, we present two new algorithms for discovering monad patterns in DNA sequences. Monad patterns are of the form (l,d)-k, where l is the length of the pattern, d is the maximum number of mismatches allowed, and k is the minimum number of times the pattern is repeated in the given sample. The time-complexity of some of the best known algorithms to date is O(nt 2 l d σ d), where t is the number of input sequences, n is the length of each input sequence, and σ = | ∑ | is the size of the alphabet. The first algorithm that we present in this paper takes \(O(n^{2}t^{2}l^{\frac{d}{2}})\) time and \(O(ntl^{\frac{d}{2}}\sigma^{\frac{d}{2}})\) space, and the second algorithm takes \(O(n^3t^3l^\frac{d}{2}\sigma^{\frac{d}{2}})\) time using \(O(l^\frac{d}{2}\sigma^{\frac{d}{2}})\) space. In practice, our algorithms have much better performance provided the d/l ratio is small. The second algorithm performs very well even for large values l and d as long as the d/l ratio is small.

This research was partially supported by NSF grant number: ITR-0312724.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buhler, J., Tompa, B.: Finding motifs using random projections. In: Proc. of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB 2001), pp. 69–76 (2001)

    Google Scholar 

  2. Eskin, E., Keich, U., Gelfand, M.S., Pevzner, P.A.: Genome-wide analysis of bacterial promoter regions. In: Proc. of the Pacific Symposium on Biocomputing PSB − 2003, Kauì, Hawaii, January 3-7 (2003)

    Google Scholar 

  3. Eskin, E., Pevzner, P.A.: Finding composite regulatory patterns in DNA sequences. In: Proc. of the Tenth International Conference on Intelligent Systems for Molecular Biology (ISMB 2002), Edmonton, Canada, August 3-7 (2002)

    Google Scholar 

  4. Guha Thakurtha, D., Stormo, G.D.: Identifying target sites for cooperatively binding factors. Bioinformatics 15, 563–577 (2001)

    Google Scholar 

  5. Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 10, 1205–1214 (1999)

    Google Scholar 

  6. Liang, S.: cWINNOWER Algorithm for finding fuzzy DNA motifs. In: Proc. of the 2003 IEEE Computational Systems Bioinformatics conference (CSB 2003), pp. 260–265 (2003)

    Google Scholar 

  7. Marsan, L., Sagot, M.: Algorithms for extracting structured motifs using suffix tree with applications to promoter and regulatory site consensus identification. Journal of Computational Biology 7, 345–360 (2000)

    Article  Google Scholar 

  8. Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. In: Proc. of the Ninth International Conference on Intelligent Systems for Molecular Biology (2001)

    Google Scholar 

  9. Pevzner, P.A., Sze, S.: Combinatorial approaches to finding subtle motifs in DNA sequences. In: Proc. of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 269–278 (2000)

    Google Scholar 

  10. Price, A., Ramabhadran, S., Pevzner, A.: Finding subtle motifs by branching from sample strings. Bioinformatics 19, 149–155 (2003)

    Article  Google Scholar 

  11. Sagot, M.: Spelling approximate or repeated motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 111–127. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  12. van Helden, J., Rios, A.F., Collado-Vides, J.: Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Research 28, 1808–1818 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Satya, R.V., Mukherjee, A. (2004). New Algorithms for Finding Monad Patterns in DNA Sequences. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30213-1_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23210-0

  • Online ISBN: 978-3-540-30213-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics