New Algorithms for Finding Monad Patterns in DNA Sequences

Satya, Ravi Vijaya; Mukherjee, Amar

doi:10.1007/978-3-540-30213-1_40

Ravi Vijaya Satya¹⁸ &
Amar Mukherjee¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3246))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

711 Accesses
3 Citations

Abstract

In this paper, we present two new algorithms for discovering monad patterns in DNA sequences. Monad patterns are of the form (l,d)-k, where l is the length of the pattern, d is the maximum number of mismatches allowed, and k is the minimum number of times the pattern is repeated in the given sample. The time-complexity of some of the best known algorithms to date is O(nt ² l ^d σ ^d), where t is the number of input sequences, n is the length of each input sequence, and σ = | ∑ | is the size of the alphabet. The first algorithm that we present in this paper takes \(O(n^{2}t^{2}l^{\frac{d}{2}})\) time and \(O(ntl^{\frac{d}{2}}\sigma^{\frac{d}{2}})\) space, and the second algorithm takes \(O(n^3t^3l^\frac{d}{2}\sigma^{\frac{d}{2}})\) time using \(O(l^\frac{d}{2}\sigma^{\frac{d}{2}})\) space. In practice, our algorithms have much better performance provided the d/l ratio is small. The second algorithm performs very well even for large values l and d as long as the d/l ratio is small.

This research was partially supported by NSF grant number: ITR-0312724.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Buhler, J., Tompa, B.: Finding motifs using random projections. In: Proc. of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB 2001), pp. 69–76 (2001)
Google Scholar
Eskin, E., Keich, U., Gelfand, M.S., Pevzner, P.A.: Genome-wide analysis of bacterial promoter regions. In: Proc. of the Pacific Symposium on Biocomputing PSB − 2003, Kauì, Hawaii, January 3-7 (2003)
Google Scholar
Eskin, E., Pevzner, P.A.: Finding composite regulatory patterns in DNA sequences. In: Proc. of the Tenth International Conference on Intelligent Systems for Molecular Biology (ISMB 2002), Edmonton, Canada, August 3-7 (2002)
Google Scholar
Guha Thakurtha, D., Stormo, G.D.: Identifying target sites for cooperatively binding factors. Bioinformatics 15, 563–577 (2001)
Google Scholar
Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 10, 1205–1214 (1999)
Google Scholar
Liang, S.: cWINNOWER Algorithm for finding fuzzy DNA motifs. In: Proc. of the 2003 IEEE Computational Systems Bioinformatics conference (CSB 2003), pp. 260–265 (2003)
Google Scholar
Marsan, L., Sagot, M.: Algorithms for extracting structured motifs using suffix tree with applications to promoter and regulatory site consensus identification. Journal of Computational Biology 7, 345–360 (2000)
Article Google Scholar
Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. In: Proc. of the Ninth International Conference on Intelligent Systems for Molecular Biology (2001)
Google Scholar
Pevzner, P.A., Sze, S.: Combinatorial approaches to finding subtle motifs in DNA sequences. In: Proc. of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 269–278 (2000)
Google Scholar
Price, A., Ramabhadran, S., Pevzner, A.: Finding subtle motifs by branching from sample strings. Bioinformatics 19, 149–155 (2003)
Article Google Scholar
Sagot, M.: Spelling approximate or repeated motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 111–127. Springer, Heidelberg (1998)
Chapter Google Scholar
van Helden, J., Rios, A.F., Collado-Vides, J.: Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Research 28, 1808–1818 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Central Florida, Orlando, FL, USA, 32816-2362
Ravi Vijaya Satya & Amar Mukherjee

Authors

Ravi Vijaya Satya
View author publications
You can also search for this author in PubMed Google Scholar
Amar Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Georgia Institute of Technology and Università di Padova,
Alberto Apostolico
Department of Information Engineering, University of Padova,
Massimo Melucci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Satya, R.V., Mukherjee, A. (2004). New Algorithms for Finding Monad Patterns in DNA Sequences. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_40

Download citation

DOI: https://doi.org/10.1007/978-3-540-30213-1_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics