Abstract
In this paper, we present two new algorithms for discovering monad patterns in DNA sequences. Monad patterns are of the form (l,d)-k, where l is the length of the pattern, d is the maximum number of mismatches allowed, and k is the minimum number of times the pattern is repeated in the given sample. The time-complexity of some of the best known algorithms to date is O(nt 2 l d σ d), where t is the number of input sequences, n is the length of each input sequence, and σ = | ∑ | is the size of the alphabet. The first algorithm that we present in this paper takes \(O(n^{2}t^{2}l^{\frac{d}{2}})\) time and \(O(ntl^{\frac{d}{2}}\sigma^{\frac{d}{2}})\) space, and the second algorithm takes \(O(n^3t^3l^\frac{d}{2}\sigma^{\frac{d}{2}})\) time using \(O(l^\frac{d}{2}\sigma^{\frac{d}{2}})\) space. In practice, our algorithms have much better performance provided the d/l ratio is small. The second algorithm performs very well even for large values l and d as long as the d/l ratio is small.
This research was partially supported by NSF grant number: ITR-0312724.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Buhler, J., Tompa, B.: Finding motifs using random projections. In: Proc. of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB 2001), pp. 69–76 (2001)
Eskin, E., Keich, U., Gelfand, M.S., Pevzner, P.A.: Genome-wide analysis of bacterial promoter regions. In: Proc. of the Pacific Symposium on Biocomputing PSB − 2003, Kauì, Hawaii, January 3-7 (2003)
Eskin, E., Pevzner, P.A.: Finding composite regulatory patterns in DNA sequences. In: Proc. of the Tenth International Conference on Intelligent Systems for Molecular Biology (ISMB 2002), Edmonton, Canada, August 3-7 (2002)
Guha Thakurtha, D., Stormo, G.D.: Identifying target sites for cooperatively binding factors. Bioinformatics 15, 563–577 (2001)
Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 10, 1205–1214 (1999)
Liang, S.: cWINNOWER Algorithm for finding fuzzy DNA motifs. In: Proc. of the 2003 IEEE Computational Systems Bioinformatics conference (CSB 2003), pp. 260–265 (2003)
Marsan, L., Sagot, M.: Algorithms for extracting structured motifs using suffix tree with applications to promoter and regulatory site consensus identification. Journal of Computational Biology 7, 345–360 (2000)
Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. In: Proc. of the Ninth International Conference on Intelligent Systems for Molecular Biology (2001)
Pevzner, P.A., Sze, S.: Combinatorial approaches to finding subtle motifs in DNA sequences. In: Proc. of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 269–278 (2000)
Price, A., Ramabhadran, S., Pevzner, A.: Finding subtle motifs by branching from sample strings. Bioinformatics 19, 149–155 (2003)
Sagot, M.: Spelling approximate or repeated motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 111–127. Springer, Heidelberg (1998)
van Helden, J., Rios, A.F., Collado-Vides, J.: Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Research 28, 1808–1818 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Satya, R.V., Mukherjee, A. (2004). New Algorithms for Finding Monad Patterns in DNA Sequences. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_40
Download citation
DOI: https://doi.org/10.1007/978-3-540-30213-1_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive