Abstract
Using our techniques for extracting approximate non-tandem repeats[1] on well constructed maximal models, we derive an algorithm to find common motifs of length P that occur in N sequences with at most D differences under the Edit distance metric. We compare the effectiveness of our algorithm with the more involved algorithm of Sagot[17] for Edit distance on some real sequences. Her method has not been implemented before for Edit distance but only for Hamming distance[12],[20]. Our resulting method turns out to be simpler and more efficient theoretically and also in practice for moderately large P and D.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
E. F. Adebiyi, T. Jiang, and M. Kaufmann. An efficient algorithm for finding short approximate non-tandem repeats (Extended Abstract). Bioinformatics, 17(1):S5–S13, 2001.
E. F. Adebiyi. Pattern Discovery in Biology and Strings Sorting: Theory and Experimentation. Ph. D Thesis, 2002.
A. Blumer and A. Ehrenfeucht and others. Average size of suffix trees and DAWGS. Discrete Applied Mathematics, 24, 37–45, 1989.
J.-M. Claverie and S. Audic. The Statistical significance of nucleotide position-weight matrix matches. Computer Applications in Biosciences 12(5), 431–439, 1996.
M. Crochemore and M.-F. Sagot. Motifs in sequences: localization and extraction. In Handbook of Computational Chemistry, Crabbe, Drew, Konopka, eds., Marcel Dekker, Inc., 2001. To appear.
D. Gusfield. Algorithms on strings, trees and sequences. Cambridge University Press, New York, 1997.
J. D. Helmann. Compilation and analysis of Bacillus Subtilis σ A -dependent promoter sequences: evidence for extended contact between RNA polymerase and up-stream promoter DNA., Nucleic Acids Research, 23(13): 2351–2360, 1995.
L. C. K. Hui. Color set size problem with applications to string matching. In CPM Proceeding, vol. 644 of LNCS, 230–243, 1992.
S. Karlin, F. Ost, and B. E. Blaisdell. Patterns in DNA and amino acid sequences and their statistical significance. In M. S. Waterman, editor, Mathematical Methods for DNA Sequences, 133–158, 1989.
C. J. McInerny, J. F. Patridge, G. E. Mikesell, D. P. Creemer, and L. L. Breeden. A novel Mcm1-dependent element in the SWI4, CLN3, CDC6, CDC46, and CDC47 promoters activates M/G 1 -specific transcription. Genes and Development, 11: 1277–1288, 1997.
E. Myers. A sub-linear algorithm for approximate keyword matching. Algorithmica 12, 4–5, 345–374, 1994.
L. Marsan and M. F. Sagot. Extracting structured motifs using a suffix tree-algorithms and application to promoter consensus identification. RECOMB 2000.
P. Pevzner and S.-H. Sze. Combinatorial approaches to finding subtle signals in DNA sequences. ISMB, 269–278, 2000.
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes. In The Art of Scientific Computing, Cambridge University Press, Cambridge.
E. Rocke and M. Tompa. An algorithm for finding novel gaped motifs in DNA sequences. RECOMB, 228–233, 1998.
B. Schieber and U. Vishkin. On Finding Lowest Common Ancestors: Simplification and Parallelization. SIAM Journal on Computing, 17:1253–1262, 1988.
M.-F. Sagot. Spelling approximate repeated or common motifs using a suffix tree. LNCS 1380: 111–127, 1998.
J. F. Tomb et al. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature, 388, 539–547, 1997.
E. Ukkonen. Approximate string matching over suffix trees. LNCS 684: 228–242, 1993.
A. Vanet, L. Marsan, A. Labigne and M.-F. Sagot. Inferring regulatory elements from a whole genome. an analysis of Helicobacter pylori σ 80 family of promoter signals. J. Mol. Biol., 297, 335–353, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Adebiyi, E.F., Kaufmann, M. (2002). Extracting Common Motifs under the Levenshtein Measure: Theory and Experimentation. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_11
Download citation
DOI: https://doi.org/10.1007/3-540-45784-4_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44211-0
Online ISBN: 978-3-540-45784-8
eBook Packages: Springer Book Archive