Fast approximate matching using suffix trees

Cobbs, Archie L.

doi:10.1007/3-540-60044-2_33

Fast approximate matching using suffix trees

Archie L. Cobbs¹

Conference paper
First Online: 01 January 2005

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 937))

Abstract

Let T be a text of length n and P a pattern of length m, both strings over a fixed finite alphabet σ. We wish to find all approximate occurrences of P in T having weighted edit distance at most k from P: this is the approximate substring matching problem. We focus on the case in which T is fixed and preprocessed in linear time, while P and k vary over consecutive searches. We give an O(mq+t _vanocc) time and O(q) space algorithm, where q≤n depends on the problem instance, and t _vanocc is the size of the output. The running time is proportional to the amount of matching, in the worst case as fast as standard dynamic programming. The algorithm uses the suffix tree representation of the text. The best previous algorithm requires O(mq log q+t _vanocc) time and O(mq) space.

Supported by U.S. DOE Grant #DE-FG03-90ER60999.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

D. Benson, D. J. Lipman, and J. Ostell. Genbank. Nucl. Acids Res., 21(13):2963–2965, 1993.
PubMed Google Scholar
A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, and R. McConnell. Building a complete inverted file for a set of text files in linear time. In FOCS, pages 349–358. ACM, January 1984.
Google Scholar
M. T. Chen and Joel Seiferas. Efficient and Elegant Subword Tree Construction, pages 97–107. Springer-Verlag, Berlin, 1985.
Google Scholar
P. Jokinen and E. Ukkonen. Two algorithms for approximate string matching in static texts. In Proc. MFCS 1991, volume 16, pages 240–248. Springer-Verlag, September 1991.
Google Scholar
G. M. Landau and U. Vishkin. Fast string matching with k differences. J. Comp. Sys. Sci., 37:63–78, 1988.
Article Google Scholar
G. M. Landau and U. Vishkin. Fast parallel and serial approximate string matching. J. Algorithms, 10:157–169, 1989.
Article Google Scholar
Edward M. McCreight. A space-economical suffix tree construction algorithm. JACM, 23(2):262–272, April 1976.
Google Scholar
E. W. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12:345–374, 1994.
Article Google Scholar
D. Sankoff and J. B. Kruskal, editors. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, 1983.
Google Scholar
E. Ukkonen. Approximate matching over suffix trees. In Proc. Combinatorial Pattern Matching 1993, volume 4, pages 228–242. Springer-Verlag, June 1993.
Google Scholar
P. Weiner. Linear pattern matching algorithms. In 14th Annual Symposium on Switching and Automata Theory, pages 1–11. IEEE, 1973.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Division, University of California Berkeley, 94720, Berkeley, CA
Archie L. Cobbs

Authors

Archie L. Cobbs
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zvi Galil Esko Ukkonen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cobbs, A.L. (1995). Fast approximate matching using suffix trees. In: Galil, Z., Ukkonen, E. (eds) Combinatorial Pattern Matching. CPM 1995. Lecture Notes in Computer Science, vol 937. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60044-2_33

Download citation

DOI: https://doi.org/10.1007/3-540-60044-2_33
Published: 31 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60044-2
Online ISBN: 978-3-540-49412-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics