Fast two-dimensional approximate pattern matching
We address the problem of approximate string matching in two dimensions, that is, to find a pattern of size m×m in a text of size n×n with at most k errors (substitutions, insertions and deletions). Although the problem can be solved using dynamic programming in time O(m 2 n 2), this is in general too expensive for small k. So we design a filtering algorithm which avoids verifying most of the text with dynamic programming. This filter is based on a one-dimensional multi-pattern approximate search algorithm. The average complexity of our resulting algorithm is O(n 2 klogσ m /m 2) for k < m(m+l)/(5logσ m), which is optimal and matches the best previous result which allows only substitutions. For higher error levels, we present an algorithm with time complexity O(n 2 k/(w√σ)) (where w is the size in bits of the computer word and σ is the alphabet size). This algorithm works for k < m(m+1)(l−e/√σ), where e=2.718..., a limit which is not possible to improve. These are the first good expected-case algorithms for the problem. Our algorithms work also for rectangular patterns and rectangular text and can even be extended to the case where each row in the pattern and the text has a different length.
KeywordsError Level Edit Distance Approximate Match Approximate String Match Computer Word
Unable to display preview. Download preview PDF.
- 1.A. Amir and M. Farach. Efficient 2-dimensional approximate matching of nonrectangular figures. In Proc. SODA '91, pages 212–223, 1991.Google Scholar
- 3.R. Baeza-Yates. Similarity in two dimensional strings. Dept. of Computer Science, University of Chile, 1996.Google Scholar
- 5.R. Baeza-Yates and G. Navarro. Multiple approximate string matching. In Proc. WADS'97, LNCS 1272, pages 174–184, 1997. ftp://ftp.dcc.uchile.cl/pub/-users/gnavarro/vads97.ps.gz.Google Scholar
- 6.R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, LNCS 644, pages 185–192, 1992.Google Scholar
- 12.J. Karkkäinen and E. Ukkonen. Two and higher dimensional pattern matching in optimal expected time. In Proc. SODA '94, pages 715–723. SIAM, 1994.Google Scholar
- 13.K. Krithivasan. Efficient two-dimensional parallel and serial approximate pattern matching. Technical Report CAR-TR-259, University of Maryland, 1987.Google Scholar
- 17.G. Navarro. Multiple approximate string matching by counting. In Proc. WSP'97, pages 125–139, 1997. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/-wsp97.1.ps.gz.Google Scholar
- 18.K. Park. Analysis of two dimensional approximate pattern matching algorithms. In Proc. CPM'96, LNCS 1075, pages 335–347, 1996.Google Scholar
- 20.E. Sutinen and J. Tarhio. On using g-gram locations in approximate string matching. In Proc. ESA '95, LNCS 979, 1995.Google Scholar
- 22.S. Wu and U. Manber. Fast text searching allowing errors. CACM, 35(10):83–91, October 1992.Google Scholar