Advertisement

Fast two-dimensional approximate pattern matching

  • Ricardo Baeza-Yates
  • Gonzalo Navarro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1380)

Abstract

We address the problem of approximate string matching in two dimensions, that is, to find a pattern of size m×m in a text of size n×n with at most k errors (substitutions, insertions and deletions). Although the problem can be solved using dynamic programming in time O(m 2 n 2), this is in general too expensive for small k. So we design a filtering algorithm which avoids verifying most of the text with dynamic programming. This filter is based on a one-dimensional multi-pattern approximate search algorithm. The average complexity of our resulting algorithm is O(n 2 klogσ m /m 2) for k < m(m+l)/(5logσ m), which is optimal and matches the best previous result which allows only substitutions. For higher error levels, we present an algorithm with time complexity O(n 2 k/(w√σ)) (where w is the size in bits of the computer word and σ is the alphabet size). This algorithm works for k < m(m+1)(l−e/√σ), where e=2.718..., a limit which is not possible to improve. These are the first good expected-case algorithms for the problem. Our algorithms work also for rectangular patterns and rectangular text and can even be extended to the case where each row in the pattern and the text has a different length.

Keywords

Error Level Edit Distance Approximate Match Approximate String Match Computer Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Amir and M. Farach. Efficient 2-dimensional approximate matching of nonrectangular figures. In Proc. SODA '91, pages 212–223, 1991.Google Scholar
  2. 2.
    A. Amir and G. Landau. Fast parallel and serial multidimensional approximate array matching. Theoretical Computer Science, 81:97–115, 1991.CrossRefMathSciNetzbMATHGoogle Scholar
  3. 3.
    R. Baeza-Yates. Similarity in two dimensional strings. Dept. of Computer Science, University of Chile, 1996.Google Scholar
  4. 4.
    R. Baeza-Yates and G. Navarro. A faster algorithm for approximate string matching. In Proc. CPM'96, LNCS 1075, pages 1–23, 1996. ftp://ftp.dcc.uchile.cl/-pub/users/gnavarro/cpm96.ps.gz.MathSciNetGoogle Scholar
  5. 5.
    R. Baeza-Yates and G. Navarro. Multiple approximate string matching. In Proc. WADS'97, LNCS 1272, pages 174–184, 1997. ftp://ftp.dcc.uchile.cl/pub/-users/gnavarro/vads97.ps.gz.Google Scholar
  6. 6.
    R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, LNCS 644, pages 185–192, 1992.Google Scholar
  7. 7.
    R. Baeza-Yates and M. Régnier. Fast two dimensional pattern matching. Information Processing Letters, 45:51–57, 1993.CrossRefzbMATHGoogle Scholar
  8. 8.
    T. Baker. A technique for extending rapid exact string matching to arrays of more than one dimension. SIAM Journal on Computing, 7:533–541, 1978.zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    R. Bird. Two dimensional pattern matching. Inf. Proc. Letters, 6:168–170, 1977.CrossRefGoogle Scholar
  10. 10.
    W. Chang and J. Lampe. Theoretical and empirical comparisons of approximate string matching algorithms. In Proc. CPM'92, LNCS 644, pages 172–181, 1992.MathSciNetGoogle Scholar
  11. 11.
    M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, Oxford, UK, 1994.zbMATHGoogle Scholar
  12. 12.
    J. Karkkäinen and E. Ukkonen. Two and higher dimensional pattern matching in optimal expected time. In Proc. SODA '94, pages 715–723. SIAM, 1994.Google Scholar
  13. 13.
    K. Krithivasan. Efficient two-dimensional parallel and serial approximate pattern matching. Technical Report CAR-TR-259, University of Maryland, 1987.Google Scholar
  14. 14.
    K. Krithivasan and R. Sitalakshmi. Efficient two-dimensional pattern matching in the presence of errors. Information Sciences, 43:169–184, 1987.CrossRefGoogle Scholar
  15. 15.
    G. Landau and U. Vishkin. Fast string matching with k differences. J. of Computer Systems Science, 37:63–78, 1988.CrossRefMathSciNetzbMATHGoogle Scholar
  16. 16.
    R. Muth and U. Manber. Approximate multiple string search. In Proc. CPM'96, LNCS 1075, pages 75–86, 1996.MathSciNetGoogle Scholar
  17. 17.
    G. Navarro. Multiple approximate string matching by counting. In Proc. WSP'97, pages 125–139, 1997. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/-wsp97.1.ps.gz.Google Scholar
  18. 18.
    K. Park. Analysis of two dimensional approximate pattern matching algorithms. In Proc. CPM'96, LNCS 1075, pages 335–347, 1996.Google Scholar
  19. 19.
    P. Sellers. The theory and computation of evolutionary distances: pattern recognition. J. of Algorithms, 1:359–373, 1980.zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    E. Sutinen and J. Tarhio. On using g-gram locations in approximate string matching. In Proc. ESA '95, LNCS 979, 1995.Google Scholar
  21. 21.
    Esko Ukkonen. Finding approximate patterns in strings. J. of Algorithms, 6:132–137, 1985.zbMATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    S. Wu and U. Manber. Fast text searching allowing errors. CACM, 35(10):83–91, October 1992.Google Scholar
  23. 23.
    S. Wu, U. Manber, and E. Myers. A sub-quadratic algorithm for approximate limited expression matching. Algorithmica, 15(1):50–67, 1996.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Ricardo Baeza-Yates
    • 1
  • Gonzalo Navarro
    • 1
  1. 1.Dept. of Computer ScienceUniversity of ChileSantiagoChile

Personalised recommendations