Advertisement

Bit-Parallel Approximate String Matching Algorithms with Transposition

  • Heikki Hyyrö
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2857)

Abstract

Using bit-parallelism has resulted in fast and practical algorithms for approximate string matching under the Levenshtein edit distance, which permits a single edit operation to insert, delete or substitute a character. Depending on the parameters of the search, currently the fastest non-filtering algorithms in practice are the O(knm/w ⌉) algorithm of Wu & Manber, the O(⌈km/wn) algorithm of Baeza-Yates & Navarro, and the O(⌈m/wn) algorithm of Myers, where m is the pattern length, n is the text length, k is the error threshold and w is the computer word size. In this paper we discuss a uniform way of modifying each of these algorithms to permit also a fourth type of edit operation: transposing two adjacent characters in the pattern. This type of edit distance is also known as the Damerau edit distance. In the end we also present an experimental comparison of the resulting algorithms.

Keywords

Edit Distance Edit Operation Pattern Length Approximate String Match Dynamic Programming Table 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baeza-Yates, R., Navarro, G.: Faster approximate string matching. Algorithmica 23(2), 127–158 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Damerau, F.: A technique for computer detection and correction of spelling errors. Comm. of the ACM 7(3), 171–176 (1964)CrossRefGoogle Scholar
  3. 3.
    Du, M.W., Chang, S.C.: A model and a fast algorithm for multiple errors spelling correction. Acta Informatica 29, 281–302 (1992)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Harman, D.: Overview of the Third Text REtrieval Conference. In: Proc. Third Text REtrieval Conference (TREC-3), pp. 1–19. NIST Special Publication 500-207 (1995)Google Scholar
  5. 5.
    Kukich, K.: Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), 377–439 (1992)CrossRefGoogle Scholar
  6. 6.
    Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10(8), 707–710 (1966); Original in Russian in Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)MathSciNetGoogle Scholar
  7. 7.
    Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic progamming. Journal of the ACM 46(3), 395–415 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)CrossRefGoogle Scholar
  9. 9.
    Navarro, G.: NR-grep: a fast and flexible pattern matching tool. Software Practice. Software Practice and Experience (SPE) 31, 1265–1312 (2001)zbMATHCrossRefGoogle Scholar
  10. 10.
    Navarro, G., Baeza-Yates, R.: Improving an algorithm for approximate pattern matching. Algorithmica 30(4), 473–502 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Sellers, P.: The theory and computation of evolutionary distances: pattern recognition. J. of Algorithms 1, 359–373 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Ukkonen, E.: Algorithms for approximate string matching. Information and Control 64, 100–118 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Ukkonen, E.: Finding approximate patterns in strings. J. of Algorithms 6, 132–137 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Wright, A.: Approximate string matching using within-word parallelism. Software Practice and Experience 24(4), 337–362 (1994)zbMATHCrossRefGoogle Scholar
  15. 15.
    Wu, S., Manber, U.: Fast text searching allowing errors. Comm. of the ACM 35(10), 83–91 (1992)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Heikki Hyyrö
    • 1
  1. 1.Department of Computer and Information SciencesUniversity of TampereFinland

Personalised recommendations