Advertisement

International Workshop on Combinatorial Algorithms

IWOCA 2014: Combinatorial Algorithms pp 49-61 | Cite as

Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance

  • Carl Barton
  • Costas S. Iliopoulos
  • Solon P. PissisEmail author
  • William F. Smyth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8986)

Abstract

In this article, we introduce a new and simple data structure, the prefix table under Hamming distance, and present two algorithms to compute it efficiently: one asymptotically fast; the other very fast on average and in practice. Because the latter approach avoids the computation of global data structures, such as the suffix array and the longest common prefix array, it yields algorithms much faster in practice than existing methods. We show how this data structure can be used to solve two string problems of interest: (a) approximate string matching under Hamming distance; and (b) longest approximate overlap under Hamming distance. Analogously, we introduce the prefix table under edit distance, and present an efficient algorithm for its computation. In the process, we also define the border array under both distance measures, and provide an algorithm for conversion between prefix tables and border arrays.

Keywords

Edit Distance Suffix Array Approximate String Match Dynamic Programming Matrix Longe Common Prefix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Abrahamson, K.: Generalized string matching. SIAM J. Comput. 16(6), 1039–1051 (1987)zbMATHMathSciNetCrossRefGoogle Scholar
  2. 2.
    Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with \(k\) mismatches. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2000), pp. 794–803. Society for Industrial and Applied Mathematics, USA (2000)Google Scholar
  3. 3.
    Bland, W., Kucherov, G., Smyth, W.F.: Prefix table construction and conversion. In: Lecroq, T., Mouchard, L. (eds.) IWOCA 2013. LNCS, vol. 8288, pp. 41–53. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  4. 4.
    Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, New York (2007) zbMATHCrossRefGoogle Scholar
  5. 5.
    Dori, S., Landau, G.M.: Construction of Aho Corasick automaton in linear time for integer alphabets. Inf. Process. Lett. 98(2), 66–72 (2006)zbMATHMathSciNetCrossRefGoogle Scholar
  6. 6.
    Fischer, J.: Inducing the LCP-array. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844, pp. 374–385. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  7. 7.
    Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)zbMATHMathSciNetCrossRefGoogle Scholar
  8. 8.
    Fredriksson, K., Navarro, G.: Average-optimal single and multiple approximate string matching. J. Exp. Algorithmics 9, 1–47 (2004). http://doi.acm.org/10.1145/1005813.1041513 MathSciNetGoogle Scholar
  9. 9.
    Galil, Z., Giancarlo, R.: Improved string matching with \(k\) mismatches. ACM SIGACT News 17(4), 52–54 (1986)CrossRefGoogle Scholar
  10. 10.
    Hall, H.S., Knight, S.R.: Higher Algebra. MacMillan, London (1950) Google Scholar
  11. 11.
    Hsu, P.-H., Chen, K.-Y., Chao, K.-M.: Finding all approximate gapped palindromes. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1084–1093. Springer, Heidelberg (2009). http://dx.doi.org/10.1007/3-540-12689-9_129 CrossRefGoogle Scholar
  12. 12.
    Ilie, L., Navarro, G., Tinta, L.: The longest common extension problem revisited and applications to approximate string searching. J. Discrete Algorithms 8(4), 418–428 (2010)zbMATHMathSciNetCrossRefGoogle Scholar
  13. 13.
    Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27–2, 557–582 (1998)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Landau, G.M., Vishkin, U.: Efficient string matching in the presence of errors. In: IEEE (ed.) Proceedings of the 26th Annual Symposium on Foundations of Computer Science (FOCS 1985), USA, pp. 126–136. IEEE Computer Society (1985)Google Scholar
  15. 15.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Technical report 8 (1966)Google Scholar
  16. 16.
    Main, M.G., Lorentz, R.J.: An \(\cal O\)(n log n) algorithm for finding all repetitions in a string. J. Algs 5, 422–432 (1984)zbMATHMathSciNetCrossRefGoogle Scholar
  17. 17.
    Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Proceedings of the 2009 Data Compression Conference, DCC 2009, pp. 193–202, IEEE Computer Society, Washington, DC (2009)Google Scholar
  18. 18.
    Pizza & Chili, April 2013. http://pizzachili.dcc.uchile.cl/
  19. 19.
    Smyth, B.: Computing Patterns in Strings. Pearson Addison-Wesley, London (2003) Google Scholar
  20. 20.
    Smyth, W.F., Wang, S.: New perspectives on the prefix array. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 133–143. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  21. 21.
    StringPedia, April 2013. http://stringpedia.bsmithers.co.uk
  22. 22.
    Ukkonen, E.: On approximate string matching. In: Karpinski, M. (ed.) Foundations of Computation Theory. Lecture Notes in Computer Science, vol. 158, pp. 487–495. Springer, Heidelberg (1983). http://dx.doi.org/10.1007/3-540-12689-9_129 CrossRefGoogle Scholar
  23. 23.
    Välimäki, N., Ladra, S., Mäkinen, V.: Approximate all-pairs suffix/prefix overlaps. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 76–87. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  24. 24.
    Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35(10), 83–91 (1992)CrossRefGoogle Scholar
  25. 25.
    Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: PEAR: a fast and accurate Illumina paired-end reAd mergeR. Bioinformatics 30(5), 614–620 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Carl Barton
    • 1
  • Costas S. Iliopoulos
    • 1
    • 3
  • Solon P. Pissis
    • 1
    Email author
  • William F. Smyth
    • 2
  1. 1.King’s College LondonLondonUK
  2. 2.McMaster UniversityHamiltonCanada
  3. 3.University of Western AustraliaCrawleyAustralia

Personalised recommendations