Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Approximate Pattern Matching with the L 1, L 2 and L Metrics

Abstract

Given an alphabet Σ={1,2,…,|Σ|} text string T∈Σn and a pattern string P∈Σm, for each i=1,2,…,nm+1 define L p (i) as the p-norm distance when the pattern is aligned below the text and starts at position i of the text. The problem of pattern matching with L p distance is to compute L p (i) for every i=1,2,…,nm+1. We discuss the problem for d=1,2,∞. First, in the case of L 1 matching (pattern matching with an L 1 distance) we show a reduction of the string matching with mismatches problem to the L 1 matching problem and we present an algorithm that approximates the L 1 matching up to a factor of 1+ε, which has an \(O(\frac{1}{\varepsilon^{2}}n\log m\log|\Sigma|)\) run time. Then, the L 2 matching problem (pattern matching with an L 2 distance) is solved with a simple O(nlog m) time algorithm. Finally, we provide an algorithm that approximates the L matching up to a factor of 1+ε with a run time of \(O(\frac{1}{\varepsilon}n\log m\log|\Sigma|)\) . We also generalize the problem of String Matching with mismatches to have weighted mismatches and present an O(nlog 4 m) algorithm that approximates the results of this problem up to a factor of O(log m) in the case that the weight function is a metric.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    Abrahamson, K.: Generalized string matching. SIAM J. Comput. 16(6), 1039–1051 (1987)

  2. 2.

    Amir, A., Farach, M.: Efficient 2-dimensional approximate matching of half-rectangular figures. Inf. Comput. 118(1), 1–11 (1995)

  3. 3.

    Amir, A., Farach, M., Muthukrishnan, S.: Alphabet dependence in parameterized matching. Inf. Process. Lett. 49, 111–115 (1994)

  4. 4.

    Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. J. Algorithms 50(2), 257–275 (2004). Special SODA 2000 issue

  5. 5.

    Atallah, M.J., Chyzak, F., Dumas, P.: A randomized algorithm for approximate string matching. Algorithmica 29(3), 468–486 (2001)

  6. 6.

    Baker, B.S.: A theory of parameterized pattern matching: algorithms and applications. In: Proc. 25th Annual ACM Symposium on the Theory of Computing, pp. 71–80 (1993)

  7. 7.

    Chu, K.K.W., Wong, M.H.: Fast time-series searching with scaling and shifting. In: Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 31–June 2, 1999, Philadelphia, Pennsylvania, pp. 237–248. ACM, New York (1999)

  8. 8.

    Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. of the 34th Annual ACM Symposium on Theory of Computing, pp. 592–601. ACM, New York (2002)

  9. 9.

    Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press, Cambridge (1992)

  10. 10.

    Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 667–676. ACM, New York (2002)

  11. 11.

    Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: Proceedings 1994 ACM SIGMOD Conference, Mineapolis, MN, pp. 419–429 (1994)

  12. 12.

    Fischer, M.J., Paterson, M.S.: String matching and other products. In: Karp, R.M. (ed.) Complexity of Computation. SIAM-AMS Proceedings, vol.7, pp. 113–125 (1974)

  13. 13.

    Hirschberg, D.S.: Serial computations of Levenshtein distances. In: Apostolico, A., Galil, Z. (eds.) Pattern Matching Algorithms, pp. 123–141. Oxford University Press, London (1997)

  14. 14.

    Indyk, P.: Faster algorithms for string matching problems: matching the convolution bound. In: Proceedings of the 39th IEEE Annual Symposium on Foundations of Computer Science, pp. 166–173. IEEE Comput. Soc., Los Alamitos (1998)

  15. 15.

    Indyk, P.: Stable distributions, pseudorandom generators, embeddings and data stream computation. In: Proc. 41st Symp. Foundations of Computer Science, pp. 189–197. IEEE Press, New York (2000)

  16. 16.

    Indyk, P.: Algorithmic applications of low-distortion geometric embeddings. In: Proceedings of the 42th IEEE Annual Symposium on Foundations of Computer Science. IEEE Comput. Soc., Los Alamitos (2001)

  17. 17.

    Indyk, P., Koudas, N., Muthukrishnan, S.: Identifying representative trends in massive time series data sets using sketches. In: El Abbadi, A., Brodie, M.L., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.-Y. (eds.) VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10–14, 2000, Cairo, Egypt, pp. 363–372. Morgan Kaufmann, San Mateo (2000)

  18. 18.

    Indyk, P., Lewenstein, M., Lipsky, O., Porat, E.: Closest pair problems in very high dimensions. In: Proceedings of the 31st International Colloquium on Automata, Languages and Programming (ICALP). Lecture Notes in Computer Science, vol. 3142, pp. 782–792. Springer, Berlin (2004)

  19. 19.

    Karloff, H.: Fast algorithms for approximately counting mismatches. Inf. Process. Lett. 48(2), 53–60 (1993)

  20. 20.

    Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6, 323–350 (1977)

  21. 21.

    Levenshtein, V.I.: Binary codes capable of correcting, deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707–710 (1966)

  22. 22.

    Lipsky, O., Porat, E.: L1 pattern matching lower bound. In: Proceedings of the 12th International Conference on String Processing and Information Retrieval (SPIRE), Bs. As., Argentina. Lecture Notes in Computer Science, vol. 3772, pp. 327–330. Springer, Berlin (2005)

  23. 23.

    Masek, W.J., Paterson, M.S.: A faster algorithm for computing string-edit distances. J. Comput. Syst. Sci. 20, 18–31 (1980)

  24. 24.

    Muthukrishnan, S.: New results and open problems related to non-standard stringology. In: Proc. 6th Combinatorial Pattern Matching Conference. Lecture Notes in Computer Science, vol. 937, pp. 298–317. Springer, Berlin (1995)

  25. 25.

    Muthukrishnan, S., Ramesh, H.: String matching under a general matching relation. Inf. Comput. 122(1), 140–148 (1995)

  26. 26.

    Perttu, S.: Combinatorial pattern matching in musical sequences. Master’s thesis, Department of Computer Science, University of Helsinki (2000)

Download references

Author information

Correspondence to Ely Porat.

Additional information

Research supported in part by US-Israel Binational Science Foundation.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Lipsky, O., Porat, E. Approximate Pattern Matching with the L 1, L 2 and L Metrics. Algorithmica 60, 335–348 (2011). https://doi.org/10.1007/s00453-009-9345-9

Download citation

Keywords

  • Design and analysis of algorithms
  • Combinatorial algorithms on words
  • Approximate string matching
  • Hamming distance