Skip to main content

Period Recovery over the Hamming and Edit Distances

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9644))

Abstract

A string S of length n has period P of length p if \(S[i]=S[i+p]\) for all \(1 \le i \le n-p\) and \(n \ge 2p\). The shortest such substring, P, is called the period of S, and the string S is called periodic in P. In this paper we investigate the period recovery problem. Given a string S of length n, find the primitive period(s) P such that the distance between S and the string that is periodic in P is below a threshold \(\tau \). We consider the period recovery problem over both the Hamming distance and the edit distance. For the Hamming distance case, we present an \(O(n \log n)\) time algorithm, where \(\tau \) is given as \(\frac{n}{(2+\epsilon )p}\), for \(0 < \epsilon < 1\). For the edit distance case, \(\tau =\frac{n}{(4+\epsilon )p}\), and we provide an \(O(n^{4/ 3})\) time algorithm.

A. Amir—Partially supported by the Israel Science Foundation grant 571/14, and grant No. 2014028 from the United States-Israel Binational Science Foundation (BSF).

M. Amit—Partially supported by the Israel Science Foundation grant 571/14, grant No. 2014028 from the United States-Israel Binational Science Foundation (BSF) and DFG.

G. M. Landau—Partially supported by the Israel Science Foundation grant 571/14, grant No. 2014028 from the United States-Israel Binational Science Foundation (BSF) and DFG.

D. Sokol—Partially supported by the United States-Israel Binational Science Foundation (BSF) grant No. 2014028.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In previous work, the lemmas state “up to cyclic rotations” which means that one conjugate is counted/reported for each set of cyclic permutations of a given period P. Here we clarify this language by always finding the single best conjugate.

  2. 2.

    The paper actually states \(O(n^3 \log n)\) time complexity. However, more recent work [4] for construction of a minimal augmented suffix tree can be used, reducing the time complexity of [1] to \(O(n^3)\).

  3. 3.

    More precisely, this time complexity can be further improved to linear time preprocessing and \(O(\log \log n)\) time query, by replacing, in Crochemore et al. [9], the 2D range minimum query algorithm of Chazelle [6] with the algorithm of Chan [5].

References

  1. Amir, A., Eisenberg, E., Levy, A., Porat, E., Shapira, N.: Cycle detection and correction. ACM Trans. Algorithms 9(1), 13:1–13:20 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  2. Amit, M., Crochemore, M., Landau, G.M.: Locating all maximal approximate runs in a string. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 13–27. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. Bannai, H.., Inenaga, T.I.S., Nakashima, Y., Takeda, M., Tsuruta, K.: The “runs” theorem. CoRR, abs/1406.0263v4 (2014)

    Google Scholar 

  4. Brodal, G.S., Lyngsø, R.B., Östlin, A., Pedersen, C.N.S.: Solving the string statistics problem in time \({{\cal O}}(n \log n)\). In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 728–739. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Chan, T.M.: Persistent predecessor search and orthogonal point location on the word ram. ACM Trans. Algorithms (TALG) 9(3), 22 (2013)

    MathSciNet  MATH  Google Scholar 

  6. Chazelle, B.: A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput. 17(3), 427–462 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  7. Crochemore, M.: An optimal algorithm for computing the repetitions in a word. Inf. Process. Lett. 12(5), 244–250 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  8. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings, 392 p. Cambridge University Press, Cambridge (2007)

    Google Scholar 

  9. Crochemore, M., Iliopoulos, C., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: Extracting powers and periods in a string from its runs structure. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 258–269. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  10. Fine, N.J., Wilf, H.S.: Uniqueness theorems for periodic functions. Proc. Am. Math. Soc. 16, 109–114 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  11. Fischetti, V.A., Landau, G.M., Sellers, P.H., Schmidt, J.P.: Identifying periodic occurences of a template with applications to protein structure. Inf. Process. Lett. 45(1), 11–18 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  12. Galil, Z., Giancarlo, R.: Improved string matching with \(k\) mismatches. SIGACT News 17(4), 52–54 (1986)

    Article  MATH  Google Scholar 

  13. Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci. 69(4), 525–546 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  14. Iliopoulos, C.S., Moore, D., Smyth, W.F.: A characterization of the squares in a Fibonacci string. Theor. Comput. Sci. 172(1–2), 281–291 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  15. Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid identification of repeated patterns in strings, trees, and arrays. In: STOC: ACM Symposium on Theory of Computing (STOC) (1972)

    Google Scholar 

  16. Kolpakov, R.M., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: Proceedings of Symposium on Foundations of Computer Science (FOCS), pp. 596–604 (1999)

    Google Scholar 

  17. Kolpakov, R.M., Kucherov, G.: Finding approximate repetitions under Hamming distance. Theor. Comput. Sci 1(303), 135–156 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  18. Landau, G.M., Schmidt, J.P., Sokol, D.: An algorithm for approximate tandem repeats. J. Comput. Biol. 8(1), 1–18 (2001)

    Article  Google Scholar 

  19. Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  20. Lothaire, M.: Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications). Cambridge University Press, New York (2005)

    Book  MATH  Google Scholar 

  21. Lyndon, R.C.: On Burnside’s problem. Trans. Am. Math. Soc. 77(2), 202–215 (1954)

    MathSciNet  MATH  Google Scholar 

  22. Myers, E.W., Miller, W.: Approximate matching of regular expressions. Bull. Math. Biol. 51(1), 5–37 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  23. Sim, J.S., Iliopoulos, C.S., Park, K., Smyth, W.F.: Approximate periods of strings. In: Crochemore, M., Paterson, M. (eds.) CPM 1999. LNCS, vol. 1645, pp. 123–133. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mika Amit .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amir, A., Amit, M., Landau, G.M., Sokol, D. (2016). Period Recovery over the Hamming and Edit Distances. In: Kranakis, E., Navarro, G., Chávez, E. (eds) LATIN 2016: Theoretical Informatics. LATIN 2016. Lecture Notes in Computer Science(), vol 9644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49529-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-49529-2_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49528-5

  • Online ISBN: 978-3-662-49529-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics