Skip to main content

Repetition Complexity of Words

  • Conference paper
  • First Online:
Computing and Combinatorics (COCOON 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2387))

Included in the following conference series:

Abstract

With ideas from data compression and combinatorics on words, we introduce a complexity measure for words, called repetition complexity, which quantifies the amount of repetition in a word. The repetition complexity of w, r(w), is defined as the smallest amount of space needed to store w when reduced by repeatedly applying the following procedure: n consecutive occurrences uu... u of the same subword u of w are stored as (u, n). The repetition complexity has interesting relations with well-known complexity measures, such as subword complexity, sub, and Lempel-Ziv complexity, lz. We have always r(w) ≥ lz(w) and could even be that the former is linear while the latter is only logarithmic; e.g., this happens for prefixes of certain infinite words obtained by iterated morphisms. An infinite word α being ultimately periodic is equivalent to: (i) sub(prefn(α)) = \( \mathcal{O} \) (n), (ii) lz(prefn(α)) = \( \mathcal{O} \) (1), and (iii) r(prefn(α)) = lgn + \( \mathcal{O} \) (1). De Bruijn words, well known for their high subword complexity are shown to have almost highest repetition complexity; the precise complexity remains open. r(w) can be computed in time \( \mathcal{O} \) (n 3(logn)2) and it is open, and probably very difficult, to find very fast algorithms.

Research partially supported by NSERC grant R3143A01.

Research partially supported by NSERC grant OGP0041630.

Research partially supported by NSERC grant OGP0046373.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apostolico, A., and Preparata, F., Optimal off-line detection of repetitions in a string, Theoret. Comput. Sci. 22 (1983) 297–315.

    Article  MATH  MathSciNet  Google Scholar 

  2. de Bruijn, N.G., A combinatorial problem, Proc. Kon. Ned. Akad. Wetensch. 49 (1946) 758–764.

    Google Scholar 

  3. Chaitin, G.J, Information-theoretic limitations of formal systems, J. Assoc. Comput. Mach. 21 (1974) 403–424.

    MATH  MathSciNet  Google Scholar 

  4. Choffrut, C., and Karhumäki, J., Combinatorics of Words, in: G. Rozenberg, A. Salomaa, eds., Handbook of Formal Languages, Vol. I, Springer-Verlag, Berlin, 1997, 329–438.

    Google Scholar 

  5. Coven, E.M., and Hedlund, G., Sequences with minimal block growth, Math. Sytems Theory 7 (1973) 138–153.

    Article  MATH  MathSciNet  Google Scholar 

  6. Crochemore, M., An optimal algorithm for computing the repetitions in a word, Inform. Proc. Lett. 12(5) (1981) 244–250.

    Article  MATH  MathSciNet  Google Scholar 

  7. Crochemore, M., and Rytter, W., Text Algorithms, Oxford Univ. Press, 1994.

    Google Scholar 

  8. Crochemore, M., and Rytter, W., Squares, cubes, and time-space efficient string matching, Algorithmica 13 (1995) 405–425. Oxford Univ. Press, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  9. Dejean, F., Sur un théorème de Thue, J. Combin. Theory, Ser. A 13 (1972) 90–99.

    Article  MATH  MathSciNet  Google Scholar 

  10. Garey, M.R., Johnson, D.S., Computers and Intractability. A Guide to the Theory of NP-completeness, W.H. Freeman and Co., San Francisco, 1979.

    MATH  Google Scholar 

  11. Hansel, G., Perrin, D., and Simon, I., Compression and entropy, Proc. of STACSrs92, LNCS 577, Springer-Verlag, 1992, 515–528.

    Google Scholar 

  12. Kolmogorov, A.N., Three approaches to the quantitative definition of information, Probl. Inform. Transmission 1 (1965) 1–7.

    Google Scholar 

  13. Kolpakov, R., and Kucherov, G., Finding maximal repetitions in a word in linear time, Proc. of FOCS’99, 596–604.

    Google Scholar 

  14. Lempel, A., and Ziv, J., On the complexity of finite sequences IEEE Trans. Information Theory 22(1) (1976) 75–81.

    Article  MATH  MathSciNet  Google Scholar 

  15. Lothaire, M., Combinatorics on Words, Addison-Wesley, Reading, MA, 1983.

    MATH  Google Scholar 

  16. Lothaire, M., Algebraic Combinatorics on Words, Cambridge Univ. Press, 2002.

    Google Scholar 

  17. Main, M., and Lorentz, R., An O(nlgn) algorithm for finding all repetitions in a string, J. Algorithms 5 (1984) 422–432.

    Article  MATH  MathSciNet  Google Scholar 

  18. Main, M., Detecting leftmost maximal periodicities, Discrete Appl. Math. 25 (1989) 145–153.

    Article  MathSciNet  MATH  Google Scholar 

  19. Martin-Löf, P., The definition of random sequences, Inform. and Control 9 (1966) 602–619.

    Article  Google Scholar 

  20. Morse, M., and Hedlund, G., Unending chess, symbolic dynamics and a problem in semigroups, Duke Math. J. 11 (1944) 1–7.

    Article  MATH  MathSciNet  Google Scholar 

  21. Storer, J.A., Szymanski, T.G., The macro model for data compression, Proc. of 10th STOC, 1978, 30–39.

    Google Scholar 

  22. Thue, A., Uber unendliche Zeichenreihen, Norske Vid. Selsk. Skr. Mat.-Nat. Kl. (Kristiania) 7 (1906) 1–22.

    Google Scholar 

  23. Thue, A., Uber die gegenseitige Lage gleicher Teile gewisser Zeichenreihen, Norske Vid. Selsk. Skr. Mat.-Nat. Kl. (Kristiania) 5 (1912) 1–67.

    Google Scholar 

  24. Ziv, J., and Lempel, A., A universal algorithm for sequential data compression, IEEE Trans. Information Theory 23(3) (1977) 337–343.

    Article  MATH  MathSciNet  Google Scholar 

  25. Ziv, J., and Lempel, A., Compression of individual sequences via variable length encoding, IEEE Trans. Information Theory 24(5) (1978) 530–536.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ilie, L., Yu, S., Zhang, K. (2002). Repetition Complexity of Words. In: Ibarra, O.H., Zhang, L. (eds) Computing and Combinatorics. COCOON 2002. Lecture Notes in Computer Science, vol 2387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45655-4_35

Download citation

  • DOI: https://doi.org/10.1007/3-540-45655-4_35

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43996-7

  • Online ISBN: 978-3-540-45655-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics