Repetition Complexity of Words

Ilie, Lucian; Yu, Sheng; Zhang, Kaizhong

doi:10.1007/3-540-45655-4_35

Lucian Ilie⁶,
Sheng Yu⁶ &
Kaizhong Zhang⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2387))

Included in the following conference series:

International Computing and Combinatorics Conference

592 Accesses
6 Citations

Abstract

With ideas from data compression and combinatorics on words, we introduce a complexity measure for words, called repetition complexity, which quantifies the amount of repetition in a word. The repetition complexity of w, r(w), is defined as the smallest amount of space needed to store w when reduced by repeatedly applying the following procedure: n consecutive occurrences uu... u of the same subword u of w are stored as (u, n). The repetition complexity has interesting relations with well-known complexity measures, such as subword complexity, sub, and Lempel-Ziv complexity, lz. We have always r(w) ≥ lz(w) and could even be that the former is linear while the latter is only logarithmic; e.g., this happens for prefixes of certain infinite words obtained by iterated morphisms. An infinite word α being ultimately periodic is equivalent to: (i) sub(pref_n(α)) = \( \mathcal{O} \) (n), (ii) lz(pref_n(α)) = \( \mathcal{O} \) (1), and (iii) r(pref_n(α)) = lgn + \( \mathcal{O} \) (1). De Bruijn words, well known for their high subword complexity are shown to have almost highest repetition complexity; the precise complexity remains open. r(w) can be computed in time \( \mathcal{O} \) (n ³(logn)²) and it is open, and probably very difficult, to find very fast algorithms.

Research partially supported by NSERC grant R3143A01.

Research partially supported by NSERC grant OGP0041630.

Research partially supported by NSERC grant OGP0046373.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Apostolico, A., and Preparata, F., Optimal off-line detection of repetitions in a string, Theoret. Comput. Sci. 22 (1983) 297–315.
Article MATH MathSciNet Google Scholar
de Bruijn, N.G., A combinatorial problem, Proc. Kon. Ned. Akad. Wetensch. 49 (1946) 758–764.
Google Scholar
Chaitin, G.J, Information-theoretic limitations of formal systems, J. Assoc. Comput. Mach. 21 (1974) 403–424.
MATH MathSciNet Google Scholar
Choffrut, C., and Karhumäki, J., Combinatorics of Words, in: G. Rozenberg, A. Salomaa, eds., Handbook of Formal Languages, Vol. I, Springer-Verlag, Berlin, 1997, 329–438.
Google Scholar
Coven, E.M., and Hedlund, G., Sequences with minimal block growth, Math. Sytems Theory 7 (1973) 138–153.
Article MATH MathSciNet Google Scholar
Crochemore, M., An optimal algorithm for computing the repetitions in a word, Inform. Proc. Lett. 12(5) (1981) 244–250.
Article MATH MathSciNet Google Scholar
Crochemore, M., and Rytter, W., Text Algorithms, Oxford Univ. Press, 1994.
Google Scholar
Crochemore, M., and Rytter, W., Squares, cubes, and time-space efficient string matching, Algorithmica 13 (1995) 405–425. Oxford Univ. Press, 1994.
Article MATH MathSciNet Google Scholar
Dejean, F., Sur un théorème de Thue, J. Combin. Theory, Ser. A 13 (1972) 90–99.
Article MATH MathSciNet Google Scholar
Garey, M.R., Johnson, D.S., Computers and Intractability. A Guide to the Theory of NP-completeness, W.H. Freeman and Co., San Francisco, 1979.
MATH Google Scholar
Hansel, G., Perrin, D., and Simon, I., Compression and entropy, Proc. of STACSrs92, LNCS 577, Springer-Verlag, 1992, 515–528.
Google Scholar
Kolmogorov, A.N., Three approaches to the quantitative definition of information, Probl. Inform. Transmission 1 (1965) 1–7.
Google Scholar
Kolpakov, R., and Kucherov, G., Finding maximal repetitions in a word in linear time, Proc. of FOCS’99, 596–604.
Google Scholar
Lempel, A., and Ziv, J., On the complexity of finite sequences IEEE Trans. Information Theory 22(1) (1976) 75–81.
Article MATH MathSciNet Google Scholar
Lothaire, M., Combinatorics on Words, Addison-Wesley, Reading, MA, 1983.
MATH Google Scholar
Lothaire, M., Algebraic Combinatorics on Words, Cambridge Univ. Press, 2002.
Google Scholar
Main, M., and Lorentz, R., An O(nlgn) algorithm for finding all repetitions in a string, J. Algorithms 5 (1984) 422–432.
Article MATH MathSciNet Google Scholar
Main, M., Detecting leftmost maximal periodicities, Discrete Appl. Math. 25 (1989) 145–153.
Article MathSciNet MATH Google Scholar
Martin-Löf, P., The definition of random sequences, Inform. and Control 9 (1966) 602–619.
Article Google Scholar
Morse, M., and Hedlund, G., Unending chess, symbolic dynamics and a problem in semigroups, Duke Math. J. 11 (1944) 1–7.
Article MATH MathSciNet Google Scholar
Storer, J.A., Szymanski, T.G., The macro model for data compression, Proc. of 10th STOC, 1978, 30–39.
Google Scholar
Thue, A., Uber unendliche Zeichenreihen, Norske Vid. Selsk. Skr. Mat.-Nat. Kl. (Kristiania) 7 (1906) 1–22.
Google Scholar
Thue, A., Uber die gegenseitige Lage gleicher Teile gewisser Zeichenreihen, Norske Vid. Selsk. Skr. Mat.-Nat. Kl. (Kristiania) 5 (1912) 1–67.
Google Scholar
Ziv, J., and Lempel, A., A universal algorithm for sequential data compression, IEEE Trans. Information Theory 23(3) (1977) 337–343.
Article MATH MathSciNet Google Scholar
Ziv, J., and Lempel, A., Compression of individual sequences via variable length encoding, IEEE Trans. Information Theory 24(5) (1978) 530–536.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Western Ontario, N6A 5B7, London, Ontario, Canada
Lucian Ilie, Sheng Yu & Kaizhong Zhang

Authors

Lucian Ilie
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Kaizhong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of California, Santa Barbara, California, 93106, USA
Oscar H. Ibarra
Department of Mathematics, National University of Singapore, Singapore, Singapore, 117543
Louxin Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ilie, L., Yu, S., Zhang, K. (2002). Repetition Complexity of Words. In: Ibarra, O.H., Zhang, L. (eds) Computing and Combinatorics. COCOON 2002. Lecture Notes in Computer Science, vol 2387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45655-4_35

Download citation

DOI: https://doi.org/10.1007/3-540-45655-4_35
Published: 29 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43996-7
Online ISBN: 978-3-540-45655-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics