Abstract
With ideas from data compression and combinatorics on words, we introduce a complexity measure for words, called repetition complexity, which quantifies the amount of repetition in a word. The repetition complexity of w, r(w), is defined as the smallest amount of space needed to store w when reduced by repeatedly applying the following procedure: n consecutive occurrences uu... u of the same subword u of w are stored as (u, n). The repetition complexity has interesting relations with well-known complexity measures, such as subword complexity, sub, and Lempel-Ziv complexity, lz. We have always r(w) ≥ lz(w) and could even be that the former is linear while the latter is only logarithmic; e.g., this happens for prefixes of certain infinite words obtained by iterated morphisms. An infinite word α being ultimately periodic is equivalent to: (i) sub(prefn(α)) = \( \mathcal{O} \) (n), (ii) lz(prefn(α)) = \( \mathcal{O} \) (1), and (iii) r(prefn(α)) = lgn + \( \mathcal{O} \) (1). De Bruijn words, well known for their high subword complexity are shown to have almost highest repetition complexity; the precise complexity remains open. r(w) can be computed in time \( \mathcal{O} \) (n 3(logn)2) and it is open, and probably very difficult, to find very fast algorithms.
Research partially supported by NSERC grant R3143A01.
Research partially supported by NSERC grant OGP0041630.
Research partially supported by NSERC grant OGP0046373.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apostolico, A., and Preparata, F., Optimal off-line detection of repetitions in a string, Theoret. Comput. Sci. 22 (1983) 297–315.
de Bruijn, N.G., A combinatorial problem, Proc. Kon. Ned. Akad. Wetensch. 49 (1946) 758–764.
Chaitin, G.J, Information-theoretic limitations of formal systems, J. Assoc. Comput. Mach. 21 (1974) 403–424.
Choffrut, C., and Karhumäki, J., Combinatorics of Words, in: G. Rozenberg, A. Salomaa, eds., Handbook of Formal Languages, Vol. I, Springer-Verlag, Berlin, 1997, 329–438.
Coven, E.M., and Hedlund, G., Sequences with minimal block growth, Math. Sytems Theory 7 (1973) 138–153.
Crochemore, M., An optimal algorithm for computing the repetitions in a word, Inform. Proc. Lett. 12(5) (1981) 244–250.
Crochemore, M., and Rytter, W., Text Algorithms, Oxford Univ. Press, 1994.
Crochemore, M., and Rytter, W., Squares, cubes, and time-space efficient string matching, Algorithmica 13 (1995) 405–425. Oxford Univ. Press, 1994.
Dejean, F., Sur un théorème de Thue, J. Combin. Theory, Ser. A 13 (1972) 90–99.
Garey, M.R., Johnson, D.S., Computers and Intractability. A Guide to the Theory of NP-completeness, W.H. Freeman and Co., San Francisco, 1979.
Hansel, G., Perrin, D., and Simon, I., Compression and entropy, Proc. of STACSrs92, LNCS 577, Springer-Verlag, 1992, 515–528.
Kolmogorov, A.N., Three approaches to the quantitative definition of information, Probl. Inform. Transmission 1 (1965) 1–7.
Kolpakov, R., and Kucherov, G., Finding maximal repetitions in a word in linear time, Proc. of FOCS’99, 596–604.
Lempel, A., and Ziv, J., On the complexity of finite sequences IEEE Trans. Information Theory 22(1) (1976) 75–81.
Lothaire, M., Combinatorics on Words, Addison-Wesley, Reading, MA, 1983.
Lothaire, M., Algebraic Combinatorics on Words, Cambridge Univ. Press, 2002.
Main, M., and Lorentz, R., An O(nlgn) algorithm for finding all repetitions in a string, J. Algorithms 5 (1984) 422–432.
Main, M., Detecting leftmost maximal periodicities, Discrete Appl. Math. 25 (1989) 145–153.
Martin-Löf, P., The definition of random sequences, Inform. and Control 9 (1966) 602–619.
Morse, M., and Hedlund, G., Unending chess, symbolic dynamics and a problem in semigroups, Duke Math. J. 11 (1944) 1–7.
Storer, J.A., Szymanski, T.G., The macro model for data compression, Proc. of 10th STOC, 1978, 30–39.
Thue, A., Uber unendliche Zeichenreihen, Norske Vid. Selsk. Skr. Mat.-Nat. Kl. (Kristiania) 7 (1906) 1–22.
Thue, A., Uber die gegenseitige Lage gleicher Teile gewisser Zeichenreihen, Norske Vid. Selsk. Skr. Mat.-Nat. Kl. (Kristiania) 5 (1912) 1–67.
Ziv, J., and Lempel, A., A universal algorithm for sequential data compression, IEEE Trans. Information Theory 23(3) (1977) 337–343.
Ziv, J., and Lempel, A., Compression of individual sequences via variable length encoding, IEEE Trans. Information Theory 24(5) (1978) 530–536.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ilie, L., Yu, S., Zhang, K. (2002). Repetition Complexity of Words. In: Ibarra, O.H., Zhang, L. (eds) Computing and Combinatorics. COCOON 2002. Lecture Notes in Computer Science, vol 2387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45655-4_35
Download citation
DOI: https://doi.org/10.1007/3-540-45655-4_35
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43996-7
Online ISBN: 978-3-540-45655-1
eBook Packages: Springer Book Archive