Keywords

1 Introduction and Related Work

A fundamental concept of repeating patterns or regularities is that of periods (also known as powers). A period of order k is defined by a concatenation of k identical blocks of symbols. The study of periods can be traced to as far back as the early 1900s with the work of [9], who researched a set of strings that do not contain any substrings that are periods. Periods in diverse forms gained prominence, when they became key structures in computational biology, where they are associated with various regulatory mechanisms and play an important role in genomic fingerprinting [6].

In regularities in strings, one of the most general notions is related to period or power, for instance, given a string x of length n, a period k of a string x is a sub string of x, if it can be decomposed into equal-length blocks of symbols, such that x = \(u^k u'\), where \(u'\) is a prefix of u. However, for simplicity we will discard \(u'\) and only consider \(u^k\).

So far, regularities in strings related with periods and powers, which have been extensively studied, [3,4,5, 8] and algorithms for their efficient computation have broad applications. In this paper, we study cyclic factors of strings. The motivation of cyclic factors comes from viruses. The viruses are circular strings, for example Escherichia coli (E.coli) has 154 bases and it is circular [12] (Fig. 1). Formally, the viruses break up at any point of the circle, for example, that can appear in the DNA sequence as \(x_\delta \dots x_nx_1 \dots x_{\delta -1}\) breaking up at position \(\delta \) (Fig. 2). Now, we propose several efficient algorithms for computing: (i) cyclic periodicity; (ii) all cyclic periodicity; (iii) maximal local cyclic periodicity; (iv) cyclic covers.

Fig. 1.
figure 1

The E. coli K-12 MG1655 chromosome (outer black ring) was used as a reference map to visualise the locus position of 30 chromosome-borne CU fimbrial types. Types highlighted in blue are present in E. coli K-12 MG1655, types in red are absent in this strain. Fimbrial types associated with PAIs are indicated by an asterisk. A number of PAI associated fimbrial gene clusters occupy different locus positions relative to the MG1655 genome. tRNA sites that flank CU-containing PAIs are indicated on the inner blue ring [12]. (Color figure online)

Fig. 2.
figure 2

Circular pattern.

2 Preliminaries

A string x of length \(|x|=n\) over an alphabet \(\varSigma \) of size \(\sigma \) can be denoted as \(x[1 \mathinner {.\,.}n]=x[1]x[2] \dots x[i] \dots x[n]\), where \(1 \le i \le n\) and the i-th letter of x is denoted by x[i] \(\in \) \(\varSigma \). The empty string \(\epsilon \) is the string of length 0. The string \(x^R\) is the reverse of string x. And \(x[i \mathinner {.\,.}j]\), \(1 \le i \le j \le n\), denotes the contiguous substring (or factor) of letters, such as \(x[i]x[i+1]x[i+2] \dots x[j]\). A substring \(x[i \mathinner {.\,.}j]\) is a suffix of x, if \(j=n\) and a substring \(x[i \mathinner {.\,.}j]\) is a prefix of x, if \(i=1\). Given a cyclic factor u of length k, \(1 \le k \le n\), we denote by c(u), for instance, \(u=ababc\), c(u) is one of the following rotations: ababc, babca, abcab, bcaba, cabab. Moreover, we say \(u=u_1 u_2 \dots u_n\), \(c_{\delta }(u)=u_{\delta } \dots u_n u_1 \dots u_{\delta -1}\).

In this paper, suffix trees are used extensively as computation tools. For a general introduction to suffix trees, see [2, 7, 10, 11].

figure a

Example 1

Consider a string \(x=aaabaabaabaabaaa=:{u_1}{u_2}{u_3}{u_4}\), where \(u_1=aaab\), \(u_2=aaba\), \(u_3=abaa\), \(u_4=baaa\) and \(k=4\), \(\ell =4\). Therefore x has a period of length \(\ell \).

Definition 1

A cyclic periodic array A of a string x of length n is defined to be as follows: \(A[i]:=\ell \), \(1 \le i \le n\), if and only if \(x[1 \mathinner {.\,.}i]\) has cyclic periodicity \(\ell \) by a string u and there no \(u'\), with \(|u'|\le |u|\) that is a cyclic period of \(x[1 \mathinner {.\,.}i]\).

Example 2

Consider a string \(x=aababa\) of length 6, a cyclic periodic array A as follows:

\(x[1]=a \implies A[1]:=1\)                \(x[1 \mathinner {.\,.}4]=aaba \implies A[4]:=1\)

\(x[1 \mathinner {.\,.}2]=a\) \(a \implies A[2]:=2\)          \(x[1 \mathinner {.\,.}5]=aabab \implies A[5]:=1\)

\(x[1 \mathinner {.\,.}3]=aab \implies A[3]:=1\)          \(x[1 \mathinner {.\,.}6]=aab\) \(aba \implies A[6]:=2\)

Definition 2

We define maximal local k-cyclic periodicity of a string x, if a substring y is cyclic periodic and y is not a substring of another cyclic periodic strings.

Example 3

Consider a string \(x=aaaababaaab\), \(\varSigma =\{a,b\}\), and a substring \(y=aabababaa\) is 3-cyclic periodic and substring \(y\alpha =aabababaaa\), \(\alpha \in \varSigma \), is not cyclic periodic, and substring \(\beta y=aaabababaa\), \(\beta \in \varSigma \) is not cyclic periodic. Therefore, the substring \(y=aabababaa\) is maximal local 3-cyclic periodic in string \(x=aaaabababaaa\).

Definition 3

We say that a string x of length n is cyclic-coverable by a string u of length \(k'\), if and only if, for every position i of x, the following condition holds \(x[\beta \mathinner {.\,.}\gamma ]=c(u)\), \(1 \le \beta \le i \le \gamma \le n\).

Example 4

Consider a string \(x=aababaa={u_1}{u_2}\), \(u_1=aaba\), \(u_2=abaa\), \(k'=4\), \(\gamma =2\), is cyclic coverable by a string u, for every position i of x, \(x[1 \mathinner {.\,.}4]=x[4 \mathinner {.\,.}7]=c(u)\).

Definition 4

Compute all cyclic covers of a given string x, that is for all possible length cyclic covers.

Example 5

Consider a string \(x=ababbaba\), then ab, abab, ababb, ababbab are cyclic covers of x.

3 Computing k-cyclic Periodicity

Theorem 1

Given a string x of length n and an integer k, \(1 \le k \le n\), test whether it is k-cyclic periodic; this can be determined in \(\mathcal {O}(n/k)\) time and \(\mathcal {O}(n)\) space.

Proof

We construct the suffix tree of x (see [7, 10, 11]). We let \(u=x[1\dots k]\), then let \(\ell _m\) denote the depth of the lowest common ancestor of \(x[1 \mathinner {.\,.}n]\) (see [1]), and \(x[i_m \mathinner {.\,.}n]\). We compute the LCA \(\ell _m\) of \(x[1 \mathinner {.\,.}n]\) and \(x[i_m \mathinner {.\,.}n]\) for \(i_m=2k, 3k, \dots \), and \(\ell _k=n-k\), if \(\ell _m=1\) for some m, then x is not k-cyclic periodic string. Now consider \(C^{rignt}_m=(u_{\ell _m+1} \dots u_k)^R\), compute the \(\ell _m^{'}\) the LCA of \(u^R\) and \(C^{right}_m\). If \(\ell _m^{'} \ge \ell _m\) for all m, then x is k-cyclic periodic.    \(\square \)

4 Computing All Cyclic Periodicities

Theorem 2

Given a string x of length n, test whether it is k-cyclic periodicity for all \(1 \le k \le n\), this can be determined in \(\mathcal {O}(n \log n)\) time and \(\mathcal {O}(n)\) space.

Proof

We apply the algorithm of Theorem 1 for \(k=1,2,\dots ,n\) and we test all cyclic periods of length k. The construction of the suffix tree of string x and \(x^R\) is done once costing \(\mathcal {O}(n)\). The total cost is

$$\mathcal {O}(n)+\mathcal {O}(\sum _{k=1}^{n} n/k)=\mathcal {O}(n\log n)$$

   \(\square \)

Lemma 1

Compute the cyclic period of x.

Proof

The smallest cyclic-period of x is the cyclic-period of x.    \(\square \)

5 Computing Maximal Local k-cyclic Periodicity

Theorem 3

We can compute all k-cyclic periodicity of x in \(\mathcal {O}(n \log n)\) time.

Proof

We apply the algorithm for \(k=1,2,\dots ,n\) and in this case, extend it to cyclic periods of length \(k+1\), where |y| = m is cyclic periodic and \(y\alpha \) = \(m+1\) is not cyclic periodic. Next, we perform this algorithm on string \(x^R\) as \(\mathcal {T}(x^R)\), where |y| = m, again is cyclic periodic and \(\beta y\) = \(m+1\) is not cyclic periodic.

The construction of the suffix tree of string x is done once. The total cost is

$$\mathcal {O}(\sum _{k=1}^{n} n/k)=\mathcal {O}(n\log n)$$

   \(\square \)

Lemma 2

Compute maximal local k-cyclic periodicity of x.

Proof

We compute and merge the arrays for \(y\alpha \) and \(\beta y\) of x. That is the maximal local k-cyclic periodicity of x.    \(\square \)

6 Computing \(k'\)-cyclic Coverability

Theorem 4

Given a string x of length n and an integer \(k'\), \(1 \le k' \le n\), test whether it is \(k'\)-cyclic coverable, this can be determined in \(\mathcal {O}(n)\) time and \(\mathcal {O}(n)\) space.

Proof

We compute the suffix tree of string x as \(\mathcal {T}(x)\), and also we compute the suffix tree of string \(x^R\) as \(\mathcal {T}(x^R)\).

Then we check \(x[1,k']\) with each one of \(x[n-k'+1,n]\), \(x[n-k',n-1]\), \(x[n-k'-1,n-2]\) ...\(x[2,k'+1]\), together with the reverse pairs in \(T(x^R)\). This way we build a collection of cyclic covers if there is one.

The construction of the suffix tree costs \(\mathcal {O}(n)\); checking of equality costs \(\mathcal {O}(1)\) and there are n factors. The total time is \(\mathcal {O}(n)\).    \(\square \)

7 Computing All Cyclic Coverability

Theorem 5

Given a string x of length n, test whether it is \(k'\)-cyclic coverable for \(1 \le k' \le n\), this can be determined in \(\mathcal {O}(n^2)\) time and \(\mathcal {O}(n)\) space.

Proof

We apply the algorithm for \(k'=1,2,\dots ,n\) and we compare all cyclic coverable of length \(k'\). The construction of the suffix tree of string x is done once. The total cost is

$$\mathcal {O}(\sum _{k'=1}^{n} n)=\mathcal {O}(n^2)$$

   \(\square \)

Lemma 3

Compute the cyclic coverability of x.

Proof

The smallest cyclic coverable of x is the all the cyclic coverable of x.    \(\square \)

8 Conclusions and Open Problems

In this paper, we defined k-cyclic periodicity, we presented several efficient algorithms for computing: (i) cyclic periodicity; (ii) all cyclic periodicity; (iii) maximal local cyclic periodicity; (iv) cyclic covers.

Future work will be focused on computing the cyclic-periodic array, that is the cyclic periodicity of every prefix of string x and computing the cyclic-coverability array, that is testing each prefix of x, for cyclic-coverability. Finally, we will extend the cyclic periodicity to cover the case \(u_1u_2u_2\dots u_k u^1\), where \(u_i\)=\(c(u_j)\) \(\forall \) ij and \(u^1\) is a substring of some \(u_i\).