Abstract
In this work, we propose constructions that correct duplications of multiple consecutive symbols. These errors are known as tandem duplications, where a sequence of symbols is repeated; respectively as palindromic duplications, where a sequence is repeated in reversed order. We compare the redundancies of these constructions with code size upper bounds that are obtained from sphere packing arguments. Proving that an upper bound on the code cardinality for tandem deletions is also an upper bound for inserting tandem duplications, we derive the bounds based on this special tandem deletion error as this results in tighter bounds. Our upper bounds on the cardinality directly imply lower bounds on the redundancy which we compare with the redundancy of the best known construction correcting arbitrary burst insertions. Our results indicate that the correction of palindromic duplications requires more redundancy than the correction of tandem duplications and both significantly less than arbitrary burst insertions.
Similar content being viewed by others
References
Dolecek L., Anantharam V.: Repetition error correcting sets: explicit constructions and prefixing methods. SIAM J. Discret. Math. 23(4), 2120–2146 (2010).
Fazeli A., Vardy A., Yaakobi E.: Generalized sphere packing bound. IEEE Trans. Inf. Theory 61(5), 2313–2334 (2015).
Hansen P.: Studies on Graphs and Discrete Programming, vol. 11. North Holland, New York (1981).
Jain S., Farnoud F., Schwartz M., Bruck J.: Duplication-correcting codes for data storage in the DNA of living organisms. In: IEEE International Symposium on Information Theory (ISIT), Barcelona, pp. 1028–1032 (2016).
Kulkarni A.A., Kiyavash N.: Nonasymptotic upper bounds for deletion correcting codes. IEEE Trans. Inf. Theory 59(8), 5115–5130 (2013).
Kurmaev O.F.: Constant-weight and constant-charge binary run-length limited codes. IEEE Trans. Inf. Theory 57(7), 4497–4515 (2011).
Lenz A., Wachter-Zeh A., Yaakobi E.: Bounds on codes correcting tandem and palindromic duplications. In: Workshop on Coding and Cryptography (WCC) (2017).
Levenshtein V.: Binary codes capable of correcting spurious insertions and deletions of ones. Probl. Pereda. Inform. 1(1), 12–25 (1965).
Levenshtein V.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707–710 (1966).
Mahdavifar H., Vardy A.: Asymptotically optimal sticky-insertion-correcting codes with efficient encoding and decoding. In: IEEE International Symposium on Information Theory (ISIT), Aachen, pp. 2688–2692 (2017).
Roth R., Siegel P.: Lee-metric bch codes and their application to constrained and partial-response channels. IEEE Trans. Inf. Theory 40(4), 1083–1096 (1994).
Schoeny C., Wachter-Zeh A., Gabrys R., Yaakobi E.: Codes correcting a burst of deletions or insertions. IEEE Trans. Inf. Theory 63(4), 1971–1985 (2017).
Varshamov R.R., Tenengolts G.M.: Codes which correct single asymmetric errors. Autom. Remote Control 26(2), 286–290 (1965).
Acknowledgements
This work was supported by the Institute for Advanced Study (IAS), Technische Universität München (TUM), with funds from the German Excellence Initiative and the European Union’s Seventh Framework Program (FP7) under Grant Agreement No. 291763. Parts of this work have been presented at the 2017 Workshop on Coding and Cryptography (WCC), St. Petersburg [7]..
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This is one of several papers published in Designs, Codes and Cryptography comprising the “Special Issue on Coding and Cryptography”.
Appendices
Appendix A: Sphere sizes for tandem and palindromic duplications and deletions
In the following we derive the size of the spheres \(S^{\epsilon }_t(\varvec{x})\), as defined in (1), for tandem and palindromic duplication and deletion errors. For the subsequent two lemmas we denote the \(\ell \)-step derivative by \(\phi _\ell (\varvec{x})=(\varvec{u}_x, \varvec{v}_x)\), according to the definition from Sect. 2.1.
Lemma 10
The sphere size for tandem duplications of length \(\ell \) is given as
where \(\phi _\ell (\varvec{x})=(\varvec{u}_x, \varvec{v}_x)\) is the \(\ell \)-step derivative of \(\varvec{x}\).
Proof
Recall that a tandem duplication error corresponds to increasing one entry of the \(\ell \)-zero signature \(\sigma _\ell (\varvec{v}_x)\) by one. Then, the duplication sphere size equals the number of vectors \(\varvec{y} \in \mathbb {N}_0^{|\sigma _\ell (\varvec{v}_x)|}\) with \(\varvec{y} \ge \sigma _\ell (\varvec{v}_x)\) and \(|\varvec{y}|_1=|\sigma _\ell (\varvec{v}_x)|_1 + t\). The number of such vectors is given by \(\left( {\begin{array}{c}|\sigma _\ell (\varvec{v}_x)|+t-1\\ t\end{array}}\right) = \left( {\begin{array}{c}wt_{\mathrm {H}}(\varvec{v}_x)+t\\ t\end{array}}\right) \)\(\square \)
Lemma 11
The sphere size for tandem deletions of length \(\ell \) is given as
where \(\phi _\ell (\varvec{x})=(\varvec{u}_x, \varvec{v}_x)\) is the \(\ell \)-step derivative of \(\varvec{x}\).
Proof
A tandem deletion corresponds to decreasing one entry of the \(\ell \)-zero signature \(\sigma _\ell (\varvec{v}_x)\) by one. It is only possible to delete a tandem duplication at positions, where the \(\ell \)-zero signature has positive entries. \(\square \)
Note that by this Lemma, \(S^{\tau _\ell ^D}_t(\varvec{x}) = \emptyset \), if \(|\sigma _\ell (\varvec{v}_x)|_1 < t\).
Corollary 3
The sphere size for single tandem deletions of length \(\ell \) is
where \(\phi _\ell (\varvec{x})=(\varvec{u}_x, \varvec{v}_x)\) is the \(\ell \)-step derivative of \(\varvec{x}\).
We continue with deriving the palindromic duplication sphere size for the cases \(\ell =1\) and \(\ell =2\). For \(\ell =1\), a palindromic duplication is a single duplication. Therefore, the sphere size is
as duplications in the same run yield the same outcome.
Lemma 12
The size of the palindromic duplication sphere \(|S^{\rho _2}_1(\varvec{x})|\) for palindromic duplications of length 2 is
Proof
We start with the observation that there are \(n-1\) possible positions \(i \in \{0,1, \ldots , n-2\}\) for palindromic duplications. Now, for \(\ell =2\), the conditions \(\rho _{\ell }(\varvec{x},i) = \rho _{\ell }(\varvec{x},i+j)\) (10a)–(10c) and (11a)–(11c) become \(x_1 = x_2 = \dots = x_{2+j} \, \forall \, j>0\). We therefore deduce that two palindromic duplications in \(\varvec{x}\) of length 2 only result in the same vector \(\varvec{y}=\rho _{\ell }(\varvec{x},i) = \rho _{\ell }(\varvec{x},i+j)\) iff they appear in the same run in \(\varvec{x}\). Further, two palindromic duplications at two different positions i and \(i+j, j>0\) can only duplicate symbols from the same run, if this run has length at least 3. Thus, every additional symbol to runs of length at least 2 does not increase the duplication sphere size and has to be subtracted from the palindromic duplication sphere size. Using \(\sum _{i=1}^n i r^{(i)}(\varvec{x}) = n\) and \(\sum _{i=1}^n r^{(i)}(\varvec{x}) = r(\varvec{x})\) yields the statement. \(\square \)
For \(\ell \ge 3\) and \(j \ge 2\), (10a)–(10c) and (11a)–(11c) do not imply \(x_1=x_2 = \dots =x_{\ell +j}\). For example, consider \(\ell =3\) and the word \(\varvec{x} = (010010)\). Then, \(\rho _{3}(\varvec{x},0) = \rho _{3}(\varvec{x},3) = (010010010)\). However, it is possible to find an upper bound on the size of the palindromic duplication sphere. For \(j = 1\), (10a)–(10c) become \(x_1 = x_2 = \dots = x_{\ell +1}\). Therefore two neighboring palindromic duplications can only result in the same word if they appear in one run.
Lemma 13
The size of the palindromic duplication sphere \(S^{\rho _\ell }_1(\varvec{x})\) is upper bounded by
Proof
There are \(n-\ell +1\) possible positions for palindromic duplications of length \(\ell \). Now, as seen before, duplications in the same run result in the same descendant. We therefore subtract the additional \(i-\ell \) entries of runs with length at least \(\ell +1\) from the number of possible positions for duplications to obtain an upper bound on the duplication sphere. \(\square \)
Similar to the previous discussion, we start with deriving the size of the palindromic deletions spheres for \(\ell =1\) and \(\ell =2\). For \(\ell =1\), a palindromic deletion is a de-duplication of one symbol. Therefore, the size of the error sphere becomes
where \(r^{(\ge 2)}(\varvec{x})\) is the number of runs of length at least 2. Further, we derive the following lemma for binary words.
Lemma 14
The size of the palindromic deletion sphere \(|S^{\rho ^D_2}_1(\varvec{x})|\) for \(q=2\) is
where \(r^{(2)}_{\mathcal {I}}(\varvec{x})\) is the number of runs of length 2, that are located at the interior of \(\varvec{x}\), i.e., between \(x_2\) and \(x_{n-1}\) and, \(r^{(\ge 4)}(\varvec{x})\) denotes the number of runs of length at least 4 in \(\varvec{x}\).
Proof
There are 4 possible patterns (0000), (1111), (0110), (1001), at which palindromic deletions of length 2 can occur. Recall that, as we have seen in the proof of Lemma 12, two palindromic deletions of length 2 at two distinct positions in a word \(\varvec{x}\) can only result in the same outcome, if they appear in the same run. Every run of length at least 4 contains one of the patterns (0000), (1111) and therefore will contribute one element to the palindromic deletion sphere. The patterns (0110), (1001) contain a run of length exactly 2, that is located in the interior of \(\varvec{x}\), such that there is at least one symbol to the left and right of the run. Thus, every run of length 2, that is located in the interior of \(\varvec{x}\) also contributes one unique element in the palindromic deletion sphere. Therefore, the total size of the deletion sphere is \(r^{(2)}_{\mathcal {I}}(\varvec{x}) + r^{(\ge 4)}(\varvec{x})\). \(\square \)
Let us define the matrix \(\varvec{A}^{\rho _\ell }(\varvec{x}) \in \mathbb {Z}_q^{\ell \times n-2\ell +1}\) to be
With this definition it is directly possible to establish the following upper bound on the size of the palindromic deletion spheres for arbitrary deletion length \(\ell \).
Lemma 15
The palindromic deletion sphere \(|S^{\rho ^D_\ell }_1(\varvec{x})|\) is upper bounded by
where \(r^{(0)} \left( \varvec{A}^{\rho _\ell }(\varvec{x})\right) \) is the number of runs of all zero columns in \(\left( \varvec{A}^{\rho _\ell }(\varvec{x})\right) \).
Proof
A palindrome of length \(\ell \) in the word \(\varvec{x}\) corresponds to a zero column in the matrix \(\varvec{A}^{\rho _\ell }(\varvec{x})\). Therefore palindromic deletions are only possible at positions i, where \(\varvec{A}^{\rho _\ell }(\varvec{x})\) has a zero-column. Further, it can be shown that two neighboring zero columns are only possible if \(x_{i+1} = x_{i+2} = \dots = x_{i+2\ell +1}\), i.e. for a run of length \(2\ell +1\). However, two palindromic deletions inside the same run result in the same words. Therefore, every run of all zero columns in \(\left( \varvec{A}^{\rho _\ell }(\varvec{x})\right) \) contributes one unique element to \(S^{\rho ^D_\ell }_1(\varvec{x})\). \(\square \)
Example 8
Consider the word \(\varvec{x} = (21011012210) \in \mathbb {Z}_3^{11}\). The palindromic deletion sphere for deletions of length 3 is given by \(S^{\rho ^D_3}_1(\varvec{x}) = \{ (21012210), (21011012) \}\). The matrix \(\varvec{A}^{\rho _3}(\varvec{x})\) is given by
Applying Lemma 15, yields \(|S^{\rho ^D_\ell }_1(\varvec{x})| \le 2\).
Appendix B: Equivalence of palindromic duplications errors in one word
In this section we derive conditions that two palindromic duplications, respectively deletions at two different positions i and \(i+j\) with \(j > 0\) result in the same word \(\rho _{\ell }(\varvec{x},i) = \rho _{\ell }(\varvec{x},i+j)\), respectively \(\rho ^D_{\ell }(\varvec{x},i) = \rho ^D_{\ell }(\varvec{x},i+j)\) for palindromic deletions. For \(j < \ell \) the condition \(\rho _{\ell }(\varvec{x},i) = \rho _{\ell }(\varvec{x},i+j)\) can be expressed as (the left hand side of the equations corresponds to \(\rho _{\ell }(\varvec{x},i+j)\) and the right hand side to \(\rho _{\ell }(\varvec{x},i)\))
For \(j \ge \ell \) these conditions are
The conditions \(\rho ^D_{\ell }(\varvec{x},i) = \rho ^D_{\ell }(\varvec{x},i+j)\) for \(j>0\) are
Appendix C: Equivalence of palindromic duplications in two words
In this section we derive conditions that two palindromic duplications at two different positions i and \(i+j\) with \(j > 0\) result in the same word \(\rho _{\ell }(\varvec{x},i) = \rho _{\ell }(\varvec{y},i+j)\). For \(j < \ell \) the condition \(\rho _{\ell }(\varvec{x},i) = \rho _{\ell }(\varvec{y},i+j)\) can be expressed as
For \(j \ge \ell \) these conditions are
The conditions \(\rho ^D_{\ell }(\varvec{x},i) = \rho ^D_{\ell }(\varvec{y},i+j)\) for \(j>0\) are
Rights and permissions
About this article
Cite this article
Lenz, A., Wachter-Zeh, A. & Yaakobi, E. Duplication-correcting codes. Des. Codes Cryptogr. 87, 277–298 (2019). https://doi.org/10.1007/s10623-018-0523-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10623-018-0523-0
Keywords
- Error-correcting codes
- Duplication errors
- Generalized sphere packing bound
- DNA storage
- Combinatorial channel
- Burst insertions/deletions