Skip to main content
Log in

Duplication-correcting codes

  • Published:
Designs, Codes and Cryptography Aims and scope Submit manuscript

Abstract

In this work, we propose constructions that correct duplications of multiple consecutive symbols. These errors are known as tandem duplications, where a sequence of symbols is repeated; respectively as palindromic duplications, where a sequence is repeated in reversed order. We compare the redundancies of these constructions with code size upper bounds that are obtained from sphere packing arguments. Proving that an upper bound on the code cardinality for tandem deletions is also an upper bound for inserting tandem duplications, we derive the bounds based on this special tandem deletion error as this results in tighter bounds. Our upper bounds on the cardinality directly imply lower bounds on the redundancy which we compare with the redundancy of the best known construction correcting arbitrary burst insertions. Our results indicate that the correction of palindromic duplications requires more redundancy than the correction of tandem duplications and both significantly less than arbitrary burst insertions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Dolecek L., Anantharam V.: Repetition error correcting sets: explicit constructions and prefixing methods. SIAM J. Discret. Math. 23(4), 2120–2146 (2010).

    Article  MathSciNet  MATH  Google Scholar 

  2. Fazeli A., Vardy A., Yaakobi E.: Generalized sphere packing bound. IEEE Trans. Inf. Theory 61(5), 2313–2334 (2015).

    Article  MathSciNet  MATH  Google Scholar 

  3. Hansen P.: Studies on Graphs and Discrete Programming, vol. 11. North Holland, New York (1981).

    Google Scholar 

  4. Jain S., Farnoud F., Schwartz M., Bruck J.: Duplication-correcting codes for data storage in the DNA of living organisms. In: IEEE International Symposium on Information Theory (ISIT), Barcelona, pp. 1028–1032 (2016).

  5. Kulkarni A.A., Kiyavash N.: Nonasymptotic upper bounds for deletion correcting codes. IEEE Trans. Inf. Theory 59(8), 5115–5130 (2013).

    Article  MathSciNet  MATH  Google Scholar 

  6. Kurmaev O.F.: Constant-weight and constant-charge binary run-length limited codes. IEEE Trans. Inf. Theory 57(7), 4497–4515 (2011).

    Article  MathSciNet  MATH  Google Scholar 

  7. Lenz A., Wachter-Zeh A., Yaakobi E.: Bounds on codes correcting tandem and palindromic duplications. In: Workshop on Coding and Cryptography (WCC) (2017).

  8. Levenshtein V.: Binary codes capable of correcting spurious insertions and deletions of ones. Probl. Pereda. Inform. 1(1), 12–25 (1965).

    MATH  Google Scholar 

  9. Levenshtein V.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707–710 (1966).

    MathSciNet  Google Scholar 

  10. Mahdavifar H., Vardy A.: Asymptotically optimal sticky-insertion-correcting codes with efficient encoding and decoding. In: IEEE International Symposium on Information Theory (ISIT), Aachen, pp. 2688–2692 (2017).

  11. Roth R., Siegel P.: Lee-metric bch codes and their application to constrained and partial-response channels. IEEE Trans. Inf. Theory 40(4), 1083–1096 (1994).

    Article  MathSciNet  MATH  Google Scholar 

  12. Schoeny C., Wachter-Zeh A., Gabrys R., Yaakobi E.: Codes correcting a burst of deletions or insertions. IEEE Trans. Inf. Theory 63(4), 1971–1985 (2017).

    Article  MathSciNet  MATH  Google Scholar 

  13. Varshamov R.R., Tenengolts G.M.: Codes which correct single asymmetric errors. Autom. Remote Control 26(2), 286–290 (1965).

    MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by the Institute for Advanced Study (IAS), Technische Universität München (TUM), with funds from the German Excellence Initiative and the European Union’s Seventh Framework Program (FP7) under Grant Agreement No. 291763. Parts of this work have been presented at the 2017 Workshop on Coding and Cryptography (WCC), St. Petersburg [7]..

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Lenz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is one of several papers published in Designs, Codes and Cryptography comprising the “Special Issue on Coding and Cryptography”.

Appendices

Appendix A: Sphere sizes for tandem and palindromic duplications and deletions

In the following we derive the size of the spheres \(S^{\epsilon }_t(\varvec{x})\), as defined in (1), for tandem and palindromic duplication and deletion errors. For the subsequent two lemmas we denote the \(\ell \)-step derivative by \(\phi _\ell (\varvec{x})=(\varvec{u}_x, \varvec{v}_x)\), according to the definition from Sect. 2.1.

Lemma 10

The sphere size for tandem duplications of length \(\ell \) is given as

$$\begin{aligned} \bigg |S^{\tau _\ell }_{t}(\varvec{x})\bigg | = \left( {\begin{array}{c}wt_{\mathrm {H}}(\varvec{v}_x)+t\\ t\end{array}}\right) , \end{aligned}$$

where \(\phi _\ell (\varvec{x})=(\varvec{u}_x, \varvec{v}_x)\) is the \(\ell \)-step derivative of \(\varvec{x}\).

Proof

Recall that a tandem duplication error corresponds to increasing one entry of the \(\ell \)-zero signature \(\sigma _\ell (\varvec{v}_x)\) by one. Then, the duplication sphere size equals the number of vectors \(\varvec{y} \in \mathbb {N}_0^{|\sigma _\ell (\varvec{v}_x)|}\) with \(\varvec{y} \ge \sigma _\ell (\varvec{v}_x)\) and \(|\varvec{y}|_1=|\sigma _\ell (\varvec{v}_x)|_1 + t\). The number of such vectors is given by \(\left( {\begin{array}{c}|\sigma _\ell (\varvec{v}_x)|+t-1\\ t\end{array}}\right) = \left( {\begin{array}{c}wt_{\mathrm {H}}(\varvec{v}_x)+t\\ t\end{array}}\right) \)\(\square \)

Lemma 11

The sphere size for tandem deletions of length \(\ell \) is given as

$$\begin{aligned} \bigg |S^{\tau _\ell ^D}_t(\varvec{x})\bigg |&= \bigg |\bigg \{ \varvec{s} \in \mathbb {N}_0^{|\sigma _\ell (\varvec{v}_x)|} : \varvec{s} \le \sigma _\ell (\varvec{v}_x) \wedge |\varvec{s}|_1 = |\sigma _\ell (\varvec{v}_x)|_1 - t \bigg \}\bigg | \\&= \bigg |\bigg \{ \varvec{s} \in \mathbb {N}_0^{|\sigma _\ell (\varvec{v}_x)|} : \varvec{s} \le \sigma _\ell (\varvec{v}_x) \wedge |\varvec{s}|_1 = t \bigg \}\bigg |, \end{aligned}$$

where \(\phi _\ell (\varvec{x})=(\varvec{u}_x, \varvec{v}_x)\) is the \(\ell \)-step derivative of \(\varvec{x}\).

Proof

A tandem deletion corresponds to decreasing one entry of the \(\ell \)-zero signature \(\sigma _\ell (\varvec{v}_x)\) by one. It is only possible to delete a tandem duplication at positions, where the \(\ell \)-zero signature has positive entries. \(\square \)

Note that by this Lemma, \(S^{\tau _\ell ^D}_t(\varvec{x}) = \emptyset \), if \(|\sigma _\ell (\varvec{v}_x)|_1 < t\).

Corollary 3

The sphere size for single tandem deletions of length \(\ell \) is

$$\begin{aligned} \bigg |S^{\tau _\ell ^D}_1(\varvec{x})\bigg | = wt_{\mathrm {H}}(\sigma _\ell (\varvec{v}_x)), \end{aligned}$$

where \(\phi _\ell (\varvec{x})=(\varvec{u}_x, \varvec{v}_x)\) is the \(\ell \)-step derivative of \(\varvec{x}\).

We continue with deriving the palindromic duplication sphere size for the cases \(\ell =1\) and \(\ell =2\). For \(\ell =1\), a palindromic duplication is a single duplication. Therefore, the sphere size is

$$\begin{aligned} \bigg |S^{\rho _1}_1(\varvec{x})\bigg | = r(\varvec{x}), \end{aligned}$$

as duplications in the same run yield the same outcome.

Lemma 12

The size of the palindromic duplication sphere \(|S^{\rho _2}_1(\varvec{x})|\) for palindromic duplications of length 2 is

$$\begin{aligned} \bigg |S^{\rho _2}_1(\varvec{x})\bigg | = n-1 - \sum _{i=3}^n (i-2) r^{(i)}(\varvec{x}) = 2 r(\varvec{x}) - r^{(1)}(\varvec{x})-1. \end{aligned}$$

Proof

We start with the observation that there are \(n-1\) possible positions \(i \in \{0,1, \ldots , n-2\}\) for palindromic duplications. Now, for \(\ell =2\), the conditions \(\rho _{\ell }(\varvec{x},i) = \rho _{\ell }(\varvec{x},i+j)\) (10a)–(10c) and (11a)–(11c) become \(x_1 = x_2 = \dots = x_{2+j} \, \forall \, j>0\). We therefore deduce that two palindromic duplications in \(\varvec{x}\) of length 2 only result in the same vector \(\varvec{y}=\rho _{\ell }(\varvec{x},i) = \rho _{\ell }(\varvec{x},i+j)\) iff they appear in the same run in \(\varvec{x}\). Further, two palindromic duplications at two different positions i and \(i+j, j>0\) can only duplicate symbols from the same run, if this run has length at least 3. Thus, every additional symbol to runs of length at least 2 does not increase the duplication sphere size and has to be subtracted from the palindromic duplication sphere size. Using \(\sum _{i=1}^n i r^{(i)}(\varvec{x}) = n\) and \(\sum _{i=1}^n r^{(i)}(\varvec{x}) = r(\varvec{x})\) yields the statement. \(\square \)

For \(\ell \ge 3\) and \(j \ge 2\), (10a)–(10c) and (11a)–(11c) do not imply \(x_1=x_2 = \dots =x_{\ell +j}\). For example, consider \(\ell =3\) and the word \(\varvec{x} = (010010)\). Then, \(\rho _{3}(\varvec{x},0) = \rho _{3}(\varvec{x},3) = (010010010)\). However, it is possible to find an upper bound on the size of the palindromic duplication sphere. For \(j = 1\), (10a)–(10c) become \(x_1 = x_2 = \dots = x_{\ell +1}\). Therefore two neighboring palindromic duplications can only result in the same word if they appear in one run.

Lemma 13

The size of the palindromic duplication sphere \(S^{\rho _\ell }_1(\varvec{x})\) is upper bounded by

$$\begin{aligned} \bigg |S^{\rho _\ell }_1(\varvec{x})\bigg | \le n - \ell +1 - \sum _{i=\ell +1}^{n} (i-\ell )r^{(i)}(\varvec{x}). \end{aligned}$$

Proof

There are \(n-\ell +1\) possible positions for palindromic duplications of length \(\ell \). Now, as seen before, duplications in the same run result in the same descendant. We therefore subtract the additional \(i-\ell \) entries of runs with length at least \(\ell +1\) from the number of possible positions for duplications to obtain an upper bound on the duplication sphere. \(\square \)

Similar to the previous discussion, we start with deriving the size of the palindromic deletions spheres for \(\ell =1\) and \(\ell =2\). For \(\ell =1\), a palindromic deletion is a de-duplication of one symbol. Therefore, the size of the error sphere becomes

$$\begin{aligned} \bigg |S^{\rho _1^D}_1(\varvec{x})\bigg | = r^{(\ge 2)}(\varvec{x}) , \end{aligned}$$
(8)

where \(r^{(\ge 2)}(\varvec{x})\) is the number of runs of length at least 2. Further, we derive the following lemma for binary words.

Lemma 14

The size of the palindromic deletion sphere \(|S^{\rho ^D_2}_1(\varvec{x})|\) for \(q=2\) is

$$\begin{aligned} \bigg |S^{\rho ^D_2}_1(\varvec{x})\bigg | = r^{(2)}_{\mathcal {I}}(\varvec{x}) + r^{(\ge 4)}(\varvec{x}), \end{aligned}$$

where \(r^{(2)}_{\mathcal {I}}(\varvec{x})\) is the number of runs of length 2, that are located at the interior of \(\varvec{x}\), i.e., between \(x_2\) and \(x_{n-1}\) and, \(r^{(\ge 4)}(\varvec{x})\) denotes the number of runs of length at least 4 in \(\varvec{x}\).

Proof

There are 4 possible patterns (0000), (1111), (0110), (1001), at which palindromic deletions of length 2 can occur. Recall that, as we have seen in the proof of Lemma 12, two palindromic deletions of length 2 at two distinct positions in a word \(\varvec{x}\) can only result in the same outcome, if they appear in the same run. Every run of length at least 4 contains one of the patterns (0000), (1111) and therefore will contribute one element to the palindromic deletion sphere. The patterns (0110), (1001) contain a run of length exactly 2, that is located in the interior of \(\varvec{x}\), such that there is at least one symbol to the left and right of the run. Thus, every run of length 2, that is located in the interior of \(\varvec{x}\) also contributes one unique element in the palindromic deletion sphere. Therefore, the total size of the deletion sphere is \(r^{(2)}_{\mathcal {I}}(\varvec{x}) + r^{(\ge 4)}(\varvec{x})\). \(\square \)

Let us define the matrix \(\varvec{A}^{\rho _\ell }(\varvec{x}) \in \mathbb {Z}_q^{\ell \times n-2\ell +1}\) to be

$$\begin{aligned} \varvec{A}^{\rho _\ell }(\varvec{x}) = \begin{pmatrix} x_{2\ell }-x_1 &{}\quad x_{2\ell +1}-x_2 &{}\quad \dots &{}\quad x_n-x_{n-2\ell +1} \\ x_{2\ell -1}-x_2 &{}\quad x_{2\ell }-x_3 &{}\quad \dots &{}\quad x_{n-1}-x_{n-2\ell +2} \\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ x_{\ell +1}-x_\ell &{}\quad x_{\ell +2}-x_{\ell +1} &{}\quad \dots &{}\quad x_{n-\ell +1}-x_{n-\ell } \end{pmatrix}. \end{aligned}$$
(9)

With this definition it is directly possible to establish the following upper bound on the size of the palindromic deletion spheres for arbitrary deletion length \(\ell \).

Lemma 15

The palindromic deletion sphere \(|S^{\rho ^D_\ell }_1(\varvec{x})|\) is upper bounded by

$$\begin{aligned} \bigg |S^{\rho ^D_\ell }_1(\varvec{x})\bigg | \le r^{(0)} \bigg (\varvec{A}^{\rho _\ell }(\varvec{x})\bigg ), \end{aligned}$$

where \(r^{(0)} \left( \varvec{A}^{\rho _\ell }(\varvec{x})\right) \) is the number of runs of all zero columns in \(\left( \varvec{A}^{\rho _\ell }(\varvec{x})\right) \).

Proof

A palindrome of length \(\ell \) in the word \(\varvec{x}\) corresponds to a zero column in the matrix \(\varvec{A}^{\rho _\ell }(\varvec{x})\). Therefore palindromic deletions are only possible at positions i, where \(\varvec{A}^{\rho _\ell }(\varvec{x})\) has a zero-column. Further, it can be shown that two neighboring zero columns are only possible if \(x_{i+1} = x_{i+2} = \dots = x_{i+2\ell +1}\), i.e. for a run of length \(2\ell +1\). However, two palindromic deletions inside the same run result in the same words. Therefore, every run of all zero columns in \(\left( \varvec{A}^{\rho _\ell }(\varvec{x})\right) \) contributes one unique element to \(S^{\rho ^D_\ell }_1(\varvec{x})\). \(\square \)

Example 8

Consider the word \(\varvec{x} = (21011012210) \in \mathbb {Z}_3^{11}\). The palindromic deletion sphere for deletions of length 3 is given by \(S^{\rho ^D_3}_1(\varvec{x}) = \{ (21012210), (21011012) \}\). The matrix \(\varvec{A}^{\rho _3}(\varvec{x})\) is given by

$$\begin{aligned} \varvec{A}^{\rho _3}(\varvec{x}) = \begin{pmatrix} 1 &{}\quad 0 &{}\quad 2 &{}\quad 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 2 &{}\quad 0 \\ 1 &{}\quad 0 &{}\quad 2 &{}\quad 1 &{}\quad 1 &{}\quad 0 \end{pmatrix}. \end{aligned}$$

Applying Lemma 15, yields \(|S^{\rho ^D_\ell }_1(\varvec{x})| \le 2\).

Appendix B: Equivalence of palindromic duplications errors in one word

In this section we derive conditions that two palindromic duplications, respectively deletions at two different positions i and \(i+j\) with \(j > 0\) result in the same word \(\rho _{\ell }(\varvec{x},i) = \rho _{\ell }(\varvec{x},i+j)\), respectively \(\rho ^D_{\ell }(\varvec{x},i) = \rho ^D_{\ell }(\varvec{x},i+j)\) for palindromic deletions. For \(j < \ell \) the condition \(\rho _{\ell }(\varvec{x},i) = \rho _{\ell }(\varvec{x},i+j)\) can be expressed as (the left hand side of the equations corresponds to \(\rho _{\ell }(\varvec{x},i+j)\) and the right hand side to \(\rho _{\ell }(\varvec{x},i)\))

$$\begin{aligned} x_{i+\ell +1+m}&= x_{i+\ell -m},&\quad m&\in \{0, \dots , j-1\}, \end{aligned}$$
(10a)
$$\begin{aligned} x_{i+\ell +2j-m}&= x_{i+\ell -m},&\quad m&\in \{j, \dots , \ell -1\}, \end{aligned}$$
(10b)
$$\begin{aligned} x_{i+\ell +2j-m}&= x_{i+1+m},&\quad m&\in \{\ell , \dots , \ell +j-1\}. \end{aligned}$$
(10c)

For \(j \ge \ell \) these conditions are

$$\begin{aligned} x_{i+\ell +1+m}&= x_{i+\ell -m},&\quad m&\in \{0, \dots , \ell -1\}, \end{aligned}$$
(11a)
$$\begin{aligned} x_{i+\ell +1+m}&= x_{i+1+m},&\quad m&\in \{\ell , \dots , j-1\}, \end{aligned}$$
(11b)
$$\begin{aligned} x_{i+\ell +2j-m}&= x_{i+1+m},&\quad m&\in \{j, \dots , \ell +j-1\}. \end{aligned}$$
(11c)

The conditions \(\rho ^D_{\ell }(\varvec{x},i) = \rho ^D_{\ell }(\varvec{x},i+j)\) for \(j>0\) are

$$\begin{aligned} x_{i+\ell +1+m}&= x_{i+\ell -m}, \quad&m&\in \{0, \dots , \ell -1\}, \end{aligned}$$
(12a)
$$\begin{aligned} x_{i+\ell +j+1+m}&= x_{i+\ell +j-m}, \quad&m&\in \{0, \dots , \ell -1\}, \end{aligned}$$
(12b)
$$\begin{aligned} x_{i+2\ell +1+m}&= x_{i+\ell +1+m}, \quad&m&\in \{0, \dots , j-1\}. \end{aligned}$$
(12c)

Appendix C: Equivalence of palindromic duplications in two words

In this section we derive conditions that two palindromic duplications at two different positions i and \(i+j\) with \(j > 0\) result in the same word \(\rho _{\ell }(\varvec{x},i) = \rho _{\ell }(\varvec{y},i+j)\). For \(j < \ell \) the condition \(\rho _{\ell }(\varvec{x},i) = \rho _{\ell }(\varvec{y},i+j)\) can be expressed as

$$\begin{aligned} x_{m}&= y_{m},&\quad m&\in \{1, \dots , i+\ell \}\cup \{ i+j+\ell +1, \dots , n \}, \end{aligned}$$
(13a)
$$\begin{aligned} x_{i+\ell -m}&= y_{i+\ell +1+m},&\quad m&\in \{0, \dots , j-1\}, \end{aligned}$$
(13b)
$$\begin{aligned} x_{i+1+m}&= y_{i+2j+1+m},&\quad m&\in \{0, \dots , \ell -j-1\}, \end{aligned}$$
(13c)
$$\begin{aligned} x_{i+\ell +1+m}&= y_{i+2j-m},&\quad m&\in \{0, \dots , j-1\}. \end{aligned}$$
(13d)

For \(j \ge \ell \) these conditions are

$$\begin{aligned} x_{m}&= y_{m},&\quad m&\in \{1, \dots , i+\ell \}\cup \{ i+j+\ell +1, \dots , n \}, \end{aligned}$$
(14a)
$$\begin{aligned} x_{i+\ell -m}&= y_{i+\ell +1+m},&\quad m&\in \{0, \dots , \ell -1\}, \end{aligned}$$
(14b)
$$\begin{aligned} x_{i+\ell +1+m}&= y_{i+2\ell +1+m},&\quad m&\in \{0, \dots , j-\ell -1\}, \end{aligned}$$
(14c)
$$\begin{aligned} x_{i+j+1+m}&= y_{i+j+\ell -m},&\quad m&\in \{0, \dots , \ell -1\}. \end{aligned}$$
(14d)

The conditions \(\rho ^D_{\ell }(\varvec{x},i) = \rho ^D_{\ell }(\varvec{y},i+j)\) for \(j>0\) are

$$\begin{aligned} x_{m}&= y_{m},&\quad m&\in \{1, \dots , i+\ell \}\cup \{ i+j+2\ell +1, \dots , n \}, \end{aligned}$$
(15a)
$$\begin{aligned} x_{i+\ell -m}&= x_{i+\ell +1+m},&\quad m&\in \{0, \dots , \ell -1\}, \end{aligned}$$
(15b)
$$\begin{aligned} y_{i+j+\ell -m}&= y_{i+j+\ell +1+m},&\quad m&\in \{0, \dots , \ell -1\}, \end{aligned}$$
(15c)
$$\begin{aligned} x_{i+2\ell +1+m}&= y_{i+\ell +1+m},&\quad m&\in \{0, \dots , j-1\}. \end{aligned}$$
(15d)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lenz, A., Wachter-Zeh, A. & Yaakobi, E. Duplication-correcting codes. Des. Codes Cryptogr. 87, 277–298 (2019). https://doi.org/10.1007/s10623-018-0523-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10623-018-0523-0

Keywords

Mathematics Subject Classification

Navigation