Duplication in DNA Sequences

  • Masami Ito
  • Lila Kari
  • Zachary Kincaid
  • Shinnosuke SekiEmail author
Part of the Natural Computing Series book series (NCS)


The duplication and repeat-deletion operations are the basis of a formal language theoretic model of errors that can occur during DNA replication. During DNA replication, subsequences of a strand of DNA may be copied several times (resulting in duplications) or skipped (resulting in repeat-deletions). As formal language operations, iterated duplication and repeat-deletion of words and languages have been well studied in the literature. However, little is known about single-step duplications and repeat-deletions. In this paper, we investigate several properties of these operations, including closure properties of language families in the Chomsky hierarchy and equations involving these operations. We also make progress toward a characterization of regular languages that are generated by duplicating a regular language.


Regular Language Closure Property Language Family Formal Language Theory Language Equation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bichara M, Wagner J, Lambert IB (2006) Mechanisms of tandem repeat instability in bacteria. Mut Res 598(1–2):144–163 Google Scholar
  2. 2.
    Dassow J, Mitrana V, Păun Gh (1999) On the regularity of duplication closure. Bull EATCS 69:133–136 zbMATHGoogle Scholar
  3. 3.
    Dassow J, Mitrana V, Salomaa A (2002) Operations and language generating devices suggested by the genome evolution. Theor Comput Sci 270:701–738 zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Garcia-Diaz M, Kunkel TA (2006) Mechanism of a genetic glissando: structural biology of indel mutations. Trends Biochem Sci 31(4):206–214 CrossRefGoogle Scholar
  5. 5.
    Gu Z, Steinmetz LM, Gu X, Scharfe G, Davis RW, Li W-H (2003) Role of duplicate genes in genetic robustness against null mutations. Nature 421:63–66 CrossRefGoogle Scholar
  6. 6.
    Harrison MA (1978) Introduction to formal language theory. Addison–Wesley, Reading zbMATHGoogle Scholar
  7. 7.
    Ito M (2004) Algebraic theory of automata and languages. World Scientific, Singapore zbMATHGoogle Scholar
  8. 8.
    Ito M, Kari L, Kincaid Z, Seki S (2008) Duplication in DNA sequences. In: Ito M, Toyama M (eds) DLT 2008. Lecture notes in computer science, vol 5257. Springer, Berlin, pp 419–430 Google Scholar
  9. 9.
    Ito M, Leupold P, S-Tsuji K (2006) Closure of language classes under bounded duplication. In: Ibarra OH, Dang Z (eds) DLT 2006. Lecture notes in computer science, vol 4036. Springer, Berlin, pp 238–247 Google Scholar
  10. 10.
    Leupold P (2007) Duplication roots. In: Harju T, Karhumäki J, Lepistö A (eds) DLT 2007. Lecture notes in computer science, vol 4588. Springer, Berlin, pp 290–299 Google Scholar
  11. 11.
    Leupold P (2006) Languages generated by iterated idempotencies and the special case of duplication. PhD thesis, Department de Filologies Romaniques, Facultat de Lletres, Universitat Rovira i Virgili, Tarragona, Spain Google Scholar
  12. 12.
    Leupold P, M-Vide C, Mitrana V (2005) Uniformly bounded duplication languages. Discrete Appl Math 146(3):301–310 zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Leupold P, Mitrana V, Sempere J (2004) Formal languages arising from gene repeated duplication. In: Aspects of molecular computing. Essays in honour of Tom Head on his 70th birthday. Lecture notes in computer science, vol 2950. Springer, Berlin, pp 297–308 Google Scholar
  14. 14.
    Lothaire M (1983) Combinatorics on words. Encyclopedia of mathematics and its applications, vol 17. Addison–Wesley, Reading zbMATHGoogle Scholar
  15. 15.
    Lyndon RC, Schützenberger MP (1962) On the equation a M=b N c P in a free group. Mich Math J 9:289–298 zbMATHCrossRefGoogle Scholar
  16. 16.
    M-Vide C, Păun Gh (1999) Duplication grammars. Acta Cybern 14:151–164 Google Scholar
  17. 17.
    Mitrana V, Rozenberg G (1999) Some properties of duplication grammars. Acta Cybern 14:165–177 zbMATHMathSciNetGoogle Scholar
  18. 18.
    Reis CM, Shyr HJ (1978) Some properties of disjunctive languages on a free monoid. Inf Control 37:334–344 zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Ross R, Winklmann K (1982) Repetitive strings are not context-free. RAIRO Inform Theor 16(3):191–199 zbMATHMathSciNetGoogle Scholar
  20. 20.
    Rozenberg G, Salomaa A (eds) (1997) Handbook of formal languages. Springer, Berlin zbMATHGoogle Scholar
  21. 21.
    Searls DB (1993) The computational linguistics of biological sequences. In: Hunter L (ed) Artificial intelligence and molecular biology. AAAI Press/MIT Press, Menlo Park, pp 47–120 Google Scholar
  22. 22.
    Yu SS (2005) Languages and codes. Lecture notes. Department of Computer Science, National Chung-Hsing University, Taichung, Taiwan 402 Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Masami Ito
    • 1
  • Lila Kari
    • 2
  • Zachary Kincaid
    • 2
    • 3
  • Shinnosuke Seki
    • 2
    Email author
  1. 1.Department of Mathematics, Faculty of ScienceKyoto Sangyo UniversityKyotoJapan
  2. 2.Department of Computer ScienceUniversity of Western OntarioLondonCanada
  3. 3.Department of MathematicsUniversity of Western OntarioLondonCanada

Personalised recommendations