Application of the Burrows-Wheeler Transform for Searching for Approximate Tandem Repeats

  • Agnieszka Danek
  • Rafał Pokrzywa
  • Izabela Makałowska
  • Andrzej Polański
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7632)


Tandem repeats (TRs) are contiguous copies of repeating patterns, which may be either exact or approximate. Approximate tandem repeats (ATRs) in a genomic sequences are adjacent copies of a repeating pattern of nucleotides, where similarity is defined by a suitable measure. Both TRs and ATRs are used in forensic analysis, DNA mapping, testing for inherited diseases and many evolutionary studies. All their functions and roles are not well defined and remains a subject of ongoing investigation. However, growing biological databases together with tools to look for such repeats may lead to better understanding of their behavior. This paper presents our method for searching for ATRs defined on the basis of the model of substitution mutations and its comparison to two other tools. The capabilities and limitations of methods are analyzed and results obtained with each tool are investigated.


approximate tandem repeats Burrows-Wheeler transform suffix array Hamming distance 


  1. 1.
    Chakraborty, R., Kimmel, M., Stivers, D.N., Davison, L.J., Deka, R.: Relative mutation rates at di-, tri-, and tetranucleotide microsatellite loci. PNAS 94, 1041–1046 (1997)CrossRefGoogle Scholar
  2. 2.
    Kruglyak, S., Durrett, R.T., Schug, M.D., Aquadro, C.F.: Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations. PNAS 95, 10774–10778 (1998)CrossRefGoogle Scholar
  3. 3.
    Pumpernik, D., Oblak, B., Borštnik, B.: Replication slippage versus point mutation rates in short tandem repeats of the human genome, Mol. Genet. Genomics 279(1), 53–61 (2008)CrossRefGoogle Scholar
  4. 4.
    Leclercq, S., Rivals, E., Jarne, P.: DNA slippage occurs at microsatellite loci without minimal threshold length in humans: a comparative genomic approach. Genome Biol. Evol. 2, 325–335 (2010)CrossRefGoogle Scholar
  5. 5.
    Vinces, M.D., Legendre, M., Caldara, M., Hagihara, M., Verstrepen, K.J.: Unstable Tandem Repeats in Promoters Confer Transcriptional Evolvability. Science 324, 1213 (2009)CrossRefGoogle Scholar
  6. 6.
    McMurray, C.T.: Mechanisms of trinucleotide repeat instability during human development. Nat. Rev. Genet. 11(11), 786–799 (2010)CrossRefGoogle Scholar
  7. 7.
    Jeffreys, A.J., Wilson, V., Thein, S.L.: Individual-specific ’fingerprints’ of human DNA. Nature 316, 76–79 (1985)CrossRefGoogle Scholar
  8. 8.
    Weber, J.L., Wong, C.: Mutation of human short tandem repeats. Hum. Mol. Genet. 2, 1123–1128 (1993)CrossRefGoogle Scholar
  9. 9.
    Merkel, A., Gemmell, N.: Detecting short tandem repeats from genome data: opening the software black box. Brief. Bioinform. 9(5), 355–366 (2008)CrossRefGoogle Scholar
  10. 10.
    Saha, S., Bridges, S., Magbanua, Z.V., Peterson, D.G.: Empirical comparison of ab initio repeat finding programs. Nucleic Acids Res. 36(7), 2284–2294 (2008)CrossRefGoogle Scholar
  11. 11.
    Lerat, E.: Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity 104(6), 520–533 (2009)CrossRefGoogle Scholar
  12. 12.
    Leclercq, S., Rivals, E., Jarne, P.: Detecting microsatellites within genomes: significant variation among algorithms. BMC Bioinformatics 8, 125 (2007)CrossRefGoogle Scholar
  13. 13.
    Smit, A.F.A., Hubley, R., Green, P.: RepeatMasker,
  14. 14.
    Frith, M.C.: A new repeat-masking method enables specific detection of homologous sequences. Nucleic Acids Res. 39(4), e23 (2011)Google Scholar
  15. 15.
    Pokrzywa, R., Polanski, A.: BWtrs: A tool for searching for tandem repeats in DNA sequences based on the Burrows-Wheeler transform. Genomics 96, 316–321 (2010)CrossRefGoogle Scholar
  16. 16.
    Pellegrini, M., Renda, M.E., Vecchio, A.: TRStalker: an efficient heuristic for finding fuzzy tandem repeats. Bioinformatics 26(12), 358–366 (2010)CrossRefGoogle Scholar
  17. 17.
    Kolpakov, R., Bana, G., Kucherov, G.: mreps: efficient and flexible detection of tandem repeats in DNA. Nucleid Acids Research 31, 3672–3678 (2003)CrossRefGoogle Scholar
  18. 18.
    Kurtz, S., Choudhuri, J.V., Ohlebusch, E., Schleiermacher, C., Stoye, J., Giegerich, R.: REPuter: The Manifold Applications of Repeat Analysis on a Genomic Scale. Nucleic Acids Res. 29(22), 4633–4642 (2001)CrossRefGoogle Scholar
  19. 19.
    Ruitberg, C.M., Reeder, D.J., Butler, J.M.: STRBase: a short tandem repeat DNA database for the human identity testing community. Nucleic Acids Res. 29(1), 320–322 (2001)CrossRefGoogle Scholar
  20. 20.
    Gelfand, Y., Rodriguez, A., Benson, G.: TRDB—The Tandem Repeats Database. Nucleic Acids Res. 35 (suppl. 1), D80–D87 (2007)Google Scholar
  21. 21.
    Sokol, D, Atagun, F.: TRedD—a database for tandem repeats over the edit distance. Database 2010, article ID baq003, 10.1093/database/baq003 (2010)Google Scholar
  22. 22.
    Danek, A., Pokrzywa, R.: Finding Approximate Tandem Repeats with the Burrows-Wheeler Transform. International Journal of Medical and Biological Sciences 6, 8–12 (2012)Google Scholar
  23. 23.
    Benson, G.: Tandem Repeats Finder: a program to analyze DNA sequences. Nucleic Acids Research 27, 573–580 (1999)CrossRefGoogle Scholar
  24. 24.
    Pokrzywa, R.: Application of the Burrows-Wheeler Transform for searching for tandem repeats in DNA sequences. Int. J. Bioinf. Res. Appl. 5, 432–446 (2009)CrossRefGoogle Scholar
  25. 25.
    Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm, SRC Research Report 124, Digital Equipment Corporation, California (1994)Google Scholar
  26. 26.
    Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE Computer Society, Washington, DC (2000)Google Scholar
  27. 27.
  28. 28.
    Tandem Repeat Finder,
  29. 29.
    Bhargava, A., Fuentes, F.F.: Mutational Dynamics of Microsatellites. Molecular Biotechnology 44(3), 250–266 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Agnieszka Danek
    • 1
  • Rafał Pokrzywa
    • 1
  • Izabela Makałowska
    • 2
  • Andrzej Polański
    • 1
  1. 1.Institute of InformaticsSilesian University of TechnologyGliwicePoland
  2. 2.Laboratory of Bioinformatics, Faculty of BiologyAdam Mickiewicz UniversityPoznańPoland

Personalised recommendations