Skip to main content

Finding and Characterizing Repeats in Plant Genomes

  • Protocol
Plant Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1374))

Abstract

Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of available software that can help biologists to look for these repeats and check some hypothetical models intended to characterize their structures. Since transposable elements are a major source of repeats in plants, many methods have been used or developed for this large class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided a whole section on this topic as well as a selection of the main existing software. In order to better understand how they work and how repeats may be efficiently found in genomes, it is necessary to look at the technical issues involved in the large-scale search of these structures. Indeed, it may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of the search for repeats and more complex patterns. The second section introduces the key concepts that are useful for understanding the current state of the art in playing with words, applied to genomic sequences. This can be seen as the first stage of a very general approach called linguistic analysis that is interested in the analysis of natural or artificial texts. Words, the lexical level, correspond to simple repeated entities in texts or strings. In fact, biologists need to represent more complex entities where a repeat family is built on more abstract structures, including direct or inverted small repeats, motifs, composition constraints as well as ordering and distance constraints between these elementary blocks. In terms of linguistics, this corresponds to the syntactic level of a language. The last section introduces concepts and practical tools that can be used to reach this syntactic level in biological sequence analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://bergmanlab.smith.man.ac.uk/?page_id=295

References

  1. Barghini E et al (2014) The peculiar landscape of repetitive sequences in the olive (Olea europaea L.) genome. Genome Biol Evol 6:776–791. doi:10.1093/gbe/evu058

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  2. Novák P et al (2013) RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29:792–793. doi:10.1093/bioinformatics/btt054

    Article  PubMed  CAS  Google Scholar 

  3. Lim KG et al (2013) Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Brief Bioinform 14:67–81. doi:10.1093/bib/bbs023

    Article  PubMed  Google Scholar 

  4. Nakamura K et al (2011) Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res 39:e90. doi:10.1093/nar/gkr344

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. Luo C et al (2012) Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample. PLoS One 7:e30087. doi:10.1371/journal.pone.0030087

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  6. Jurka J et al (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110:462–467

    Article  CAS  PubMed  Google Scholar 

  7. Bergman CM, Quesneville H (2007) Discovering and detecting transposable elements in genome sequences. Brief Bioinform 8(6):382–392

    Article  CAS  PubMed  Google Scholar 

  8. Kurtz S et al (2008) A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes. BMC Genomics 9:517

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  9. Kurtz S et al (2001) REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res 29(22):4633–4642

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  10. Volfovsky N, Haas BJ, Salzberg SL (2001) A clustering method for repeat analysis in DNA sequences. Genome Biol 2(8):RESEARCH0027

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Morgulis A et al (2006) WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22(2):134–141

    Article  CAS  PubMed  Google Scholar 

  12. Marcais G, Kingsford C (2011) A fast lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764–770

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  13. Gu W et al (2008) Identification of repeat structure in large genomes using repeat probability clouds. Anal Biochem 380(1):77–83

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  14. Achaz G et al (2007) Repseek, a tool to retrieve approximate repeats from large DNA sequences. Bioinformatics 23(1):119–121

    Article  CAS  PubMed  Google Scholar 

  15. Kurtz S, Myers G (1997) Estimating the probability of approximate matches. In Proceedings of 8th symposium on combinatorial pattern matching, Arhus, Denmark, June/July 1997. Lecture notes in computer science, vol 1264. Springer, pp 52–64

    Google Scholar 

  16. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  17. Altschul SF et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410

    Article  CAS  PubMed  Google Scholar 

  18. Edgar RC, Myers EW (2005) PILER: identification and classification of genomic repeats. BMC Bioinformatics 9:18

    Google Scholar 

  19. Bao Z, Eddy SR (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12(8):1269–1276

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  20. DeBarry J, Liu R, Bennetzen J (2008) Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm. BMC Bioinformatics 9(1):235. doi:10.1186/1471-2105-9-235

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  21. Johnson M et al (2008) NCBI BLAST: a better web interface. Nucleic Acids Res 36:W5–W9

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  22. Advanced Biocomputing, LLC (2009) AB-BLAST [En ligne]. http://blast.advbiocomp.com/

  23. Schäffer AA et al (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29(14):2994–3005. doi:10.1093/nar/29.14.2994

    Article  PubMed Central  PubMed  Google Scholar 

  24. Jurka J et al (1996) CENSOR - a program for identification and elimination of repetitive elements from DNA sequences. Comput Chem 20(1):119–122

    Article  CAS  PubMed  Google Scholar 

  25. Smit AFA, Hubley R, Green P (1996–2010) RepeatMasker Open-3.0 [En ligne]. http://www.repeatmasker.org/

  26. Tempel S (2012) Using and understanding RepeatMasker. Methods Mol Biol 859:29–51

    Article  CAS  PubMed  Google Scholar 

  27. Kennedy RC et al (2011) An automated homology-based approach for identifying transposable elements. BMC Bioinformatics 12:130

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  28. Haas BJ (2010) TransposonPSI [En ligne]. http://transposonpsi.sf.net

  29. Robb SC et al (2013) The use of RelocaTE and unassembled short reads to produce high-resolution snapshots of transposable element generated diversity in rice. G3 3(6):949–957. doi:10.1534/g3.112.005348

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  30. Han Y, Burnette JM, Wessler SR (2009) TARGeT: a web-based pipeline for retrieving and characterizing gene and transposable element families from genomic sequences. Nucleic Acids Res 37(11):e78

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  31. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  32. Price MN, Dehal PS, Arkin AP (2009) FastTree: Computing large minimum-evolution trees with profiles instead of a distance Matrix. Mol Biol Evol 26:1641–1650

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  33. Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9:868–877

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  34. Larkin MA et al (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23(21):2947–2948

    Article  CAS  PubMed  Google Scholar 

  35. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 20(18):573–580

    Article  Google Scholar 

  36. Green P (1993–1996) phrap/cross_match/swat documentation [En ligne]. http://www.phrap.org/phredphrap/general.html.

  37. TimeLogic (2014). Decypher [En ligne]. http://www.timelogic.com/

  38. Smit A (2013) RMBlast [En ligne]. http://www.repeatmasker.org/RMBlast.html

  39. Smith JD (2010) Process_hits [En ligne]. http://sourceforge.net/projects/processhits/files/README.txt/download.

  40. Pereira V (2008) Automated paleontology of repetitive DNA with REANNOTATE. BMC Genomics 9:614. doi:10.1186/1471-2164-9-614

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  41. Smith CD et al (2007) Improved repeat identification; masking in Dipterans. Gene 389(1):1–9

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  42. Bailly-Bechet M, Haudry A, Lerat E (2014) One code to find them all: a perl tool to conveniently parse RepeatMasker output files. Mob DNA 5:13. doi:10.1186/1759-8753-5-13

    Article  PubMed Central  CAS  Google Scholar 

  43. Keane TM, Wong K, Adams DJ (2012) RetroSeq: transposable element discovery from next-generation sequencing data. Bioinformatics 29(3):389–390

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  44. Fiston-Lavier AS et al (2011) T-lex: a program for fast and accurate assessment of transposable element presence using next-generation sequencing data. Nucleic Acids Res 39(6):e36

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  45. Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  46. Zhang Y, Zaki MJ (2006) SMOTIF: efficient structured pattern and profile motif search. Algorithms Mol Biol 1:22

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  47. Morgante M et al (2005) Structured motifs search. J Comput Biol 12(8):1065–1082. doi:10.1089/cmb.2005.12.1065

    Article  CAS  PubMed  Google Scholar 

  48. Nicolas J et al (2005) Suffix-tree analyser (STAN): looking for nucleotidic and peptidic patterns in chromosomes. Bioinformatics 21(24):4408–4410

    Article  CAS  PubMed  Google Scholar 

  49. Estill JC, Bennetzen JL (2009) The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes. Plant Methods 5(1):8

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  50. Flutre T et al (2011) Considering transposable element diversification in de novo annotation approaches. PLoS One 6(1):e16526

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  51. Leroy P et al (2012) TriAnnot: a versatile. High performance pipeline for the automated annotation of plant genomes. Front Plant Sci 3:5

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  52. Singh V, Mishra R (2010) RISCI - Repeat Induced Sequence Changes Identifier: a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify repeat induced sequence changes in closely related genomes. BMC Bioinformatics 11:609. doi:10.1186/1471[--]2105-11-609

    Article  PubMed Central  PubMed  Google Scholar 

  53. McCarthy EM, McDonald JF (2003) LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19:362–367

    Article  CAS  PubMed  Google Scholar 

  54. Kalyanaraman A, Aluru S (2006) Efficient algorithms and software for detection of full-length LTR retrotransposons. J Bioinform Comput Biol 4(2):197–216

    Article  CAS  PubMed  Google Scholar 

  55. Xu Z, Wang H (2007) LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res 35:W265–W268

    Article  PubMed Central  PubMed  Google Scholar 

  56. Tu Z (2001) Eight novel families of miniature inverted repeat transposable elements in the African malaria mosquito, Anopheles gambiae. PNAS 98:1699–1704

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  57. Rho M et al (2007) De novo identification of LTR retrotransposons in eukaryotic genomes. BMC Genomics 8:90

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  58. Kronmiller BA, Wise RP (2008) TEnest: automated chronological annotation and visualization of nested plant transposable elements. Plant Physiol 146:45–59

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  59. Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21(1):351–358

    Article  Google Scholar 

  60. Quesneville H, Nouaud D, Anxolabéhère D (2003) Detection of new transposable element families in Drosophila melanogaster. Anopheles gambiae genomes. J Mol Evol 57(1):S50–S59

    Article  CAS  PubMed  Google Scholar 

  61. Huang X (1994) On global sequence alignment. Comput Appl Biosci 10:227–235

    CAS  PubMed  Google Scholar 

  62. Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9:286–298

    Article  CAS  PubMed  Google Scholar 

  63. Kolpakov R, Bana G, Kucherov G (2003) mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res 31:3672–3678

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  64. Pereira V (2004) Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome. Genome Biol 5(10):R79

    Article  PubMed Central  PubMed  Google Scholar 

  65. Ellinghaus D, Kurtz S, Willhoeft U (2008) LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9:18

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  66. Gremme G, Steinbiss S, Kurtz S (2013) GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans Comput Biol Bioinform 10(3):645–656

    Article  PubMed  Google Scholar 

  67. Darzentas N et al (2010) MASiVE: mapping and analysis of SireVirus elements in plant genome sequences. Bioinformatics 26(19):2452–2454

    Article  CAS  PubMed  Google Scholar 

  68. Kurtz S (2011) Vmatch: large scale sequence analysis software [En ligne]. http://www.vmatch.de/vmweb.pdf

  69. Birney E, Clamp M, Durbin R (2004) Genewise and genomewise. Genome Res 14:988–995

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  70. Rho M, Tang H (2009) MGEScan-non-LTR: computational identification and classification of autonomous non-LTR retrotransposons in eukaryotic genomes. Nucleic Acids Res 37(21):e143

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  71. Lucier JF et al (2007) RTAnalyzer: a web application for finding new retrotransposons and detecting L1 retrotransposition signatures. Nucleic Acids Res 35:W269–W274

    Article  PubMed Central  PubMed  Google Scholar 

  72. Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16:276–277

    Article  CAS  PubMed  Google Scholar 

  73. Santiago N et al (2002) Genome-wide analysis of the Emigrant family of MITEs of Arabidopsis thaliana. Mol Biol Evol 19(12):2285–2293

    Article  CAS  PubMed  Google Scholar 

  74. Gordon AD (1999) Classification. Chapman & Hall, New York

    Google Scholar 

  75. Myers G (1998) A fast bit-vector algorithm for approximate string matching based on dynamic progamming. In: Ninth combinatorial pattern matching conference, vol 1448, LNCS series. Springer, New York, pp 1–13

    Chapter  Google Scholar 

  76. Warburton PE et al (2004) Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes. Genome Res 14(10A):1861–1869

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  77. Chen Y, Zhou F, Li G, Xu Y (2009) MUST: a system for identification of miniature inverted-repeat transposable elements and applications to Anabaena variabilis and Haloquadratum walsbyi. Gene 436(1-2):1–7

    Article  CAS  PubMed  Google Scholar 

  78. Lu C et al (2012) Miniature inverted-repeat transposable elements (MITEs) have been accumulated through amplification bursts and play important roles in gene expression and species diversity in Oryza sativa. Mol Biol Evol 29(3):1005–1017. doi:10.1093/molbev/msr282

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  79. Han Y, Wessler SR (2010) MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res 38(22):e199

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  80. Yang G (2013) MITE Digger, an efficient and accurate algorithm for genome wide discovery of miniature inverted repeat transposable elements. BMC Bioinformatics 14:186. doi:10.1186/1471-2105-14-186

    Article  PubMed Central  PubMed  Google Scholar 

  81. Dongen SV (2008) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30:121–141

    Article  Google Scholar 

  82. Yang L, Bennetzen JL (2009) Structure-based discovery and description of plant and animal Helitrons. Proc Natl Acad Sci U S A 106(31):12832–12837

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  83. Markham N, Zuker M (2005) DINAMelt web server for nucleic acid melting prediction. Nucleic Acids Res 33:577–581

    Article  CAS  Google Scholar 

  84. Charras C, Lecroq T (2004) Handbook of exact string matching algorithms. King’s College publications, London. ISBN 0954300645

    Google Scholar 

  85. Weiner, P. (1973) Linear pattern matching algorithms. IEEE Computer Society Washington, DC, USA. SWAT '73 Proceedings of the 14th annual symposium on switching and automata theory, pp 1–11. doi:10.1109/SWAT.1973.13

  86. Ukkonen E (1995) On-line construction of suffix trees. Algorithmica 14(3):249–260. doi:10.1007/BF01206331

    Article  Google Scholar 

  87. Aluru S, Ko P (2006) In: Aluru S (ed) Handbook of computational molecular biology, Computer and information science series. Chapman & Hall, New York, Chapter 5 and 6

    Google Scholar 

  88. Välimäki N et al (2007) Compressed suffix tree--a basis for genome-scale sequence analysis. Bioinformatics 23(5):629–630. doi:10.1093/bioinformatics/btl681

    Article  PubMed  Google Scholar 

  89. Mäkinen V (2013) Compressed Suffix Tree [En ligne]. http://www.cs.helsinki.fi/group/suds/cst/

  90. Manber U, Myers G (1993) Suffix arrays: a new method for on-line string searches. SIAM J Comput 22:935–948. doi:10.1137/0222058

    Article  Google Scholar 

  91. Kärkkäinen J, Sanders P, Burkhardt S (2006) Linear work suffix array construction. J ACM 53(6):918–936. doi:10.1145/1217856.1217858

    Article  Google Scholar 

  92. Nong G, Zhang S, Chan WH (2011) Two efficient algorithms for linear time suffix array construction. IEEE Trans Comput 60(10):1471–1484. doi:10.1109/TC.2010.188

    Article  Google Scholar 

  93. Shrestha AMS, Frith MC, Horton P (2014) A bioinformatician’s guide to the forefront of suffix array construction algorithms. Brief Bioinform. doi:10.1093/bib/bbt081

    PubMed Central  PubMed  Google Scholar 

  94. Weiss D (2011) jsuffixarrays [En ligne]. https://github.com/carrotsearch/jsuffixarrays

  95. Barenbaum P et al (2013) Efficient repeat finding in sets of strings via suffix arrays. Dis Math Theor Comput Sci 15(2):59–70

    Google Scholar 

  96. Becher V (2013) findrepset [En ligne]. http://www.dc.uba.ar/people/profesores/becher/software/findrepset.tar.bz2

  97. Burrows M, Wheeler DJ (1994) A block sorting lossless data compression algorithm. Digital Equipment Corporation, Palo Alto, Technical Report. 124

    Google Scholar 

  98. Ferragina P, Manzini G (2000) Opportunistic data structures with applications. FOCS '00 Proceedings of the 41st annual symposium on foundations of computer science, pp 390–398. doi:10.1109/SFCS.2000.892127

  99. Ferragina P, Manzini G (2001) An experimental study of an opportunistic index. Proceedings of the twelfth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, Washington, DC, pp 269–278. ISBN 0-89871-490-7.

    Google Scholar 

  100. Ferragina P, Navarro G (2005) Compressed indexes and their Testbeds [En ligne]. http://pizzachili.di.unipi.it/

  101. Jenkin B (2012) SpookyHash [En ligne]. http://burtleburtle.net/bob/hash/spooky.html

  102. Google (2012) Sparsehash [En ligne]. http://code.google.com/p/sparsehash/

  103. Zhao Y, Tang H, Ye Y (2012) RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28(1):125–126. doi:10.1093/bioinformatics/btr595

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  104. Zhao Y, Ye Y (2014) RAPSearch2 [En ligne]. http://omics.informatics.indiana.edu/mg/RAPSearch2/

  105. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25(14):1754–1760. doi:10.1093/bioinformatics/btp324

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  106. Noe L, Kucherov G (2005) YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acids Res 33(2):W540–W543

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  107. Noe L (2013) Yass [En ligne]. http://bioinfo.lifl.fr/yass/

  108. Mora JRH et al (2010) Sequence analysis of two alleles reveals that intra- and intergenic recombination played a role in the evolution of the radish fertility restorer (Rfo). BMC Plant Biol 10:35

    Article  CAS  Google Scholar 

  109. Horton P, Kiełbasa SM, Frith MC (2008) DisLex: a transformation for discontiguous suffix array construction. Workshop on knowledge, language, and learning in bioinformatics, KLLBI. Pacific Rim International Conferences on Artificial Intelligence (PRICAI). pp 1–11

    Google Scholar 

  110. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197. doi:10.1016/0022-2836(81)90087-5

    Article  CAS  PubMed  Google Scholar 

  111. Kurtz S et al (2004) Versatile and open software for comparing large genomes. Genome Biol 5(2):12. doi:10.1186/gb-2004-5-2-r12

    Article  Google Scholar 

  112. Sedlazeck FJ, von Rescheneder P, Haeseler A (2013) NextGenMap: fast and accurate read mapping in highly polymorphic genomes. Bioinformatics 29(21):2790–2791. doi:10.1093/bioinformatics/btt468

    Article  CAS  PubMed  Google Scholar 

  113. Sedlazeck FJ, Rescheneder P (2014) NextGenMap [En ligne]. http://cibiv.github.io/NextGenMap/

  114. Iqbal Z et al (2012) De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet 44(2):226–232. doi:10.1038/ng.1028

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  115. Peterlongo P (2014) discoSnp [En ligne]. http://colibread.inria.fr/software/discosnp/

  116. Koch P, Platzer M, Downie BR (2014) RepARK—de novo creation of repeat libraries from whole-genome NGS reads. Nucleic Acids Res 42(9):e80. doi:10.1093/nar/gku210

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  117. Slater GS, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6:31. doi:10.1186/1471-2105-6-31

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  118. Ioannidis JPA et al (2009) Replication of analysis of published microarray gene expression analyses. Nat Genet 41(2):149–155. doi:10.1038/ng.295

    Article  CAS  PubMed  Google Scholar 

  119. Wolstencroft K et al (2013) The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res 41(W1):W557–W561. doi:10.1093/nar/gkt328

    Article  PubMed Central  PubMed  Google Scholar 

  120. de Castro E et al (2006) ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res 34(Web Server issue):W362–W365. doi:10.1093/nar/gkl124

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  121. de Castro E (2002) ps_scan [En ligne]. ftp://ftp.expasy.org/databases/prosite/ps_scan/

  122. Datta S, Mukhopadhyay S (2013) A composite method based on formal grammar and DNA structural features in detecting human polymerase II. PLoS One 8(2):e54843. doi:10.1371/journal.pone.0054843

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  123. Macke T et al (2001) RNAMotif: A new RNA secondary structure definition and discovery algorithm. Nucleic Acids Res 29(22):4724–4735. doi:10.1093/nar/29.22.4724

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  124. Macke T (2010) RNAMotif [En ligne]. http://casegroup.rutgers.edu/casegr-sh-2.5.html

  125. Reeder J, Reeder J, Giegerich R (2007) Locomotif: from graphical motif description to RNA motif search. Bioinformatics 23(13):392–400. doi:10.1093/bioinformatics/btm179

    Article  CAS  Google Scholar 

  126. Meyer F et al (2011) Structator: fast index-based search for RNA sequence-structure patterns. BMC Bioinformatics 12:214. doi:10.1186/1471-2105-12-214

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  127. Abouelhoda MI, Kurtz S, Ohlebusch E (2004) Replacing suffix trees with enhanced suffix arrays. J Dis Algorithms 2(1):53–86. doi:10.1016/S1570-8667(03)00065-0

    Article  Google Scholar 

  128. Nussbaumer T et al (2013) MIPS PlantsDB: a database framework for comparative plant genome research. Nucleic Acids Res 41(Database issue):D1144–D1151

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  129. Brendel V (2007) Pattern Search [En ligne]. http://www.plantgdb.org/cgi-bin/vmatch/patternsearch.pl

  130. Jensen K, Stephanopoulos G, Rigoutsos I (2002) Biogrep: a multi–threaded pattern matcher for large pattern sets. kljensen/biogrep GitHub [En ligne]. https://github.com/kljensen/biogrep

  131. Searls DB (2002) The language of genes. Nature 420(6912):211–217

    Article  CAS  PubMed  Google Scholar 

  132. Searls DB (1995) String variable grammar: a logic grammar formalism for DNA sequences. J Log Program 24(1–2):73–102

    Article  Google Scholar 

  133. Dong S, Searls DB (1994) Gene structure prediction by linguistic methods. Genomics 23:540–551

    Article  CAS  PubMed  Google Scholar 

  134. Grillo G et al (2003) PatSearch: a program for the detection of patterns and structural motifs in nucleotide sequences. Nucleic Acids Res 31(13):3608–3612. doi:10.1093/nar/gkg548

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  135. Overbeek R (2010) ScanForMatches [En ligne]. http://blog.theseed.org/servers/2010/07/scan-for-matches.html

  136. Belleannée C, Sallou O, Nicolas J (2012) Expressive pattern matching with Logol. Application to the modelling of -1 ribosomal frameshift events. JOBIM’2012, Rennes. pp 5–14. http://jobim2012.inria.fr/jobim_actes_2012_online.pdf

  137. Sallou O (2014) Logol [En ligne]. http://logol.genouest.org

  138. Ouyang S, Buell CR (2004) The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res 32(Database issue):D360–D363

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  139. Bousios A et al (2012) MASiVEdb: the Sirevirus Plant Retrotransposon Database. BMC Genomics 13(158)

    Google Scholar 

  140. Chen J et al (2013) P-MITE: a database for plant miniature inverted-repeat transposable elements. Nucleic Acids Res 42(Database issue):D1176–D1181. doi:10.1093/nar/gkt1000

    PubMed Central  PubMed  Google Scholar 

  141. Malde K et al (2006) RBR: library-less repeat detection for ESTs. Bioinformatics 22(18):2232–2236

    Article  CAS  PubMed  Google Scholar 

  142. Li R et al (2005) ReAS: recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun. PLoS Comput Biol 1(4):e43

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  143. You FM et al (2010) RJPrimers: unique transposable element insertion junction discovery and PCR primer design for marker development. Nucleic Acids Res 38(Suppl 2):W313–W320

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  144. Nakagome M et al (2014) Transposon Insertion Finder (TIF): a novel program for detection of de novo transpositions of transposable elements. BMC Bioinformatics 15:71. doi:10.1186/1471-2105-15-71

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacques Nicolas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Nicolas, J., Peterlongo, P., Tempel, S. (2016). Finding and Characterizing Repeats in Plant Genomes. In: Edwards, D. (eds) Plant Bioinformatics. Methods in Molecular Biology, vol 1374. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3167-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3167-5_17

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3166-8

  • Online ISBN: 978-1-4939-3167-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics