Pattern Discovery in RNA Secondary Structure Using Affix Trees

  • Giancarlo Mauri
  • Giulio Pavesi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2676)


We present an algorithm for finding common secondary structure motifs in a set of unaligned RNA sequences. The basic version of the algorithm takes as input a set of strings representing the secondary structure of the sequences, enumerates a set of candidate secondary structure patterns, and finally reports all those patterns that appear, possibly with variations, in all or most of the sequences of the set. By considering structural information only, the algorithm can be applied to cases where the input sequences do not present any significant similarity. However, sequence information can be added to the algorithm at different levels. Patterns describing RNA secondary structure elements present a peculiar symmetric layout that makes affix trees a suitable indexing structure that significantly accelerates the searching process, by permitting bidirectional search from the middle to the outside of patterns. In case the secondary structure of the input sequences is not available, we show how the algorithm can deal with the uncertainty deriving from prediction methods, or can predict the structure by itself on the fly while searching for patterns, again taking advantage of the information contained in the affix tree built for the sequences. Finally, we present some case studies where the algorithm was able to detect experimentally known RNA stem-loop motifs, either by using predicted structures, or by folding the sequences by itself.


Secondary Structure Input Sequence Loop Structure Lowercase Letter Internal Loop 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gesteland, R., Cech, T., Atkins, J., (eds.): The RNA World. Cold Spring Harbor Laboratory Press, New York (1999)Google Scholar
  2. 2.
    Simons, R., Grumberg-Magnago, M., (eds.): RNA Structure and Function. Cold Spring Harbor Laboratory Press, New York (1998)Google Scholar
  3. 3.
    Fox, G., Woese, C.: 5s rna secondary structure. Nature 256 (1975) 505–507CrossRefGoogle Scholar
  4. 4.
    Westhof, E., Auffinger, E., Gaspin, C.: Dna and rna structure prediction. In: DNA — Protein Sequence Analysis, Oxford (1996) 255–278Google Scholar
  5. 5.
    Stephan, W., Parsch, J., Braverman, J.: Comparative sequence analysis and patterns of covariation in rna secondary structures. Genetics 154 (2000) 909–921Google Scholar
  6. 6.
    Gorodkin, J., Heyer, L., Stormo, G.: Finding common sequence and structure motifs in a set of rna sequences. Nucleic Acids Research 25 (1997) 3724–3732CrossRefGoogle Scholar
  7. 7.
    Gorodkin, J., Stricklin, S., Stormo, G.: Discovering common stem-loop motifs in unaligned rna sequences. Nucleic Acids Research 29 (2001) 2135–2144CrossRefGoogle Scholar
  8. 8.
    Maass, M.: Linear bidirectional on-line construction of affix trees. Proc. of CPM 2000, Lecture Notes in Computer Science 1848 (2000) 320–334Google Scholar
  9. 9.
    Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)zbMATHGoogle Scholar
  10. 10.
    Marsan, L., Sagot, M.: Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site consensus identification. Journal of Computational Biology 7 (2000) 345–360CrossRefGoogle Scholar
  11. 11.
    Sagot, M.: Spelling approximate repeated or common motifs using a suffix tree. Lecture Notes in Computer Science 1380 (1998) 111–127CrossRefGoogle Scholar
  12. 12.
    Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in dna sequences. Proc. of ISMB’ 01, Bioinformatics 17 (2001) S207–S214Google Scholar
  13. 13.
    Hertz, G., Hartzell, G., Stormo, G.: Identification of consensus patterns in unaligned dna sequences known to be functionally related. Comput.Appl.Biosci. 6 (1990) 81–92Google Scholar
  14. 14.
    Hertz, G., Stormo, G.: Identifying dna and protein patterns with statistically significant alignment of multiple sequences. Bioinformatics 15 (1999) 563–577CrossRefGoogle Scholar
  15. 15.
    Zucker, M., Matthews, D.H., Turner, D.H.: Algorithms and thermodynamics for rna secondary structure prediction: a practical guide. In: RNA Biochemistry and Biotechnology, NATO ASI Series, Kluwer Academic Publishers (1999) 11–43Google Scholar
  16. 16.
    Hofacker, I., Fontana, W., Stadler, P., Bonhoeffer, S., Tacker, M., Schuster, P.: Fast folding and comparison of rna secondary structures. Monatshefte f Chemie 125 (1994) 167–188CrossRefGoogle Scholar
  17. 17.
    Wuchty, S., Fontana, W., Schuster, P.: Complete suboptimal folding of rna and the stability of secondary structures. Biopolymers 49 (1999) 145–165CrossRefGoogle Scholar
  18. 18.
    Ward, J.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58 (1963) 236–244CrossRefMathSciNetGoogle Scholar
  19. 19.
    Hentze, M., Kuhn, L.: Molecular control of vertebrate iron metabolism: mrna based regulatory circuits operated by iron, nitric oxide and oxidative stress. Proc. Natl. Acad. Sci. USA 93 (1996) 8175–8182CrossRefGoogle Scholar
  20. 20.
    Williams, A., Marzluff, W.: The sequence of the stem and flanking sequences at the 3’ end of histone mrna are critical determinants for the binding of the stem-loop binding protein. Nucleic Acids Research 23 (1996) 654–662CrossRefGoogle Scholar
  21. 21.
    Walter, A., Turner, D., Kim, J., Lyttle, M., Muller, P., Mathews, D., Zuker, M.: Coaxial stacking of helices enhances binding of oligoribonucleotides. PNAS 91 (1994) 9218–9222CrossRefGoogle Scholar
  22. 22.
    Mathews, D., Sabina, J., Zucker, M., Turner, D.: Expanded sequence dependence of thermodynamic parameters provides robust prediction of rna secondary structure. Journal of Molecular Biology 288 (1999) 911–940CrossRefGoogle Scholar
  23. 23.
    Pain, V.: Initiation of protein synthesis in eukaryotic cells. Eur. J. Biochem. 236 (1996) 747–771CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Giancarlo Mauri
    • 1
  • Giulio Pavesi
    • 1
  1. 1.Dept. of Computer Science, Systems and CommunicationUniversity of Milan-BicoccaMilanItaly

Personalised recommendations