Sequence and Structural Analyses for Functional Non-coding RNAs

  • Yasubumi SakakibaraEmail author
  • Kengo Sato
Part of the Natural Computing Series book series (NCS)


Analysis and detection of functional RNAs are currently important topics in both molecular biology and bioinformatics research. Several computational methods based on stochastic context-free grammars (SCFGs) have been developed for modeling and analysing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNAs and are used for structural alignments of RNA sequences. Such stochastic models, however, are not sufficient to discriminate member sequences of an RNA family from non-members, and hence to detect non-coding RNA regions from genome sequences. Recently, the support vector machine (SVM) and kernel function techniques have been actively studied and proposed as a solution to various problems in bioinformatics. SVMs are trained from positive and negative samples and have strong, accurate discrimination abilities, and hence are more appropriate for the discrimination tasks. A few kernel functions that extend the string kernel to measure the similarity of two RNA sequences from the viewpoint of secondary structures have been proposed. In this article, we give an overview of recent progress in SCFG-based methods for RNA sequence analysis and novel kernel functions tailored to measure the similarity of two RNA sequences and developed for use with support vector machines (SVM) in discriminating members of an RNA family from non-members.


Support Vector Machine Structural Alignment tRNA Sequence String Kernel Typical Secondary Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akutsu T (2006) Recent advances in RNA secondary structure prediction with pseudoknots. Curr Bioinform 1:115–129 CrossRefGoogle Scholar
  2. 2.
    Cai L, Malmberg RL, Wu Y (2003) Stochastic modeling of RNA pseudoknotted structures: a grammatical approach. Bioinformatics 19(Suppl 1):i66–i73. CrossRefGoogle Scholar
  3. 3.
    Do CB, Woods DA, Batzoglou S (2006) CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 22:e90–e98 CrossRefGoogle Scholar
  4. 4.
    Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis. Cambridge University Press, Cambridge zbMATHGoogle Scholar
  5. 5.
    Eddy SR, Durbin R (1994) RNA sequence analysis using covariance models. Nucleic Acids Res 22:2079–2088 CrossRefGoogle Scholar
  6. 6.
    Eddy SR (2001) Non-coding RNA genes and the modern RNA world. Nat Rev Genet 2:919–929 CrossRefGoogle Scholar
  7. 7.
    Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res 33:D121–D124. CrossRefGoogle Scholar
  8. 8.
    Havgaard JH, LyngsøRB, Stormo GD, Gorodkin J (2005) Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 21:1815–1824 CrossRefGoogle Scholar
  9. 9.
    Hofacker IL (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31:3429–3431 CrossRefGoogle Scholar
  10. 10.
    Hofacker IL, Bernhart SHF, Stadler PF (2004) Alignment of RNA base pairing probability matrices. Bioinformatics 20:2222–2227 CrossRefGoogle Scholar
  11. 11.
    Holmes I (2005) Accelerated probabilistic inference of RNA structure evolution. BMC Bioinform 6:73 CrossRefGoogle Scholar
  12. 12.
    Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (2002) The human genome browser at UCSC. Genome Res 12:996–1006 Google Scholar
  13. 13.
    Kin T, Tsuda K, Asai K (2002) Marginalized kernels for RNA sequence data analysis. Genome Inform Ser Workshop Genome Inform 13:112–122 Google Scholar
  14. 14.
    Klein RJ, Eddy SR (2003) RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinform 4:44 CrossRefGoogle Scholar
  15. 15.
    Knudsen B, Hein J (1999) RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15:446–454 CrossRefGoogle Scholar
  16. 16.
    Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning, pp 282–289 Google Scholar
  17. 17.
    Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:419–444 zbMATHCrossRefGoogle Scholar
  18. 18.
    Mathews DH, Turner DH (2002) Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol 317:191–203 CrossRefGoogle Scholar
  19. 19.
    Matsuda D, Dreher TW (2004) The tRNA-like structure of turnip yellow mosaic virus RNA is a 3′-translational enhancer. Virology 321:36–46 CrossRefGoogle Scholar
  20. 20.
    Matsui H, Sato K, Sakakibara Y (2005) Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures. Bioinformatics 21:2611–2617 CrossRefGoogle Scholar
  21. 21.
    McCaskill JS (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29:1105–1119 CrossRefGoogle Scholar
  22. 22.
    Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D (2006) Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol 2:e33 CrossRefGoogle Scholar
  23. 23.
    Rivas E, Eddy SR (2000) The language of RNA: a formal grammar that includes pseudoknots. Bioinformatics 16:334–340 CrossRefGoogle Scholar
  24. 24.
    Rivas E, Eddy SR (2001) Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinform 2:8 CrossRefGoogle Scholar
  25. 25.
    Sakakibara Y (2003) Pair hidden Markov models on tree structures. Bioinformatics 19(Suppl 1):i232–i240. CrossRefMathSciNetGoogle Scholar
  26. 26.
    Sakakibara Y, Asai K, Sato K (2007) Stem kernels for RNA sequence analyses. In: 1st international conference on bioinformatics research and development (BIRD 2007). Lecture notes in bioinformatics, vol 4414. Springer, Berlin, pp 278–291 Google Scholar
  27. 27.
    Sakakibara Y, Brown M, Hughey R, Mian IS, Sjölander K, Underwood RC, Haussler D (1994) Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res 22:5112–5120 CrossRefGoogle Scholar
  28. 28.
    Sankoff D (1985) Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J Appl Math 45:810–825 zbMATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    Sato K, Sakakibara Y (2005) RNA secondary structural alignment with conditional random fields. Bioinformatics 21(Suppl 2):ii237–ii242 CrossRefGoogle Scholar
  30. 30.
    Schölkopf B, Tsuda K, Vert JP (2004) Kernel methods in computational biology. MIT Press, Cambridge Google Scholar
  31. 31.
    Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge Google Scholar
  32. 32.
    Tabei Y, Tsuda K, Kin T, Asai K (2006) SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments. Bioinformatics 22:1723–1729 CrossRefGoogle Scholar
  33. 33.
    Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680 CrossRefGoogle Scholar
  34. 34.
    Tsuda K, Kin T, Asai K (2002) Marginalized kernels for biological sequences. Bioinformatics 18(Suppl 1):S268–S275. Google Scholar
  35. 35.
    Uemura Y, Hasegawa A, Kobayashi S, Yokomori T (1999) Tree adjoining grammars for RNA structure prediction. Theor Comput Sci 210:277–303 zbMATHCrossRefMathSciNetGoogle Scholar
  36. 36.
    Washietl S, Hofacker IL, Lukasser M, Hüttenhofer A, Stadler PF (2005) Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat Biotechnol 23:1383–1390 CrossRefGoogle Scholar
  37. 37.
    Washietl S, Hofacker IL, Stadler PF (2005) Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci USA 102:2454–2459 CrossRefGoogle Scholar
  38. 38.
    Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–148 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  1. 1.Department of Biosciences and InformaticsKeio UniversityYokohamaJapan
  2. 2.Japan Biological Informatics ConsortiumTokyoJapan

Personalised recommendations