Computational Resources for Studying Recoding

  • Andrew E. Firth
  • Michaël Bekaert
  • Pavel V. Baranov
Part of the Nucleic Acids and Molecular Biology book series (NUCLEIC, volume 24)


The rapid growth in the quantity of available sequence data has made necessary the development of efficient computational tools for its analysis. Substantial progress has been made in the development of tools for the identification and prediction of genes that are expressed via standard decoding. However, since recoded genes embrace only a minority of all genes and since their prediction requires different approaches, they are frequently neglected and as a result are often mis-annotated in the public databases or even left undetected during the annotation process. This chapter aims to describe available computer tools designed for the identification and analysis of recoded genes and public databases that collect information related to recoding. In addition, we also discuss how standard tools for sequence analysis can be used for these purposes.


SECIS Element Ribosomal Frameshifting Waterman Algorithm Ungapped Alignment Stop Codon Readthrough 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We are grateful to Drs. Sergi Castellano and Kyungsook Han for careful reading of the manuscript and useful comments. This work was supported by funds from Science Foundation Ireland.


  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410PubMedGoogle Scholar
  2. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman D J (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res 25:3389–3402PubMedCrossRefGoogle Scholar
  3. Athanasiadis A, Rich A, Maas S (2004) Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol 2:e391PubMedCrossRefGoogle Scholar
  4. Baranov PV, Fayet O, Hendrix RW, Atkins JF (2006) Recoding in bacteriophages and bacterial IS elements. Trends Genet 22:174−181PubMedCrossRefGoogle Scholar
  5. Baranov PV, Gesteland RF, Atkins JF (2002a) Recoding: translational bifurcations in gene expression. Gene 286:187–201Google Scholar
  6. Baranov PV, Gesteland RF, Atkins JF (2002b) Release factor 2 frameshifting sites in different bacteria. EMBO Rep 3:373–377Google Scholar
  7. Baranov PV, Gurvich OL, Fayet O, Prere MF, Miller WA, Gesteland RF, Atkins JF, Giddings MC (2001) RECODE: a database of frameshifting, bypassing and codon redefinition utilized for gene expression. Nucl Acids Res 29:264–267PubMedCrossRefGoogle Scholar
  8. Baranov PV, Gurvich OL, Hammer AW, Gesteland RF, Atkins JF (2003) Recode 2003. Nucl Acids Res 31:87–89PubMedCrossRefGoogle Scholar
  9. Bekaert M, Atkins JF, Baranov PV (2006) ARFA: a program for annotating bacterial release factor genes, including prediction of programmed ribosomal frameshifting. Bioinformatics 22:2463–2465PubMedCrossRefGoogle Scholar
  10. Bekaert M, Bidou L, Denise A, Duchateau-Nguyen G, Forest JP, Froidevaux C, Hatin I, Rousset JP, Termier M (2003) Towards a computational model for -1 eukaryotic frameshifting sites. Bioinformatics 19:327–335Bekaert M, Firth AE, Zhang Y, Gladyshev VN, Atkins JF, Baranov PV (2009) Recode-2: new design, new search tools, andmany more genes. Nucl Acids Res e-pul ahead of printPubMedCrossRefGoogle Scholar
  11. Bekaert M, Ivanov IP, Atkins JF, Baranov PV (2008) Ornithine decarboxylase antizyme finder (OAF): fast and reliable detection of antizymes with frameshifts in mRNAs. BMC Bioinformatics 9:178PubMedCrossRefGoogle Scholar
  12. Belcourt MF, Farabaugh PJ (1990) Ribosomal frameshifting in the yeast retrotransposon Ty: tRNAs induce slippage on a 7 nucleotide minimal site. Cell 62:339–352PubMedCrossRefGoogle Scholar
  13. Belew AT, Hepler NL, Jacobs JL, Dinman JD (2008) PRFdb: a database of computationally predicted eukaryotic programmed −1 ribosomal frameshift signals. BMC Genomics9:339PubMedCrossRefGoogle Scholar
  14. Brierley I, Pennell S (2001) Structure and function of the stimulatory RNAs involved in programmed eukaryotic-1 ribosomal frameshifting. Cold Spr Harb Symp Quant Biol 66:233–248CrossRefGoogle Scholar
  15. Byun Y, Han K (2006) PseudoViewer: web application and web service for visualizing RNA pseudoknots and secondary structures. Nucl Acids Res 34:W416–W422PubMedCrossRefGoogle Scholar
  16. Byun Y, Moon S, Han K (2007) A general computational model for predicting ribosomal frameshifts in genome sequences. Comput Biol Med 37:1796–1801PubMedCrossRefGoogle Scholar
  17. Castellano S, Gladyshev VN, Guigo R, Berry MJ (2008) SelenoDB 1.0 : a database of selenoprotein genes, proteins and SECIS elements. Nucl Acids Res 36:D332–338PubMedCrossRefGoogle Scholar
  18. Castellano S, Morozova N, Morey M, Berry MJ, Serras F, Corominas M, Guigo R (2001) In silico identification of novel selenoproteins in the Drosophila melanogaster genome. EMBO Rep 2:697–702PubMedCrossRefGoogle Scholar
  19. Castellano S, Novoselov SV, Kryukov GV, Lescure A, Blanco E, Krol A, Gladyshev VN, Guigo R (2004) Reconsidering the evolution of eukaryotic selenoproteins: a novel nonmammalian family with scattered phylogenetic distribution. EMBO Rep 5:71–77PubMedCrossRefGoogle Scholar
  20. Chen SH, Habib G, Yang CY, Gu ZW, Lee BR, Weng SA, Silberman SR, Cai SJ, Deslypere JP, Rosseneu M et al. (1987) Apolipoprotein B-48 is the product of a messenger RNA with an organ-specific in-frame stop codon. Science 238:363–366PubMedCrossRefGoogle Scholar
  21. Chung BY, Miller WA, Atkins JF, Firth AE (2008) An overlapping essential gene in the Potyviridae. Proc Nat Acad Sci USA 105:5897–5902PubMedCrossRefGoogle Scholar
  22. Chung WY, Wadhawan S, Szklarczyk R, Pond SK, Nekrutenko A (2007) A first look at ARFome: dual-coding genes in mammalian genomes. PLoS Comput Biol 3:e91PubMedCrossRefGoogle Scholar
  23. Dsouza M, Larsen N, Overbeek R (1997) Searching for patterns in genomic data. Trends Genet 13:497–498PubMedCrossRefGoogle Scholar
  24. Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14:755–763PubMedCrossRefGoogle Scholar
  25. Eddy SR, Durbin R (1994) RNA sequence analysis using covariance models. Nucl Acids Res 22:2079–2088PubMedCrossRefGoogle Scholar
  26. Firth AE, Brown CM (2005) Detecting overlapping coding sequences with pairwise alignments. Bioinformatics 21:282–292PubMedCrossRefGoogle Scholar
  27. Firth AE, Brown CM (2006) Detecting overlapping coding sequences in virus genomes. BMC Bioinformatics 7:75Firth AE, Chung BY, Fleeton MN, Atkins JF (2008) Discovery of frameshifting in Alphavirus 6–K resolves a 20-year enigma. Virol J 5:108PubMedCrossRefGoogle Scholar
  28. Freyhult EK, Bollback JP, Gardner PP (2007) Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. Genome Res 17: 117–125PubMedCrossRefGoogle Scholar
  29. Gardner PP, Giegerich R (2004) A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics 5:140PubMedCrossRefGoogle Scholar
  30. Gruber AR, Bernhart SH, Hofacker IL, Washietl S (2008a) Strategies for measuring evolutionary conservation of RNA secondary structures. BMC Bioinformatics 9:122Google Scholar
  31. Gruber AR, Lorenz R, Bernhart SH, Neubock R, Hofacker IL (2008b) The vienna RNA websuite. Nucl Acids Res 36:W70–74Google Scholar
  32. Gurvich OL, Baranov PV, Zhou J, Hammer AW, Gesteland RF, Atkins JF (2003) Sequences that direct significant levels of frameshifting are frequent in coding regions of Escherichia coli. EMBO J 22:5941–5950PubMedCrossRefGoogle Scholar
  33. Hammell AB, Taylor RC, Peltz SW, Dinman JD (1999) Identification of putative programmed -1 ribosomal frameshift signals in large DNA databases. Genome Res 9:417–427PubMedGoogle Scholar
  34. Han K, Byun Y (2003) PSEUDOVIEWER2: Visualization of RNA pseudoknots of any type. Nucl Acids Res 31:3432–3440PubMedCrossRefGoogle Scholar
  35. Han K, Lee Y, Kim W (2002) PseudoViewer: automatic visualization of RNA pseudoknots. Bioinformatics 18(Suppl 1):S321–S328PubMedCrossRefGoogle Scholar
  36. Harrison PM, Carriero N, Liu Y, Gerstein M (2003) A “polyORFomic” analysis of prokaryote genomes using disabled-homology filtering reveals conserved but undiscovered short ORFs. J Mol Biol 333:885–892PubMedCrossRefGoogle Scholar
  37. Havgaard JH, Lyngso RB, Gorodkin J (2005) The FOLDALIGN web server for pairwise structural RNA alignment and mutual motif search. Nucl Acids Res 33:W650–653PubMedCrossRefGoogle Scholar
  38. Herr AJ, Atkins JF, Gesteland RF (2000) Coupling of open reading frames by translational bypassing. Annu Rev Biochem 69:343–372PubMedCrossRefGoogle Scholar
  39. Hofacker IL (2003) Vienna RNA secondary structure server. Nucl Acids Res 31:3429–3431PubMedCrossRefGoogle Scholar
  40. Hofacker IL, Fekete M, Stadler PF (2002) Secondary structure prediction for aligned RNA sequences. J Mol Biol 319:1059–1066PubMedCrossRefGoogle Scholar
  41. Ivanov IP, Atkins JF (2007) Ribosomal frameshifting in decoding antizyme mRNAs from yeast and protists to humans: close to 300 cases reveal remarkable diversity despite underlying conservation. Nucl Acids Res 35:1842–1858PubMedCrossRefGoogle Scholar
  42. Jacobs JL, Belew AT, Rakauskaite R, Dinman JD (2007) Identification of functional, endogenous programmed -1 ribosomal frameshift signals in the genome of Saccharomyces cerevisiae. Nucl Acids Res 35:165–174PubMedCrossRefGoogle Scholar
  43. Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F, Kober KM, Miller W, Pedersen JS, Pohl A, Raney BJ, Rhead B, Rosenbloom KR, Smith KE, Stanke M, Thakkapallayil A, Trumbower H, Wang T, Zweig AS, Haussler D, Kent WJ (2008) The UCSC Genome Browser Database: 2008 update. Nucl Acids Res 36:D773–779PubMedCrossRefGoogle Scholar
  44. Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES (2003) Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241–254PubMedCrossRefGoogle Scholar
  45. Kim DD, Kim TT, Walsh T, Kobayashi Y, Matise TC, Buyske S, Gabriel A (2004) Widespread RNA editing of embedded alu elements in the human transcriptome. Genome Res 14:1719–1725PubMedCrossRefGoogle Scholar
  46. Klein RJ, Eddy SR (2003) RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics 4:44PubMedCrossRefGoogle Scholar
  47. Knudsen B, Hein J (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucl Acids Res 31:3423–3428PubMedCrossRefGoogle Scholar
  48. Korf I, Yandell M, Bedell J (2003) BLAST: O’Reilly and Associates IncGoogle Scholar
  49. Krogh A, Brown M, Mian IS, Sjolander K, Haussler D (1994) Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 235:1501–1531PubMedCrossRefGoogle Scholar
  50. Kryukov GV, Castellano S, Novoselov SV, Lobanov AV, Zehtab O, Guigo R, Gladyshev VN (2003) Characterization of mammalian selenoproteomes. Science 300:1439–1443PubMedCrossRefGoogle Scholar
  51. Kryukov GV, Kryukov VM, Gladyshev VN (1999) New mammalian selenocysteine-containing proteins identified with an algorithm that searches for selenocysteine insertion sequence elements. J Biol Chem 274:33888–33897PubMedCrossRefGoogle Scholar
  52. Lescure A, Gautheret D, Carbon P, Krol A (1999) Novel selenoproteins identified in silico and in vivo by using a conserved RNA structural motif. J Biol Chem 274:38147–38154PubMedCrossRefGoogle Scholar
  53. Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D, Olshansky M, Rechavi G, Jantsch MF (2004) Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nature Biotech 22:1001–1005Li JB, Levanon EY, Yoon JK, Aach J, Xie B, Leproust E, Zhang K, Gao Y, Church GM (2009) Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science 324:1210–1213CrossRefGoogle Scholar
  54. Lin MF, Carlson JW, Crosby MA, Matthews BB., Yu C, Park S, Wan KH, Schroeder AJ, Gramates LS, St Pierre SE, Roark M, Wiley KL Jr, Kulathinal RJ, Zhang P, Myrick KV, Antone JV, Celniker SE, Gelbart WM, Kellis M (2007) Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res 17:1823–1836PubMedCrossRefGoogle Scholar
  55. Lin MF, Deoras AN, Rasmussen MD, Kellis M (2008) Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes. PLoS Computat Biol 4:e1000067CrossRefGoogle Scholar
  56. Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227:1435–1441Macke TJ, Ecker DJ, Gutell RR, Gautheret D, Case DA, Sampath R (2001) RNAMotif, an RNA secondary structure definition and search algorithm.Nucleic Acids Res 29:4724–4735PubMedCrossRefGoogle Scholar
  57. Mathews DH, Turner DH (2002) Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol 317:191–203PubMedCrossRefGoogle Scholar
  58. Matsufuji S, Matsufuji T, Miyazaki Y, Murakami Y, Atkins JF, Gesteland RF, Hayashi S (1995) Autoregulatory frameshifting in decoding mammalian ornithine decarboxylase antizyme. Cell 80:51–60PubMedCrossRefGoogle Scholar
  59. McCaskill JS (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29:1105–1119PubMedCrossRefGoogle Scholar
  60. Moon S, Byun Yand Han K (2007) FSDB: a frameshift signal database. Computat Biol Chem 31:298–302CrossRefGoogle Scholar
  61. Moon S, Byun Y, Kim HJ, Jeong S, Han K (2004) Predicting genes expressed via -1 and +1 frameshifts. Nucl Acids Res 32:4884–4892PubMedCrossRefGoogle Scholar
  62. Namy O, Duchateau-Nguyen G, Hatin I, Hermann-Le Denmat S, Termier M, Rousset JP (2003) Identification of stop codon readthrough genes in Saccharomyces cerevisiae. Nucl Acids Res 31:2289–2296PubMedCrossRefGoogle Scholar
  63. Nawrocki EP, Eddy SR (2007) Query-dependent banding (QDB) for faster RNA similarity searches. PLoS Computat Biol 3:e56CrossRefGoogle Scholar
  64. Nekrutenko A, Wadhawan S, Goetting-Minesky P, Makova KD (2005) Oscillating evolution of a mammalian locus with overlapping reading frames: an XLalphas/ALEX relay. PLoS Genetics 1:e18PubMedCrossRefGoogle Scholar
  65. Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D (2006) Identification and classification of conserved RNA secondary structures in the human genome. PLoS Computat Biol 2:e33CrossRefGoogle Scholar
  66. Pedersen JS, Meyer IM, Forsberg R, Simmonds P, Hein J (2004) A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucl Acids Res 32:4925–4936PubMedCrossRefGoogle Scholar
  67. Reeder J, Giegerich R (2004) Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics 5:104PubMedCrossRefGoogle Scholar
  68. Reeder J, Reeder J, Giegerich R (2007a) Locomotif: from graphical motif description to RNA motif search. Bioinformatics 23:i392–400Google Scholar
  69. Reeder J, Steffen P, Giegerich R (2007b) pknotsRG: RNA pseudoknot folding including near-optimal structures and sliding windows. Nucl Acids Res 35:W320–324Google Scholar
  70. Rivas E, Eddy SR (1999) A dynamic programming algorithm for RNA structure prediction including pseudoknots. J Mol Biol 285:2053–2068PubMedCrossRefGoogle Scholar
  71. Rivas E, Eddy SR (2000) Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics 16:583–605PubMedCrossRefGoogle Scholar
  72. Romano P (2008) Automation of in-silico data analysis processes through workflow management systems. Briefings Bioinformat 9:57–68CrossRefGoogle Scholar
  73. Ruan J, Stormo GD, Zhang W (2004) ILM: a web server for predicting RNA secondary structures with pseudoknots. Nucl Acids Res 32:W146–149PubMedCrossRefGoogle Scholar
  74. Sankoff D (1985) Simultaneous solution of the RNA folding, alignment and protosequence problems. Siam J Appl Math 45:810–825CrossRefGoogle Scholar
  75. Schuster P, Fontana W, Stadler PF, Hofacker IL (1994) From sequences to shapes and back: a case study in RNA secondary structures. Proc Royal Soc London B 255:279–284CrossRefGoogle Scholar
  76. Shah AA., Giddings MC, Parvaz JB, Gesteland RF, Atkins JF, Ivanov IP (2002) Computational identification of putative programmed translational frameshift sites. Bioinformatics 18:1046–1053PubMedCrossRefGoogle Scholar
  77. Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M (2006) ISfinder: the reference centre for bacterial insertion sequences. Nucl Acids Res 34:D32–36PubMedCrossRefGoogle Scholar
  78. Skuzeski JM, Nichols LM, Gesteland RF, Atkins JF (1991) The signal for a leaky UAG stop codon in several plant viruses includes the two downstream codons. J Mol Biol 218:365–373PubMedCrossRefGoogle Scholar
  79. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197PubMedCrossRefGoogle Scholar
  80. Strabo, Hamilton HC, Falconer W (1854) The geography of Strabo. H. G. Bohn, LondonGoogle Scholar
  81. Theis C, Reeder J, Giegerich R (2008) KnotInFrame: prediction of -1 ribosomal frameshift events. Nucl Acids Res 36:6013–6020Google Scholar
  82. Touzet H, Perriquet O (2004) CARNAC: folding families of related RNAs. Nucl Acids Res 32:W142–W145PubMedCrossRefGoogle Scholar
  83. van Batenburg FH, Gultyaev AP, Pleij CW (2001) PseudoBase: structural information on RNA pseudoknots. Nucl Acids Res 29:194–195PubMedCrossRefGoogle Scholar
  84. Washietl S, Hofacker IL (2004) Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. J Mol Biol 342:19–30PubMedCrossRefGoogle Scholar
  85. Washietl S, Hofacker IL, Lukasser M, Huttenhofer A, Stadler PF (2005a) Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nature Biotech 23:1383–1390Google Scholar
  86. Washietl S, Hofacker IL, Stadler PF (2005b) Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci USA 102:2454–2459Google Scholar
  87. Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R (2007) Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. PLoS Computat Biol 3:e65CrossRefGoogle Scholar
  88. Wills NM, Moore B, Hammer A, Gesteland RF, Atkins JF (2006) A functional -1 ribosomal frameshift signal in the human paraneoplastic Ma3 gene. J Biol Chem 281:7082–7088PubMedCrossRefGoogle Scholar
  89. Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucl Acids Res 31:3406–3415PubMedCrossRefGoogle Scholar
  90. Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucl Acids Res 9:133–148PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Andrew E. Firth
    • 2
  • Michaël Bekaert
    • 3
  • Pavel V. Baranov
    • 1
  1. 1.Biochemistry DepartmentUniversity College CorkCorkIreland
  2. 2.Biosciences InstituteUniversity College CorkCorkIreland
  3. 3.School of Biology and Environmental ScienceUniversity College DublinDublinIreland

Personalised recommendations