PicXAAR: Efficient structural alignment of multiple RNA sequences using a greedy approach
 2.5k Downloads
 8 Citations
Abstract
Background
Accurate and efficient structural alignment of noncoding RNAs (ncRNAs) has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms. While the Sankoff style structural alignment algorithms cannot efficiently serve for multiple sequences, mostly progressive schemes are used to reduce the complexity. However, this idea tends to propagate the early stage errors throughout the entire process, thereby degrading the quality of the final alignment. For multiple protein sequence alignment, we have recently proposed PicXAA which constructs an accurate alignment in a nonprogressive fashion.
Results
Here, we propose PicXAAR as an extension to PicXAA for greedy structural alignment of ncRNAs. PicXAAR efficiently grasps both folding information within each sequence and local similarities between sequences. It uses a set of probabilistic consistency transformations to improve the posterior basepairing and base alignment probabilities using the information of all sequences in the alignment. Using a graphbased scheme, we greedily build up the structural alignment from sequence regions with high basepairing and base alignment probabilities.
Conclusions
Several experiments on datasets with different characteristics confirm that PicXAAR is one of the fastest algorithms for structural alignment of multiple RNAs and it consistently yields accurate alignment results, especially for datasets with locally similar sequences. PicXAAR source code is freely available at: http://www.ece.tamu.edu/~bjyoon/picxaa/.
Keywords
Structural Alignment Pairwise Alignment Matthews Correlation Coefficient Alignment Graph Alignment ProbabilityBackground
Increasing number of newly discovered noncoding RNAs (ncRNAs) with huge functional variety has revealed the substantial role that RNAs play in living organisms [1, 2, 3]. The function of ncRNAs is largely ascribed to their folding structure, which is often better conserved than their primary sequence. Therefore, it is important to consider this structural aspect in the comparative analysis of RNAs, and an accurate structural alignment algorithm can be helpful in decoding the function of ncRNAs and discovering novel ncRNA candidates.
To accurately align RNA sequences, one should take their secondary structure similarities into account, in addition to their sequence homologies. Simultaneous inference of both the consensus secondary structure and the alignment of RNA sequences is a computationally demanding task. Sankoff [4] proposed an algorithm for structural alignment of a set of unaligned RNA sequences. However, the high complexity of O(L^{3}^{ N } ) in time and O(L^{2}^{ N } ) in memory for N sequences of length L makes this algorithm impractical even for a small number of sequences. Hence, several studies have proposed various approximations to the Sankoff algorithm [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]. Algorithms such as Foldalign [5, 6, 7], Dynalign [8, 9], and Stemloc [10] employ several heuristics to impose constraints on the size or shape of substructures, thereby, reducing the search space. Murlet [12], RAF [13], PARTS [14], STRAL [15], LocARNA [16], CentroidAlign [17], and PMcomp [18] exploit probabilistic approaches by implementing basepairing probabilities in a restricted Sankoffstyle framework or employing the NeedlemanWunsch algorithm with structural scores. Although these variants of Sankoff’s algorithm significantly reduce the time and memory complexities, they still cannot directly find the structural alignment of multiple sequences. Instead, these algorithms build up the multiple sequence alignment (MSA) by progressively combining pairwise structural alignments along a guide tree.
In addition to these Sankoffstyle algorithms, several studies have recently investigated fast techniques to find the common structure of long RNA sequences. For example, MXSCARNA [20] progressively computes the pairwise structural alignment of a pair of stem candidates obtained from the basepairing probability matrices. RCoffee [21, 22] uses a library of input alignments to progressively compute the alignment by incorporating secondary structure information. LARA [23] and MARNA [24] employ two different heuristic approaches to compute all pairwise structure alignments and pass this information, as a primary library, to TCOFFEE [25], a progressive alignment technique. MAFFTxinsi [26] uses a fourway consistency objective function to progressively build a structural alignment by combining pairwise alignments predicted by an external program.
Despite its computational efficiency, the progressive structural alignment approach tends to propagate the errors made in the early stages throughout the entire process, which may significantly degrade the quality of the final alignment. Even with the incorporation of additional heuristics, such as iterative refinement and consistency transformation, the fundamental shortcoming of progressive technique remains. A number of nonprogressive structural alignment schemes have been proposed to address this problem [27, 28, 29].
RNASampler [27] predicts the common structure of multiple RNA sequences by probabilistically sampling aligned stems based on the stem conservation score. MASTER [28], another sampling approach, iteratively improves both sequence alignment and structure prediction by making small local changes using simulated annealing. StemlocAMA [29] employs sequence annealing to construct the multiple RNA alignment using the base alignment probabilities estimated by the Sankoff algorithm with structural considerations.
Recently, several studies have highlighted the effectiveness of the Maximum Expected Accuracy (MEA) approach for aligning biological sequences [30, 31, 32, 33, 34, 35, 36] and for predicting the consensus secondary structure of RNAs [12, 17, 20, 29, 37, 38, 39]. MEA tries to maximize the expected number of correctly aligned bases. This is especially useful for handling sequence analysis problems when the probability of the optimal alignment is low.
In this paper, we introduce PicXAAR (p robabilistic max imum a ccuracy a lignment of R NA sequences), a novel nonprogressive algorithm that efficiently finds the maximum expected accuracy structural alignment of multiple RNA sequences. PicXAAR greedily builds up the structural alignment from sequence regions with high local similarities and high basepairing probabilities. To simultaneously consider both the local similarities among sequences and their conserved secondary structural information, we incorporate three types of probabilistic consistency transformations. These transformations modify both the intersequence pairwise base alignment probabilities and the intrasequence basepairing probabilities using the information from other sequences in the alignment. For a fast and accurate construction of the alignment, we propose an efficient twostep graphbased alignment scheme. In the first step, we greedily insert the most probable alignments of basepairs with high basepairing probability. In this way, we build up the skeleton of the alignment using the structure information of the RNA sequences. Next, we successively insert the most probable pairwise base alignments into the multiple structural alignment, as in PicXAA [34], a multiple protein sequence alignment algorithm that we have recently proposed. This step can effectively grasp the local sequence similarities among the RNAs. Finally, we use a discriminative refinement step to improve the overall alignment quality in sequence regions with low alignment probability. Extensive experiments on several local alignment benchmarks clearly show that PicXAAR is one of the fastest algorithms for structural alignment of multiple RNAs and it consistently yields accurate results in comparison with several wellknown structural RNA alignment algorithms.
Methods
PicXAAR extends the idea of PicXAA, the multiple sequence alignment algorithm that maximizes the expected number of correctly aligned bases, to the structural alignment of RNA sequences. PicXAAR uses a greedy approach that builds up the alignment from sequence regions with high local similarities and high basepairing probabilities. Thus, it avoids the propagation of early stage alignment errors, usually observed in progressive techniques. The algorithm employs a probabilistic framework by utilizing both the intersequence base alignment probabilities and the intrasequence basepairing probabilities. The following subsections provide an overview of the proposed algorithm.
Preliminary
To align m RNA sequences in a set S = {s_{1}, ⋯ , s_{ m } }, we need to compute the following probabilities.

P_{ a }(x_{ i } ~ y_{ j } x, y): For each pair sequence x, y ∈ S, P_{ a }( x_{ i } ~ y_{ j } x, y) is the probability that bases x_{ i } ∈ x and y_{ j } ∈ y are matched in the true (unknown) alignment. We can compute the posterior pairwise alignment probabilities using the pair hidden Markov model (PHMM) [40].

P_{ b }(x_{ i } ~ x_{ j } x): For each sequence x ∈ S, P_{ b }( x_{ i } ~ x_{ j } x) is the probability that two bases x_{ i }, x_{ j } ∈ x form a basepair. We can exploit different approaches, such as the McCaskill algorithm [41] or the CONTRAfold model [39], to compute the basepairing probabilities.
We use these probabilities in the following probabilistic structural alignment scheme.
Consistency transformation
Here, we use three types of probabilistic consistency transformations to modify the pairwise base alignment probabilities and basepairing probabilities using the information from other sequences in the alignment. This modification makes these posterior probabilities suitable for constructing a consistent and accurate structural alignment.
Intersequence probabilistic consistency transformation for base alignment probabilities
In the first consistency transformation, we incorporate the information from other sequences in the alignment to improve the estimation of pairwise base alignment probabilities. The motivation of this transformation is that all the pairwise alignments induced from a given MSA should be consistent with each other. This means that if position x_{ i } (∈ x) aligns with position z_{ k } (∈ z) in the x – z alignment, and if z_{ k } aligns with position y_{ j } (∈ y) in the z – y alignment, then x_{ i } must align with y_{ j } in the x – y alignment. We can thus utilize the “intermediate” sequence z to improve the x – y alignment by making it consistent with the alignments x – z and z – y.
where ā is the optimal pairwise alignment of x and z.
This transformation improves the consistency of the x – y alignment with other pairwise alignments in the MSA, by incorporating information only from homologous sequences. In this way, we can obtain more probabilistically consistent estimate of the posterior alignment probabilities, which helps enhance the quality of the final MSA.
Intrasequence probabilistic consistency transformation for basepairing probabilities
In the second transformation, we incorporate the pairwise alignment information to the structural formation of the sequences. This transformation exploits this observation that the basepairings in each sequence should be consistent with the pairwise base alignments induced from a given structural alignment. This means that if positions y_{ j } ~ y_{ j }_{′} form a basepair in y, where x_{ i } (∈ x) aligns with y_{ j } (∈ y) and x_{ i }_{′}, (∈ x) aligns with y_{ j }_{′} (∈ y), then x_{ i } ~ x_{ i }_{′} must form a basepair in x. Thus, we can utilize the base alignment information to improve the estimation of the x_{ i } ~ x_{ i }_{′} basepairing probability.
where α ∈ [0, 1] is a weight parameter between the target sequence x and rest of sequences. This transformation assumes that all sequences y ∈ S – {x} are homologous to the given sequence x. However, when we have a set of distantly related sequences in S, this assumption does not necessarily hold. To address this problem, here, we modify this transformation by improving the basepairing probability using the information just from the closely related sequences to the given sequence x. Therefore, like the intersequence consistency transformation, we explicitly consider the relative significance of each sequence y ∈ S – {x} in improving the basepairing probabilities in x.
Probabilistic fourway consistency transformation for base alignment probabilities
In the third consistency transformation, we incorporate the structural information to the pairwise alignments. This transformation is based on the same observation that motivated the intrasequence consistency transformation; that is, the pairwise base alignments induced from a given structural alignment should be consistent with the basepairings in the corresponding pair sequence. However, this time, we utilize the basepairing information to improve the x – y alignment.
where β ∈ [0, 1] is a weight parameter.
Using the sparsity of alignment and pairing probability matrices, we can efficiently implement these three transformations successively. The intersequence consistency transformation has a complexity of O(µ^{2}Lm^{3}), the intrasequence transformation has a complexity of O(µ^{3}Lm^{2}), and the fourway consistency transformation has a computational complexity of O(µ^{4}Lm^{2}), where µ is the average number of nonzero elements per row (typically 1 ≤ µ ≤ 5 in real examples), m is the number of sequences, and L is the length of each sequence.
Constructing the structural alignment
To find a valid structural alignment of a set of RNA sequences, we propose a twostep greedy approach that builds up the alignment starting from those regions with higher basepairing and base alignment probabilities. The proposed greedy scheme extends the idea of PicXAA [34] to multiple RNA alignments. In PicXAA, we construct the multiple protein sequence alignment by successively inserting the most probable pairwise residue alignment into the final alignment. In the proposed algorithm, we add another step before the greedy graph construction step of PicXAA to better incorporate the secondary structure information in RNAs. This twostep alignment construction approach, along with intrasequence consistency transformation and fourway consistency transformation, described in the previous subsection, helps PicXAAR to effectively integrate both sequence and structural similarities to construct the final alignment. The proposed structural alignment approach is described in the following.
The greedy alignment approach we proposed in PicXAA [34] is conceptually similar to the one used in sequence annealing algorithms [29, 35, 36]. However, it should be noted that unlike sequence annealing, which greedily merges pairs of columns, we always add a single pairwise base alignment at a time, based on the consistencytransformed posterior alignment probabilities.
We represent the structural alignment as a directed acyclic graph G = (V, E) where, V is the set of vertices and E is the set of directed edges. Each vertex c^{(}^{ i }^{)} ∈ V corresponds to a column in the final alignment, and each directed edge e = (c^{(}^{ i }^{)}, c^{(}^{ j }^{)}) ∈ E implies that column c^{(}^{ i }^{)} precedes column c^{(}^{ j }^{)} in the given alignment. Each column c^{(}^{ i }^{)} ∈ V consists of positions from different sequences that will appear in the same column in the final alignment.
When inserting a new pairwise base alignment, we should consider the following requirements to obtain a legitimate multiple RNA alignment:

(Avoid Cycles) The alignment graph G should remain acyclic.

(LeftRight Compatibility) In the first greedy step where we use structural information, we should consider leftright compatibility. That is, for any paired columns (c, c′), if column c appears in the left part of the stem in the final structure, then for each base x_{ i } ∈ c that pairs with some x_{i′} ∈ c′ of the same sequence x, we should have i <i′.
Thus, while we build up the alignment graph, we satisfy the structural constraints and alignment constraints by verifying whether the new inserted pairwise base alignment keeps the graph acyclic and leftright compatible.
The twostep alignment construction approach is as follows:
Step 1Structural skeleton construction
Upon inserting a new pair p* = (x_{ i } , y_{ j } ) to G, three scenarios may occur: (1) New column addition; (2) Extension of an existing column; or (3) Merging of two columns. The detailed description of the procedures needed for each case can be found in [34]. Later in this section, we provide a summary of those procedures. By successively inserting the most probable alignment for confident basepairs, we construct the skeleton of the alignment enriched by structural information. Next, we complete this skeleton by greedily inserting highly probable base alignments.
Step 2Inserting highly probable local alignments
In this step, we update the skeleton alignment obtained in the previous step by successively inserting the most probable pairwise base alignments into the multiple structural alignment, as in PicXAA [34]. Thus, we sort all remaining pairwise alignments (x_{ i } , y_{ j } ) according to their transformed alignment probability Open image in new window in an ordered set A. We greedily build up G by repeatedly picking the most probable pair in A, which is not processed yet, provided that it is compatible with the current alignment. Again, insertion of any pair p* = (x_{ i } , y_{ j } ) to G will result in one of the scenarios of new column addition, extension of an existing column, or merging of two columns.
 1.
New column addition: We insert a new compatible vertex c* = {x_{ i } , y_{ j } } in G if neither x_{ i } nor y_{ j } belongs to some existing column in G . Figure 1B illustrates this process.
 2.
Extending an existing column: If only one of the bases in p*, let say x_{ i } , belongs to some vertex c ∈ V, we should add the other base y_{ j } to the same vertex c. Figure 1C illustrates this process.
 3.
Merging two vertices: When x_{ i } ∈ c _{1} and y_{ j } ∈ c _{2} belong to two different vertices c _{1}, c _{2} ∈ V, we merge the vertices c _{1} and c _{2}. Figure 1D illustrates this process.
After updating the graph as described above, we prune G to avoid redundant edges, thereby improving the computational efficiency of the construction process.
Discriminative refinement
 1.
Find S_{ x } ⊂ S, the set of similar sequences to x using the kmeans clustering.
 2.
Align x with the profile of sequences in S_{ x }.
 3.
Perform the profileprofile alignment of Open image in new window and S – S_{ x }.
This refinement strategy takes advantage of both the intrafamily similarity as well as the interfamily similarity, thereby improving the alignment quality in low similarity regions without breaking the confidently aligned bases.
Results and discussion
We use four different benchmark datasets: BRAliBase 2.1 [43], Murlet [12], BraliSub [44], and LocalExtR [44] to assess the performance of PicXAAR on different alignment conditions. The first two are general datasets not specially designed for local RNA alignment testing while the last two datasets are designed to verify the alignment accuracy for locally similar RNAs.
We compared PicXAAR with several wellknown RNA sequence alignment algorithms:
ProbConsRNA 1.10 [30], MXSCARNA 2.1 [20], CentroidAlign [17], and MAFFTxinsi 6.717 [26]. Among these techniques, ProbConsRNA uses only the sequence level information while the others take advantage of structural information. We picked these methods as they are among the fastest structural RNA aligners which yield high accuracy. There exists several other aligners such as RAF 1.00 [13], Murlet [12], StemlocAMA [29], LARA 1.3.2 [23], MLocARNA [16], and RCoffee [21], which have much higher complexity than MAFFTxinsi (in some cases they are near 60 times slower) while their accuracy is usually worse or at least comparable to MAFFTxinsi. Thus, the most complex algorithm that we compare our algorithm with will be the stateoftheart technique, MAFFTxinsi.
All the experiments have been performed on a 2.2GHz Intel Core2Duo system with 4GB memory. On all datasets we use two measurements to evaluate the performance of each alignment scheme: (1) sumofpairs score (SPS), which represents the percentage of correctly aligned bases; (2) structure conservation index (SCI) [45] that measures the degree of conservation of the consensus secondary structure for a multiple alignment. The SCI score is defined as Open image in new window where E_{ A } is the minimum free energy of the consensus MSA as computed by RNAalifod [46] and Ē is the average minimum free energy of all single sequences in the alignment as computed by RNAfold [47].
where true positive (TP) indicates the number of correctly predicted basepairs, true negative (TN) is the number of basepairs correctly predicted as unpaired, false negative (FN) is the number of not predicted true basepairs, and false positive (FP) is the number of incorrectly predicted basepairs.
In each table the total computational time for each algorithm is also reported in seconds.
Throughout the experiment we use the parameter setting of α = 0.4, β = 0.1, and T_{ b } = 0.5. These parameters are optimized manually using small datasets. Besides, we use McCaskill algorithm [41] to compute the basepairing probabilities and RNAalifold [46] to find the induced consensus structure of the computed alignment.
Results on BRAliBase 2.1
First, we evaluated the accuracy of PicXAAR using the BRAliBase 2.1 alignment benchmark. Wilm et al.[43] has developed BRAliBase 2.1 based on handcurated seed alignments of 36 RNA families taken from Rfam 7.0 database [48]. BRAliBase 2.1 contains in total 18,990 aligned sets of sequences each consists of 2, 3, 5, 7, 10, or 15 sequences (categorized into k2, k3, k5, k7, k10, and k15 reference sets) with average pairwise sequence identities ranging from 20% to 95%.
Performance evaluation on BRAliBase 2.1
Method  k2  k3  k5  k7  k10  k15  TIME 

SPS/SCI  SPS/SCI  SPS/SCI  SPS/SCI  SPS/SCI  SPS/SCI  
PicXAAR  84.27 / 85.86  86.59 / 83.35  88.78 / 83.20  90.04 / 81.72  90.97 / 79.95  92.17 / 79.73  6502 
ProbConsRNA  83.58 / 82.46  85.46 / 76.54  87.90 / 75.85  88.99 / 74.91  89.90 / 73.25  90.76 / 71.92  1444 
MXSCARNA  85.02 / 90.67  86.57 / 85.56  88.43 / 83.44  89.40 / 80.89  90.17 / 78.34  91.26 / 77.18  6024 
CentroidAlign  85.55 / 88.64  87.06 / 83.77  88.93 / 82.40  89.99 / 81.23  90.96 / 80.22  91.65 / 79.34  6443 
MAFFTxinsi  85.66 / 90.77  87.76 / 87.11  90.27 / 86.70  91.36 / 85.70  92.26 / 84.73  93.22 / 85.38  12386 
Results on BraliSub and LocExtR
The BraliBase 2.1 benchmark is not designed for local alignment testing and has reference alignments with just up to 15 sequences. Thus, Wang et al.[44] designed two types of datasets to verify the potential of RNA sequence aligners in dealing with local similarities in the alignment set: (1) BraliSub, the subsets of BraliBase 2.1 with high variability (containing 232 reference alignments); (2) LocalExtR, an extension of BraliBase 2.1 consisting total of 90 largescale reference alignments categorized into k20, k40, k60, and k80 reference sets receptively with 20, 40, 60, and 80 sequences in each alignment.
Performance evaluation on BraliSub
Method  k5  k7  k10  k15  TIME 

SPS/SCI  SPS/SCI  SPS/SCI  SPS/SCI  
PicXAAR  73.90 / 51.39  75.06 / 42.37  74.02 / 35.75  75.43 / 31.29  101 
ProbConsRNA  70.59 / 34.94  70.18 / 28.45  68.73 / 24.03  66.53 / 18.29  35 
MXSCARNA  70.77 / 46.30  69.93 / 35.95  68.58 / 27.91  69.75 / 17.79  84 
CentroidAlign  74.23 / 47.26  74.39 / 39.13  74.51 / 35.59  72.92 / 29.14  106 
MAFFTxinsi  78.28 / 57.60  78.56 / 52.10  78.48 / 44.75  79.23 / 38.79  261 
Performance evaluation on LocExtR
Method  k20  k40  k60  k80  TIME 

SPS/SCI  SPS/SCI  SPS/SCI  SPS/SCI  
PicXAAR  71.46 / 17.43  77.52 / 16.08  80.19 / 11.00  82.51 / 10.73  999 
ProbConsRNA  64.97 / 10.13  69.08 / 8.12  72.11 / 5.80  74.46 / 6.87  676 
MXSCARNA  65.52 / 9.67  68.30 / 8.44  69.45 / 9.15  71.16 / 8.93  662 
CentroidAlign  71.68 / 18.63  74.48 / 15.56  77.55 / 11.90  79.32 / 10.07  1359 
MAFFTxinsi  77.02 / 26.30  80.48 / 20.84  81.96 / 16.70  83.52 / 14.00  3791 
These results confirm that PicXAAR can efficiently yield an accurate structural alignment for a set of large number of locally similar RNAs.
Results on Murlet dataset
Performance evaluation on Murlet dataset
Method  SPS  SCI  SEN  PPV  MCC  TIME 

PicXAAR  77.90  48.15  66.08  72.71  68.29  139 
ProbConsRNA  76.26  37.47  56.79  78.12  65.10  40 
MXSCARNA  74.67  44.28  64.06  74.58  68.37  120 
CentroidAlign  77.99  47.80  63.08  74.88  67.48  146 
MAFFTxinsi  78.72  52.94  67.04  74.56  69.64  307 
Computational complexity analysis
Conclusions
In this paper, we proposed PicXAAR, a probabilistic structural RNA alignment technique based on a greedy algorithm. Using a set of probabilistic consistency transformations, including a novel intrasequence consistency transformation, we incorporate the folding and alignment information of all sequences to enhance both the posterior basepairing and base alignment probabilities. We utilize these enhanced probabilities as the building blocks of the twostep greedy scheme which builds up the alignment starting from sequence regions with high local similarity and high basepairing probability. As shown in several experiments, PicXAAR can efficiently yield highly accurate structural alignment of ncRNAs. This performance is more vivid for datasets consisting sequences with local similarities and low pairwise identities. To the best of our knowledge, PicXAAR is the fastest structural alignment algorithm after MXSCARNA among all the current RNA aligners while it significantly outperforms MXSCARNA on local datasets like BraliSub and LocExtR. High speed implementation of PicXAAR as well as its accuracy makes it a practical tool for structural alignment of large number of ncRNAs with low sequence identity which is very helpful for novel ncRNA prediction.
Notes
Acknowledgements
This work was supported in part by Texas A&M faculty startup fund.
This article has been published as part of BMC Bioinformatics Volume 12 Supplement 1, 2011: Selected articles from the Ninth Asia Pacific Bioinformatics Conference (APBC 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/14712105/12?issue=S1.
References
 1.Eddy SR: Noncoding RNA genes and the modern RNA world. Nat. Rev. Genet. 2001, 2: 919–929. 10.1038/35103511CrossRefPubMedGoogle Scholar
 2.Storz G: An expanding universe of noncoding RNAs. Science 2002, 296: 1260–1263. 10.1126/science.1072249CrossRefPubMedGoogle Scholar
 3.Costa FF: Noncoding RNAs: lost in translation? Gene 2007, 386: 1–10. 10.1016/j.gene.2006.09.028CrossRefPubMedGoogle Scholar
 4.Sankoff D: Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems. SIAM Journal on Applied Mathematics 1985, 45(5):810–825. 10.1137/0145048CrossRefGoogle Scholar
 5.Gorodkin J, Stricklin SL, Stormo GD: Discovering common stemloop motifs in unaligned RNA sequences. Nucleic Acids Res. 2001, 29: 2135–2144. 10.1093/nar/29.10.2135PubMedCentralCrossRefPubMedGoogle Scholar
 6.Havgaard JH, Lyngso RB, Stormo GD, Gorodkin J: Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 2005, 21: 1815–1824. 10.1093/bioinformatics/bti279CrossRefPubMedGoogle Scholar
 7.Havgaard JH, Torarinsson E, Gorodkin J: Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix. PLoS Comput. Biol. 2007, 3: 1896–1908. 10.1371/journal.pcbi.0030193CrossRefPubMedGoogle Scholar
 8.Mathews DH, Turner DH: Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol. 2002, 317: 191–203. 10.1006/jmbi.2001.5351CrossRefPubMedGoogle Scholar
 9.Mathews DH: Predicting a set of minimal free energy RNA secondary structures common to two sequences. Bioinformatics 2005, 21: 2246–2253. 10.1093/bioinformatics/bti349CrossRefPubMedGoogle Scholar
 10.Holmes I: Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics 2005, 6: 73. 10.1186/14712105673PubMedCentralCrossRefPubMedGoogle Scholar
 11.Dowell RD, Eddy SR: Evaluation of several lightweight stochastic contextfree grammars for RNA secondary structure prediction. BMC Bioinformatics 2004, 5: 71. 10.1186/14712105571PubMedCentralCrossRefPubMedGoogle Scholar
 12.Kiryu H, Tabei Y, Kin T, Asai K: Murlet: a practical multiple alignment tool for structural RNA sequences. Bioinformatics 2007, 23: 1588–1598. 10.1093/bioinformatics/btm146CrossRefPubMedGoogle Scholar
 13.Do CB, Foo CS, Batzoglou S: A maxmargin model for efficient simultaneous alignment and folding of RNA sequences. Bioinformatics 2008, 24: 68–76. 10.1093/bioinformatics/btn177CrossRefGoogle Scholar
 14.Harmanci AO, Sharma G, Mathews DH: PARTS: probabilistic alignment for RNA joinT secondary structure prediction. Nucleic Acids Res. 2008, 36: 2406–2417. 10.1093/nar/gkn043PubMedCentralCrossRefPubMedGoogle Scholar
 15.Dalli D, Wilm A, Mainz I, Steger G: STRAL: progressive alignment of noncoding RNA using base pairing probability vectors in quadratic time. Bioinformatics 2006, 22: 1593–1599. 10.1093/bioinformatics/btl142CrossRefPubMedGoogle Scholar
 16.Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R: Inferring noncoding RNA families and classes by means of genomescale structurebased clustering. PLoS Comput. Biol 2007, 3: e65. 10.1371/journal.pcbi.0030065PubMedCentralCrossRefPubMedGoogle Scholar
 17.Hamada M, Sato K, Kiryu H, Mituyama T, Asai K: CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sumofpairs score. Bioinformatics 2009, 25: 3236–3243. 10.1093/bioinformatics/btp580CrossRefPubMedGoogle Scholar
 18.Hofacker IL, Bernhart SH, Stadler PF: Alignment of RNA base pairing probability matrices. Bioinformatics 2004, 20: 2222–2227. 10.1093/bioinformatics/bth229CrossRefPubMedGoogle Scholar
 19.Anwar M, Nguyen T, Turcotte M: Identification of consensus RNA secondary structures using suffix arrays. BMC Bioinformatics 2006, 7: 244. 10.1186/147121057244PubMedCentralCrossRefPubMedGoogle Scholar
 20.Tabei Y, Kiryu H, Kin T, Asai K: A fast structural multiple alignment method for long RNA sequences. BMC Bioinformatics 2008, 9: 33. 10.1186/14712105933PubMedCentralCrossRefPubMedGoogle Scholar
 21.Wilm A, Higgins DG, Notredame C: RCoffee: a method for multiple alignment of noncoding RNA. Nucleic Acids Res. 2008, 36: e52. 10.1093/nar/gkn174PubMedCentralCrossRefPubMedGoogle Scholar
 22.Moretti S, Wilm A, Higgins DG, Xenarios I, Notredame C: RCoffee: a web server for accurately aligning noncoding RNA sequences. Nucleic Acids Res. 2008, 36: W10–13. 10.1093/nar/gkn278PubMedCentralCrossRefPubMedGoogle Scholar
 23.Bauer M, Klau GW, Reinert K: Accurate multiple sequencestructure alignment of RNA sequences using combinatorial optimization. BMC Bioinformatics 2007, 8: 271. 10.1186/147121058271PubMedCentralCrossRefPubMedGoogle Scholar
 24.Siebert S, Backofen R: MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons. Bioinformatics 2005, 21: 3352–3359. 10.1093/bioinformatics/bti550CrossRefPubMedGoogle Scholar
 25.Notredame C, Higgins DG, Heringa J: TCoffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000, 302: 205–217. 10.1006/jmbi.2000.4042CrossRefPubMedGoogle Scholar
 26.Katoh K, Toh H: Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFTbased framework. BMC Bioinformatics 2008, 9: 212. 10.1186/147121059212PubMedCentralCrossRefPubMedGoogle Scholar
 27.Xu X, Ji Y, Stormo GD: RNA Sampler: a new sampling based algorithm for common RNA secondary structure prediction and structural alignment. Bioinformatics 2007, 23: 1883–1891. 10.1093/bioinformatics/btm272CrossRefPubMedGoogle Scholar
 28.Lindgreen S, Gardner PP, Krogh A: MASTR: multiple alignment and structure prediction of noncoding RNAs using simulated annealing. Bioinformatics 2007, 23: 3304–3311. 10.1093/bioinformatics/btm525CrossRefPubMedGoogle Scholar
 29.Bradley RK, Pachter L, Holmes I: Specific alignment of structured RNA: stochastic grammars and sequence annealing. Bioinformatics 2008, 24: 2677–2683. 10.1093/bioinformatics/btn495PubMedCentralCrossRefPubMedGoogle Scholar
 30.Do CB, Mahabhashyam MS, Brudno M, Batzoglou S: ProbCons: Probabilistic consistencybased multiple sequence alignment. Genome Res. 2005, 15: 330–340. 10.1101/gr.2821705PubMedCentralCrossRefPubMedGoogle Scholar
 31.Roshan U, Livesay DR: Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 2006, 22: 2715–2721. 10.1093/bioinformatics/btl472CrossRefPubMedGoogle Scholar
 32.Paten B, Herrero J, Beal K, Birney E: Sequence progressive alignment, a framework for practical largescale probabilistic consistency alignment. Bioinformatics 2009, 25: 295–301. 10.1093/bioinformatics/btn630CrossRefPubMedGoogle Scholar
 33.Do C, Gross S, Batzoglou S: CONTRAlign: Discriminative Training for Protein Sequence Alignment. Proceedings of the Tenth Annual International Conference on Computational Molecular Biology (RECOMB): 2–5 April 2006; Venice, Italy 2006, 160–174.Google Scholar
 34.Sahraeian SM, Yoon BJ: PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences. Nucleic Acids Res. 2010, 38: 4917–4928. 10.1093/nar/gkq255PubMedCentralCrossRefPubMedGoogle Scholar
 35.Schwartz AS, Pachter L: Multiple alignment by sequence annealing. Bioinformatics 2007, 23: e24–29. 10.1093/bioinformatics/btl311CrossRefPubMedGoogle Scholar
 36.Bradley RK, Roberts A, Smoot M, Juvekar S, Do J, Dewey C, Holmes I, Pachter L: Fast statistical alignment. PLoS Comput. Biol. 2009, 5: e1000392. 10.1371/journal.pcbi.1000392PubMedCentralCrossRefPubMedGoogle Scholar
 37.Lu ZJ, Gloor JW, Mathews DH: Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA 2009, 15: 1805–1813. 10.1261/rna.1643609PubMedCentralCrossRefPubMedGoogle Scholar
 38.Kiryu H, Kin T, Asai K: Robust prediction of consensus secondary structures using averaged base pairing probability matrices. Bioinformatics 2007, 23: 434–441. 10.1093/bioinformatics/btl636CrossRefPubMedGoogle Scholar
 39.Do CB, Woods DA, Batzoglou S: CONTRAfold: RNA secondary structure prediction without physicsbased models. Bioinformatics 2006, 22: e90–98. 10.1093/bioinformatics/btl246CrossRefPubMedGoogle Scholar
 40.Durbin R, Eddy SR, Krogh A, Mitchison G: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press; 1998.CrossRefGoogle Scholar
 41.McCaskill JS: The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 1990, 29: 1105–1119. 10.1002/bip.360290621CrossRefPubMedGoogle Scholar
 42.Hamada M, Sato K, Kiryu H, Mituyama T, Asai K: Predictions of RNA secondary structure by combining homologous sequence information. Bioinformatics 2009, 25: i330–338. 10.1093/bioinformatics/btp228PubMedCentralCrossRefPubMedGoogle Scholar
 43.Wilm A, Mainz I, Steger G: An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms Mol Biol 2006, 1: 19. 10.1186/17487188119PubMedCentralCrossRefPubMedGoogle Scholar
 44.Wang S, Gutell RR, Miranker DP: Biclustering as a method for RNA local multiple sequence alignment. Bioinformatics 2007, 23: 3289–3296. 10.1093/bioinformatics/btm485PubMedCentralCrossRefPubMedGoogle Scholar
 45.Washietl S, Hofacker IL, Stadler PF: Fast and reliable prediction of noncoding RNAs. Proc. Natl. Acad. Sci. U.S.A. 2005, 102: 2454–2459. 10.1073/pnas.0409169102PubMedCentralCrossRefPubMedGoogle Scholar
 46.Hofacker IL, Fekete M, Stadler PF: Secondary structure prediction for aligned RNA sequences. J. Mol. Biol. 2002, 319: 1059–1066. 10.1016/S00222836(02)00308XCrossRefPubMedGoogle Scholar
 47.Hofacker IL: Vienna RNA secondary structure server. Nucleic Acids Res. 2003, 31: 3429–3431. 10.1093/nar/gkg599PubMedCentralCrossRefPubMedGoogle Scholar
 48.GriffithsJones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A: Rfam: annotating noncoding RNAs in complete genomes. Nucleic Acids Res. 2005, 33: D121–124. 10.1093/nar/gki081PubMedCentralCrossRefPubMedGoogle Scholar
Copyright information
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.