Abstract
Covariance models are very effective for finding new members of non-coding RNA sequence families in genomic data. However, the computation burden of applying CM-based search algorithms can be prohibitive. When annotating the genome of a newly sequenced organism it is usually desired to search the sequence data using a large number of ncRNA families. Computational burden can be reduced if the families are clustered into statistically similar models and a single cluster-average representative model produced. The database is then searched with the representative model for each cluster at a relatively low detection threshold. The output of this pre-filtered database is then processed with the individual family members of the cluster. A base-pair conflict metric has previously been proposed for use in model clustering. In this work an alternative metric using standard alignment algorithms and a special mixed primary-secondary structure scoring matrix is proposed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Liu, T., Schmidt, B.: Parallel RNA Secondary Structure Prediction Using Context-free Grammars. Concurrency and Computation: Practice and Experience 17, 1669–1685 (2005)
Weinberg, Z., Ruzzo, W.: Faster Genome Annotation of Non-coding RNA Families Without Loss of Accuracy. In: Proceedings of the Eighth Annual International Conference on Research in Computational Molecular Biology, pp. 243–251 (2004)
Weinberg, Z., Ruzzo, W.: Exploiting Conserved Structure for Faster Annotation of Non-coding RNAs Without Loss of Accuracy. Bioinformatics 20, 1334–1341 (2004)
Weinberg, Z., Ruzzo, W.: Sequence-based Heuristics for Faster Annotation of Non-coding RNA Families. Bioinformatics 22, 35–39 (2006)
Nawrocki, E., Eddy, S.: Query-dependent Banding (QDB) for Faster RNA Similarity Searches. PLoS Computational Biology 3, e56 (2007)
Smith, J.: RNA Search with Decision Trees and Partial Covariance Models. IEEE Transactions on Computational Biology and Bioinformatics 6, 517–527 (2009)
Smith, J.: Computational Intelligence Method to Find Generic Non-coding RNA Search Models. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 198–202 (2010)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press (1998)
Eddy, S., Durbin, R.: RNA Sequence Analysis Using Covariance Models. Nucleic Acids Research 22, 2079–2088 (1995)
Eddy, S.: Hidden Markov Models. Current Opinion Structural Biology 6, 361–365 (1996)
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic Local Alignment Search Tool. Journal of Molecular Biology 205(3), 403–410 (1990)
Altshcul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Research 25, 3389–3402 (1997)
Gardner, P., Daub, J., Tate, J., Nawrocki, E., Kolbe, D., Lindgreen, S., Wilkinson, A., Finn, R., Griffiths-Jones, S., Eddy, S., Bateman, A.: Rfam: Updates to the RNA Families Database. Nucleic Acids Research 37, D136–D140 (2009)
Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S., Bateman, A.: Rfam: Annotating Non-coding RNAs in Complete Genomes. Nucleic Acids Research 33, D121–D124 (2005)
Rfam: RNA Families Database of Alignments and Covariance Models, version 9.1 (December 2008), http://rfam.janelia.org
Eddy, S.: Infernal user’s guide, version 1.0.2 (2009), http://infernal.janelia.org
Nawrocki, E., Kolbe, D., Eddy, S.: Infernal 1.0: Inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009)
Rivas, E., Eddy, S.: Secondary Structure Alone is Generally Not Statistically Significant for the Detection of Noncoding RNAs. Bioinformatics 6, 583–605 (2000)
Jiang, W., Wiese, K.: Combined Covariance Model for Non-Coding RNA Gene Finding. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2011), doi:10.1109/CIBCB.2011.5948474
Xu, R., Wunsch, D.: Clustering. IEEE Press Series on Computational Intelligence. Wiley (2009)
Lee, D., Lee, J.: Support Vector Clustering Toolbox, ver. 1.0, http://sites.google.com/site/daewonlee/research/svctoolbox
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Smith, J.A. (2013). Non-coding RNA Covariance Model Combination Using Mixed Primary-Secondary Structure Alignment. In: Peterson, L.E., Masulli, F., Russo, G. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2012. Lecture Notes in Computer Science(), vol 7845. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38342-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-38342-7_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38341-0
Online ISBN: 978-3-642-38342-7
eBook Packages: Computer ScienceComputer Science (R0)