Abstract
This paper presents algorithms for flexibly identifying structural objects in nucleic acid sequences. These objects are palindromes, mirror repeats, pseudoknots and triple helices. We further explore here the idea of a model against which the words in a sequence are compared for finding these structural objects [17]. In the present case, models are words defined over the alphabet of nucleotides that have both direct and inverse occurrences in the sequence. Moreover, errors (substitutions, deletions and insertions) are allowed between a model and its inverse occurrences. Helix stems may therefore present bulges or interior loops, and mirror repeats need not be exact. Reasonably efficient performance comes from the fact that the parts composing the structures are kept separated until the end and that filtering for valid occurrences (occurrences that may form part of such a structure) can be done in O(n) time where n is the length of the sequence. The time complexity for the searching phase (that is, before the structural parts are put together at the end) of both algorithms presented here (one for palindromes and mirror repeats, the other for pseudoknots and triple helices) is then O(nk(e+1)(1+min d max -d min +1+e, k e ∣Σ∣ e )) where n is the length of the sequence, d max and d min are, respectively, the maximal and minimal length of a hairpin loop, k is either the maximum length k max of a model, is a fixed length or represents the maximum value of a range of lengths, e is the maximum number of errors allowed (substitutions, deletions and insertions) and ∣Σ∣ is the size of the alphabet of nucleotides.
Preview
Unable to display preview. Download preview PDF.
References
J. P. Abrahams, M. v. d. Berg, E. v. Batenburg, and C. Pleij. Prediction of RNA secondary structure, including pseudoknotting, by computer simulation. Comput. Appli. Biosci., 8:243–248, 1992.
B. Billoud, M. Kontic, and A. Viari. Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence databases. Nucleic Acids Res., 24:1395–1403, 1996.
D. Bouthinon, H. Soldano, and B. Billoud. Apprentissage d'un concept commun à un ensemble d'objets dont la description est hypothétique: application à la découverte de structures secondaires d'ARN. In 11émes Journés Françaises d'Apprentissage, 1996.
M. Brown and C. Wilson. RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. manuscript — University of California, Santa Cruz, Oct. 1995, 1995.
J.-H. Chen, S.-Y. Le, and J. V. Maizel. A procedure for RNA pseudoknot prediction. Comput. Appli. Biosci., 8:243–248, 1992.
D. J. Galas, M. Eggert, and M. S. Waterman. Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli. J. Mol. Biol., 186:117–128, 1985.
I. Tinoco Jr., P. W. Davis, C. C. Hardin, J. D. Puglisi, G. T. Walker, and J. Wyatt. RNA structures from A to Z. In Cold Spring Harbor Symposia on Quantitative Biology, volume LII, pages 135–146. Cold Spring Harbor Laboratory, 1987.
N. A. Kolchanov, I. I. Titov, I. E. Vlassova, and V. V. Vlassov. Chemical and computer probing of RNA structure. In W. E. Cohn and K. Moldave, editors, Progress in Nucleic Acid Research and Molecular Biology, pages 131–196. Academic Press, 1996.
M. Kontic. Palingol. Langage pour la description et la recherche de structures secondaires dans les séquences nucléotidiques, 1993. DEA d'Intelligence Artificielle, Université de Paris Nord.
F. Lefebvre. An optimized parsing algorithm well suited for RNA folding. In Proceedings First International Conference on Intelligent Systems for Molecular Biology, Cambridge, England, 1995.
B. Lewin. Genes V. Oxford University Press, 1994.
H. M. Martinez. An efficient method for finding repeats in molecular sequences. Nucleic Acids Res., 11:4629–4634, 1983.
H. M. Martinez. Detecting pseudoknots and other local base-pairing structures in RNA sequences. 183:306–317, 1990.
S. M. Murkin, V. I. Lyamichev, K. N. Druhlyak, V. N. Dobrynin, S. A. Filipov, and M. D. Frank-Kamenetskii. DNA H form requires a homopurine-homopyrimidine mirror repeat. Nature, 330:495–497, 1987.
E. W. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12:345–374, 1994.
C. W. A. Pleij and L. Bosch. RNA pseudoknots: structure, detection, and prediction. 180:289–303, 1989.
M.-F. Sagot, V. Escalier, A. Viari, and H. Soldano. Searching for repeated words in a text allowing for mismatches and gaps. pages 87–100, Viñas del Mar, Chili, 1995. Second South American Workshop on String Processing.
M.-F. Sagot and A. Viari. A double combinatorial approach to discovering patterns in biological sequences. In D. Hirschberg and G. Myers, editors, Combinatorial Pattern Matching, volume 1075 of Lecture Notes in Computer Science, pages 186–208. Springer-Verlag, 1996.
M.-F. Sagot, A. Viari, and H. Soldano. A distance-based block searching algorithm. pages 322–331, Cambridge, England, 1995. Third International Symposium on Intelligent Systems for Molecular Biology.
M.-F. Sagot, A. Viari, and H. Soldano. Multiple comparison: a peptide matching approach. In Z. Galil and E. Ukkonen, editors, Combinatorial Pattern Matching, volume 937 of Lecture Notes in Computer Science, pages 366–385. Springer-Verlag, 1995. to appear in Theoret. Comput. Sci.
Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjolander, R. C. Underwood, and D. Haussler. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res., 22:5112–5120, 1994.
D. Searls. The linguistics of DNA. American Scientist, 80:579–591, 1992.
M. S. Waterman. Consensus methods for folding single-stranded nucleic acids. In M. S. Waterman, editor, Mathematical Methods for DNA Sequences, pages 185–224. CRC Press, 1989.
S. Wu, U. Manber, and E. W. Myers. An O(NP) sequence comparison algorithm. Inf. Proc. Letters, 35:317–323, 1990.
M. Zuker and D. Sankoff. RNA secondary structures and their prediction. Bull. Math. Biol., 46:591–621, 1984.
M. Zuker and P. Stiegler. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res., 9:133–148, 1981.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sagot, MF., Viari, A. (1997). Flexible identification of structural objects in nucleic acid sequences: Palindromes, mirror repeats, pseudoknots and triple helices. In: Apostolico, A., Hein, J. (eds) Combinatorial Pattern Matching. CPM 1997. Lecture Notes in Computer Science, vol 1264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63220-4_62
Download citation
DOI: https://doi.org/10.1007/3-540-63220-4_62
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63220-7
Online ISBN: 978-3-540-69214-0
eBook Packages: Springer Book Archive