Flexible identification of structural objects in nucleic acid sequences: Palindromes, mirror repeats, pseudoknots and triple helices

Sagot, Marie-France; Viari, Alain

doi:10.1007/3-540-63220-4_62

Marie-France Sagot^1,2 &
Alain Viari²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1264))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

249 Accesses
2 Citations

Abstract

This paper presents algorithms for flexibly identifying structural objects in nucleic acid sequences. These objects are palindromes, mirror repeats, pseudoknots and triple helices. We further explore here the idea of a model against which the words in a sequence are compared for finding these structural objects [17]. In the present case, models are words defined over the alphabet of nucleotides that have both direct and inverse occurrences in the sequence. Moreover, errors (substitutions, deletions and insertions) are allowed between a model and its inverse occurrences. Helix stems may therefore present bulges or interior loops, and mirror repeats need not be exact. Reasonably efficient performance comes from the fact that the parts composing the structures are kept separated until the end and that filtering for valid occurrences (occurrences that may form part of such a structure) can be done in O(n) time where n is the length of the sequence. The time complexity for the searching phase (that is, before the structural parts are put together at the end) of both algorithms presented here (one for palindromes and mirror repeats, the other for pseudoknots and triple helices) is then O(nk(e+1)(1+min d _max -d _min+1+e, k ^e ∣Σ∣_e)) where n is the length of the sequence, d _max and d _min are, respectively, the maximal and minimal length of a hairpin loop, k is either the maximum length k _max of a model, is a fixed length or represents the maximum value of a range of lengths, e is the maximum number of errors allowed (substitutions, deletions and insertions) and ∣Σ∣ is the size of the alphabet of nucleotides.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. P. Abrahams, M. v. d. Berg, E. v. Batenburg, and C. Pleij. Prediction of RNA secondary structure, including pseudoknotting, by computer simulation. Comput. Appli. Biosci., 8:243–248, 1992.
Google Scholar
B. Billoud, M. Kontic, and A. Viari. Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence databases. Nucleic Acids Res., 24:1395–1403, 1996.
Google Scholar
D. Bouthinon, H. Soldano, and B. Billoud. Apprentissage d'un concept commun à un ensemble d'objets dont la description est hypothétique: application à la découverte de structures secondaires d'ARN. In 11émes Journés Françaises d'Apprentissage, 1996.
Google Scholar
M. Brown and C. Wilson. RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. manuscript — University of California, Santa Cruz, Oct. 1995, 1995.
Google Scholar
J.-H. Chen, S.-Y. Le, and J. V. Maizel. A procedure for RNA pseudoknot prediction. Comput. Appli. Biosci., 8:243–248, 1992.
Google Scholar
D. J. Galas, M. Eggert, and M. S. Waterman. Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli. J. Mol. Biol., 186:117–128, 1985.
Google Scholar
I. Tinoco Jr., P. W. Davis, C. C. Hardin, J. D. Puglisi, G. T. Walker, and J. Wyatt. RNA structures from A to Z. In Cold Spring Harbor Symposia on Quantitative Biology, volume LII, pages 135–146. Cold Spring Harbor Laboratory, 1987.
Google Scholar
N. A. Kolchanov, I. I. Titov, I. E. Vlassova, and V. V. Vlassov. Chemical and computer probing of RNA structure. In W. E. Cohn and K. Moldave, editors, Progress in Nucleic Acid Research and Molecular Biology, pages 131–196. Academic Press, 1996.
Google Scholar
M. Kontic. Palingol. Langage pour la description et la recherche de structures secondaires dans les séquences nucléotidiques, 1993. DEA d'Intelligence Artificielle, Université de Paris Nord.
Google Scholar
F. Lefebvre. An optimized parsing algorithm well suited for RNA folding. In Proceedings First International Conference on Intelligent Systems for Molecular Biology, Cambridge, England, 1995.
Google Scholar
B. Lewin. Genes V. Oxford University Press, 1994.
Google Scholar
H. M. Martinez. An efficient method for finding repeats in molecular sequences. Nucleic Acids Res., 11:4629–4634, 1983.
Google Scholar
H. M. Martinez. Detecting pseudoknots and other local base-pairing structures in RNA sequences. 183:306–317, 1990.
Google Scholar
S. M. Murkin, V. I. Lyamichev, K. N. Druhlyak, V. N. Dobrynin, S. A. Filipov, and M. D. Frank-Kamenetskii. DNA H form requires a homopurine-homopyrimidine mirror repeat. Nature, 330:495–497, 1987.
Google Scholar
E. W. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12:345–374, 1994.
Google Scholar
C. W. A. Pleij and L. Bosch. RNA pseudoknots: structure, detection, and prediction. 180:289–303, 1989.
Google Scholar
M.-F. Sagot, V. Escalier, A. Viari, and H. Soldano. Searching for repeated words in a text allowing for mismatches and gaps. pages 87–100, Viñas del Mar, Chili, 1995. Second South American Workshop on String Processing.
Google Scholar
M.-F. Sagot and A. Viari. A double combinatorial approach to discovering patterns in biological sequences. In D. Hirschberg and G. Myers, editors, Combinatorial Pattern Matching, volume 1075 of Lecture Notes in Computer Science, pages 186–208. Springer-Verlag, 1996.
Google Scholar
M.-F. Sagot, A. Viari, and H. Soldano. A distance-based block searching algorithm. pages 322–331, Cambridge, England, 1995. Third International Symposium on Intelligent Systems for Molecular Biology.
Google Scholar
M.-F. Sagot, A. Viari, and H. Soldano. Multiple comparison: a peptide matching approach. In Z. Galil and E. Ukkonen, editors, Combinatorial Pattern Matching, volume 937 of Lecture Notes in Computer Science, pages 366–385. Springer-Verlag, 1995. to appear in Theoret. Comput. Sci.
Google Scholar
Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjolander, R. C. Underwood, and D. Haussler. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res., 22:5112–5120, 1994.
Google Scholar
D. Searls. The linguistics of DNA. American Scientist, 80:579–591, 1992.
Google Scholar
M. S. Waterman. Consensus methods for folding single-stranded nucleic acids. In M. S. Waterman, editor, Mathematical Methods for DNA Sequences, pages 185–224. CRC Press, 1989.
Google Scholar
S. Wu, U. Manber, and E. W. Myers. An O(NP) sequence comparison algorithm. Inf. Proc. Letters, 35:317–323, 1990.
Google Scholar
M. Zuker and D. Sankoff. RNA secondary structures and their prediction. Bull. Math. Biol., 46:591–621, 1984.
Google Scholar
M. Zuker and P. Stiegler. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res., 9:133–148, 1981.
Google Scholar

Download references

Author information

Authors and Affiliations

Institut Gaspard Monge, Université de Marne-la-Vallée, 2, rue de la Butte Verte, 93160, Noisy-le-Grand
Marie-France Sagot
Atelier de BioInformatique, Université de Paris 6, 12, rue Cuvier, 75005, Paris
Marie-France Sagot & Alain Viari

Authors

Marie-France Sagot
View author publications
You can also search for this author in PubMed Google Scholar
Alain Viari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alberto Apostolico Jotun Hein

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sagot, MF., Viari, A. (1997). Flexible identification of structural objects in nucleic acid sequences: Palindromes, mirror repeats, pseudoknots and triple helices. In: Apostolico, A., Hein, J. (eds) Combinatorial Pattern Matching. CPM 1997. Lecture Notes in Computer Science, vol 1264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63220-4_62

Download citation

DOI: https://doi.org/10.1007/3-540-63220-4_62
Published: 08 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63220-7
Online ISBN: 978-3-540-69214-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics