Skip to main content

Flexible identification of structural objects in nucleic acid sequences: Palindromes, mirror repeats, pseudoknots and triple helices

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1264))

Included in the following conference series:

Abstract

This paper presents algorithms for flexibly identifying structural objects in nucleic acid sequences. These objects are palindromes, mirror repeats, pseudoknots and triple helices. We further explore here the idea of a model against which the words in a sequence are compared for finding these structural objects [17]. In the present case, models are words defined over the alphabet of nucleotides that have both direct and inverse occurrences in the sequence. Moreover, errors (substitutions, deletions and insertions) are allowed between a model and its inverse occurrences. Helix stems may therefore present bulges or interior loops, and mirror repeats need not be exact. Reasonably efficient performance comes from the fact that the parts composing the structures are kept separated until the end and that filtering for valid occurrences (occurrences that may form part of such a structure) can be done in O(n) time where n is the length of the sequence. The time complexity for the searching phase (that is, before the structural parts are put together at the end) of both algorithms presented here (one for palindromes and mirror repeats, the other for pseudoknots and triple helices) is then O(nk(e+1)(1+min d max -d min +1+e, k eΣ e )) where n is the length of the sequence, d max and d min are, respectively, the maximal and minimal length of a hairpin loop, k is either the maximum length k max of a model, is a fixed length or represents the maximum value of a range of lengths, e is the maximum number of errors allowed (substitutions, deletions and insertions) and ∣Σ∣ is the size of the alphabet of nucleotides.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. P. Abrahams, M. v. d. Berg, E. v. Batenburg, and C. Pleij. Prediction of RNA secondary structure, including pseudoknotting, by computer simulation. Comput. Appli. Biosci., 8:243–248, 1992.

    Google Scholar 

  2. B. Billoud, M. Kontic, and A. Viari. Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence databases. Nucleic Acids Res., 24:1395–1403, 1996.

    Google Scholar 

  3. D. Bouthinon, H. Soldano, and B. Billoud. Apprentissage d'un concept commun à un ensemble d'objets dont la description est hypothétique: application à la découverte de structures secondaires d'ARN. In 11émes Journés Françaises d'Apprentissage, 1996.

    Google Scholar 

  4. M. Brown and C. Wilson. RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. manuscript — University of California, Santa Cruz, Oct. 1995, 1995.

    Google Scholar 

  5. J.-H. Chen, S.-Y. Le, and J. V. Maizel. A procedure for RNA pseudoknot prediction. Comput. Appli. Biosci., 8:243–248, 1992.

    Google Scholar 

  6. D. J. Galas, M. Eggert, and M. S. Waterman. Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli. J. Mol. Biol., 186:117–128, 1985.

    Google Scholar 

  7. I. Tinoco Jr., P. W. Davis, C. C. Hardin, J. D. Puglisi, G. T. Walker, and J. Wyatt. RNA structures from A to Z. In Cold Spring Harbor Symposia on Quantitative Biology, volume LII, pages 135–146. Cold Spring Harbor Laboratory, 1987.

    Google Scholar 

  8. N. A. Kolchanov, I. I. Titov, I. E. Vlassova, and V. V. Vlassov. Chemical and computer probing of RNA structure. In W. E. Cohn and K. Moldave, editors, Progress in Nucleic Acid Research and Molecular Biology, pages 131–196. Academic Press, 1996.

    Google Scholar 

  9. M. Kontic. Palingol. Langage pour la description et la recherche de structures secondaires dans les séquences nucléotidiques, 1993. DEA d'Intelligence Artificielle, Université de Paris Nord.

    Google Scholar 

  10. F. Lefebvre. An optimized parsing algorithm well suited for RNA folding. In Proceedings First International Conference on Intelligent Systems for Molecular Biology, Cambridge, England, 1995.

    Google Scholar 

  11. B. Lewin. Genes V. Oxford University Press, 1994.

    Google Scholar 

  12. H. M. Martinez. An efficient method for finding repeats in molecular sequences. Nucleic Acids Res., 11:4629–4634, 1983.

    Google Scholar 

  13. H. M. Martinez. Detecting pseudoknots and other local base-pairing structures in RNA sequences. 183:306–317, 1990.

    Google Scholar 

  14. S. M. Murkin, V. I. Lyamichev, K. N. Druhlyak, V. N. Dobrynin, S. A. Filipov, and M. D. Frank-Kamenetskii. DNA H form requires a homopurine-homopyrimidine mirror repeat. Nature, 330:495–497, 1987.

    Google Scholar 

  15. E. W. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12:345–374, 1994.

    Google Scholar 

  16. C. W. A. Pleij and L. Bosch. RNA pseudoknots: structure, detection, and prediction. 180:289–303, 1989.

    Google Scholar 

  17. M.-F. Sagot, V. Escalier, A. Viari, and H. Soldano. Searching for repeated words in a text allowing for mismatches and gaps. pages 87–100, Viñas del Mar, Chili, 1995. Second South American Workshop on String Processing.

    Google Scholar 

  18. M.-F. Sagot and A. Viari. A double combinatorial approach to discovering patterns in biological sequences. In D. Hirschberg and G. Myers, editors, Combinatorial Pattern Matching, volume 1075 of Lecture Notes in Computer Science, pages 186–208. Springer-Verlag, 1996.

    Google Scholar 

  19. M.-F. Sagot, A. Viari, and H. Soldano. A distance-based block searching algorithm. pages 322–331, Cambridge, England, 1995. Third International Symposium on Intelligent Systems for Molecular Biology.

    Google Scholar 

  20. M.-F. Sagot, A. Viari, and H. Soldano. Multiple comparison: a peptide matching approach. In Z. Galil and E. Ukkonen, editors, Combinatorial Pattern Matching, volume 937 of Lecture Notes in Computer Science, pages 366–385. Springer-Verlag, 1995. to appear in Theoret. Comput. Sci.

    Google Scholar 

  21. Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjolander, R. C. Underwood, and D. Haussler. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res., 22:5112–5120, 1994.

    Google Scholar 

  22. D. Searls. The linguistics of DNA. American Scientist, 80:579–591, 1992.

    Google Scholar 

  23. M. S. Waterman. Consensus methods for folding single-stranded nucleic acids. In M. S. Waterman, editor, Mathematical Methods for DNA Sequences, pages 185–224. CRC Press, 1989.

    Google Scholar 

  24. S. Wu, U. Manber, and E. W. Myers. An O(NP) sequence comparison algorithm. Inf. Proc. Letters, 35:317–323, 1990.

    Google Scholar 

  25. M. Zuker and D. Sankoff. RNA secondary structures and their prediction. Bull. Math. Biol., 46:591–621, 1984.

    Google Scholar 

  26. M. Zuker and P. Stiegler. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res., 9:133–148, 1981.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alberto Apostolico Jotun Hein

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sagot, MF., Viari, A. (1997). Flexible identification of structural objects in nucleic acid sequences: Palindromes, mirror repeats, pseudoknots and triple helices. In: Apostolico, A., Hein, J. (eds) Combinatorial Pattern Matching. CPM 1997. Lecture Notes in Computer Science, vol 1264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63220-4_62

Download citation

  • DOI: https://doi.org/10.1007/3-540-63220-4_62

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63220-7

  • Online ISBN: 978-3-540-69214-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics