Abstract
Most of the current practice of pattern matching tools is oriented towards finding efficient ways to compare sequences. This is useful but insufficient: as the knowledge and understanding of some functional or structural aspects of living systems improve, analysts in molecular biology progressively shift from mere classification tasks to modeling tasks. People need to be able to express global sequence architectures and check various hypotheses on the way their sequences are structured. It appears necessary to offer generic tools for this task, allowing to build more expressive models of biological sequence families, on the basis of their content and structure.
This article introduces Logol, a new application designed to achieve pattern matching in possibly large sequences with customized biological patterns. Logol consists in both a language for describing patterns, and the associated parser for effective pattern search in sequences (RNA, DNA or protein) with such patterns. The Logol language, based on an high level grammatical formalism, allows to express flexible patterns (with mispairings and indels) composed of both sequential elements (such as motifs) and structural elements (such as repeats or pseudoknots). Its expressive power is presented through an application using the main components of the language : the identification of -1 programmed ribosomal frameshifting (PRF) events in messenger RNA sequences.
Logol allows the design of sophisticated patterns, and their search in large nucleic or amino acid sequences. It is available on the GenOuest bioinformatics platform at http://logol.genouest.org. The core application is a command-line application, available for different operating systems. The Logol suite also includes interfaces, e.g. an interface for graphically drawing the pattern.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–86 (2004)
Billoud, B., Kontic, M., Viari, A.: Palingol: a declarative programming language to describe nucleic acids’ secondary structures and to scan sequence database. Nucleic Acids Res. 24(8) (1996)
de Castro, E., Sigrist, C.J.A., et al.: Scanprosite: detection of prosite signature matches and prorule-associated functional and structural residues in proteins. Nucleic Acids Research 34(suppl. 2), 362–365 (2006)
Dong, S., Searls, D.B.: Gene structure prediction by linguistic methods. Genomics 23(3), 540–551 (1994)
Dsouza, M., Larsen, N., Overbeek, R.: Searching for patterns in genomic data. Trends in Genetics 13(12), 497–498 (1997)
Eddy, S.: Rnabob: a program to search for rna secondary structure motifs in sequence databases (1996)
Firth, A.E., Bekaert, M., Baranov, P.V.: Computational resources for studying recoding. In: Atkins, J.F., Gesteland, R.F. (eds.) Recoding: Expansion of Decoding Rules Enriches Gene Expression, Nucleic Acids and Molecular Biology, vol. 24, pp. 435–461. Springer, New York (2010)
Forest, J.P.: Modélisation et détection automatique de sites de décalage de cadre en -1 dans les génomes eucaryotes. Ph.D. thesis, Université de Paris VI (2005)
Gattiker, A., Gasteiger, E., Bairoch, A.: Scanprosite: a reference implementation of a prosite scanning tool. Applied Bioinformatics 1(2), 107–108 (2002)
Graf, S., Strothmann, D., Kurtz, S., Steger, G.: HyPaLib: a Database of RNAs and RNA Structural Elements defined by Hybrid Patterns. Nucleic Acids Res. 29(1), 196–198 (2001)
Jensen, K., Stephanopoulos, G., Rigoutsos, I.: Biogrep: A multi-threaded pattern matcher for large pattern sets (2002)
Joshi, A.K., Vijay-Shanker, K., Weir, D.: The convergence of midly context-sensitive grammars. In: Shieber, S.M., Wasow, T. (eds.) The Processing of Natural Language Structure, pp. 31–81. MIT Press, Bosto (1991)
Macke, T.J., Ecker, D.J., Gutell, R.R., Gautheret, D., Case, D.A., Sampath, R.: Rnamotif, an rna secondary structure definition and search algorithm. Nucleic Acids Research 29(22), 4724–4735 (2001)
Meyer, F., Kurtz, S., et al.: Structator: fast index-based search for rna sequence-structure patterns. BMC Bioinformatics 12(1), 214 (2011)
Nicolas, J., Durand, P., et al.: Suffix-tree analyser (stan): looking for nucleotidic and peptidic patterns in chromosomes. Bioinformatics 21(24), 4408–4410 (2005)
Pesole, G., Liuni, S., DSouza, M.: Patsearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics 16(5), 439–450 (2000)
Rocheteau, A., Belleannée, C.: Recherche d’éléments structurés dans les génomes par modèles logiques. Rapport de recherche PI-1994, Dyliss - Inria - Irisa (April 2012), http://hal.inria.fr/hal-00684388
Searls, D.B.: String variable grammar: A logic grammar formalism for the biological language of DNA. Journal of Logic Programming 24(1&2), 73–102 (1995)
Searls, D.B., Dong, S.: A syntactic pattern recognition system for DNA sequences. In: Cantor, C.R., Lim, H.A., Fickett, J., Robbins, R.J. (eds.) Proceedings 2nd International Conference on Bioinformatics, Supercomputing, and Complex Genome Analysis, pp. 89–101. World Scientific, Singapore (1993)
Strothmann, D., Gräf, S.A., Kurtz, S., Steger, G.: The syntax and semantics of a language for describing complex patterns in biological sequences. Tech. rep., Universität Bielefeld, Arbeitsgruppe Praktische Informatik (August 2000)
Theis, C., Reeder, J., Giegerich, R.: Knotinframe: prediction of -1 ribosomal frameshift events. Nucleic Acids Research 36(18), 6013–6020 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Belleannée, C., Sallou, O., Nicolas, J. (2014). Logol: Expressive Pattern Matching in Sequences. Application to Ribosomal Frameshift Modeling. In: Comin, M., Käll, L., Marchiori, E., Ngom, A., Rajapakse, J. (eds) Pattern Recognition in Bioinformatics. PRIB 2014. Lecture Notes in Computer Science(), vol 8626. Springer, Cham. https://doi.org/10.1007/978-3-319-09192-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-09192-1_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09191-4
Online ISBN: 978-3-319-09192-1
eBook Packages: Computer ScienceComputer Science (R0)