Abstract
Pattern discovery has many applications in finding functionally or structurally important regions in biological sequences (binding sites, regulatory sites, protein signatures etc.). In this paper we present a new pattern discovery algorithm, which has the following features:
it allows to find, in exactly the same manner and without any prior specification, patterns with fixed length gaps (i.e. sequences of one or several consecutive wild-cards) and contiguous patterns;
it allows the use of any pairwise score function, thus offering multiple ways to define or to constrain the type of the searched patterns; in particular, one can use substitution matrices (PAM, BLOSUM) to compare amino acids, or exact matchings to compare nucleotides, or equivalency sets in both cases.
We describe the algorithm, compare it to other algorithms and give the results of the tests on discovering binding sites for DNA-binding proteins (ArgR, LexA, PurR, TyrR respectively) in E. coli, and promoter sites in a set of Dicot plants.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)
Brejová, B., DiMarco, C., Vinar, T., Hidalgo, S.R., Hoguin, G., Patten, C.: Finding patterns in biological sequences. Tech. Rep. CS798g, University of Waterloo (2000)
Buhler, J., Tompa, M.: Finding motifs using random projections. In: Proceedings of RECOMB 2001, pp. 69–76. ACM Press, New York (2001)
Califano, A.: SPLASH: Structural pattern localization analysis by sequential histograms. Bioinformatics 16(4), 341–357 (2000)
Hertz, G., Stormo, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)
Jonassen, I.: Efficient discovery of conserved patterns using a pattern graph. Computer Applications in the Biosciences 13, 509–522 (1997)
Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: Detecting subtle sequence signal: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Lipman, D.J., Pearson, W.R.: Rapid and sensitive protein similarity search. Sciences 227, 1435–1441 (1985)
Marsan, L., Sagot, M.-F.: Extracting structured motifs using a suffix tree. In: Proceedings of RECOMB 2000, pp. 210–219. ACM Press, New York (2000)
Pevzner, P.A., Sze, S.-H.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of ISMB, pp. 269–278 (2000)
Rigoutsos, I., Floratos, A.: Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm. Bioinformatics 14(1), 55–67 (1998)
Robison, K., McGuire, A.M., Church, G.M.: A comprehensive library of DNA binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. J. Mol. Biol. 284, 241–254 (1998)
Smith, H.O., Annau, T.M., Chandrasegaran, S.: Finding sequence motifs groups of functionally related proteins. Proc. Nat. Ac. Sci. USA 87, 826–830 (1990)
Schneider, T.D., Stephens, R.M.: Sequence logos: a new way to display consensus sequence. Nucl. Acids Res 18, 6097–6100 (1990)
Waterman, M.: Introduction to computational biology: maps, sequences and genomes. Chapman & Hall, Boca Raton (2000)
Wilbur, W., Lipman, D.: Rapid similarity searches of nucleic acid and protein data banks. In: Proceeding of National Academy of Science, vol. 80, pp. 726–730 (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mancheron, A., Rusu, I. (2003). Pattern Discovery Allowing Wild-Cards, Substitution Matrices, and Multiple Score Functions. In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-39763-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20076-5
Online ISBN: 978-3-540-39763-2
eBook Packages: Springer Book Archive