Skip to main content

Pattern Discovery Allowing Wild-Cards, Substitution Matrices, and Multiple Score Functions

  • Conference paper
Algorithms in Bioinformatics (WABI 2003)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 2812))

Included in the following conference series:

  • 861 Accesses

Abstract

Pattern discovery has many applications in finding functionally or structurally important regions in biological sequences (binding sites, regulatory sites, protein signatures etc.). In this paper we present a new pattern discovery algorithm, which has the following features:

it allows to find, in exactly the same manner and without any prior specification, patterns with fixed length gaps (i.e. sequences of one or several consecutive wild-cards) and contiguous patterns;

it allows the use of any pairwise score function, thus offering multiple ways to define or to constrain the type of the searched patterns; in particular, one can use substitution matrices (PAM, BLOSUM) to compare amino acids, or exact matchings to compare nucleotides, or equivalency sets in both cases.

We describe the algorithm, compare it to other algorithms and give the results of the tests on discovering binding sites for DNA-binding proteins (ArgR, LexA, PurR, TyrR respectively) in E. coli, and promoter sites in a set of Dicot plants.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)

    Google Scholar 

  2. Brejová, B., DiMarco, C., Vinar, T., Hidalgo, S.R., Hoguin, G., Patten, C.: Finding patterns in biological sequences. Tech. Rep. CS798g, University of Waterloo (2000)

    Google Scholar 

  3. Buhler, J., Tompa, M.: Finding motifs using random projections. In: Proceedings of RECOMB 2001, pp. 69–76. ACM Press, New York (2001)

    Chapter  Google Scholar 

  4. Califano, A.: SPLASH: Structural pattern localization analysis by sequential histograms. Bioinformatics 16(4), 341–357 (2000)

    Article  Google Scholar 

  5. Hertz, G., Stormo, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)

    Article  Google Scholar 

  6. Jonassen, I.: Efficient discovery of conserved patterns using a pattern graph. Computer Applications in the Biosciences 13, 509–522 (1997)

    Google Scholar 

  7. Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: Detecting subtle sequence signal: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  8. Lipman, D.J., Pearson, W.R.: Rapid and sensitive protein similarity search. Sciences 227, 1435–1441 (1985)

    Article  Google Scholar 

  9. Marsan, L., Sagot, M.-F.: Extracting structured motifs using a suffix tree. In: Proceedings of RECOMB 2000, pp. 210–219. ACM Press, New York (2000)

    Chapter  Google Scholar 

  10. Pevzner, P.A., Sze, S.-H.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of ISMB, pp. 269–278 (2000)

    Google Scholar 

  11. Rigoutsos, I., Floratos, A.: Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm. Bioinformatics 14(1), 55–67 (1998)

    Article  Google Scholar 

  12. Robison, K., McGuire, A.M., Church, G.M.: A comprehensive library of DNA binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. J. Mol. Biol. 284, 241–254 (1998)

    Article  Google Scholar 

  13. Smith, H.O., Annau, T.M., Chandrasegaran, S.: Finding sequence motifs groups of functionally related proteins. Proc. Nat. Ac. Sci. USA 87, 826–830 (1990)

    Article  Google Scholar 

  14. Schneider, T.D., Stephens, R.M.: Sequence logos: a new way to display consensus sequence. Nucl. Acids Res 18, 6097–6100 (1990)

    Article  Google Scholar 

  15. Waterman, M.: Introduction to computational biology: maps, sequences and genomes. Chapman & Hall, Boca Raton (2000)

    MATH  Google Scholar 

  16. Wilbur, W., Lipman, D.: Rapid similarity searches of nucleic acid and protein data banks. In: Proceeding of National Academy of Science, vol. 80, pp. 726–730 (1983)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mancheron, A., Rusu, I. (2003). Pattern Discovery Allowing Wild-Cards, Substitution Matrices, and Multiple Score Functions. In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39763-2_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20076-5

  • Online ISBN: 978-3-540-39763-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics