Skip to main content

Discovering Frequent Structured Patterns from String Databases: An Application to Biological Sequences

  • Conference paper
  • First Online:
Discovery Science (DS 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2534))

Included in the following conference series:

Abstract

In the last years, the completion of the human genome sequencing showed up a wide range of new challenging issues involving raw data analysis. In particular, the discovery of information implicitly encoded in biological sequences is assuming a prominent role in identifying genetic diseases and in deciphering biological mechanisms. This information is usually represented by patterns frequently occurring in the sequences. Because of biological observations, a specific class of patterns is becoming particularly interesting: frequent structured patterns. In this respect, it is biologically meaningful to look at both “exact” and “approximate” repetitions of the patterns within the available sequences. This paper gives a contribution in this setting by providing some algorithms which allow to discover frequent structured patterns, either in “exact” or “approximate” form, present in a collection of input biological sequences.

Corresponding author

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Amir, M. Farach, Z. Galil, R. Giancarlo, and K. Park. Dinamic dictionary matching. Journal of Computer and System Science, 49:208–222, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  2. A. Apostolico and M. Crochemore. String matching for a deluge survival kit. Handbook of Massive Data Sets, To appear.

    Google Scholar 

  3. A. Bairoch. PROSITE: A dictionary of protein sites and patterns. Nucleic Acid Research, 20:2013–2018, 1992.

    Google Scholar 

  4. G. Benson. An algorithm for finding tandem repeats of unspecified pattern size. In Proceedings of ACM Recomb, pages 20–29, 1998.

    Google Scholar 

  5. P. Bieganski, J. Riedl, J. V. Carlis, and E. M. Retzel. Generalized suffix trees for biological sequence data: Applications and implementations. In Proc. of the 27th Hawai Int. Conf. on Systems Science, pages 35–44. IEEE Computer Society Press, 1994.

    Google Scholar 

  6. A. Brazma, I. Jonassen, I. Eidhammer, and D. Gilbert. Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology, 5(2):277–304, 1998.

    Article  Google Scholar 

  7. Y. M. Fraenkel, Y. Mandel, D. Friedberg, and H. Margalit. Identification of common motifs in unaligned dna sequnces: application to escherichia coli lrp regulon. Computer Applied Bioscience, 11:379–387, 1995.

    Google Scholar 

  8. D. J. Galas, M. Eggert, and M. S. Waterman. Rigorous pattern-recognition methods for dna sequences: Analysis of promoter sequences from escherichia coli. J. of Molecular Biology, 186:117–128, 1985.

    Article  Google Scholar 

  9. C. A. Gross, M. Lonetto, and R. Losick. Bacterial sigma factors. Transcriptional Regulation, 1:129–176, 1992.

    Google Scholar 

  10. D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambrige University Press, 1997.

    Google Scholar 

  11. D. Gusfield, G. M. Landau, and B. Schieber. An efficient algorithm of all pairs suffix-prefix problem. Information Processing Letters, 41:181–185, 1992.

    Article  MATH  MathSciNet  Google Scholar 

  12. J. Helden, A. F. Rios, and J. Collado-Vides. Discovering regulatory elements in noncoding sequences by analysis of spaced dyads. Nucleic Acids Research, 28(8):1808–1818, 2000.

    Article  Google Scholar 

  13. A. Klingenhofen, K. Frech, K. Quandt, and T. Werner. Functional promoter modules can be detected by formal methods independent of overall sequence similarity. Bioinformatics, 15:180–186, 1999.

    Article  Google Scholar 

  14. L. Marsan and M. F. Sagot. Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site consensus identification. Journal of Computational Biology, 7:345–360, 2000.

    Article  Google Scholar 

  15. M. F. Sagot and E. W. Myers. Identifying satellites in nucleic acid sequences. In Proc. of ACM RECOMB, pages 234–242, 1998.

    Google Scholar 

  16. H. O. Smith, T. M. Annau, and S. Chandrasegaran. Finding sequence motifs in groups of functionally related proteins. In Proc. of National Academy of Science, pages 118–122, U.S.A., 1990.

    Google Scholar 

  17. R. L. Tatusov, S. F. Altschul, and E. V. Koonin. Detection of conserved segments in proteins. In Proc. of National Academy of Science, pages 12091–12095, U.S.A., 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Palopoli, L., Terracina, G. (2002). Discovering Frequent Structured Patterns from String Databases: An Application to Biological Sequences. In: Lange, S., Satoh, K., Smith, C.H. (eds) Discovery Science. DS 2002. Lecture Notes in Computer Science, vol 2534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36182-0_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-36182-0_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00188-1

  • Online ISBN: 978-3-540-36182-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics