Discovering Frequent Structured Patterns from String Databases: An Application to Biological Sequences

Palopoli, Luigi; Terracina, Giorgio

doi:10.1007/3-540-36182-0_6

Luigi Palopoli⁷ &
Giorgio Terracina⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2534))

Included in the following conference series:

International Conference on Discovery Science

951 Accesses
2 Citations

Abstract

In the last years, the completion of the human genome sequencing showed up a wide range of new challenging issues involving raw data analysis. In particular, the discovery of information implicitly encoded in biological sequences is assuming a prominent role in identifying genetic diseases and in deciphering biological mechanisms. This information is usually represented by patterns frequently occurring in the sequences. Because of biological observations, a specific class of patterns is becoming particularly interesting: frequent structured patterns. In this respect, it is biologically meaningful to look at both “exact” and “approximate” repetitions of the patterns within the available sequences. This paper gives a contribution in this setting by providing some algorithms which allow to discover frequent structured patterns, either in “exact” or “approximate” form, present in a collection of input biological sequences.

Corresponding author

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Amir, M. Farach, Z. Galil, R. Giancarlo, and K. Park. Dinamic dictionary matching. Journal of Computer and System Science, 49:208–222, 1994.
Article MATH MathSciNet Google Scholar
A. Apostolico and M. Crochemore. String matching for a deluge survival kit. Handbook of Massive Data Sets, To appear.
Google Scholar
A. Bairoch. PROSITE: A dictionary of protein sites and patterns. Nucleic Acid Research, 20:2013–2018, 1992.
Google Scholar
G. Benson. An algorithm for finding tandem repeats of unspecified pattern size. In Proceedings of ACM Recomb, pages 20–29, 1998.
Google Scholar
P. Bieganski, J. Riedl, J. V. Carlis, and E. M. Retzel. Generalized suffix trees for biological sequence data: Applications and implementations. In Proc. of the 27th Hawai Int. Conf. on Systems Science, pages 35–44. IEEE Computer Society Press, 1994.
Google Scholar
A. Brazma, I. Jonassen, I. Eidhammer, and D. Gilbert. Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology, 5(2):277–304, 1998.
Article Google Scholar
Y. M. Fraenkel, Y. Mandel, D. Friedberg, and H. Margalit. Identification of common motifs in unaligned dna sequnces: application to escherichia coli lrp regulon. Computer Applied Bioscience, 11:379–387, 1995.
Google Scholar
D. J. Galas, M. Eggert, and M. S. Waterman. Rigorous pattern-recognition methods for dna sequences: Analysis of promoter sequences from escherichia coli. J. of Molecular Biology, 186:117–128, 1985.
Article Google Scholar
C. A. Gross, M. Lonetto, and R. Losick. Bacterial sigma factors. Transcriptional Regulation, 1:129–176, 1992.
Google Scholar
D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambrige University Press, 1997.
Google Scholar
D. Gusfield, G. M. Landau, and B. Schieber. An efficient algorithm of all pairs suffix-prefix problem. Information Processing Letters, 41:181–185, 1992.
Article MATH MathSciNet Google Scholar
J. Helden, A. F. Rios, and J. Collado-Vides. Discovering regulatory elements in noncoding sequences by analysis of spaced dyads. Nucleic Acids Research, 28(8):1808–1818, 2000.
Article Google Scholar
A. Klingenhofen, K. Frech, K. Quandt, and T. Werner. Functional promoter modules can be detected by formal methods independent of overall sequence similarity. Bioinformatics, 15:180–186, 1999.
Article Google Scholar
L. Marsan and M. F. Sagot. Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site consensus identification. Journal of Computational Biology, 7:345–360, 2000.
Article Google Scholar
M. F. Sagot and E. W. Myers. Identifying satellites in nucleic acid sequences. In Proc. of ACM RECOMB, pages 234–242, 1998.
Google Scholar
H. O. Smith, T. M. Annau, and S. Chandrasegaran. Finding sequence motifs in groups of functionally related proteins. In Proc. of National Academy of Science, pages 118–122, U.S.A., 1990.
Google Scholar
R. L. Tatusov, S. F. Altschul, and E. V. Koonin. Detection of conserved segments in proteins. In Proc. of National Academy of Science, pages 12091–12095, U.S.A., 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

DIMET - Università di Reggio Calabria, Località Feo di Vito, 89100, Reggio Calabria, Italy
Luigi Palopoli & Giorgio Terracina

Authors

Luigi Palopoli
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Terracina
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Deutsches Forschungszentrum für Künstliche Intelligenz, Stuhlsatzenhausweg 3, 66123, Saarbrücken, Germany
Steffen Lange
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, 101-8430, Tokyo, Japan
Ken Satoh
Department of Computer Science, University of Maryland, College Park, 20742, Maryland, MD, USA
Carl H. Smith

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Palopoli, L., Terracina, G. (2002). Discovering Frequent Structured Patterns from String Databases: An Application to Biological Sequences. In: Lange, S., Satoh, K., Smith, C.H. (eds) Discovery Science. DS 2002. Lecture Notes in Computer Science, vol 2534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36182-0_6

Download citation

DOI: https://doi.org/10.1007/3-540-36182-0_6
Published: 08 November 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00188-1
Online ISBN: 978-3-540-36182-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics