Motif Extraction from Weighted Sequences

Iliopoulos, Costas S.; Perdikuri, Katerina; Theodoridis, Evangelos; Tsakalidis, Athanasios; Tsichlas, Kostas

doi:10.1007/978-3-540-30213-1_41

Costas S. Iliopoulos¹⁸,
Katerina Perdikuri^19,20,
Evangelos Theodoridis^19,20,
Athanasios Tsakalidis^19,20 &
…
Kostas Tsichlas¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3246))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

724 Accesses
4 Citations

Abstract

We present in this paper three algorithms. The first extracts repeated motifs from a weighted sequence. The motifs correspond to words which occur at least q times and with hamming distance e in a weighted sequence with probability ≥ 1/k each time, where k is a small constant. The second algorithm extracts common motifs from a set of N ≥ 2 weighted sequences with hamming distance e. In the second case, the motifs must occur twice with probability ≥ 1/k, in 1 ≤ q ≤ N distinct sequences of the set. The third algorithm extracts maximal pairs from a weighted sequence. A pair in a sequence is the occurrence of the same substring twice. In addition, the algorithms presented in this paper improve slightly on previous work on these problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brodal, G., Lyngso, R., Pedersen, C., Stoye, J.: Finding Maximal Pairs with Bounded Gap. Journal of Discrete Algorithms 1, 134–149 (2000)
MathSciNet Google Scholar
Brown, M.R., Tarjan, R.E.: A Fast Merging Algorithm. J. ACM 26(2), 211–226 (1979)
Article MATH MathSciNet Google Scholar
Chazelle, B., Guibas, L.J.: Fractional Cascading: I. A data structuring technique. Algorithmica 1, 133–162 (1986)
Article MATH MathSciNet Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)
Book MATH Google Scholar
Iliopoulos, C., Makris, C., Panagis, I., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: Computing the Repetitions in a Weighted Sequence using Weighted Suffix Trees. In: Proc. of the European Conference On Computational Biology, ECCB (2003)
Google Scholar
Iliopoulos, C., Makris, C., Panagis, I., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: Efficient Algorithms for Handling Molecular Weighted Sequences. Accepted for presentation in IFIP TCS (2004)
Google Scholar
Iliopoulos, C., Makris, C., Sioutas, S., Tsakalidis, A., Tsichlas, K.: Identifying occurrences of maximal pairs in multiple strings. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 133–143. Springer, Heidelberg (2002)
Chapter Google Scholar
Marsan, L., Sagot, M.-F.: Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site consensus identification. Journal of Computational Biology 7, 345–360 (2000)
Article Google Scholar
McCreight, E.M.: A Space-Economical Suffix Tree Construction Algorithm. Journal of the ACM 23, 262–272 (1976)
Article MATH MathSciNet Google Scholar
Myers, E.W. and Celera Genomics Corporation: The whole-genome assembly of drosophila. Science 287, 2196–2204 (2000)
Google Scholar
Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.-F.: A basis of tiling motifs for generating repeated patterns and its complexity for higher quorum. In: Rovan, B., Vojtáš, P. (eds.) MFCS 2003. LNCS, vol. 2747, pp. 622–632. Springer, Heidelberg (2003)
Chapter Google Scholar
Sagot, M.F.: Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 111–127. Springer, Heidelberg (1998)
Chapter Google Scholar
Schieber, B., Vishkin, U.: On Finding lowest common ancestors:simplifications and parallelization. SIAM Journal on Computing 17, 1253–1262 (1988)
Article MATH MathSciNet Google Scholar
van Emde Boas, P.: Preserving order in a forest in less than logarithmic time and linear space. Information Processing Letters 6(3), 80–82 (1977)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, King’s College London, London, WC2R 2LS, England
Costas S. Iliopoulos & Kostas Tsichlas
Computer Engineering & Informatics Dept. of University of Patras, 26500, Patras, Greece
Katerina Perdikuri, Evangelos Theodoridis & Athanasios Tsakalidis
Research Academic Computer Technology Institute (RACTI), 61 Riga Feraiou St., 26221, Patras, Greece
Katerina Perdikuri, Evangelos Theodoridis & Athanasios Tsakalidis

Authors

Costas S. Iliopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Katerina Perdikuri
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos Theodoridis
View author publications
You can also search for this author in PubMed Google Scholar
Athanasios Tsakalidis
View author publications
You can also search for this author in PubMed Google Scholar
Kostas Tsichlas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Georgia Institute of Technology and Università di Padova,
Alberto Apostolico
Department of Information Engineering, University of Padova,
Massimo Melucci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Iliopoulos, C.S., Perdikuri, K., Theodoridis, E., Tsakalidis, A., Tsichlas, K. (2004). Motif Extraction from Weighted Sequences. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_41

Download citation

DOI: https://doi.org/10.1007/978-3-540-30213-1_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics