Abstract
We present in this paper three algorithms. The first extracts repeated motifs from a weighted sequence. The motifs correspond to words which occur at least q times and with hamming distance e in a weighted sequence with probability ≥ 1/k each time, where k is a small constant. The second algorithm extracts common motifs from a set of N ≥ 2 weighted sequences with hamming distance e. In the second case, the motifs must occur twice with probability ≥ 1/k, in 1 ≤ q ≤ N distinct sequences of the set. The third algorithm extracts maximal pairs from a weighted sequence. A pair in a sequence is the occurrence of the same substring twice. In addition, the algorithms presented in this paper improve slightly on previous work on these problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brodal, G., Lyngso, R., Pedersen, C., Stoye, J.: Finding Maximal Pairs with Bounded Gap. Journal of Discrete Algorithms 1, 134–149 (2000)
Brown, M.R., Tarjan, R.E.: A Fast Merging Algorithm. J. ACM 26(2), 211–226 (1979)
Chazelle, B., Guibas, L.J.: Fractional Cascading: I. A data structuring technique. Algorithmica 1, 133–162 (1986)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)
Iliopoulos, C., Makris, C., Panagis, I., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: Computing the Repetitions in a Weighted Sequence using Weighted Suffix Trees. In: Proc. of the European Conference On Computational Biology, ECCB (2003)
Iliopoulos, C., Makris, C., Panagis, I., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: Efficient Algorithms for Handling Molecular Weighted Sequences. Accepted for presentation in IFIP TCS (2004)
Iliopoulos, C., Makris, C., Sioutas, S., Tsakalidis, A., Tsichlas, K.: Identifying occurrences of maximal pairs in multiple strings. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 133–143. Springer, Heidelberg (2002)
Marsan, L., Sagot, M.-F.: Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site consensus identification. Journal of Computational Biology 7, 345–360 (2000)
McCreight, E.M.: A Space-Economical Suffix Tree Construction Algorithm. Journal of the ACM 23, 262–272 (1976)
Myers, E.W. and Celera Genomics Corporation: The whole-genome assembly of drosophila. Science 287, 2196–2204 (2000)
Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.-F.: A basis of tiling motifs for generating repeated patterns and its complexity for higher quorum. In: Rovan, B., Vojtáš, P. (eds.) MFCS 2003. LNCS, vol. 2747, pp. 622–632. Springer, Heidelberg (2003)
Sagot, M.F.: Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 111–127. Springer, Heidelberg (1998)
Schieber, B., Vishkin, U.: On Finding lowest common ancestors:simplifications and parallelization. SIAM Journal on Computing 17, 1253–1262 (1988)
van Emde Boas, P.: Preserving order in a forest in less than logarithmic time and linear space. Information Processing Letters 6(3), 80–82 (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Iliopoulos, C.S., Perdikuri, K., Theodoridis, E., Tsakalidis, A., Tsichlas, K. (2004). Motif Extraction from Weighted Sequences. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-30213-1_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive