Skip to main content

Motif Extraction from Weighted Sequences

  • Conference paper
Book cover String Processing and Information Retrieval (SPIRE 2004)

Abstract

We present in this paper three algorithms. The first extracts repeated motifs from a weighted sequence. The motifs correspond to words which occur at least q times and with hamming distance e in a weighted sequence with probability ≥ 1/k each time, where k is a small constant. The second algorithm extracts common motifs from a set of N ≥ 2 weighted sequences with hamming distance e. In the second case, the motifs must occur twice with probability ≥ 1/k, in 1 ≤ qN distinct sequences of the set. The third algorithm extracts maximal pairs from a weighted sequence. A pair in a sequence is the occurrence of the same substring twice. In addition, the algorithms presented in this paper improve slightly on previous work on these problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brodal, G., Lyngso, R., Pedersen, C., Stoye, J.: Finding Maximal Pairs with Bounded Gap. Journal of Discrete Algorithms 1, 134–149 (2000)

    MathSciNet  Google Scholar 

  2. Brown, M.R., Tarjan, R.E.: A Fast Merging Algorithm. J. ACM 26(2), 211–226 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  3. Chazelle, B., Guibas, L.J.: Fractional Cascading: I. A data structuring technique. Algorithmica 1, 133–162 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  4. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)

    Book  MATH  Google Scholar 

  5. Iliopoulos, C., Makris, C., Panagis, I., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: Computing the Repetitions in a Weighted Sequence using Weighted Suffix Trees. In: Proc. of the European Conference On Computational Biology, ECCB (2003)

    Google Scholar 

  6. Iliopoulos, C., Makris, C., Panagis, I., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: Efficient Algorithms for Handling Molecular Weighted Sequences. Accepted for presentation in IFIP TCS (2004)

    Google Scholar 

  7. Iliopoulos, C., Makris, C., Sioutas, S., Tsakalidis, A., Tsichlas, K.: Identifying occurrences of maximal pairs in multiple strings. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 133–143. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  8. Marsan, L., Sagot, M.-F.: Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site consensus identification. Journal of Computational Biology 7, 345–360 (2000)

    Article  Google Scholar 

  9. McCreight, E.M.: A Space-Economical Suffix Tree Construction Algorithm. Journal of the ACM 23, 262–272 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  10. Myers, E.W. and Celera Genomics Corporation: The whole-genome assembly of drosophila. Science 287, 2196–2204 (2000)

    Google Scholar 

  11. Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.-F.: A basis of tiling motifs for generating repeated patterns and its complexity for higher quorum. In: Rovan, B., Vojtáš, P. (eds.) MFCS 2003. LNCS, vol. 2747, pp. 622–632. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Sagot, M.F.: Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 111–127. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  13. Schieber, B., Vishkin, U.: On Finding lowest common ancestors:simplifications and parallelization. SIAM Journal on Computing 17, 1253–1262 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  14. van Emde Boas, P.: Preserving order in a forest in less than logarithmic time and linear space. Information Processing Letters 6(3), 80–82 (1977)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Iliopoulos, C.S., Perdikuri, K., Theodoridis, E., Tsakalidis, A., Tsichlas, K. (2004). Motif Extraction from Weighted Sequences. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30213-1_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23210-0

  • Online ISBN: 978-3-540-30213-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics