Advertisement

Using Suffix Trees for Gapped Motif Discovery

  • Emily Rocke
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1848)

Abstract

Gibbs sampling is a local search method that can be used to find novel motifs in a text string. In previous work [8], we have proposed a modified Gibbs sampler that can discover novel gapped motifs of varying lengths and occurrence rates in DNA or protein sequences. The Gibbs sampling method requires repeated searching of the text for the best match to a constantly evolving collection of aligned strings, and each search pass previously required θ(nl) time, where l is the length of the motif and n the length of the original sequence. This paper presents a novel method for using suffix trees to greatly improve the performance of the Gibbs sampling approach.

Keywords

Tree Search Gibbs Sampling Edit Distance Suffix Tree Gibbs Sampling Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chang, W.I., Lampe, J., Theoretical and Empirical Comparisons of Approximate String Matching Algorithms. Proc. 3rd Symp. on Combinatorial Pattern Matching, Springer LNCS 644, 175–84, 1992.Google Scholar
  2. 2.
    Fickett, J.W., Fast Optimal Alignment. Nucl. Acids Res., 12:175–80, 1984.CrossRefGoogle Scholar
  3. 3.
    Gusfield, D., Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York, 1997.zbMATHGoogle Scholar
  4. 4.
    Landau, G.M., Vishkin, U., Efficient string matching with k mismatches. Theor. Sci., 43:239–49, 1986zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Lawrence, Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J. Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment. Science, 262:208–214, 8 October 1993.Google Scholar
  6. 6.
    Marsan, L., Sagot, M.F., Extracting Structured Motifs Using a Suffix Tree—Algorithms and Application to Promoter Consensus Identification. To appear in Proceedings of RECOMB 2000.Google Scholar
  7. 7.
    Meyers, E.W., An O(nd) Difference Algorithm and Its Variations. Algorithmica, 1:251–66, 1986.CrossRefMathSciNetGoogle Scholar
  8. 8.
    Rocke, E., Tompa, M., An Algorithm for Finding Novel Gapped Motifs in DNA Sequences. Proceedings of the Second Annual International Conference on Computational Molecular Biology, 228–233., New York, NY, March 1998.Google Scholar
  9. 9.
    Sagot, M-F., Spelling Approximate Repeated or Common Motifs Using a Suffix Tree. Proceedings of LATIN, 374–390, 1998.Google Scholar
  10. 10.
    Smith, T.F., Waterman, M.S., Identification of Common Molecular Subsequences. J. Mol. Biol., 284:1–18, 1995.Google Scholar
  11. 11.
    Ukkonen, E., Approximate String-Matching Over Suffix Trees. Proc. 4th Symp. on Combinatorial Pattern Matching, Springer LCNS 684, 228–42, 1993.Google Scholar
  12. 12.
    Ukkonen, E., On-line Construction of Suffix-Trees. Algorithmica, 14:249–60, 1995.zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Ukkonen, E., Algorithms for Approximate String Matching. Information Control, 64:100–18, 1985.CrossRefGoogle Scholar
  14. 14.
    Weiner, P., Linear Pattern Matching Algorithms. Proc. of the 14th IEEE Symp. on Switching and Automata Theory, 1–11, 1973.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Emily Rocke
    • 1
  1. 1.CSE DepartmentUniversity of WashingtonSeattle

Personalised recommendations