Advertisement

Trie-based data structures for sequence assembly

  • Ting Chen
  • Steven S. Skiena
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1264)

Abstract

We investigate the application of trie-based data structures, suffix trees and suffix arrays in the problem of overlap detection in fragment assembly. Both data structures are theoretically and experimentally analyzed on speed and space. By using heuristics, we can greatly reduce the calls to the time-consuming dynamic programming, and have improved the speed of overlap detection up to 1,000 times with high accuracy in our collaborative DNA sequencing with Brookhaven National Laboratory. We also studied the problem of approximating maximum space savings in tries structures for unification factoring in logic programming, which is proved to be hard.

Keywords

Logic Programming Exact Match Suffix Tree Suffix Array Polynomial Factor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403–410, 1990.Google Scholar
  2. 2.
    M Bellare, O. Goldreich, and M. Sudan. Free bits, PCPs, and non-approximability — towards tight results. In Proc. IEEE 36th Symp. Foundations of Computer Science, pages 422–431, 1995.Google Scholar
  3. 3.
    D.R. Clark and J.I. Munro. Efficient suffix trees on secondary storage. In Proc. Seventh ACM Symp. on Discrete Algorithms (SODA), pages 383–391, 1996.Google Scholar
  4. 4.
    S. Dawson, C.R. Ramakrishnan, I.V. Ramakrishnan, K. Sagonas, T. Swift, and D.S. Warren. Unification factoring for efficient execution of logic programs. In 2nd ACM Symposium on Principles of Programming Languages (POPL '95), pages 247–258, 1995.Google Scholar
  5. 5.
    S. Dawson, C.R. Ramakrishnan, and T. Swift. Principles and practice of unification factoring. In ACM Trans. on Programming Languages (TOPLAS), pages 528–563, 1996.Google Scholar
  6. 6.
    M.L. Engle and C. Burks. Artificially generated data sets for testing DNA fragment assembly algorithms. Genomics, 16:286–288, 1993.Google Scholar
  7. 7.
    P. Green. Documentation for phrap. Genome Center, University of Washington, http://bozeman.mbt.washington.edu, 1996.Google Scholar
  8. 8.
    J. Kececioglu and E.W. Myers. Exact and approximate algorithms for the sequence reconstruction problem. Algorithmica, 13:5–51, 1995.Google Scholar
  9. 9.
    C.-L. Lin. Optimizing tries for ordered pattern matching is π2p-complete. In Proc. 10th IEEE Structures in Complexity Theory Conference, pages 238–244, 1995.Google Scholar
  10. 10.
    C. Lund and M. Yannakakis. The approximation of maximum subgraph problems. In Proc. 20th ICALP, pages 40–51, 1992.Google Scholar
  11. 11.
    U. Manber and E.W. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Computing, 22:935–948, 1993.Google Scholar
  12. 12.
    E. W. Myers. Towards simplifying and accurately formulating fragment assembly. J. Comp. Biol., 2(2):275–290, 1995.Google Scholar
  13. 13.
    W.R. Pearson and D.J. Lipman. Improved tools for biological sequence comparison. In Proc. Natl. Acad. Sci., pages 2444–2448, 1988.Google Scholar
  14. 14.
    H. Simon. On approximate solutions for combinatorial optimization problems. SIAM J. Discrete Math., 3:294–310, 1990.Google Scholar
  15. 15.
    G.G. Sutton, O. White, M.D. Admas, and A.R. Kerlavage. TIGR assembler: a new tool for assembling large shotgun sequencing projects. Genome Science and Technology, 1:9–19, 1995.Google Scholar
  16. 16.
    M. S. Waterman. Introduction to Computational Biology. Chapman & Hall, London, UK, 1995.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Ting Chen
    • 1
  • Steven S. Skiena
    • 1
  1. 1.Department of Computer ScienceState University of New YorkStony Brook

Personalised recommendations