Skip to main content

Improved length bounds for the shortest superstring problem

Extended abstract

  • Invited Presentation
  • Conference paper
  • First Online:
Book cover Algorithms and Data Structures (WADS 1995)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 955))

Included in the following conference series:

Abstract

Given a collection of strings S={s 1,...,s n } over an alphabet Σ, a superstring α of S is a string containing each s i as a substring; that is, for each i, 1 ≤ in, α contains a block of ¦s i¦ consecutive characters that match s i exactly. The shortest superstring problem is the problem of finding a superstring α of minimum length. This problem is NP-hard [6] and has applications in computational biology and data compression. The first O(1)-approximation algorithms were given in [2]. We describe our 2 3/4-approximation algorithm, which is the best known. While our algorithm is not complex, our analysis requires some novel machinery to describe overlapping periodic strings. We then show how to combine our result with that of [11] to obtain a ratio of 2 50/69 ≈ 2.725. We describe an implementation of our algorithm which runs in O(¦S¦+n 3) time; this matches the running time of previous O(1)-approximations.

This work was done while the author was at Dartmouth College.

Research partly supported by NSF Award CCR-9308701, a Walter Burke Research Initiation Award and a Dartmouth College Research Initiation Award.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Armen and C. Stein. Short supertrings and the structure of overlapping strings. To appear in J. of Computational Biology, 1995.

    Google Scholar 

  2. A. Blum, T. Jiang, M. Li, J. Tromp, and M. Yannakakis. Linear approximation of shortest superstrings. Journal of the ACM, 41(4):630–647, July 1994.

    Article  Google Scholar 

  3. A. Czumaj, L. Gasieniec, M. Piotrow, and W. Rytter. Parallel and sequential approxmations of shortest superstrings. In Proceedings of Fourth Scandinavian Workshop on Algorithm Theory, pages 95–106, 1994.

    Google Scholar 

  4. A. Lesk (edited). Computational Molecular Biology, Sources and Methods for Sequence Analysis. Oxford University Press, 1988.

    Google Scholar 

  5. A.M. Frieze, G. Galbiati, and F. Maffoli. On the worst case performance of some algorithms for the asymmetric travelling salesman problem. Networks, 12:23–39, 1982.

    Google Scholar 

  6. J. Gallant, D. Maier, and J. Storer. On finding minimal length superstrings. Journal of Computer and System Sciences, 20:50–58, 1980.

    Article  Google Scholar 

  7. D. Gusfield. Faster implementation of a shortest superstring approximation. Information Processing Letters, (51):271–274, 1994.

    Article  Google Scholar 

  8. D. Gusfield, G. Landau, and B. Schieber. An efficient algorithm for the all pairs suffix-prefix problem. Information Processing Letters, (41):181–185, March 1992.

    MathSciNet  Google Scholar 

  9. John D. Kececioglu. Exact and approximation algorithms for DNA sequence reconstruction. PhD thesis, University of Arizona, 1991.

    Google Scholar 

  10. D.E. Knuth, J.H.Morris, and V.B. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6:189–195, 1977.

    Article  Google Scholar 

  11. R. Kosaraju, J. Park, and C. Stein. Long tours and short superstrings. In FOCS, November 1994.

    Google Scholar 

  12. M. Li. Towards a DNA sequencing theory (learning a string). In FOCS, pages 125–134, 1990.

    Google Scholar 

  13. L.J.Cummings. Strongly qth power-free strings. Annals of Discrete Mathematics, 17:247–252, 1983.

    Google Scholar 

  14. Christos H. Papadimitriou and Kenneth Steiglitz. Combinatorial Optimization, Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ, 1982.

    Google Scholar 

  15. H. Peltola, H. Soderlund, J. Tarjio, and E. Ukkonen. Algorithms for some string matching problems arising in molecular genetics. In Proceedings of the IFIP Congress, pages 53–64, 1983.

    Google Scholar 

  16. Graham A. Stephen. String searching algorithms. World Scientific, 1994.

    Google Scholar 

  17. J. Storer. Data compression: methods and theory. Computer Science Press, 1988.

    Google Scholar 

  18. J. Tarhio and E. Ukkonen. A greedy approximation algorithm for constructing shortest common superstrings. Theoretical Computer Science, 57:131–145, 1988.

    Article  Google Scholar 

  19. Shang-Hua Teng and Frances Yao. Approximating shortest superstrings. In Proceedings of the 34th Annual Symposium on Foundations of Computer Science, pages 158–165, November 1993.

    Google Scholar 

  20. J. Turner. Approximation algorithms for the shortest common superstring problem. Information and Computation, 83:1–20, 1989.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Selim G. Akl Frank Dehne Jörg-Rüdiger Sack Nicola Santoro

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Armen, C., Stein, C. (1995). Improved length bounds for the shortest superstring problem. In: Akl, S.G., Dehne, F., Sack, JR., Santoro, N. (eds) Algorithms and Data Structures. WADS 1995. Lecture Notes in Computer Science, vol 955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60220-8_88

Download citation

  • DOI: https://doi.org/10.1007/3-540-60220-8_88

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60220-0

  • Online ISBN: 978-3-540-44747-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics