Skip to main content

Greedy algorithms for the shortest common superstring that are asymtotically optimal

  • Conference paper
  • First Online:
Book cover Algorithms — ESA '96 (ESA 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1136))

Included in the following conference series:

Abstract

There has recently been a resurgence of interest in the shortest common superstring problem due to its important applications in molecular biology (e.g., recombination of DNA) and data compression. The problem is NP-hard, but it has been known for some time that greedy algorithms work well for this problem. More precisely, it was proved in a recent sequence of papers that in the worst case a greedy algorithm produces a superstring that is at most β times (2≤β≤4) worse than optimal. We analyze the problem in a probabilistic framework,and consider the optimal total overlap O optn and the overlap O grn produced by various greedy algorithms. These turn out to be asymptotically equivalent. We show that in several cases, with high probability \(\lim _{n \to \infty } \tfrac{{O_n^{opt} }}{{n\log n}} = \lim _{n \to \infty } \tfrac{{O_n^{gr} }}{{n\log n}} = \tfrac{1}{H}\)where n is the number of original strings, and H is the entropy of the underlying alphabet. Our results hold under a condition that the lengths of all strings are not too short. Finally, we provide several generalizations and extensions of our basic result.

This work was supported by CCR-9225008.

This research was supported in part by NSF Grants CCR-9201078, NCR-9206315 and NCR-9415491, and in part by NATO Collaborative Grant CGR.950060.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. K. Alexander, Shortest Common Superstring of Random Strings, Proc. Combinatorial Pattern Matching, Springer-Verlag, LNCS #807, 164–172, 1994

    Google Scholar 

  2. C.Armen and C.Stein, Short Superstrings and the Structure of Overlapping Strings, Journal of Computational Biology, to appear.

    Google Scholar 

  3. C.Armen and C.Stein, A 2-2/3 Approximation Algorithm for the Shortest Superstring Problem, Proc. Combinatorial Pattern Matching, 1996.

    Google Scholar 

  4. W. Bains and G. Smith, A Novel Method for Nucleic Acid Sequence Determination, J. Theor. Biol., 135, 303–307, 1988.

    PubMed  Google Scholar 

  5. A. Blum, T. Jiang, M. Li, J. Tromp, M. Yannakakis, Linear Approximation of Shortest Superstring, J. the ACM, 41, 630–647, 1994; also STOC, 328–336, 1991.

    Google Scholar 

  6. T.M. Cover and J.A. Thomas, Elements of Information Theory, John Wiley&Sons, New York (1991).

    Google Scholar 

  7. A.Czumaj, L.Gasienic, M.Piotrow and W.Rytter, Parallel and Sequential Approximations of Shortest Superstrings, Proceedings of the Fourth Scandinavian Workshop on Algorithm Theory, 95–106, 1994.

    Google Scholar 

  8. R. Drmanac and C. Crkvenjakov, Sequencing by Hybridization (SBH) with Oligonucloide Probes as an Integral Approach for the Analysis of Complex Genome, Int. J. genomic Research, 1, 59–79, 1992.

    Google Scholar 

  9. J. Gallant, D. Maier and J.A. Storer, On Finding Minimal Length Superstrings, Journal of Computer and System Sciences, 20, 50–58, 1980.

    Article  Google Scholar 

  10. P. Jacquet and W. Szpankowski, Analysis of Digital Tries with Markovian Dependency, IEEE Trans. on Information Theory, 37, 1470–1475, 1991.

    Google Scholar 

  11. T. Jiang and M. Li, Approximating Shortest Superstring with Constraints, WADS, 385–396, Montreal 1993.

    Google Scholar 

  12. T.Jiang, Z.Jiang and D.Breslauer, Rotation of Periodic Strings and Short Superstrings, Proceedings of the Third South American Conference on String Processing, to appear.

    Google Scholar 

  13. D. E. Knuth, The Art of Computer Programming. Sorting and Searching, Addison-Wesley 1973.

    Google Scholar 

  14. D. E. Knuth, Motwani, and B. Pittel, Stable Husbands, Random Structures and Algorithms, 1, 1–14, 1990.

    Google Scholar 

  15. S.R.Kosaraju, J.K.Park and C.Stein, Long Tours and Short Superstrings, Proceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science, 166–177, 1994.

    Google Scholar 

  16. A. Lesek (Ed.), Computational Molecular Biology, Sources and Methods for Sequence Analysis, Oxford University Press, 1988.

    Google Scholar 

  17. Ming Li, Towards a DNA Sequencing Theory, Proc. of 31st IEEE Symp. on Foundation of Computer Science, 125–134 1990.

    Google Scholar 

  18. T. Luczak and W. Szpankowski, A Lossy Data Compression Based on an Approximate Pattern Matching, IEEE Trans. Information Theory, to appear; also Purdue University, CSD-TR-94-072, 1994.

    Google Scholar 

  19. P. Pevzner, l-tuple DNA Sequencing: Computer Analysis, J. Biomolecular Structure and Dynamics, 7, 63–73, 1989.

    Google Scholar 

  20. B. Pittel, Asymptotic Growth of a Class of Random Trees, Ann. Probab., 18, 414–427, 1985.

    Google Scholar 

  21. P. Shields, Entropy and Prefixes, Ann. Probab., 20, 403–409, 1992.

    Google Scholar 

  22. W. Szpankowski, The Evaluation of an Alternative (sic!) Sum with Applications to the Analysis of Some Data Structures, Information Processing Letters, 28, 13–19, 1988.

    Google Scholar 

  23. W. Szpankowski, A Generalized Suffix Tree and its (Un)Expected Asymptotic Behaviors, SIAM J. Computing, 22, pp. 1176–1198, 1993.

    Google Scholar 

  24. S. Teng and F. Yao, Approximating Shortest Superstring, Proc. FOCS, 158–165, 1993.

    Google Scholar 

  25. E. Ukkonen, A Linear-Time Algorithm for Finding Approximate Shortest Common Superstrings, Algorithmica, 5, 313–323, 1990.

    Google Scholar 

  26. E. Ukkonen, Approximate String-Matching over Suffix Trees, Proc. Combinatorial Pattern Matching, 228–242, Padova, 1993.

    Google Scholar 

  27. E-H. Yang and Z. Zhang, The Shortest Common Superstring Problem: Average Case Analysis for Both Exact Matching and Approximate Matching, preprint.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Josep Diaz Maria Serna

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Frieze, A., Szpankowski, W. (1996). Greedy algorithms for the shortest common superstring that are asymtotically optimal. In: Diaz, J., Serna, M. (eds) Algorithms — ESA '96. ESA 1996. Lecture Notes in Computer Science, vol 1136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61680-2_56

Download citation

  • DOI: https://doi.org/10.1007/3-540-61680-2_56

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61680-1

  • Online ISBN: 978-3-540-70667-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics