Why Greed Works for Shortest Common Superstring Problem

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5029)


The shortest common superstring problem (SCS) has been widely studied for its applications in string compression and DNA sequence assembly. Although it is known to be Max-SNP hard, the simple greedy algorithm works extremely well in practice. Previous researchers have proved that the greedy algorithm is asymptotically optimal on random instances. Unfortunately, the practical instances in DNA sequence assembly are very different from random instances.

In this paper we explain the good performance of greedy algorithm by using the smoothed analysis. We show that, for any given instance I of SCS, the average approximation ratio of the greedy algorithm on a small random perturbation of I is 1 + o(1). The perturbation defined in the paper is small and naturally represents the mutations of the DNA sequence during evolution.

Due to the existence of the uncertain nucleotides in the output of a DNA sequencing machine, we also proposed the shortest common superstring with wildcards problem (SCSW). We prove that in worst case SCSW cannot be approximated within ratio n 1/7 − ε , while the greedy algorithm still has 1 + o(1) smoothed approximation ratio.


Greedy Algorithm Greed Work Input String Short String Perturbation Probability 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Armen, C., Stein, C.: A 2 2/3-approximation algorithm for the shortest superstring problem. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 87–101. Springer, Heidelberg (1996)Google Scholar
  2. 2.
    Rebaï, A.S., Elloumi, M.: Approximation algorithm for the shortest approximate common superstring problem. In: Proc. 12th Word Academy of Science, Engineering and Technology, pp. 302–307 (2006)Google Scholar
  3. 3.
    Bellare, M., Goldreich, O., Sudan, M.: Free bits, pcps and non-approximability - towards tight results. SIAM Journal on Computing 27, 804–915 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Blum, A., Jiang, T., Li, M., Tromp, J., Yannakakis, M.: Linear Approximation of Shortest Superstrings. Journal of the Association for Computer Machinery 41(4), 630–647 (1994)zbMATHMathSciNetGoogle Scholar
  5. 5.
    Breslauer, D., Jiang, T., Jiang, Z.: Rotations of periodic strings and short superstrings. Journal of Algorithms 24(2), 340–353 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Frieze, A.M., Szpankowski, W.: Greedy algorithms for the shortest common superstring that are asymptotically optimal. Algorithmica 21(1), 21–36 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Gallant, J., Maier, D., Storer, J.: On finding minimal length superstrings. Journal of Computer and System Sciences 20, 50–58 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Storer, J.: Data Compression: Methods and Theory. Addison-Wesley, Reading (1988)Google Scholar
  9. 9.
    Tarhio, J., Ukkonen, E.: A greedy approximation algorithm for constructing shortest common superstrings. Theoretical Computer Science 57, 131–145 (1988)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Turner, J.: Approximation algorithms for the shortest common superstring problem. Information and Computation 83, 1–20 (1989)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Kaplan, H., Shafrir, N.: The greedy algorithm for shortest superstrings. Information Processing Letters 93, 13–17 (2005)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Li, M.: Towards a DNA sequencing theory. In: Proc. of the 31st IEEE Symposium on Foundations of Computer Science, pp. 125–134 (1990)Google Scholar
  13. 13.
    Waterman, M.S.: Introduction to Computational Biology: Maps, Sequences, and Genomes. Chapman and Hall, Boca Raton (1995)zbMATHGoogle Scholar
  14. 14.
    Romero, H.J., Brizuela, C.A., Tchernykh, A.: An experimental comparison of approximation algorithms for the shortest common superstring problem. In: Proc. Fifth Mexican International Conference in Computer Science (ENC 2004), pp. 27–34 (2004)Google Scholar
  15. 15.
    Shapiro, M.B.: An algorithm for reconstructing protein and RNA sequences. Journal of ACM 14(4), 720–731 (1967)zbMATHCrossRefGoogle Scholar
  16. 16.
    Spielman, D.A., Teng, S.-H.: Smoothed analysis: Motivation and discrete models. In: Dehne, F., Sack, J.-R., Smid, M. (eds.) WADS 2003. LNCS, vol. 2748, pp. 256–270. Springer, Heidelberg (2003)Google Scholar
  17. 17.
    Spielman, D.A., Teng, S.-H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of ACM 51(3), 385–463 (2004)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Sweedyk, Z.: 2.5-approximation algorithm for shortest superstring. SIAM Journal on Computing 29(3), 954–986 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Teng, S.H., Yao, F.: Approximating shortest superstrings. In: Proc. 34th IEEE Symposium on Foundations of Computer Science, pp. 158–165 (1993)Google Scholar
  20. 20.
    Vassilevska, V.: Explicit inapproximability bounds for the shortest superstring problem. In: Jedrzejowicz, J., Szepietowski, A. (eds.) MFCS 2005. LNCS, vol. 3618, pp. 793–800. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  21. 21.
    Yang, E.H., Zhang, Z.: Shortest common superstring problem: average case analysis for both exact and approximate matching. IEEE Transactions on Information Theory 45(6), 1867–1886 (1999)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Bin Ma
    • 1
  1. 1.Department of Computer ScienceUniversity of Western OntarioLondonCanada

Personalised recommendations