Why Greed Works for Shortest Common Superstring Problem
The shortest common superstring problem (SCS) has been widely studied for its applications in string compression and DNA sequence assembly. Although it is known to be Max-SNP hard, the simple greedy algorithm works extremely well in practice. Previous researchers have proved that the greedy algorithm is asymptotically optimal on random instances. Unfortunately, the practical instances in DNA sequence assembly are very different from random instances.
In this paper we explain the good performance of greedy algorithm by using the smoothed analysis. We show that, for any given instance I of SCS, the average approximation ratio of the greedy algorithm on a small random perturbation of I is 1 + o(1). The perturbation defined in the paper is small and naturally represents the mutations of the DNA sequence during evolution.
Due to the existence of the uncertain nucleotides in the output of a DNA sequencing machine, we also proposed the shortest common superstring with wildcards problem (SCSW). We prove that in worst case SCSW cannot be approximated within ratio n 1/7 − ε , while the greedy algorithm still has 1 + o(1) smoothed approximation ratio.
KeywordsGreedy Algorithm Greed Work Input String Short String Perturbation Probability
Unable to display preview. Download preview PDF.
- 1.Armen, C., Stein, C.: A 2 2/3-approximation algorithm for the shortest superstring problem. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 87–101. Springer, Heidelberg (1996)Google Scholar
- 2.Rebaï, A.S., Elloumi, M.: Approximation algorithm for the shortest approximate common superstring problem. In: Proc. 12th Word Academy of Science, Engineering and Technology, pp. 302–307 (2006)Google Scholar
- 8.Storer, J.: Data Compression: Methods and Theory. Addison-Wesley, Reading (1988)Google Scholar
- 12.Li, M.: Towards a DNA sequencing theory. In: Proc. of the 31st IEEE Symposium on Foundations of Computer Science, pp. 125–134 (1990)Google Scholar
- 14.Romero, H.J., Brizuela, C.A., Tchernykh, A.: An experimental comparison of approximation algorithms for the shortest common superstring problem. In: Proc. Fifth Mexican International Conference in Computer Science (ENC 2004), pp. 27–34 (2004)Google Scholar
- 16.Spielman, D.A., Teng, S.-H.: Smoothed analysis: Motivation and discrete models. In: Dehne, F., Sack, J.-R., Smid, M. (eds.) WADS 2003. LNCS, vol. 2748, pp. 256–270. Springer, Heidelberg (2003)Google Scholar
- 19.Teng, S.H., Yao, F.: Approximating shortest superstrings. In: Proc. 34th IEEE Symposium on Foundations of Computer Science, pp. 158–165 (1993)Google Scholar