Journal of Combinatorial Optimization

, Volume 29, Issue 4, pp 859–883 | Cite as

A greedy randomized adaptive search procedure with path relinking for the shortest superstring problem



The shortest superstring problem (SSP) is an \(NP\)-hard combinatorial optimization problem which has attracted the interest of many researchers, due to its applications in computational molecular biology problems such as DNA sequencing, and in computer science problems such as string compression. In this paper a new heuristic algorithm for solving large scale instances of the SSP is presented, which outperforms the natural greedy algorithm in the majority of the tested instances. The proposed method is able to provide multiple near-optimum solutions and admits a natural parallel implementation. Extended computational experiments on a set of SSP instances with known optimum solutions indicate that the new method finds the optimum solution in most of the cases, and its average error relative to the optimum is close to zero.


Combinatorial optimization DNA sequencing Data compression Heuristics GRASP Path relinking 



This research has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF)—Research Funding Program: Heraclitus II. Investing in knowledge society through the European Social Fund.


  1. Armen C, Stein C (1995a) Improved length bounds for the shortest superstring problem. In: Akl S, Dehne F, Jorg-Rudiger S, Santoro N (eds) Algorithms and data structures, volume 955. Lecture notes in computer science. Springer, Berlin, pp 494–505Google Scholar
  2. Armen C, Stein C (1995b) Short superstrings and the structure of overlapping strings. J Comput Biol 2(2):307–332CrossRefGoogle Scholar
  3. Armen C, Stein C (1996) A \(2 \frac{2}{3}\)-approximation algorithm for the shortest superstring problem. In: Hirschberg D, Myers G (eds) Combinatorial pattern matching, volume 1075. Lecture notes in computer science. Springer, Berlin, pp 87–101Google Scholar
  4. Arora S, Lund C, Motwani R, Sudan M, Szegedy M (1998) Proof verification and the hardness of approximation problems. J ACM 45:501–555CrossRefMATHMathSciNetGoogle Scholar
  5. Bains W, Smith GC (1988) A novel method for nucleic acid sequence determination. J Theor Biol 135(3):303–307CrossRefGoogle Scholar
  6. Blum A, Jiang T, Li M, Tromp J, Yannakakis M (1991) Linear approximation of shortest superstrings. In: Proceedings of the twenty-third annual ACM symposium on theory of computing, STOC ’91, New York, pp. 328–336. ACM. ISBN 0-89791-397-3Google Scholar
  7. Breslauer D, Jiang T, Jiang Z (1997) Rotations of periodic strings and short superstrings. J Algorithm 24:340–353CrossRefMATHMathSciNetGoogle Scholar
  8. Czumaj A, Ga̧sieniec L, Piotrów M, Rytter W (1994) Parallel and sequential approximation of shortest superstrings. In: Schmidt E, Skyum S (eds) Algorithm theory SWAT ’94, volume 824. Lecture notes in computer science. Springer, Berlin, pp 95–106Google Scholar
  9. Festa P, Resende MGC (2002) GRASP: an annotated bibliography. In: Ribeiro CC, Hansen P (eds) Essays and surveys in metaheuristics. Kluwer Academic Publishers, Norwell, pp 325–367CrossRefGoogle Scholar
  10. Frieze A, Szpankowski W (1998) Greedy algorithms for the shortest common superstring that are asymptotically optimal. Algorithmica 21:21–36CrossRefMATHMathSciNetGoogle Scholar
  11. Gallant J, Maier D, Storer JA (1980) On finding minimal length superstrings. J Comput Syst Sci 20(1):50–58CrossRefMATHMathSciNetGoogle Scholar
  12. Garey MR, Johnson DS (1990) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co., New YorkGoogle Scholar
  13. Glover F, Laguna M, Rafael M (2000) Fundamentals of scatter search and path relinking. Control Cybern 29(42743):653–684MATHGoogle Scholar
  14. Glover F, Laguna M (1997) Tabu search. Kluwer Academic Publishers, NorwellCrossRefMATHGoogle Scholar
  15. Goldberg MK, Lim DT (2001) A learning algorithm for the shortest superstring problem. In: Proceedings of the atlantic symposium on computational biology and genome information and technology, Durham, NC, pp 171–175Google Scholar
  16. Ilie L, Popescu C (2006) The shortest common superstring problem and viral genome compression. Fundam Inform 73:153–164MATHMathSciNetGoogle Scholar
  17. Ilie L, Tinta L, Popescu C, Hill KA (2006) Viral genome compression. In: Mao C, Yokomori T (eds) DNA computing, volume 4287. Lecture notes in computer science. Springer, BerlinGoogle Scholar
  18. Kaplan H, Shafrir N (2005) The greedy algorithm for shortest superstrings. Inform Process Lett 93:13–17CrossRefMATHMathSciNetGoogle Scholar
  19. Kosaraju SR, Park JK, Stein C (1994) Long tours and short superstrings. In: Proceedings of the 35th annual symposium on foundations of computer science, Washington, pp 166–177. IEEE Computer Society. ISBN 0-8186-6580-7Google Scholar
  20. Laguna M, Martí R (1999) Grasp and path relinking for 2-layer straight line crossing minimization. INFORMS J Comput 11:44–52CrossRefMATHGoogle Scholar
  21. Li M (1990) Towards a DNA sequencing theory (learning a string). Foundations of Computer Science. In: Proceedings of 31st Annual IEEE Symposium, 22–24 Oct 1990Google Scholar
  22. Lin Y, Li J, Shen H, Zhang L, Papasian CJ, Deng H-W (2011) Comparative studies of de novo assembly tools for next-generation sequencing technologies. Bioinformatics 27(15):2031–2037CrossRefGoogle Scholar
  23. López-Rodríguez D, Mérida-Casermeiro E (2009) Shortest common superstring problem with discrete neural networks. In: Kolehmainen M, Toivanen P, Beliczynski B (eds) Adaptive and natural computing algorithms, volume 5495. Lecture Notes in computer science. Springer, Berlin, pp 62–71Google Scholar
  24. Ma B (2008) Why greed works for shortest common superstring problem. In: Ferragina P, Landau G (eds) Combinatorial pattern matching, volume 5029. Lecture notes in computer science. Springer, Berlin, pp 244–254. doi: 10.1007/978-3-540-69068-9-23 Google Scholar
  25. Mayne A, James EB (1975) Information compression by factorising common strings. Comput J 18(2):157–160CrossRefMATHGoogle Scholar
  26. Middendorf M (1998) Shortest common superstrings and scheduling with coordinated starting times. Theor Comput Sci 191(1–2):205–214CrossRefMATHMathSciNetGoogle Scholar
  27. Miller CE, Tucker AW, Zemlin RA (1960) Integer programming formulation of traveling salesman problems. J ACM 7:326–329CrossRefMATHMathSciNetGoogle Scholar
  28. Morozova O, Marra MA (2008) Applications of next-generation sequencing technologies in functional genomics. Genomics 92(5):255–264CrossRefGoogle Scholar
  29. Oliveira C, Pardalos P, Resende M (2004) Grasp with path-relinking for the quadratic assignment problem. In: Ribeiro C, Martins S (eds) Experimental and efficient algorithms, volume 3059. Lecture notes in computer science. Springer, Berlin,pp. 356–368. ISBN 978-3-540-22067-1Google Scholar
  30. Ott S (1999) Lower bounds for approximating shortest superstrings over an alphabet of size 2. In: Widmayer P, Neyer G, Eidenbenz S (eds) Graph-theoretic concepts in computer science, volume 1665. Lecture notes in computer science. Springer, Berlin, pp 55–64Google Scholar
  31. Pitsoulis LS, Resende MGC (2002) Greedy randomized adaptive search procedures. In: Pardalos PM, Resende MGC (eds) Handbook of applied optimization. Oxford University Press, Oxford, pp 178–183Google Scholar
  32. Resende MGC, Ribeiro CC (2005) Grasp with path-relinking: recent advances and applications. In: Ibaraki T, Nonobe K, Yagiura M (eds) Metaheuristics: progress as real problem solvers. Springer, Berlin, pp 29–63CrossRefGoogle Scholar
  33. Resende MGC, Werneck RF (2004) A hybrid heuristic for the \(p\)-median problem. J Heuristics 10:59–88CrossRefMATHGoogle Scholar
  34. Storer JA, Szymanski TG (1978) The macro model for data compression (extended abstract). In: Proceedings of the tenth annual ACM symposium on theory of computing, STOC ’78, New York, pp. 30–39Google Scholar
  35. Storer JA, Szymanski TG (1982) Data compression via textual substitution. J ACM 29:928–951CrossRefMATHMathSciNetGoogle Scholar
  36. Sweedyk Z (1999) A \(2\frac{1}{2}\)-approximation algorithm for shortest superstring. SIAM J Comput 29:954–986CrossRefMathSciNetGoogle Scholar
  37. Tarhio J, Ukkonen E (1988) A greedy approximation algorithm for constructing shortest common superstrings. Theor Comput Sci 57(1):131–145CrossRefMATHMathSciNetGoogle Scholar
  38. Teng S-H, Yao F (1993) Approximating shortest superstrings. In: Proceedings of the 1993 IEEE 34th annual foundations of computer science, pp. 158–165Google Scholar
  39. Yang E, Zhang Z (1999) The shortest common superstring problem: average case analysis for both exact and approximate matching. IEEE Trans Inf Theory 45(6):1867–1886CrossRefMATHGoogle Scholar
  40. Zaritsky A, Sipper M (2004) The preservation of favored building blocks in the struggle for fitness: the puzzle algorithm. IEEE Trans Evol Comput 8(5):443–455CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Mathematical, Physical and Computational SciencesAristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations