Journal of Combinatorial Optimization

, Volume 28, Issue 1, pp 25–37 | Cite as

Recognition of overlap graphs

  • Theodoros P. Gevezes
  • Leonidas S. Pitsoulis


Overlap graphs occur in computational biology and computer science, and have applications in genome sequencing, string compression, and machine scheduling. Given two strings \(s_{i}\) and \(s_{j}\), their overlap string is defined as the longest string \(v\) such that \(s_{i} = uv\) and \(s_{j} = vw\), for some non empty strings \(u,w\), and its length is called the overlap between these two strings. A weighted directed graph is an overlap graph if there exists a set of strings with one-to-one correspondence to the vertices of the graph, such that each arc weight in the graph equals the overlap between the corresponding strings. In this paper, we characterize the class of overlap graphs, and we present a polynomial time recognition algorithm as a direct consequence. Given a weighted directed graph \(G\), the algorithm constructs a set of strings that has \(G\) as its overlap graph, or decides that this is not possible.


Strings Shortest superstring problem Overlap graphs Recognition algorithm 



This research has been co-financed by the European Union (European Social Fund—ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF)—Research Funding Program: Heracleitus II. Investing in knowledge society through the European Social Fund.


  1. Alon N, Cosares S, Hochbaum DS, Shamir R (1989) An algorithm for the detection and construction of monge sequences. Linear Algebra Appl 114115:669–680 ISSN: 0024-3795. Special issue dedicated to Alan J. HoffmanCrossRefMathSciNetGoogle Scholar
  2. Barnes ER, Hoffman AJ (1985) On transportation problems with upper bounds on leading rectangles. SIAM J Algebr Discret Methods 6(3):487–496CrossRefMATHMathSciNetGoogle Scholar
  3. Blum A, Jiang T, Li M, Tromp J, Yannakakis M (1994) Linear approximation of shortest superstrings. J ACM 41:630–647 ISSN: 0004-5411CrossRefMATHMathSciNetGoogle Scholar
  4. Braga MDV, Meidanis J (2002) An algorithm that builds a set of strings given its overlap graph. In Proceedings of the 5th Latin American symposium on theoretical informatics, LATIN ’02, pp 52–63, London, UK, Springer. ISBN: 3-540-43400-3Google Scholar
  5. Czumaj A, Ga̧sieniec L, Piotrów M, Rytter W (1997) Sequential and parallel approximation of shortest superstrings. J Algorithms 23:74–100 ISSN: 0196-6774CrossRefMATHMathSciNetGoogle Scholar
  6. Gallant J, Maier D, Storer JA (1980) On finding minimal length superstrings. J Comput Syst Sci 20(1):50–58 ISSN: 0022-0000CrossRefMATHMathSciNetGoogle Scholar
  7. Gevezes T, Pitsoulis L (2013) A greedy randomized adaptive search procedure with path relinking for the shortest superstring problem. J Comb Optim ISSN: 1382-6905Google Scholar
  8. Gingeras TR, Milazzo PJ, Sciaky D, Roberts RJ (1979) Computer programs for the assembly of DNA sequences. Nucl Acids Res 7(2):529–543CrossRefGoogle Scholar
  9. Gusfield D (1994) Faster implementation of a shortest superstring approximation. Inf Proces Lett 51(5):271–274 ISSN: 0020-0190CrossRefMATHMathSciNetGoogle Scholar
  10. Gusfield D, Landau GM, Schieber B (1992) An efficient algorithm for the all pairs suffix–prefix problem. Inf Process Lett 41:181–185 ISSN: 0020-0190CrossRefMATHMathSciNetGoogle Scholar
  11. Ilie L, Popescu C (2006) The shortest common superstring problem and viral genome compression. Fundam Inf 73:153–164 ISSN: 0169-2968MATHMathSciNetGoogle Scholar
  12. Ilie L, Tinta L, Popescu C, Hill KA (2006) Viral genome compression. In: Mao C, Yokomori T (eds) DNA computing, vol 4287 of lecture notes in computer science, pp Springer, Berlin/Heidelberg, pp 111–126Google Scholar
  13. Jenkyns TA (1979) The greedy travelling salesman’s problem. Networks 9(4):363–373 ISSN: 1097-0037CrossRefMATHMathSciNetGoogle Scholar
  14. Jiang T, Li M (1994) Approximating shortest superstrings with constraints. Theor Comput Sci 134(2):473–491 ISSN: 0304-3975CrossRefMATHGoogle Scholar
  15. Middendorf M (1998) Shortest common superstrings and scheduling with coordinated starting times. Theor Comput Sci 191(1–2):205–214 ISSN: 0304-3975CrossRefMATHMathSciNetGoogle Scholar
  16. Shapiro MB (1967) An algorithm for reconstructing protein and RNA sequences. J ACM 14:720–731 ISSN: 0004-5411CrossRefMATHGoogle Scholar
  17. Staden Roger (1982) Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. Nucl Acids Res 10(15):4731–4751CrossRefGoogle Scholar
  18. Storer JA, Szymanski TG (1982) Data compression via textual substitution. J ACM 29:928–951 ISSN 0004-5411CrossRefMATHMathSciNetGoogle Scholar
  19. Tarhio J, Ukkonen E (1988) A greedy approximation algorithm for constructing shortest common superstrings. Theor Comput Sci 57(1):131–145 ISSN: 0304–3975CrossRefMATHMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Mathematical, Physical and Computational SciencesAristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations