Abstract
Integer linear programming (ILP) is a powerful and versatile technique for framing and solving hard optimization problems of many types. In the last several years, ILP has become widely used in computational biology, although predominantly by computationally and mathematically trained researchers, such as Bernard Moret. In an effort to reach a broader set of researchers, this chapter begins with an introduction to ILP, illustrated by the phenomena of cliques and independent sets in biological graphs. Then, the focus shifts to new research results on the use of ILP to solve traveling salesman problems, using compact ILP formulations. Such formulations have been largely declared useless in the optimization literature. However, in this chapter, I argue that the correct compact formulation can be very effective for problems of the size and structure that arise in computational biology. These empirical results, and some additional arguments, then bring into question the relevance of the concept of strength of an ILP formulation as a predictor of the speed that it will be solved.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
And the other Bernard (the bookish nerd) in Death of a Salesman.
- 2.
The introduction is related and partly derived from several sections in [20].
- 3.
Terminology note: We use “TS” as an abbreviation for “traveling salesman”, which is sometimes followed by “tour” or “path”, as appropriate.
- 4.
This is a practice I recommend for all empirical, computation-based papers.
- 5.
Another noncommercial ILP-solver (which is not an LP-solver) that has a good reputation is called SCIP, but I have not had much experience with it.
- 6.
A binary variable can only be set to value 0 or 1.
- 7.
The introductory material on TSP, and the descriptions of TSP formulations, are extracted from [20]. The research results and conclusions are new.
- 8.
Moreover, a freely available, highly engineered program called Concorde mixes many techniques and tricks to solve very large TS problems in practice.
- 9.
This remains true in 2018, according to people who teach courses on ILP.
- 10.
Certainly, if one states the TS problem to students without including that constraint, an alert student will ask about it.
- 11.
Including one that I came up with, which had nearly the worst performance of all.
- 12.
Certainly, even my laptop is a faster machine than the one used in 2003 by the author of [40]. But, the increased machine speed does not account for the difference between the observation today, and the understanding in 2003. It should be noted however, that my experience with MTZ on br17 also contradicts the statement in [40], since the MTZ formulation for br17 solved in 0.54 s.
- 13.
On the instance where MTZ took over 1 h, the GG formulation took 2.38 s to solve (with Gurobi 8), and the DFJ formulation with separation took 0.02 s. So, DFJ with separation is unquestionably dominant, but the speed of GG here contributes to the new understanding that the right compact TSP formulation is practical, while the instability of MTZ makes it much less reliable.
- 14.
A later attempt with Gurobi 8 suffered the same fate.
- 15.
For example, in the benchmark data ch130 of 130 cities, the ILP optimal is 6110, the LP-opt for GG is 5608 but the assignment optimal is only 4377.
- 16.
However, I have never seen this stated in the literature.
- 17.
In fact, I put that prohibition into my first ILP implementations without even thinking about it. Then, when I looked at the empirical results and saw cases where the LP results violated (correct) mathematical theory, I was perplexed until I realized that the theory is only established for the pure formulations.
- 18.
But remember that the path computation for the input graph G is actually a tour computation on the derived graph \(G'\). Hence, what we learn from these computations concerns TSP tours and ILP formulations for the TS tour problem.
- 19.
Note however, what looks like a contradiction in the case of DFJ with separation. In the case of ch130, the LP-Opt reported is 5582. However, the LP-opt for the assignment ILP for ch130 is only 4377, and the two values should be the same if LP-opt was computed exactly as discussed in Sect. 15.7.1. A possible explanation is the fact that the computation of DFJ with separation, implemented by a Gurobi program, added subtour-elimination constraints even before the LP-opt was reported. So, the Gurobi code implementing the separation approach seems not to exactly follow the description given in Sect. 15.7.1. However, in all experiments that did not use the DFJ formulation, the LP-opt value was identical to the value obtained by running the LP-relaxation of the ILP formulation.
References
Agarwala, R., Applegate, D.L., Maglott, D., Schuler, G.D., Schäffer, A.A.: A fast and scalable radiation hybrid map construction and integration strategy. Genome Res. 10(3), 350–364 (2000)
Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice Hall (1993)
Alizadeh, F., Karp, R.M., Weisser, D., Zweig, G.: Physical mapping of chromosomes using unique probes. J. Comput. Biol. 2, 159–184 (1995)
Althaus, E., Klau, G.W., Kohlbacher, O., Lenhof, H.P., Reinert, K.: Integer linear programming in computational biology. In: Festschrift Mehlhorn, LNCS 5760, pp. 199 – 218. Springer (2009)
Álvarez-Miranda, E., Ljubić, I., Mutzel, P.: The maximum weight connected subgraph problem. In: Junger, M., Reinelt, G. (eds.) Facets of Combinatorial Optimization, pp. 245–270. Springer (2013)
Bertsimas, D., Weismantel, R.: Optimization Over Integers, vol. 13. Dynamic Ideas, Belmont (MA) (2005)
Blanchette, M., Bourque, G., Sankoff, D.: Breakpoint phylogenies. In: Miyano, S., Takagi, T. (eds.) Genome Informatics, pp. 25–34. University Academy Press (1997)
Blum, C., Festa, P.: Metaheuristics for String Problems in Bio-informatics. Wiley (2016)
Chimani, M., Rahmann, S., Bocker, S.: Exact ILP solutions for phylogenetic minimum flip problems. In: Proceedings of the First ACM-BCB Conference, pp. 147–153 (2010)
Claus, A.: A new formulation for the travelling salesman problem. SIAM J. Algebr. Discr. Methods 5, 21–25 (1984)
Conforti, M., Cornuejols, G., Zambelli, G.: Integer Programming. Springer (2014)
Dantzig, G.B., Fulkerson, D.R., Johnson, S.M.: Solution of a large-scale travelling-salesman problem. Oper. Res. 2, 393–410 (1954)
Felsenstein, J.: Inferring Phylogenies. Sinauer (2004)
Forrester, R., Greenberg, H.J.: Quadratic binary programming models in computational biology. Alg. Oper. Res. 3, 110129 (2008)
Fox, K., Gavish, B., Graves, S.: An n-constraint formulation of the (time-dependent) traveling salesman problem. Oper. Res. 28, 101821 (1980)
Frumkin, J.P., Patra, B.N., Sevold, A., Ganguly, K., Patel, C., Yoon, S., Schmid, M.B., Ray, A.: The interplay between chromosome stability and cell cycle control explored through gene-gene interaction and computational simulation. Nucleic Acids Res. 44, 8073–8085 (2016)
Gavish, B., Graves, S.: The travelling salesman problem and related problems. Working Paper OR 078-78. Technical Report. MIT, Operations Research Center (1978)
Gouveia, L., Vos, S.: A classification of formulations for the (time-dependent) traveling salesman problem. Europ. J. Oper. Res. 83, 69–82 (1995)
Gusfield, D.: Algorithms on Strings, Trees and Sequence. Computer Science and Computational Biology. Cambridge University Press (1997)
Gusfield, D.: Integer linear programming in computational and systems biology: an entry-level text and course. Cambridge University Press (2019)
Gusfield, D., Frid, Y., Brown, D.: Integer programming formulations and computations solving phylogenetic and population genetic problems with missing or genotypic data. In: Proceedings of 13th Annual International Conference on Combinatorics and Computing, pp. 51–64. LNCS 4598, Springer (2007)
Huttlin, E.L., Ting, L., Bruckner, R.J., Gebreab, F., Gygi, M.P., Szpyt, J., Tam, S., Zarraga, G., Colby, G., Baltier, K., Dong, R., Guarani, V., Vaites, L.P., Ordureau, A., Rad, R., Erickson, B.K., Whr, M., Chick, J., Zhai, B., Kolippakkam, D., Mintseris, J., Obar, R.A., Harris, T., Artavanis-Tsakonas, S., Sowa, M.E., Camilli, P.D., Paulo, J.A., Harper, J.W., Gygi, S.P.: The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015)
Johnson, M., Hummer, G.: Interface-resolved network of protein-protein interactions. PLoS Comput. Biol. 9, e1003,065 (2013)
Johnson, O., Liu, J.: A traveling salesman approach for predicting protein functions. Source Code Biol. Med. 1, (2006)
Kingsford, C.L., Chazelle, B., Singh, M.: Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics 21, 1028–1036 (2005)
Korostensky, C., Gonnet, G.: Near optimal multiple sequence alignments using a traveling salesman problem approach. In: Proceedings of String Processing and Information Retrieval Symposium, p. 105. IEEE (1999)
Korostensky, C., Gonnet, G.: Using traveling salesman problem algorithms for evolutionary tree construction. Bioinformatics 16, 619–627 (2000)
Lancia, G.: Integer programming models for computational biology problems. J. Comp. Sci. Tech. 19, 6077 (2004)
Lancia, G.: Mathematical programming in computational biology: an annotated bibliography. Algorithms 1, 100129 (2008)
Langevin, A., Soumis, F., Desrosiers, J.: Classification of travelling salesman problem formulations. Oper. Res. Let. 9, 12732 (1990)
Lorenzo, E., Camacho-Caceres, K., Ropelewski, A.J., Rosas, J., Ortiz-Mojer, M., Perez-Marty, L., Irizarry, J., Gonzalez, V., Rodríguez, J.A., Cabrera-Rios, M., Isaza, C.: An optimization-driven analysis pipeline to uncover biomarkers and signaling paths: cervix cancer. Microarrays 4(2), 287–310 (2015)
Mazza, A., Klockmeier, K., Wanker, E., Sharan, R.: An integer programming framework for inferring disease complexes from network data. Bioinformatics 32, i271–i277 (2016)
Miller, C., Tucker, R., Zemlin, R.: Integer programming formulation of traveling salesman problems. J. Assoc. Comput. Mach. pp. 326–329 (1960)
Moret, B., Bader, D.A., Warnow, T.: High-performance algorithm engineering for computational phylogenetics. J. Supercomput. 22, 99–111 (2002)
Oncan, T., Altnel, I., Laporte, G.: A comparative analysis of several asymmetric traveling salesman problem formulations. Comp. Oper. Res. 36, 637654 (2009)
Orman, A., Williams, H.: A survey of different integer programming formulations of the travelling salesman problem. Technical Report, Department of Operational Research, London School of Economics and Political Science (2004)
Orman, A., Williams, H.P.: A survey of different integer programming formulations of the travelling salesman problem. In: Kontoghiorghes, E., Gatu, C. (eds.) Optimisation, Econometric and Financial Analysis, vol. 9, pp. 91–104. Springer, Berlin, Heidelberg (2007)
Padberg, M., Sung, T.Y.: An analytical comparison of different formulations of the travelling salesman problem. Math. Prog. 52, 315–357 (1991)
Pataki, G.: The bad and the good-and-ugly. Technical Report, Columbia University, IEOR (2000). CORC 2000-1
Pataki, G.: Teaching integer programming formulations using the traveling salesman problem. SIAM Rev. 65, 116–123 (2003)
Reinelt, G.: TSPLIB-A traveling salesman problem library. ORSA J. Comp. 3, 376–384 (1991)
Reiter, J., Makohon-Moore, A., Gerold, J., Bozic, I., Chatterjee, K., Iacobuzio-Donahue, C., Vogelstein, B., Nowak, M.: Reconstructing metastatic seeding patterns of human cancers. Nat. Commun. 8, (2017)
Sankoff, D., Blanchette, M.: Multiple genome rearrangement and breakpoint phylogeny. J. Comp. Biol. 5, 555–570 (1998)
Sawik, T.: A note on the Miller-Tucker-Zemlin model for the asymmetric traveling salesman problem. Bull. Polish Acad. Sci. Tech. Sci. 64, 517–520 (2016)
Shao, M., Lin, Y., Moret, B.M.: An exact algorithm to compute the DCJ distance for genomes with duplicate genes. J. Comput. Biol. 22(5), 425–435 (2015)
Shao, M., Moret, B.M.E.: Comparing genomes with rearrangements and segmental duplications. Bioinformatics 31(12), i329–i338 (2015)
Shao, M., Moret, B.M.E.: A fast and exact algorithm for the exemplar breakpoint distance. J. Comput. Biol. 23(5), 337–346 (2016)
Shao, M., Moret, B.M.E.: On computing breakpoint distances for genomes with duplicate genes. J. Comput. Biol. 24(6), 571–580 (2017)
Wong, R.: Integer programming formulations of the traveling salesman problem. In: Rabbat, G. (ed.) Proceedings of ICCC 80, IEEE Conference on Circuits and Computing, pp. 149–152 (1980)
Acknowledgements
This research was supported by NSF grant 1528234. The research was done partly while on sabbatical at the Simons Institute for Computational Theory, UC Berkeley. I would also like to thank Thong Le for help on understanding proofs about strength; Jim Orlin, T. L. Magnanti, and David Shmoys for helpful communications. Finally, I thank Tandy Warnow, Mohammed El-Kebir, and the anonymous reviewers who provided many helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1: Data for Random Graphs
See Table 15.1 for results of experiments with different compact TSP formulations on a range of random graphs that differ in the number of nodes they contain, and their edge density.
Appendix 2: Data from Benchmark Tests
Experiments on several well-known TSP benchmark test sets covering a range of sizes are shown in Table 15.2. All the formulations, except FGG4, have inequalities that prohibit an edge from being traversed in both directions. The numerals in each ID give the number of cites, from 17 to 229. The letter ‘A’ or ‘S’ indicates whether the problem is for a directed (asymmetric) or an undirected (symmetric) graph. Both the optimal tour cost and the optimal path cost (with no designated start or stop nodes) were computed and written next to the problem ID. Each of the ILP formulations is for an optimal TS path, unless “tour” is indicated.Footnote 18
The entry in the column for “gap” is empty if the computation ran to completion, and otherwise is the gap when the computation was terminated. An entry for “Time” is the time at completion or termination of the ILP computation; and an entry for “LP-Opt” gives the optimal cost of the LP-relaxation of the TS problem, as reported by Gurobi. The LP-Opt cost can be compared to the cost indicated next to the problem ID, as a measure of the strength of the ILP formulation.Footnote 19 These LP-opt values can also be compared to each other to validate known theory about the strength of ILP formulations, or to question the relevance of that theory. This is discussed in Sect. 15.11.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Gusfield, D. (2019). Integer Linear Programming in Computational Biology: Overview of ILP, and New Results for Traveling Salesman Problems in Biology. In: Warnow, T. (eds) Bioinformatics and Phylogenetics. Computational Biology, vol 29. Springer, Cham. https://doi.org/10.1007/978-3-030-10837-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-10837-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10836-6
Online ISBN: 978-3-030-10837-3
eBook Packages: Computer ScienceComputer Science (R0)