String Selection Problems

  • Elisa Pappalardo
  • Panos M. Pardalos
  • Giovanni Stracquadanio
Part of the SpringerBriefs in Optimization book series (BRIEFSOPTI)


The increasing amount of genomic data and the ability to synthesize artificial DNA constructs poses a series of challenging problems involving the identification and design of sequences with specific properties. We address the identification of such sequences; many of these problems present challenges both at biological and computational level. In this chapter, we introduce the main string selection problems and the theoretical and experimental results for the most important instances.


Local Search Memetic Algorithm Greedy Randomized Adaptive Search Procedure Input String Alphabet Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Amir, A., Paryenty, H., Roditty, L.: Configurations and minority in the string consensus problem. In: String Processing and Information Retrieval, pp. 42–53. Springer, Berlin (2012)Google Scholar
  2. 2.
    Andoni, A., Indyk, P., Patrascu, M.: On the optimality of the dimensionality reduction method. In: 47th Annual IEEE Symposium on Foundations of Computer Science, 2006 (FOCS’06), pp. 449–458. IEEE, New York (2006)Google Scholar
  3. 3.
    Ausiello, G.: Complexity and approximation: Combinatorial optimization problems and their approximability properties. Springer, Berlin (1999)CrossRefMATHGoogle Scholar
  4. 4.
    Babaie, M., Mousavi, S.: A memetic algorithm for closest string problem and farthest string problem. In: 18th Iranian Conference on Electrical Engineering (ICEE), pp. 570–575. IEEE, New York (2010)Google Scholar
  5. 5.
    Bahredar, F., Javadi, H., Moghadam, R., Erfani, H., Navidi, H.: A meta heuristic solution for closest substring problem using ant colony system. Adv. Stud. Biol. 2(4), 179–189 (2010)Google Scholar
  6. 6.
    Ben-Dor, A., Lancia, G., Ravi, R., Perone, J.: Banishing bias from consensus sequences. In: Combinatorial Pattern Matching, pp. 247–261. Springer, Berlin (1997)Google Scholar
  7. 7.
    Booker, L., Goldberg, D., Holland, J.: Classifier systems and genetic algorithms. In: Machine Learning: Paradigms and Methods Table of Contents, pp. 235–282 (1990)Google Scholar
  8. 8.
    Boucher, C., Ma, B.: Closest string with outliers. BMC bioinformatics, 12(Suppl 1), S55 (2011)CrossRefGoogle Scholar
  9. 9.
    Boucher, C., Landau, G.M., Levy, A., Pritchard, D., Weimann, O.: On approximating string selection problems with outliers. In: Proceedings of the 23rd Annual Conference on Combinatorial Pattern Matching, pp. 427–438. Springer, Berlin (2012)Google Scholar
  10. 10.
    Calhoun, J., Graham, J., Jiang, H.: On using a graphics processing unit to solve the closest substring problem. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA) (2011)Google Scholar
  11. 11.
    Casacuberta, F., de Antonio, M.: A greedy algorithm for computing approximate median strings. In: Proceedings of Spanish Symposium on Pattern Recognition and Image Analysis, pp. 193–198. AERFAI (1997)Google Scholar
  12. 12.
    Chen, Z.Z., Ma, B., Wang, L.: A three-string approach to the closest string problem. J. Comput. Syst. Sci., 78(1), 164–178 (2012)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Chimani, M., Woste, M., Böcker, S.: A closer look at the closest string and closest substring problem. In: Proceedings of the 13th Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 13–24 (2011)Google Scholar
  14. 14.
    Della Croce, F., Salassa, F.: Improved lp-based algorithms for the closest string problem. Comput. Oper. Res. 39(3), 746–749 (2012)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: A PTAS for distinguishing (sub)string selection. In: Automata, Languages and Programming, pp. 788–788 (2002)Google Scholar
  16. 16.
    Deng, X., Li, G., Wang, L.: Center and distinguisher for strings with unbounded alphabet. J. Comb. Optim. 6(4), 383–400 (2002)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM J. Comput. 32(4), 1073–1090 (2003)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Dinu, L., Ionescu, R.: A genetic approximation of closest string via rank distance. In: 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 207–214. IEEE, New York (2011)Google Scholar
  19. 19.
    Dinu, L., Ionescu, R.: An efficient rank based approach for closest string and closest substring. PloS One 7(6), e37576 (2012)CrossRefGoogle Scholar
  20. 20.
    Dorigo, M.: Optimization, learning and natural algorithms. Ph.D. thesis, Dipartimento di Elettronica, Politecnico di Milano (1992)Google Scholar
  21. 21.
    Dorigo, M., Caro, G., Gambardella, L.: Ant algorithms for discrete optimization. Artif. Life 5(2), 137–172 (1999)CrossRefGoogle Scholar
  22. 22.
    Evans, P., Smith, A.: Complexity of approximating closest substring problems. In: Fundamentals of Computation Theory, pp. 13–47. Springer, Berlin (2003)Google Scholar
  23. 23.
    Faro, S., Pappalardo, E.: Ant-CSP: An ant colony optimization algorithm for the closest string problem. In: SOFSEM 2010: Theory and Practice of Computer Science, pp. 370–381. Springer Berlin Heidelberg (2010)Google Scholar
  24. 24.
    Fellows, M., Gramm, J., Niedermeier, R.: On the parameterized intractability of closest substring and related problems. In: STACS 2002, pp. 262–273. Springer Berlin Heidelberg (2002)Google Scholar
  25. 25.
    Festa, P.: On some optimization problems in molecular biology. Math. Biosci. 207(2), 219–234 (2007)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Festa, P., Pardalos, P.M.: Efficient solutions for the far from most string problem. Ann. Oper. Res. 196(1), 663–682 (2012)MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    Frances, M., Litman, A.: On covering problems of codes. Theor. Comput. Syst. 30(2), 113–119 (1997)MathSciNetMATHGoogle Scholar
  28. 28.
    Ga̧sieniec, L., Jansson, J., Lingas, A.: Efficient approximation algorithms for the Hamming center problem. In: Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms: Society for Industrial and Applied Mathematics, pp. 905–906 (1999)Google Scholar
  29. 29.
    Gilkerson, J., Jaromczyk, J.: The genetic algorithm scheme for consensus sequences. In: IEEE Congress on Evolutionary Computation, 2007 (CEC 2007), pp. 3870–3878. IEEE, New York (2007)Google Scholar
  30. 30.
    Gill, J.: Computational complexity of probabilistic turing machines. SIAM J. Comput. 6(4), 675–695 (1977)MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Goldberg, D., Holland, J.: Genetic algorithms and machine learning. Mach. Learn. 3(2), 95–99 (1988)CrossRefGoogle Scholar
  32. 32.
    Gomes, F., Meneses, C., Pardalos, P., Viana, G.: A parallel multistart algorithm for the closest string problem. Comput. Oper. Res. 35(11), 3636–3643 (2008)CrossRefMATHGoogle Scholar
  33. 33.
    Gramm, J., Niedermeier, R., Rossmanith, P.: Exact solutions for closest string and related problems. Algorithms and Computation, pp. 441–453. Springer Berlin Heidelberg (2001)Google Scholar
  34. 34.
    Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for closest string and 743 related problems. Algorithmica 37(1), 25-42 (2003)MathSciNetCrossRefMATHGoogle Scholar
  35. 35.
    Gramm, J., Guo, J., Niedermeier, R.: On exact and approximation algorithms for distinguishing substring selection. In: Proceedings of Fundamentals of Computation Theory: 14th International Symposium (FCT 2003), Malmö, 12–15 August 2003, vol. 14, p. 195. Springer, Berlin (2003)Google Scholar
  36. 36.
    Gramm, J., Guo, J., Niedermeier, R.: Parameterized intractability of distinguishing substring selection. Theor. Comput. Syst. 39(4), 545–560 (2006)MathSciNetCrossRefMATHGoogle Scholar
  37. 37.
    Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)CrossRefMATHGoogle Scholar
  38. 38.
    Guyon, I., Schomaker, L., Plamondon, R., Liberman, M., Janet, S.: UNIPEN project of on-line data exchange and recognizer benchmarks. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 2-Conference B: Computer Vision & Image Processing, vol. 2, pp. 29–33. IEEE, New York (1994)Google Scholar
  39. 39.
    de la Higuera, C., Casacuberta, F.: Topology of strings: median string is NP-complete. Theor. Comput. Sci. 230(1), 39–48 (2000)CrossRefMATHGoogle Scholar
  40. 40.
    Holland, J.: Adaptation in Natural and Artificial Systems. MIT, Cambridge (1992)Google Scholar
  41. 41.
    Jiang, X., Abegglen, K., Bunke, H., Csirik, J.: Dynamic computation of generalised median strings. Pattern Anal. Appl. 6(3), 185–193 (2003)MathSciNetCrossRefGoogle Scholar
  42. 42.
    Jiang, X., Bunke, H., Csirik, J.: Median strings: a review. In: Data Mining in Time Series Databases, pp. 173–192 (2004)Google Scholar
  43. 43.
    Jiang, X., Wentker, J., Ferrer, M.: Generalized median string computation by means of string embedding in vector spaces. Pattern Recognit. Lett. 33(7), 842–852 (2012)CrossRefGoogle Scholar
  44. 44.
    Juan, A., Vidal, E.: Fast median search in metric spaces. In: Advances in Pattern Recognition, pp. 905–912. Springer Berlin Heidelberg (1998)Google Scholar
  45. 45.
    Julstrom, B.: A data-based coding of candidate strings in the closest string problem. In: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers, pp. 2053–2058. Association for Computing Machinery (2009)Google Scholar
  46. 46.
    Keith, J., Adams, P., Bryant, D., Kroese, D., Mitchelson, K., Cochran, D., Lala, G.: A simulated annealing algorithm for finding consensus sequences. Bioinformatics 18(11), 1494–1499 (2002)CrossRefGoogle Scholar
  47. 47.
    Kelsey, T., Kotthoff, L.: The exact closest string problem as a constraint satisfaction problem. Arxiv preprint arXiv:1005.0089 (2010)Google Scholar
  48. 48.
    Kohonen, T.: Median strings. Pattern Recognit. Lett. 3(5), 309–313 (1985)CrossRefGoogle Scholar
  49. 49.
    Kruskal, J.B.: An overview of sequence comparison: time warps, string edits, and macromolecules. SIAM Rev. 25(2), 201–237 (1983)MathSciNetCrossRefMATHGoogle Scholar
  50. 50.
    Kruzslicz, F.: Improved greedy algorithm for computing approximate median strings. Acta Cybern. 14(2), 331–340 (1999)MathSciNetMATHGoogle Scholar
  51. 51.
    Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. In: Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, pp. 633–642. Society for Industrial and Applied Mathematics (1999)Google Scholar
  52. 52.
    Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. In: Proceedings of the Thirty-first Annual ACM Symposium on Theory of computing, pp. 473–482. Association for Computing Machinery (1999)Google Scholar
  53. 53.
    Li, M., Ma, B., Wang, L.: On the closest string and substring problems. J. ACM 49(2), 157–171 (2002)MathSciNetCrossRefGoogle Scholar
  54. 54.
    Liu, X., He, H., Sýkora, O.: Parallel genetic algorithm and parallel simulated annealing algorithm for the closest string problem. In: Advanced Data Mining and Applications, pp. 591–597. Springer Berlin Heidelberg (2005)Google Scholar
  55. 55.
    Liu, X., Holger, M., Hao, Z., Wu, G.: A compounded genetic and simulated annealing algorithm for the closest string problem. In: The 2nd International Conference on Bioinformatics and Biomedical Engineering, 2008 (ICBBE 2008), pp. 702–705. IEEE, New York (2008)Google Scholar
  56. 56.
    Liu, X., Liu, S., Hao, Z., Mauch, H.: Exact algorithm and heuristic for the closest string problem. Comput. & Oper. Res., 38(11), 1513–1520 (2011)MathSciNetCrossRefMATHGoogle Scholar
  57. 57.
    Lopresti, D., Zhou, J.: Using consensus sequence voting to correct OCR errors. Comput. Vis. Image Underst. 67(1), 39–47 (1997)CrossRefGoogle Scholar
  58. 58.
    Ma, B.: A polynomial time approximation scheme for the closest substring problem. In: Combinatorial Pattern Matching, pp. 99–107. Springer, Berlin (2000)Google Scholar
  59. 59.
    Ma, B., Sun, X.: More efficient algorithms for closest string and substring problems. In: Research in Computational Molecular Biology, pp. 396–409. Springer, Berlin (2008)Google Scholar
  60. 60.
    Martínez-Hinarejos, C.D., Juan, A., Casacuberta, F.: Use of median string for classification. In: Proceedings of 15th International Conference on Pattern Recognition, vol. 2, pp. 903–906. IEEE, New York (2000)Google Scholar
  61. 61.
    Marx, D.: Closest substring problems with small distances. SIAM J. Comput. 38(4), 1382–1410 (2008)MathSciNetCrossRefMATHGoogle Scholar
  62. 62.
    Mauch, H.: Closest substring problem–results from an evolutionary algorithm. In: Neural Information Processing, pp. 205–211. Springer, Berlin (2004)Google Scholar
  63. 63.
    Mauch, H., Melzer, M., Hu, J.: Genetic algorithm approach for the closest string problem. In: Proceedings of the 2003 IEEE Bioinformatics Conference 2003 (CSB 2003), pp. 560–561 (2003)Google Scholar
  64. 64.
    McClure, M., Vasi, T., Fitch, W.: Comparative analysis of multiple protein-sequence alignment methods. Mol. Biol. Evol. 11(4), 571 (1994)Google Scholar
  65. 65.
    Meneses, C., Lu, Z., Oliveira, C., Pardalos, P., et al.: Optimal solutions for the closest-string problem via integer programming. INFORMS J. Comput. 16(4), 419–429 (2004)MathSciNetCrossRefMATHGoogle Scholar
  66. 66.
    Meneses, C., Pardalos, P., Resende, M., Vazacopoulos, A.: Modeling and solving string selection problems. In: Second International Symposium on Mathematical and Computational Biology, pp. 54–64 (2005)Google Scholar
  67. 67.
    Meneses, C., Oliveira, C., Pardalos, P.: Optimization techniques for string selection and comparison problems in genomics. IEEE Eng. Med. Biol. Mag. 24(3), 81–87 (2005)CrossRefGoogle Scholar
  68. 68.
    Metropolis, N., Ulam, S.: The Monte Carlo method. J. Am. Stat. Assoc. 44(247), 335–341 (1949)MathSciNetCrossRefMATHGoogle Scholar
  69. 69.
    Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.: Perspective on “Equation of state calculations by fast computing machines”. J. Chem. Phys. 21, 1087–1092 (1953)CrossRefGoogle Scholar
  70. 70.
    Micó, L., Oncina, J.: An approximate median search algorithm in non-metric spaces. Pattern Recognit. Lett. 22(10), 1145–1151 (2001)CrossRefMATHGoogle Scholar
  71. 71.
    Mousavi, S.R.: A hybridization of constructive beam search with local search for far from most strings problem. Int. J. Comput. Math. Sci. v4(i7), 340–348 (2010)Google Scholar
  72. 72.
    Mousavi, S.R., Babaie, M., Montazerian, M.: An improved heuristic for the far from most strings problem. J. Heuristics 18(2), 239–262 (2012)CrossRefGoogle Scholar
  73. 73.
    Nicolas, F., Rivals, E.: Complexities of the centre and median string problems. In: Combinatorial Pattern Matching, pp. 315–327. Springer, Berlin (2003)Google Scholar
  74. 74.
    Nicolas, F., Rivals, E.: Hardness results for the center and median string problems under the weighted and unweighted edit distances. J. Discrete Algorithms 3(2), 390–415 (2005)MathSciNetCrossRefMATHGoogle Scholar
  75. 75.
    Mousavi, S.R., Nasr Esfahani, N.: A GRASP algorithm for the closest string problem using a probability-based heuristic. Comput. & Oper. Res., 39(2), 238–248 (2012)MathSciNetCrossRefMATHGoogle Scholar
  76. 76.
    Silva, R.M.A., Baleeiro, G., Pires, D., Resende, M., Festa, P., Valentim, F.: Grasp with path-relinking for the farthest substring problem. Technical Report, AT&T Labs Research (2008)Google Scholar
  77. 77.
    Sim, J.S., Park, K.: The consensus string problem for a metric is NP-complete. J. Discrete Algorithms 1(1), 111–117 (2003)MathSciNetCrossRefMATHGoogle Scholar
  78. 78.
    Smith, A.: Common approximate substrings. Ph.D. thesis, Citeseer (2004)Google Scholar
  79. 79.
    Stojanovic, N., Berman, P., Gumucio, D., Hardison, R., Miller, W.: A linear-time algorithm for the 1-mismatch problem. In: Algorithms and Data Structures, pp. 126–135. Springer Berlin Heidelberg (1997)Google Scholar
  80. 80.
    Tanaka, S.: A heuristic algorithm based on Lagrangian relaxation for the closest string problem. Comput. & Oper. Res., 39(3), 709–717 (2012)MathSciNetCrossRefMATHGoogle Scholar
  81. 81.
    Wang, J., Huang, M., Chen., J.: A lower bound on approximation algorithms for the closest substring problem. In: Combinatorial Optimization and Applications, pp. 291–300. Springer Berlin Heidelberg (2007)Google Scholar
  82. 82.
    Wang, J., Chen, J., Huang, M.: An improved lower bound on approximation algorithms for the closest substring problem. Inf. Process. Lett. 107(1), 24–28 (2008)MathSciNetCrossRefMATHGoogle Scholar
  83. 83.
    Wang, L., Zhu, B.: Efficient algorithms for the closest string and distinguishing string selection problems. In: Frontiers in Algorithmics, pp. 261–270. Springer Berlin Heidelberg (2009)Google Scholar

Copyright information

© Elisa Pappalardo, Panos M. Pardalos, Giovanni Stracquadanio 2013

Authors and Affiliations

  • Elisa Pappalardo
    • 1
  • Panos M. Pardalos
    • 2
    • 3
  • Giovanni Stracquadanio
    • 1
  1. 1.Department of Biomedical EngineeringJohns Hopkins UniversityBaltimoreUSA
  2. 2.Center for Applied Optimization Department of Industrial and Systems EngineeringUniversity of FloridaGainesvilleUSA
  3. 3.Laboratory of Algorithms and Technologies for Networks Analysis (LATNA) Higher School of EconomicsNational Research UniversityMoscowRussia

Personalised recommendations