Skip to main content

Efficient Algorithms for the Closest String and Distinguishing String Selection Problems

  • Conference paper
Frontiers in Algorithmics (FAW 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5598))

Included in the following conference series:

Abstract

In the paper, we study three related problems, the closest string problem, the farthest string problem and the distinguishing string selection problem. These problems have applications in motif detection, binding sites locating, genetic drug target identification, genetic probes design, universal PCR primer design, etc. They have been extensively studied in recent years.

The problems are defined as follows:

The closest string problem: given a group of strings \({\cal B}=\{s_1, s_2, \ldots,\) s n }, each of length L, and an integer d, the problem is to compute a center string s of length L such that the Hamming distance d(s, s i )≤d for all \(s_y\in {\cal B}\).

The farthest string problem: given a group of strings \({\cal G}=\{g_1,g_2,...,\) \(g_{n_2}\}\), with all strings of the same length L, and an integer d b , the farthest string problem is to compute a center string s of length L such that the Hamming distance d(s,g j ) ≥ L − d b for all \( g_j\in {\cal G}\).

The distinguishing string selection problem: given two groups of strings \({\cal B}\) (bad genes) and \({\cal G}\) (good genes), \({\cal B}=\{s_1,s_2,...,s_{n_1}\}\) and \({\cal G}=\{g_{n_1+1},g_{n_1+2},...,g_{n_2}\}\), with all strings of the same length L, and two integers d b and d g with d g  ≥ L − d b , the Distinguishing String Selection problem is to compute a center string s of length L such that the Hamming distance \(d(s,s_i)\leq d_b, \forall s_i\in{\cal B}\) and the Hamming distance d(s,g j )≥d g for all \(g_j\in {\cal G}\).

Our results: We design an O(Ln+nd(|Σ−1|)d 23.25d) time fixed parameter algorithm for the closest string problem which improves upon the best known O(Ln+nd24d×(|Σ|−1)d) algorithm in [14], where |Σ| is the size of the alphabet. We also design fixed parameter algorithms for both the farthest string problem and the distinguishing string selection problem. Both algorithms run in time \(O(Ln+nd2^{3.25d_b})\) when the input strings are binary strings over Σ = {0, 1}.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lucas, K., Busch, M., MÖssinger, S., Thompson, J.A.: An improved microcomputer program for finding gene- or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. CABIOS 7, 525–529 (1991)

    Google Scholar 

  2. Lanctot, K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. In: Proc. 10th ACM-SIAM Symp. on Discrete Algorithms, pp. 633–642 (1999); also Information and Computation 185(1), 41–55 (2003)

    Google Scholar 

  3. Dopazo, J., Rodríguez, A., Sáiz, J.C., Sobrino, F.: Design of primers for PCR amplification of highly variable genomes. CABIOS 9, 123–125 (1993)

    Google Scholar 

  4. Proutski, V., Holme, E.C.: Primer master: A new program for the design and analysis of PCR primers. CABIOS 12, 253–255 (1996)

    Google Scholar 

  5. Gramm, J., Huffner, F., Niedermeier, R.: Closest strings, primer design, and motif search. In: Currents in Computational Molecular Biology, poster abstracts of RECOMB 2002, pp. 74–75 (2002)

    Google Scholar 

  6. Wang, Y., Chen, W., Li, X., Cheng, B.: Degenerated primer design to amplify the heavy chain variable region from immunoglobulin cDNA. BMC Bioinformatics 7(suppl. 4), S9 (2006)

    Article  Google Scholar 

  7. Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM Journal on Computing 32(4), 1073–1090 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  8. Ben-Dor, A., Lancia, G., Perone, J., Ravi, R.: Banishing bias from consensus sequences. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 247–261. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  9. Wang, L., Dong, L.: Randomized algorithms for motif detection. Journal of Bioinformatics and Computational Biology 3(5), 1039–1052 (2005)

    Article  Google Scholar 

  10. Davila, J., Balla, S., Rajasekaran, S.: Space and time efficient algorithms for planted motif search. In: Proceedings of International Conference on Computational Science, vol. (2), pp. 822–829 (2006)

    Google Scholar 

  11. Fellows, M.R., Gramm, J., Niedermeier, R.: On the parameterized intractability of motif search problems. Combinatorica 26(2), 141–167 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  12. Frances, M., Litman, A.: On covering problems of codes. Theoretical Computer Science 30, 113–119 (1997)

    MathSciNet  MATH  Google Scholar 

  13. Li, M., Ma, B., Wang, L.: On the closest string and substring problems. J. ACM 49(2), 157–171 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  14. Ma, B., Sun, X.: More Efficient Algorithms for Closest String and Substring Problems. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 396–409. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  15. Marx, D.: Closest Substring Problems with Small Distance. SIAM. J. Comput., Vol 38(4), 1382–1410 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  16. Gramm, J., Guo, J., Niedermeier, R.: On exact and approximation algorithms for distinguishing substring selection. In: Lingas, A., Nilsson, B.J. (eds.) FCT 2003. LNCS, vol. 2751, pp. 195–209. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  17. Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems. Algorithmica 37(1), 25–42 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  18. Stojanovic, N., Berman, P., Gumucio, D., Hardison, R., Miller, W.: A linear-time algorithm for the 1-mismatch problem. In: Proceedings of the 5th International Workshop on Algorithms and Data Structures, pp. 126–135 (1997)

    Google Scholar 

  19. Li, M., Ma, B., Wang, L.: Finding Similar Regions in Many Strings. In: Proceedings of the Thirty-first Annual ACM Symposium on Theory of Computing, Atlanta, pp. 473–482 (1999)

    Google Scholar 

  20. Li, M., Ma, B., Wang, L.: Finding Similar Regions in Many Sequences. J. Comput. Syst. Sci. 65(1-2), 111–132 (2002); special issue for Thirty-first Annual ACM Symposium on Theory of Computing (1999)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, L., Zhu, B. (2009). Efficient Algorithms for the Closest String and Distinguishing String Selection Problems. In: Deng, X., Hopcroft, J.E., Xue, J. (eds) Frontiers in Algorithmics. FAW 2009. Lecture Notes in Computer Science, vol 5598. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02270-8_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02270-8_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02269-2

  • Online ISBN: 978-3-642-02270-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics