Efficient Algorithms for the Closest String and Distinguishing String Selection Problems

Wang, Lusheng; Zhu, Binhai

doi:10.1007/978-3-642-02270-8_27

Lusheng Wang¹⁹ &
Binhai Zhu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5598))

Included in the following conference series:

International Workshop on Frontiers in Algorithmics

1038 Accesses
22 Citations

Abstract

In the paper, we study three related problems, the closest string problem, the farthest string problem and the distinguishing string selection problem. These problems have applications in motif detection, binding sites locating, genetic drug target identification, genetic probes design, universal PCR primer design, etc. They have been extensively studied in recent years.

The problems are defined as follows:

The closest string problem: given a group of strings \({\cal B}=\{s_1, s_2, \ldots,\) s _n}, each of length L, and an integer d, the problem is to compute a center string s of length L such that the Hamming distance d(s, s _i)≤d for all \(s_y\in {\cal B}\).

The farthest string problem: given a group of strings \({\cal G}=\{g_1,g_2,...,\) \(g_{n_2}\}\), with all strings of the same length L, and an integer d _b, the farthest string problem is to compute a center string s of length L such that the Hamming distance d(s,g _j) ≥ L − d _b for all \( g_j\in {\cal G}\).

The distinguishing string selection problem: given two groups of strings \({\cal B}\) (bad genes) and \({\cal G}\) (good genes), \({\cal B}=\{s_1,s_2,...,s_{n_1}\}\) and \({\cal G}=\{g_{n_1+1},g_{n_1+2},...,g_{n_2}\}\), with all strings of the same length L, and two integers d _b and d _g with d _g ≥ L − d _b, the Distinguishing String Selection problem is to compute a center string s of length L such that the Hamming distance \(d(s,s_i)\leq d_b, \forall s_i\in{\cal B}\) and the Hamming distance d(s,g _j)≥d _g for all \(g_j\in {\cal G}\).

Our results: We design an O(Ln+nd(|Σ−1|)^d 2^3.25d) time fixed parameter algorithm for the closest string problem which improves upon the best known O(Ln+nd2^4d×(|Σ|−1)^d) algorithm in [14], where |Σ| is the size of the alphabet. We also design fixed parameter algorithms for both the farthest string problem and the distinguishing string selection problem. Both algorithms run in time \(O(Ln+nd2^{3.25d_b})\) when the input strings are binary strings over Σ = {0, 1}.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lucas, K., Busch, M., MÖssinger, S., Thompson, J.A.: An improved microcomputer program for finding gene- or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. CABIOS 7, 525–529 (1991)
Google Scholar
Lanctot, K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. In: Proc. 10th ACM-SIAM Symp. on Discrete Algorithms, pp. 633–642 (1999); also Information and Computation 185(1), 41–55 (2003)
Google Scholar
Dopazo, J., Rodríguez, A., Sáiz, J.C., Sobrino, F.: Design of primers for PCR amplification of highly variable genomes. CABIOS 9, 123–125 (1993)
Google Scholar
Proutski, V., Holme, E.C.: Primer master: A new program for the design and analysis of PCR primers. CABIOS 12, 253–255 (1996)
Google Scholar
Gramm, J., Huffner, F., Niedermeier, R.: Closest strings, primer design, and motif search. In: Currents in Computational Molecular Biology, poster abstracts of RECOMB 2002, pp. 74–75 (2002)
Google Scholar
Wang, Y., Chen, W., Li, X., Cheng, B.: Degenerated primer design to amplify the heavy chain variable region from immunoglobulin cDNA. BMC Bioinformatics 7(suppl. 4), S9 (2006)
Article Google Scholar
Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM Journal on Computing 32(4), 1073–1090 (2003)
Article MathSciNet MATH Google Scholar
Ben-Dor, A., Lancia, G., Perone, J., Ravi, R.: Banishing bias from consensus sequences. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 247–261. Springer, Heidelberg (1997)
Chapter Google Scholar
Wang, L., Dong, L.: Randomized algorithms for motif detection. Journal of Bioinformatics and Computational Biology 3(5), 1039–1052 (2005)
Article Google Scholar
Davila, J., Balla, S., Rajasekaran, S.: Space and time efficient algorithms for planted motif search. In: Proceedings of International Conference on Computational Science, vol. (2), pp. 822–829 (2006)
Google Scholar
Fellows, M.R., Gramm, J., Niedermeier, R.: On the parameterized intractability of motif search problems. Combinatorica 26(2), 141–167 (2006)
Article MathSciNet MATH Google Scholar
Frances, M., Litman, A.: On covering problems of codes. Theoretical Computer Science 30, 113–119 (1997)
MathSciNet MATH Google Scholar
Li, M., Ma, B., Wang, L.: On the closest string and substring problems. J. ACM 49(2), 157–171 (2002)
Article MathSciNet MATH Google Scholar
Ma, B., Sun, X.: More Efficient Algorithms for Closest String and Substring Problems. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 396–409. Springer, Heidelberg (2008)
Chapter Google Scholar
Marx, D.: Closest Substring Problems with Small Distance. SIAM. J. Comput., Vol 38(4), 1382–1410 (2008)
Article MathSciNet MATH Google Scholar
Gramm, J., Guo, J., Niedermeier, R.: On exact and approximation algorithms for distinguishing substring selection. In: Lingas, A., Nilsson, B.J. (eds.) FCT 2003. LNCS, vol. 2751, pp. 195–209. Springer, Heidelberg (2003)
Chapter Google Scholar
Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems. Algorithmica 37(1), 25–42 (2003)
Article MathSciNet MATH Google Scholar
Stojanovic, N., Berman, P., Gumucio, D., Hardison, R., Miller, W.: A linear-time algorithm for the 1-mismatch problem. In: Proceedings of the 5th International Workshop on Algorithms and Data Structures, pp. 126–135 (1997)
Google Scholar
Li, M., Ma, B., Wang, L.: Finding Similar Regions in Many Strings. In: Proceedings of the Thirty-first Annual ACM Symposium on Theory of Computing, Atlanta, pp. 473–482 (1999)
Google Scholar
Li, M., Ma, B., Wang, L.: Finding Similar Regions in Many Sequences. J. Comput. Syst. Sci. 65(1-2), 111–132 (2002); special issue for Thirty-first Annual ACM Symposium on Theory of Computing (1999)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
Lusheng Wang
Department of Computer Science, Montana State University, Bozeman, MT 59717, USA
Binhai Zhu

Authors

Lusheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Binhai Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science,, City University of Hong Kong, No. 83 Tat Chee Avenue, Kowloon Tong, Hong Kong, China
Xiaotie Deng
Computer Science Department, Cornell University, 5144 Upson Hall, NY 14853, Ithaca, USA
John E. Hopcroft
Provincial Key Laboratory of High-Performance Computing, Jiangxi Normal University, 330027, Nanchang, China
Jinyun Xue

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Zhu, B. (2009). Efficient Algorithms for the Closest String and Distinguishing String Selection Problems. In: Deng, X., Hopcroft, J.E., Xue, J. (eds) Frontiers in Algorithmics. FAW 2009. Lecture Notes in Computer Science, vol 5598. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02270-8_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-02270-8_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02269-2
Online ISBN: 978-3-642-02270-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics