CPM 2000: Combinatorial Pattern Matching pp 108-118

# Approximation Algorithms for Hamming Clustering Problems

• Leszek Gąasieniec
• Jesper Jansson
• Andrzej Lingas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1848)

## Abstract

We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by ϱ. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.

First, we provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k and p are constant. We also observe that HDC admits straightforward polynomialtime solutions when k = O(log n) or p = 2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L 1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any > 0 it is NP-hard to split S into at most pk 1/7-∈ clusters whose Hamming diameter doesn’t exceed the p-diameter. Furthermore, we note that by adapting Gonzalez’ farthest-point clustering algorithm [6], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2 O(pϱ/ε) kO(p/ε) n 2-time (1 + ε)-approximation algorithm for HRC. In particular, it runs in polynomial time when p = O(1) and ϱ = O(log(k + n)). Finally, we show how to find in O((n/ε + kn log n + k 2 log n)(2ϱ k)2/ε) time a set L of O log k) strings of length n such that for each string in S there is at least one string in L within distance (1 + ε) ϱ, for any constant 0 < ε < 1.

## Keywords

Approximation Algorithm Polynomial Time Planar Graph Vertex Cover Binary String
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. 1.
M. Bellare, O. Goldreich, and M. Sudan. Free Bits, PCPs, and Non-Approximability-Towards Tight Results. SIAM Journal on Computing 27(3), 1998, pp. 804–915.
2. 2.
M. Chrobak and T.H. Payne. A linear-time algorithm for drawing a planar graph on a grid. Information Processing Letters 54, 1995, pp. 241–246.
3. 3.
T. Feder and D. Greene. Optimal Algorithms for Approximate Clustering. Proceedings of the 20th Annual ACM Symposium on Theory of Computing (STOC’88), 1988, pp. 434–444.Google Scholar
4. 4.
M. Frances and A. Litman. On Covering Problems of Codes. Theory of Computing Systems 30, 1997, pp. 113–119.
5. 5.
L. Gçasieniec, J. Jansson, and A. Lingas. Efficient Approximation Algorithms for the Hamming Center Problem. Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99), 1999, pp. S905–S906.Google Scholar
6. 6.
T. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38, 1985, pp. 293–306.
7. 7.
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.Google Scholar
8. 8.
D.S. Hochbaum (editor). Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston, 1997.Google Scholar
9. 9.
D.S. Hochbaum and D.B. Shmoys. A best possible heuristic for the k-center problem. Mathematics of Operational Research 10(2), 1985, pp. 180–184.
10. 10.
D.S. Hochbaum and D.B. Shmoys. A Unified Approach to Approximation Algorithms for Bottleneck Problems. Journal of the Association for Computing Machinery 33(3), 1986, pp. 533–550.
11. 11.
B. Kolman, R. Busby, and S. Ross. Discrete Mathematical Structures [3rd ed.]. Prentice Hall, New Jersey, 1996.Google Scholar
12. 12.
J.K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing String Selection Problems. Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99), 1999, pp. 633–642.Google Scholar
13. 13.
M. Li, B. Ma, and L. Wang, Finding Similar Regions in Many Strings. Proceedings of the 31st Annual ACM Symposium on Theory of Computing (STOC’99), 1999, pp. 473–482.Google Scholar
14. 14.
C. Papadimitriou. On the Complexity of Integer Programming. Journal of the ACM 28(4), 1981, pp. 765–768.
15. 15.
S. Vishwanathan. An O(log* n) Approximation Algorithm for the Asymmetric p-Center Problem. Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’96), 1996, pp. 1–5.Google Scholar

## Authors and Affiliations

• Leszek Gąasieniec
• 1
• Jesper Jansson
• 2
• Andrzej Lingas
• 2
1. 1.Dept. of Computer ScienceUniversity of LiverpoolUK
2. 2.Dept. of Computer ScienceLund UniversityLundSweden