Abstract
The clustering problem has been paid lots of attention in various fields of compute science. However, in many applications, the existence of noisy data poses a big challenge for the clustering problem. As one way to deal with clustering problem with noisy data, clustering with penalties has been studied extensively, such as the k-median problem with penalties and the facility location problem with penalties. As far as we know, there is only one approximation algorithm for the k-means problem with penalties with ratio \(25+\epsilon \). All the previous related results for the clustering with penalties problems were based on the techniques of local search, LP-rounding, or primal-dual, which cannot be applied directly to the k-means problem with penalties to get better approximation ratio than \(25+\epsilon \). In this paper, we apply primal-dual technique to solve the k-means problem with penalties by a different rounding method, i.e., employing a deterministic rounding algorithm, instead of using the randomized rounding algorithm used in the previous approximation schemes. Based on the above method, an approximation algorithm with ratio \(19.849+\epsilon \) is presented for the k-means problem with penalties.
Keywords
This work is supported by the National Natural Science Foundation of China under Grants (61672536, 61420106009, 61872450, 61828205), Hunan Provincial Science and Technology Program (2018WK4001).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmadian, S., Norouzi-Fard, A., Svensson, O., Ward, J.: Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal-dual algorithms. In: Proceedings of 58th IEEE Symposium on Foundations of Computer Science, pp. 61–72 (2017)
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)
Arthur, D., Vassilvitskii, S.: \(k\)-means++: the advantages of careful seeding. In: Proceedings of 18th ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Byrka, J., Pensyl, T., Rybicki, B., Srinivasan, A., Trinh, K.: An improved approximation for \(k\)-median and positive correlation in budgeted optimization. ACM Trans. Algorithms 13(2), 23 (2017)
Charikar, M., Khuller, S., Mount, D.M., Narasimhan, G.: Algorithms for facility location problems with outliers. In: Proceedings of 12th ACM-SIAM Symposium on Discrete Algorithms, pp. 642–651 (2001)
Chen, K.: A constant factor approximation algorithm for \(k\)-median clustering with outliers. In: Proceedings of 19th ACM-SIAM Symposium on Discrete Algorithms, pp. 826–835 (2008)
Cohen-Addad, V., Klein, P.N., Mathieu, C.: Local search yields approximation schemes for \(k\)-means and \(k\)-median in Euclidean and minor-free metrics. In: Proceedings of 57th IEEE Symposium on Foundations of Computer Science, pp. 353–364 (2016)
Feldman, D., Schulman, L.J.: Data reduction for weighted and outlier-resistant clustering. In: Proceedings of 23st ACM-SIAM Symposium on Discrete Algorithms, pp. 1343–1354 (2012)
Friggstad, Z., Khodamoradi, K., Rezapour, M., Salavatipour, M.R.: Approximation schemes for clustering with outliers. In: Proceedings of 28th ACM-SIAM Symposium on Discrete Algorithms, pp. 398–414 (2018)
Friggstad, Z., Rezapour, M., Salavatipour, M.R.: Local search yields a PTAS for \(k\)-means in doubling metrics. In: Proceedings of 57th IEEE Symposium on Foundations of Computer Science, pp. 365–374 (2016)
Guha, S., Li, Y., Zhang, Q.: Distributed partial clustering. In: Proceedings of 29th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 143–152 (2017)
Gupta, A., Guruganesh, G., Schmidt, M.: Approximation algorithms for aversion \(k\)-clustering via local \(k\)-median. In: Proceedings of 43rd International Colloquium on Automata, Languages and Programming, pp. 1–13 (2016)
Gupta, S., Kumar, R., Lu, K., Moseley, B., Vassilvitskii, S.: Local search methods for \(k\)-means with outliers. Proc. VLDB Endow. 10(7), 757–768 (2017)
Hajiaghayi, M., Khandekar, R., Kortsarz, G.: Local search algorithms for the red-blue median problem. Algorithmica 63(4), 795–814 (2012)
Huang, L., Jiang, S., Li, J., Wu, X.: \(\epsilon \)-coresets for clustering (with outliers) in doubling metrics. In: Proceedings of 50th ACM Symposium on Theory of Computing, pp. 814–825 (2018)
Jain, K., Mahdian, M., Markakis, E., Saberi, A., Vazirani, V.V.: Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP. J. ACM 50(6), 795–824 (2003)
Jain, K., Vazirani, V.V.: Approximation algorithms for metric facility location and \(k\)-median problems using the primal-dual schema and lagrangian relaxation. J. ACM 48(2), 274–296 (2001)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: A local search approximation algorithm for \(k\)-means clustering. Comput. Geom. 28(2–3), 89–112 (2004)
Kumar, A., Sabharwal, Y., Sen, S.: Linear-time approximation schemes for clustering problems in any dimensions. J. ACM 57(2), 1–32 (2010)
Li, S., Guo, X.: Distributed \(k\)-clustering for data with heavy noise. In: Proceedings of 32nd Annual Conference on Neural Information Processing Systems, pp. 7849–7857 (2018)
Li, S., Svensson, O.: Approximating \(k\)-median via pseudo-approximation. SIAM J. Comput. 45(2), 530–547 (2016)
Li, Y., Du, D., Xiu, N., Xu, D.: Improved approximation algorithms for the facility location problems with linear/submodular penalties. Algorithmica 73(2), 460–482 (2015)
Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar \(k\)-means problem is NP-hard. Theoret. Comput. Sci. 442, 13–21 (2012)
Makarychev, K., Makarychev, Y., Sviridenko, M., Ward, J.: A bi-criteria approximation algorithm for \(k\)-means. In: Proceedings of 19th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems and 20th International Workshop on Randomization and Computation, pp. 1–20 (2016)
Matousek, J.: On approximate geometric \(k\)-clustering. Discrete Comput. Geom. 24(1), 61–84 (2000)
Ravishankar, K., Li, S., Sai, S.: Constant approximation for \(k\)-median and \(k\)-means with outliers via iterative rounding. In: Proceedings of 50th ACM Symposium on Theory of Computing, pp. 646–659 (2018)
Wu, C., Du, D., Xu, D.: An approximation algorithm for the \(k\)-median problem with uniform penalties via pseudo-solution. Theoret. Comput. Sci. 749, 80–92 (2018)
Xu, G., Xu, J.: An LP rounding algorithm for approximating uncapacitated facility location problem with penalties. Inf. Process. Lett. 94(3), 119–123 (2005)
Xu, G., Xu, J.: An improved approximation algorithm for uncapacitated facility location problem with penalties. J. Comb. Optim. 17(4), 424–436 (2009)
Zhang, D., Hao, C., Wu, C., Xu, D., Zhang, Z.: A local search approximation algorithm for the k-means problem with penalties. In: Cao, Y., Chen, J. (eds.) COCOON 2017. LNCS, vol. 10392, pp. 568–574. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62389-4_47
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Feng, Q., Zhang, Z., Shi, F., Wang, J. (2019). An Improved Approximation Algorithm for the k-Means Problem with Penalties. In: Chen, Y., Deng, X., Lu, M. (eds) Frontiers in Algorithmics. FAW 2019. Lecture Notes in Computer Science(), vol 11458. Springer, Cham. https://doi.org/10.1007/978-3-030-18126-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-18126-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18125-3
Online ISBN: 978-3-030-18126-0
eBook Packages: Computer ScienceComputer Science (R0)