Abstract
Clustering as an unsupervised machine learning method has broad applications within the area of data science and natural language processing. In this paper, we use background knowledge or side information of the data as constraints to improve clustering accuracy. Following the representation method as in [15], we first format the side information as must-link set and cannot-link set. Then we propose a constrained k-means algorithm for clustering the data. The key idea of our algorithm for clustering must-link data sets is to treat each set as a data with large volume, which is, to assign a set of must-link data as a whole to the center closest to its mass center. In contrast, the key for clustering cannot-link data set is to transform the assignment of the involved data points to the computation of a minimum weight perfect matching. At last, we carried out numerical simulation to evaluate our algorithms for constrained k-means on UCI datasets. The experimental results demonstrate that our method outperforms the previous constrained k-means as well as the classical k-means in both clustering accuracy and runtime.
This work is supported by National Natural Science Foundation of China under its grant number 17702005 and Natural Science Foundation of Fujian Province under its grant number 2017J01753.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012)
Cao, X., Zhang, C., Zhou, C., Huazhu, F., Foroosh, H.: Constrained multi-view video face clustering. IEEE Trans. Image Process. 24(11), 4381–4393 (2015)
Chehreghan, A., Abbaspour, R.A.: An improvement on the clustering of high-resolution satellite images using a hybrid algorithm. J. Indian Soc. Remote Sens. 45(4), 579–590 (2017)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier (2011)
Jothi, R., Mohanty, S.K., Ojha, A.: Dk-means: a deterministic k-means clustering algorithm for gene expression analysis. Pattern Anal. Appl. 22(2), 649–667 (2019)
Lai, Y., Liu, J.: Optimization study on initial center of k-means algorithm. Comput. Eng. Appl. 44(10), 147–149 (2008)
Liu, H., Shao, M., Ding, Z., Yun, F.: Structure-preserved unsupervised domain adaptation. IEEE Trans. Knowl. Data Eng. 31(4), 799–812 (2018)
MacQueen, J. et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)
Marroquin, J.L., Girosi, F.: Some extensions of the k-means algorithm for image segmentation and pattern classification. Technical report, MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB (1993)
Mashtalir, S.V., Stolbovyi, M.I., Yakovlev, S.V.: Clustering video sequences by the method of harmonic k-means. Cybern. Syst. Anal. 55(2), 200–206 (2019)
Melnykov, V., Zhu, X.: An extension of the k-means algorithm to clustering skewed data. Comput. Stat. 34(1), 373–394 (2019)
Tang, J., Chang, Y., Aggarwal, C., Liu, H.: A survey of signed network mining in social media. ACM Comput. Surv. (CSUR) 49(3), 42 (2016)
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: AAAI/IAAI, vol. 1097, pp. 577–584 (2000)
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al.: Constrained k-means clustering with background knowledge. In: Icml, vol. 1, pp. 577–584 (2001)
Zhang, L., Jin, M.: A constrained clustering-based blind detector for spatial modulation. IEEE Commun. Lett. 23(7), 1170–1173 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hao, Z., Guo, L., Yao, P., Huang, P., Peng, H. (2020). Efficient Algorithms for Constrained Clustering with Side Information. In: Shen, H., Sang, Y. (eds) Parallel Architectures, Algorithms and Programming. PAAP 2019. Communications in Computer and Information Science, vol 1163. Springer, Singapore. https://doi.org/10.1007/978-981-15-2767-8_25
Download citation
DOI: https://doi.org/10.1007/978-981-15-2767-8_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2766-1
Online ISBN: 978-981-15-2767-8
eBook Packages: Computer ScienceComputer Science (R0)