Efficient Algorithms for Constrained Clustering with Side Information

Hao, Zhendong; Guo, Longkun; Yao, Pei; Huang, Peihuang; Peng, Huihong

doi:10.1007/978-981-15-2767-8_25

Zhendong Hao⁸,
Longkun Guo⁸,
Pei Yao⁸,
Peihuang Huang⁹ &
…
Huihong Peng⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1163))

Included in the following conference series:

International Symposium on Parallel Architectures, Algorithms and Programming

1381 Accesses
1 Citations

Abstract

Clustering as an unsupervised machine learning method has broad applications within the area of data science and natural language processing. In this paper, we use background knowledge or side information of the data as constraints to improve clustering accuracy. Following the representation method as in [15], we first format the side information as must-link set and cannot-link set. Then we propose a constrained k-means algorithm for clustering the data. The key idea of our algorithm for clustering must-link data sets is to treat each set as a data with large volume, which is, to assign a set of must-link data as a whole to the center closest to its mass center. In contrast, the key for clustering cannot-link data set is to transform the assignment of the involved data points to the computation of a minimum weight perfect matching. At last, we carried out numerical simulation to evaluate our algorithms for constrained k-means on UCI datasets. The experimental results demonstrate that our method outperforms the previous constrained k-means as well as the classical k-means in both clustering accuracy and runtime.

This work is supported by National Natural Science Foundation of China under its grant number 17702005 and Natural Science Foundation of Fujian Province under its grant number 2017J01753.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
Google Scholar
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012)
Article Google Scholar
Cao, X., Zhang, C., Zhou, C., Huazhu, F., Foroosh, H.: Constrained multi-view video face clustering. IEEE Trans. Image Process. 24(11), 4381–4393 (2015)
Article MathSciNet Google Scholar
Chehreghan, A., Abbaspour, R.A.: An improvement on the clustering of high-resolution satellite images using a hybrid algorithm. J. Indian Soc. Remote Sens. 45(4), 579–590 (2017)
Article Google Scholar
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier (2011)
Google Scholar
Jothi, R., Mohanty, S.K., Ojha, A.: Dk-means: a deterministic k-means clustering algorithm for gene expression analysis. Pattern Anal. Appl. 22(2), 649–667 (2019)
Article MathSciNet Google Scholar
Lai, Y., Liu, J.: Optimization study on initial center of k-means algorithm. Comput. Eng. Appl. 44(10), 147–149 (2008)
Google Scholar
Liu, H., Shao, M., Ding, Z., Yun, F.: Structure-preserved unsupervised domain adaptation. IEEE Trans. Knowl. Data Eng. 31(4), 799–812 (2018)
Article Google Scholar
MacQueen, J. et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)
Google Scholar
Marroquin, J.L., Girosi, F.: Some extensions of the k-means algorithm for image segmentation and pattern classification. Technical report, MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB (1993)
Google Scholar
Mashtalir, S.V., Stolbovyi, M.I., Yakovlev, S.V.: Clustering video sequences by the method of harmonic k-means. Cybern. Syst. Anal. 55(2), 200–206 (2019)
Article Google Scholar
Melnykov, V., Zhu, X.: An extension of the k-means algorithm to clustering skewed data. Comput. Stat. 34(1), 373–394 (2019)
Article MathSciNet Google Scholar
Tang, J., Chang, Y., Aggarwal, C., Liu, H.: A survey of signed network mining in social media. ACM Comput. Surv. (CSUR) 49(3), 42 (2016)
Article Google Scholar
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: AAAI/IAAI, vol. 1097, pp. 577–584 (2000)
Google Scholar
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al.: Constrained k-means clustering with background knowledge. In: Icml, vol. 1, pp. 577–584 (2001)
Google Scholar
Zhang, L., Jin, M.: A constrained clustering-based blind detector for spatial modulation. IEEE Commun. Lett. 23(7), 1170–1173 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China
Zhendong Hao, Longkun Guo, Pei Yao & Huihong Peng
College of Mathematics and Data Science, Minjiang University, Fuzhou, China
Peihuang Huang

Authors

Zhendong Hao
View author publications
You can also search for this author in PubMed Google Scholar
Longkun Guo
View author publications
You can also search for this author in PubMed Google Scholar
Pei Yao
View author publications
You can also search for this author in PubMed Google Scholar
Peihuang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Huihong Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peihuang Huang .

Editor information

Editors and Affiliations

Sun Yat-sen University, Guangzhou, China
Hong Shen
Sun Yat-sen University, Guangzhou, China
Yingpeng Sang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hao, Z., Guo, L., Yao, P., Huang, P., Peng, H. (2020). Efficient Algorithms for Constrained Clustering with Side Information. In: Shen, H., Sang, Y. (eds) Parallel Architectures, Algorithms and Programming. PAAP 2019. Communications in Computer and Information Science, vol 1163. Springer, Singapore. https://doi.org/10.1007/978-981-15-2767-8_25

Download citation

DOI: https://doi.org/10.1007/978-981-15-2767-8_25
Published: 26 January 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2766-1
Online ISBN: 978-981-15-2767-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics