Skip to main content

Efficient Algorithms for Constrained Clustering with Side Information

  • Conference paper
  • First Online:
Parallel Architectures, Algorithms and Programming (PAAP 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1163))

Abstract

Clustering as an unsupervised machine learning method has broad applications within the area of data science and natural language processing. In this paper, we use background knowledge or side information of the data as constraints to improve clustering accuracy. Following the representation method as in [15], we first format the side information as must-link set and cannot-link set. Then we propose a constrained k-means algorithm for clustering the data. The key idea of our algorithm for clustering must-link data sets is to treat each set as a data with large volume, which is, to assign a set of must-link data as a whole to the center closest to its mass center. In contrast, the key for clustering cannot-link data set is to transform the assignment of the involved data points to the computation of a minimum weight perfect matching. At last, we carried out numerical simulation to evaluate our algorithms for constrained k-means on UCI datasets. The experimental results demonstrate that our method outperforms the previous constrained k-means as well as the classical k-means in both clustering accuracy and runtime.

This work is supported by National Natural Science Foundation of China under its grant number 17702005 and Natural Science Foundation of Fujian Province under its grant number 2017J01753.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)

    Google Scholar 

  2. Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012)

    Article  Google Scholar 

  3. Cao, X., Zhang, C., Zhou, C., Huazhu, F., Foroosh, H.: Constrained multi-view video face clustering. IEEE Trans. Image Process. 24(11), 4381–4393 (2015)

    Article  MathSciNet  Google Scholar 

  4. Chehreghan, A., Abbaspour, R.A.: An improvement on the clustering of high-resolution satellite images using a hybrid algorithm. J. Indian Soc. Remote Sens. 45(4), 579–590 (2017)

    Article  Google Scholar 

  5. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier (2011)

    Google Scholar 

  6. Jothi, R., Mohanty, S.K., Ojha, A.: Dk-means: a deterministic k-means clustering algorithm for gene expression analysis. Pattern Anal. Appl. 22(2), 649–667 (2019)

    Article  MathSciNet  Google Scholar 

  7. Lai, Y., Liu, J.: Optimization study on initial center of k-means algorithm. Comput. Eng. Appl. 44(10), 147–149 (2008)

    Google Scholar 

  8. Liu, H., Shao, M., Ding, Z., Yun, F.: Structure-preserved unsupervised domain adaptation. IEEE Trans. Knowl. Data Eng. 31(4), 799–812 (2018)

    Article  Google Scholar 

  9. MacQueen, J. et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)

    Google Scholar 

  10. Marroquin, J.L., Girosi, F.: Some extensions of the k-means algorithm for image segmentation and pattern classification. Technical report, MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB (1993)

    Google Scholar 

  11. Mashtalir, S.V., Stolbovyi, M.I., Yakovlev, S.V.: Clustering video sequences by the method of harmonic k-means. Cybern. Syst. Anal. 55(2), 200–206 (2019)

    Article  Google Scholar 

  12. Melnykov, V., Zhu, X.: An extension of the k-means algorithm to clustering skewed data. Comput. Stat. 34(1), 373–394 (2019)

    Article  MathSciNet  Google Scholar 

  13. Tang, J., Chang, Y., Aggarwal, C., Liu, H.: A survey of signed network mining in social media. ACM Comput. Surv. (CSUR) 49(3), 42 (2016)

    Article  Google Scholar 

  14. Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: AAAI/IAAI, vol. 1097, pp. 577–584 (2000)

    Google Scholar 

  15. Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al.: Constrained k-means clustering with background knowledge. In: Icml, vol. 1, pp. 577–584 (2001)

    Google Scholar 

  16. Zhang, L., Jin, M.: A constrained clustering-based blind detector for spatial modulation. IEEE Commun. Lett. 23(7), 1170–1173 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peihuang Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hao, Z., Guo, L., Yao, P., Huang, P., Peng, H. (2020). Efficient Algorithms for Constrained Clustering with Side Information. In: Shen, H., Sang, Y. (eds) Parallel Architectures, Algorithms and Programming. PAAP 2019. Communications in Computer and Information Science, vol 1163. Springer, Singapore. https://doi.org/10.1007/978-981-15-2767-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-2767-8_25

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-2766-1

  • Online ISBN: 978-981-15-2767-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics