Semi–supervised K-Means Clustering by Optimizing Initial Cluster Centers

Wang, Xin; Wang, Chaofei; Shen, Junyi

doi:10.1007/978-3-642-23982-3_23

Semi–supervised K-Means Clustering by Optimizing Initial Cluster Centers

Xin Wang²¹,
Chaofei Wang²² &
Junyi Shen²¹

Conference paper

1589 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6988))

Abstract

Semi-supervised clustering uses a small amount of labeled data to aid and bias the clustering of unlabeled data. This paper explores the usage of labeled data to generate and optimize initial cluster centers for k-means algorithm. It proposes a max-distance search approach in order to find some optimal initial cluster centers from unlabeled data, especially when labeled data can’t provide enough initial cluster centers. Experimental results demonstrate the advantages of this method over standard random selection and partial random selection, in which some initial cluster centers come from labeled data while the other come from unlabeled data by random selection.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Olivier, C., Bernhard, S., Alexander, Z.: Semi- Supervised learning, pp. 3–10. MIT Press, Cambridge (2006)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT 1998, Madison, WI, pp. 92–100 (1998)
Google Scholar
Zhang, T., Ando, R.K.: Analysis of spectral kernel design based semi-supervised learning, pp. 1601–1608. MIT Press, Cambridge (2006)
Google Scholar
Nizar, G., Michel, C., Nozha, B.: Unsupervised and semi-supervised clustering: a brief survey. In: Proc. of 6th Framework Programme (2005)
Google Scholar
Basu, S., Bilenko, M., Mooney, R.: A probabilistic framework for semi-supervised clustering. In: Proc. of the 10th ACM SIGKDD Int’l. Conf. on Knowledge Discovery and Data Mining, pp. 59–68. ACM Press, Seattle (2004)
Google Scholar
Tao, L., Hongjian, Y.: Semi-supervised learning based on k-means clustering algorithm. Application Research of Computers 27(3), 913–916 (2010)
Google Scholar
Davidson, I., Basu, S.: Survey of clustering with instance level constraints. ACM Trans. on Knowledge Discovery from Data, 1–44 (2007)
Google Scholar
Shi, Z.: Semi-supervised model based document clustering: a comparative study. Machine Learning 65(1), 3–29 (2006)
Article Google Scholar
Basu, S., Banerjee, A., Mooney, R.J.: Semi- supervised clustering by seeding. In: Proc. of the 19th International Conference on Machine Learning, pp. 19–26 (2002)
Google Scholar
Wagstaff, K., Cardie, C., Rogers, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the 18th International Conference on Machine Learning, pp. 577–584. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Google Scholar
Blake, C., Keogh, E., Merz, C.J.: UCI repository of machine learning databases, Department of Information and Computer Science, University of California, Irvine (1998), http://archive.ics.uci.edu/ml/datasets.html
Daoqiang, Z., Shiguo, C.: Experimental comparisons of semi-supervised dimensional reduction methods. Journal of Software 22(1), 28–43 (2011)
Article MATH Google Scholar
Xiao, Y., Jian, Y.: Semi-supervised clustering based on affinity propagation algorithm. Journal of Software 19(11), 2803–2813 (2008)
Article MATH Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710049, China
Xin Wang & Junyi Shen
China Defense Science and Technology Information Center, Beijing, 100142, China
Chaofei Wang

Authors

Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chaofei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Junyi Shen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Inforamtion Science, University of Macau, Av. Padre Tomás Pereira, Taipa, Macau, China
Zhiguo Gong
School of Computer, Shanghai University, 200444, Shanghai, China
Xiangfeng Luo
College of Computer and Software, Taiyuan University of Technology, 030024, Taiyuan, China
Junjie Chen
School of Computer and Information Engineering, Shanghai University of Electric Power, 200090, Shanghai, China
Jingsheng Lei
Department of Business Administration, Caritas Institute of Higher Education, 18 Chui Ling Road, Tseung Kwan O, Hong Kong, China
Fu Lee Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Wang, C., Shen, J. (2011). Semi–supervised K-Means Clustering by Optimizing Initial Cluster Centers. In: Gong, Z., Luo, X., Chen, J., Lei, J., Wang, F.L. (eds) Web Information Systems and Mining. WISM 2011. Lecture Notes in Computer Science, vol 6988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23982-3_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-23982-3_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23981-6
Online ISBN: 978-3-642-23982-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics