A Novel Fuzzy Kernel C-Means Algorithm for Document Clustering

Yin, Yingshun; Zhang, Xiaobin; Miao, Baojun; Gao, Lili

doi:10.1007/978-3-540-68636-1_41

Yingshun Yin¹,
Xiaobin Zhang¹,
Baojun Miao² &
…
Lili Gao¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4993))

Included in the following conference series:

Asia Information Retrieval Symposium

1436 Accesses

Abstract

Fuzzy Kernel C-Means (FKCM) algorithm can improve accuracy significantly compared with classical Fuzzy C-Means algorithms for nonlinear separability, high dimension and clusters with overlaps in input space. Despite of these advantages, several features are subjected to the applications in real world such as local optimal, outliers, the c parameter must be assigned in advance and slow convergence speed. To overcome these disadvantages, Semi-Supervised learning and validity index are employed. Semi-Supervised learning uses limited labeled data to assistant a bulk of unlabeled data. It makes the FKCM avoid drawbacks proposed. The number of cluster will great affect clustering performance. It isn’t possible to assume the optimal number of clusters especially to large text corps. Validity function makes it possible to determine the suitable number of cluster in clustering process. Sparse format, Cscatter and gathering strategy save considerable store space and computation time. Experimental results on the Reuters-21578 benchmark dataset demonstrate that the algorithm proposed is more flexibility and accuracy than the state-of-art FKCM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
MATH Google Scholar
Wu, Z.-d., Xie, W.-x., Yu, J.-p.: Fuzzy C-means clustering algorithm based on kernel method. In: Proceedings of Fifth International Conference on Computational Intelligence and Multimedia Applications, pp. 49–56 (2003)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis, pp. 327–338. Cambridge University Press, Cambridge (2004)
Google Scholar
Pal, N.R., Bezdek, J.C.: On clustering for the fuzzy c-means model. IEEE Transaction on Fuzzy System 3(3), 370–379 (1995)
Article Google Scholar
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 841–847 (1999)
Google Scholar
Bensaid, A.M., Hall, L.O., Bezdek, J.C.: Validity-guided (re)clustering with applications to image segmentation. IEEE Transactions on Fuzzy Systems, 112–123 (1996)
Google Scholar
Li, K., Liu, Y.: KFCSA:A Novel clustering Algorithm for High-Dimension Data. In: Wang, L., Jin, Y. (eds.) FSKD 2005. LNCS (LNAI), vol. 3613, pp. 531–536. Springer, Heidelberg (2005)
Google Scholar
Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2006)
Google Scholar
Huang, T.-M., Kecman, V., Kopriva, I.: Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised, and Unsupervised Learning. Springer, Berlin (2006)
MATH Google Scholar
Bouchachia, A., Pedrycz, W.: Data Clustering with Partial Supervision Data Mining and Knowledge Discovery 12, 47–78 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of computer science, Xi’an polytechnic university, Shaanxi, China
Yingshun Yin, Xiaobin Zhang & Lili Gao
Schol of mathematical Science, Xuchang University, Henan, China
Baojun Miao

Authors

Yingshun Yin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Baojun Miao
View author publications
You can also search for this author in PubMed Google Scholar
Lili Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hang Li Ting Liu Wei-Ying Ma Tetsuya Sakai Kam-Fai Wong Guodong Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yin, Y., Zhang, X., Miao, B., Gao, L. (2008). A Novel Fuzzy Kernel C-Means Algorithm for Document Clustering. In: Li, H., Liu, T., Ma, WY., Sakai, T., Wong, KF., Zhou, G. (eds) Information Retrieval Technology. AIRS 2008. Lecture Notes in Computer Science, vol 4993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68636-1_41

Download citation

DOI: https://doi.org/10.1007/978-3-540-68636-1_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68633-0
Online ISBN: 978-3-540-68636-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics