A New Fuzzy Co-clustering Algorithm for Categorization of Datasets with Overlapping Clusters

Tjhi, William-Chandra; Chen, Lihui

doi:10.1007/11811305_36

William-Chandra Tjhi²² &
Lihui Chen²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4093))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2851 Accesses
1 Citations

Abstract

Fuzzy co-clustering is a method that performs simultaneous fuzzy clustering of objects and features. In this paper, we introduce a new fuzzy co-clustering algorithm for high-dimensional datasets called Cosine-Distance-based & Dual-partitioning Fuzzy Co-clustering (CODIALING FCC). Unlike many existing fuzzy co-clustering algorithms, CODIALING FCC is a dual-partitioning algorithm. It clusters the features in the same manner as it clusters the objects, that is, by partitioning them according to their natural groupings. It is also a cosine-distance-based algorithm because it utilizes the cosine distance to capture the belongingness of objects and features in the co-clusters. Our main purpose of introducing this new algorithm is to improve the performance of some prominent existing fuzzy co-clustering algorithms in dealing with datasets with high overlaps. In our opinion, this is very crucial since most real-world datasets involve significant amount of overlaps in their inherent clustering structures. We discuss how this improvement can be made through the dual-partitioning formulation adopted. Experimental results on a toy problem and five large benchmark document datasets demonstrate the effectiveness of CODIALING FCC in handling overlaps better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mitra, S., Acharya, T.: Data Mining Multimedia, Soft Computing, and Bioinformatics. John Wiley & Sons Inc., New Jersey (2003)
Google Scholar
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Academic Press, London (2001)
Google Scholar
Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Proc. of the Twenty First Annual International ACM SIGIR Conf. on R&D in Information Retrieval, pp. 46–54 (1998)
Google Scholar
Madeira, S.C., Oliveira, A.L.: Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE/ACM Trans. on Comp. Biology and Bioinf. 1, 24–45 (2004)
Article Google Scholar
Ertoz, L., Steinbach, M., Kumar, V.: Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. In: Proc. of SIAM International Conf. on Data Mining (2003)
Google Scholar
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-Theoretic Co-clustering. In: Proc of the Ninth ACM SIGKDD International Conf. on KDD, pp. 89–98 (2003)
Google Scholar
Banerjee, A., Dhillon, I.S., Modha, D.S.: A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation. In: Proc. of the Tenth ACM SIGKDD International Conf. on KDD, pp. 509–514 (2004)
Google Scholar
Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum Sum-squared Residues Co-clustering of Gene Expression Data. In: Proc. of the Fourth SIAM International Conf. on Data Mining (2004)
Google Scholar
Mandhani, B., Joshi, S., Kummamuru, K.: A Matrix Density Based Algorithm to Hierarchically Co-Cluster Documents and Words. In: Proc. of the Twelfth Int. Conference on WWW, pp. 511–518 (2003)
Google Scholar
Zadeh, L.A.: Fuzzy Sets. Information and Control 8 (1965)
Google Scholar
Frigui, H., Nasraoui, O.: Simultaneous Clustering and Dynamic Keyword Weighting for Text Documents. In: Berry, M.W. (ed.) Survey of Text Mining, pp. 45–72. Springer, Heidelberg (2004)
Chapter Google Scholar
Kummamuru, K., Dhawale, A., Krishnapuram, R.: Fuzzy Co-clustering of Documents and Keywords. IEEE International Conf. on Fuzzy Systems 2, 772–777 (2003)
Article Google Scholar
Ruspini, E.: A new approach to clustering. Information and Control 15, 22–32 (1969)
Article MATH Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press (1981)
Google Scholar
Oh, C.H., Honda, K., Ichihashi, H.: Fuzzy Clustering for Categorical Multivariate Data. In: Proc. of Joint 9th IFSA World Congress and 2nd NAFIPS Inter. Conf., pp. 2154–2159 (2001)
Google Scholar
Sinka, M.P., Corne, D.W.: A Large Benchmark Dataset for Web Document Clustering. In: Abraham, A., et al. (eds.) Soft Computing Systems: Design, Management and Applications, pp. 881–892. IOS Press, Amsterdam (2002)
Google Scholar
Dhillon, I.S., Fan, J., Guan, Y.: Efficient Clustering of Very Large Document Collections. In: Grossman, R.L., et al. (eds.) Data Mining for Scientific and Engineering Applications, pp. 357–382. Kluwer Academic Publishers, Dordrecht (2001)
Chapter Google Scholar
Yates, R.B., Neto, R.R.: Modern Information Retrieval. ACM Press, New York (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Nanyang Technological University, Republic of Singapore
William-Chandra Tjhi & Lihui Chen

Authors

William-Chandra Tjhi
View author publications
You can also search for this author in PubMed Google Scholar
Lihui Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Electronic Engineering, The University of Queensland, Queensland, Australia
Xue Li
University of Alberta, Canada
Osmar R. Zaïane
Northwest Polytechnical University, China
Zhanhuai Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tjhi, WC., Chen, L. (2006). A New Fuzzy Co-clustering Algorithm for Categorization of Datasets with Overlapping Clusters. In: Li, X., Zaïane, O.R., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2006. Lecture Notes in Computer Science(), vol 4093. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11811305_36

Download citation

DOI: https://doi.org/10.1007/11811305_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37025-3
Online ISBN: 978-3-540-37026-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics