Skip to main content

A New Fuzzy Co-clustering Algorithm for Categorization of Datasets with Overlapping Clusters

  • Conference paper
Advanced Data Mining and Applications (ADMA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4093))

Included in the following conference series:

Abstract

Fuzzy co-clustering is a method that performs simultaneous fuzzy clustering of objects and features. In this paper, we introduce a new fuzzy co-clustering algorithm for high-dimensional datasets called Cosine-Distance-based & Dual-partitioning Fuzzy Co-clustering (CODIALING FCC). Unlike many existing fuzzy co-clustering algorithms, CODIALING FCC is a dual-partitioning algorithm. It clusters the features in the same manner as it clusters the objects, that is, by partitioning them according to their natural groupings. It is also a cosine-distance-based algorithm because it utilizes the cosine distance to capture the belongingness of objects and features in the co-clusters. Our main purpose of introducing this new algorithm is to improve the performance of some prominent existing fuzzy co-clustering algorithms in dealing with datasets with high overlaps. In our opinion, this is very crucial since most real-world datasets involve significant amount of overlaps in their inherent clustering structures. We discuss how this improvement can be made through the dual-partitioning formulation adopted. Experimental results on a toy problem and five large benchmark document datasets demonstrate the effectiveness of CODIALING FCC in handling overlaps better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mitra, S., Acharya, T.: Data Mining Multimedia, Soft Computing, and Bioinformatics. John Wiley & Sons Inc., New Jersey (2003)

    Google Scholar 

  2. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Academic Press, London (2001)

    Google Scholar 

  3. Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Proc. of the Twenty First Annual International ACM SIGIR Conf. on R&D in Information Retrieval, pp. 46–54 (1998)

    Google Scholar 

  4. Madeira, S.C., Oliveira, A.L.: Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE/ACM Trans. on Comp. Biology and Bioinf. 1, 24–45 (2004)

    Article  Google Scholar 

  5. Ertoz, L., Steinbach, M., Kumar, V.: Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. In: Proc. of SIAM International Conf. on Data Mining (2003)

    Google Scholar 

  6. Dhillon, I.S., Mallela, S., Modha, D.S.: Information-Theoretic Co-clustering. In: Proc of the Ninth ACM SIGKDD International Conf. on KDD, pp. 89–98 (2003)

    Google Scholar 

  7. Banerjee, A., Dhillon, I.S., Modha, D.S.: A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation. In: Proc. of the Tenth ACM SIGKDD International Conf. on KDD, pp. 509–514 (2004)

    Google Scholar 

  8. Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum Sum-squared Residues Co-clustering of Gene Expression Data. In: Proc. of the Fourth SIAM International Conf. on Data Mining (2004)

    Google Scholar 

  9. Mandhani, B., Joshi, S., Kummamuru, K.: A Matrix Density Based Algorithm to Hierarchically Co-Cluster Documents and Words. In: Proc. of the Twelfth Int. Conference on WWW, pp. 511–518 (2003)

    Google Scholar 

  10. Zadeh, L.A.: Fuzzy Sets. Information and Control 8 (1965)

    Google Scholar 

  11. Frigui, H., Nasraoui, O.: Simultaneous Clustering and Dynamic Keyword Weighting for Text Documents. In: Berry, M.W. (ed.) Survey of Text Mining, pp. 45–72. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Kummamuru, K., Dhawale, A., Krishnapuram, R.: Fuzzy Co-clustering of Documents and Keywords. IEEE International Conf. on Fuzzy Systems 2, 772–777 (2003)

    Article  Google Scholar 

  13. Ruspini, E.: A new approach to clustering. Information and Control 15, 22–32 (1969)

    Article  MATH  Google Scholar 

  14. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press (1981)

    Google Scholar 

  15. Oh, C.H., Honda, K., Ichihashi, H.: Fuzzy Clustering for Categorical Multivariate Data. In: Proc. of Joint 9th IFSA World Congress and 2nd NAFIPS Inter. Conf., pp. 2154–2159 (2001)

    Google Scholar 

  16. Sinka, M.P., Corne, D.W.: A Large Benchmark Dataset for Web Document Clustering. In: Abraham, A., et al. (eds.) Soft Computing Systems: Design, Management and Applications, pp. 881–892. IOS Press, Amsterdam (2002)

    Google Scholar 

  17. Dhillon, I.S., Fan, J., Guan, Y.: Efficient Clustering of Very Large Document Collections. In: Grossman, R.L., et al. (eds.) Data Mining for Scientific and Engineering Applications, pp. 357–382. Kluwer Academic Publishers, Dordrecht (2001)

    Chapter  Google Scholar 

  18. Yates, R.B., Neto, R.R.: Modern Information Retrieval. ACM Press, New York (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tjhi, WC., Chen, L. (2006). A New Fuzzy Co-clustering Algorithm for Categorization of Datasets with Overlapping Clusters. In: Li, X., Zaïane, O.R., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2006. Lecture Notes in Computer Science(), vol 4093. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11811305_36

Download citation

  • DOI: https://doi.org/10.1007/11811305_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37025-3

  • Online ISBN: 978-3-540-37026-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics