QC4 - A Clustering Evaluation Method
Many clustering algorithms have been developed and researchers need to be able to compare their effectiveness. For some clustering problems, like web page clustering, different algorithms produce clusterings with different characteristics: coarse vs fine granularity, disjoint vs overlapping, flat vs hierarchical. The lack of a clustering evaluation method that can evaluate clusterings with different characteristics has led to incomparable research and results. QC4 solves this by providing a new structure for defining general ideal clusterings and new measurements for evaluating clusterings with different characteristics with respect to a general ideal clustering. The paper describes QC4 and evaluates it within the web clustering domain by comparison to existing evaluation measurements on synthetic test cases and on real world web page clustering tasks. The synthetic test cases show that only QC4 can cope correctly with overlapping clusters, hierarchical clusterings, and all the difficult boundary cases. In the real world tasks, which represent simple clustering situations, QC4 is mostly consistent with the existing measurements and makes better conclusions in some cases.
KeywordsMutual Information Cluster Quality Cluster Evaluation Topic Coverage Real World Task
Unable to display preview. Download preview PDF.
- 1.Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000), citeseer.ist.psu.edu/steinbach00comparison.html
- 2.Ali, R., Ghani, U., Saeed, A.: Data clustering and its applications (1998), http://members.tripod.com/asim_saeed/paper.htm
- 4.Zamir, O.E.: Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results. PhD thesis, University of Washington (1999)Google Scholar
- 8.Strehl, A.: Relationship-based Clustering and Cluster Ensembles for High-dimensional Data Mining. PhD thesis, Faculty of the Graduate School of The University of Texas at Austin (2002)Google Scholar
- 9.Tonella, P., et al.: Evaluation methods for web application clustering. In: 5th Int. Workshop on Web Site Evolution, Amsterdam, The Netherlands (2003)Google Scholar
- 10.Meila, M.: Comparing clusterings. Technical Report 418, Department of Statistics, University of Washington (2002)Google Scholar
- 11.Wong, W.-c., Fu, A.: Incremental document clustering for web page classification. In: IEEE 2000 Int. Conf. on Info. Society in the 21st century: emerging technologies and new challenges (IS2000), Japan, November 2000, IEEE Computer Society Press, Los Alamitos (2000), citeseer.ist.psu.edu/article/wong01incremental.html Google Scholar
- 13.van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)Google Scholar