Skip to main content

QC4 - A Clustering Evaluation Method

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4426))

Included in the following conference series:

Abstract

Many clustering algorithms have been developed and researchers need to be able to compare their effectiveness. For some clustering problems, like web page clustering, different algorithms produce clusterings with different characteristics: coarse vs fine granularity, disjoint vs overlapping, flat vs hierarchical. The lack of a clustering evaluation method that can evaluate clusterings with different characteristics has led to incomparable research and results. QC4 solves this by providing a new structure for defining general ideal clusterings and new measurements for evaluating clusterings with different characteristics with respect to a general ideal clustering. The paper describes QC4 and evaluates it within the web clustering domain by comparison to existing evaluation measurements on synthetic test cases and on real world web page clustering tasks. The synthetic test cases show that only QC4 can cope correctly with overlapping clusters, hierarchical clusterings, and all the difficult boundary cases. In the real world tasks, which represent simple clustering situations, QC4 is mostly consistent with the existing measurements and makes better conclusions in some cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000), citeseer.ist.psu.edu/steinbach00comparison.html

  2. Ali, R., Ghani, U., Saeed, A.: Data clustering and its applications (1998), http://members.tripod.com/asim_saeed/paper.htm

  3. Wang, Y., Kitsuregawa, M.: Evaluating contents-link coupled web page clustering for web search results. In: 11th Int. Conf. on Information and Knowledge Management (CIKM ’2002), McLean, VA, USA, pp. 499–506. ACM Press, New York (2002)

    Chapter  Google Scholar 

  4. Zamir, O.E.: Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results. PhD thesis, University of Washington (1999)

    Google Scholar 

  5. Crabtree, D., Gao, X., Andreae, P.: Improving web clustering by cluster selection. In: 2005 IEEE/WIC/ACM Int. Conf. on Web Intelligence, September 2005, pp. 172–178. ACM Press, New York (2005)

    Chapter  Google Scholar 

  6. Crabtree, D., Andreae, P., Gao, X.: Query directed web page clustering. In: 2006 IEEE/WIC/ACM Int. Conf. on Web Intelligence, December 2006, pp. 202–210. ACM Press, New York (2006)

    Chapter  Google Scholar 

  7. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2-3), 107–145 (2001)

    Article  MATH  Google Scholar 

  8. Strehl, A.: Relationship-based Clustering and Cluster Ensembles for High-dimensional Data Mining. PhD thesis, Faculty of the Graduate School of The University of Texas at Austin (2002)

    Google Scholar 

  9. Tonella, P., et al.: Evaluation methods for web application clustering. In: 5th Int. Workshop on Web Site Evolution, Amsterdam, The Netherlands (2003)

    Google Scholar 

  10. Meila, M.: Comparing clusterings. Technical Report 418, Department of Statistics, University of Washington (2002)

    Google Scholar 

  11. Wong, W.-c., Fu, A.: Incremental document clustering for web page classification. In: IEEE 2000 Int. Conf. on Info. Society in the 21st century: emerging technologies and new challenges (IS2000), Japan, November 2000, IEEE Computer Society Press, Los Alamitos (2000), citeseer.ist.psu.edu/article/wong01incremental.html

    Google Scholar 

  12. Crabtree, D., Gao, X., Andreae, P.: Standardized evaluation method for web clustering results. In: 2005 IEEE/WIC/ACM Int. Conf. on Web Intelligence, September 2005, pp. 280–283. ACM Press, New York (2005)

    Chapter  Google Scholar 

  13. van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)

    Google Scholar 

  14. Mackay, D.J.: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zhi-Hua Zhou Hang Li Qiang Yang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Crabtree, D., Andreae, P., Gao, X. (2007). QC4 - A Clustering Evaluation Method. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71701-0_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71700-3

  • Online ISBN: 978-3-540-71701-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics