Criterion Functions for Clustering on High-Dimensional Data

Zhao, Y.; Karypis, G.

doi:10.1007/3-540-28349-8_8

Y. Zhao⁵ &
G. Karypis⁶

9235 Accesses
2 Citations

Summary

In recent years, we have witnessed a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. This has led to an increased interest in developing methods that can help users to effectively navigate, summarize, and organize this information with the ultimate goal of helping them to find what they are looking for. Fast and high-quality document clustering algorithms play an important role toward this goal as they have been shown to provide both an intuitive navigation/browsing mechanism by organizing large amounts of information into a small number of meaningful clusters as well as to greatly improve the retrieval performance either via cluster-driven dimensionality reduction, term-weighting, or query expansion. This ever-increasing importance of document clustering and the expanded range of its applications led to the development of a number of new and novel algorithms with different complexity-quality trade-offs. Among them, a class of clustering algorithms that have relatively low computational requirements are those that treat the clustering problem as an optimization process, which seeks to maximize or minimize a particular clustering criterion function defined over the entire clustering solution.

This chapter provides empirical and theoretical comparisons of the performance of a number of widely used criterion functions in the context of partitional clustering algorithms for high-dimensional datasets. The comparisons consist of a comprehensive experimental evaluation involving 15 different datasets, as well as an analysis of the characteristics of the various criterion func-break tions and their effect on the clusters they produce. Our experimental results show that there is a set of criterion functions that consistently outperform the rest, and that some of the newly proposed criterion functions lead to the best overall results. Our theoretical analysis of the criterion function shows that their relative performance of the criterion functions depends on: (i) the degree to which they can correctly operate when the clusters are of different tightness, and (ii) the degree to which they can lead to reasonably balanced clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
Y. Zhao
Department of Computer Science and Engineering and Digital Technology Center and Army HPC Research Center, University of Minnesota, Minneapolis, MN, 55455, USA
G. Karypis

Authors

Y. Zhao
View author publications
You can also search for this author in PubMed Google Scholar
G. Karypis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics and Statistics, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, Maryland, 21250, USA
Jacob Kogan
Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, Maryland, 21250, USA
Jacob Kogan
Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, Maryland, 21250, USA
Charles Nicholas
School of Mathematical Sciences, Tel-Aviv University, Ramat Aviv, Tel-Aviv, 69978, Israel
Marc Teboulle

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhao, Y., Karypis, G. (2006). Criterion Functions for Clustering on High-Dimensional Data. In: Kogan, J., Nicholas, C., Teboulle, M. (eds) Grouping Multidimensional Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28349-8_8

Download citation

DOI: https://doi.org/10.1007/3-540-28349-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28348-5
Online ISBN: 978-3-540-28349-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics