Abstract
Partitional graph clustering algorithms like K-means and Star necessitate a priori decisions on the number of clusters and threshold for the weight of edges to be considered, respectively. These decisions are difficult to make and their impact on clustering performance is significant. We propose a family of algorithms for weighted graph clustering that neither requires a predefined number of clusters, unlike K-means, nor a threshold for the weight of edges, unlike Star. To do so, we use re-assignment of vertices as a halting criterion, as in K-means, and a metric for selecting clusters’ seeds, as in Star. Pictorially, the algorithms’ strategy resembles the rippling of stones thrown in a pond, thus the name ’Ricochet’. We evaluate the performance of our proposed algorithms using standard datasets and evaluate the impact of removing constraints by comparing the performance of our algorithms with constrained algorithms: K-means and Star and unconstrained algorithm: Markov clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Salton, G.: The Smart Document Retrieval Project. In: 14th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pp. 356–358 (1991)
Brandes, U., Gaertler, M., Wagner, D.: Experiments on graph clustering algorithms. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 568–579. Springer, Heidelberg (2003)
MacQueen, J.B.: Some Methods for Classification and Analysis of Multivariate Observations. In: 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, vol. 1, pp. 281–297. University of California Press (1967)
Aslam, J., Pelekhov, K., Rus, D.: The Star Clustering Algorithm. Journal of Graph Algorithms and Applications 8(1), 95–129 (2004)
Van Dongen, S.M.: Graph Clustering by Flow Simulation. In: Tekst. Proefschrift Universiteit Utrecht (2000)
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, New York (1990)
Croft, W.B.: Clustering Large Files of Documents using the Single-link Method. Journal of the American Society for Information Science, 189–195 (1977)
Voorhees, E.: The Cluster Hypothesis Revisited. In: 8th SIGIR, pp. 95–104
Lund, C., Yannakakis, M.: On the Hardness of Approximating Minimization Problems. Journal of the ACM 41, 960–981 (1994)
Press, W., Flannery, B., Teukolsky, S., Vetterling, W.: Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge (1988)
Wijaya, D., Bressan, S.: Journey to the centre of the star: Various ways of finding star centers in star clustering. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA 2007. LNCS, vol. 4653, pp. 660–670. Springer, Heidelberg (2007)
Nieland, H.: Fast Graph Clustering Algorithm by Flow Simulation. Research and Development ERCIM News 42 (2000)
Karypis, G., Han, E., Kumar, V.: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer 32(8), 68–75 (1999)
Zhao, Y., Karypis, G.: Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery 10(2), 141–168 (2005)
http://www.daviddlewis.com/resources/testcollections/reuters21578/ (visited on December 2006)
http://trec.nist.gov/data.html (visited on December 2006)
Google News, http://news.google.com.sg
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wijaya, D.T., Bressan, S. (2009). Ricochet: A Family of Unconstrained Algorithms for Graph Clustering. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds) Database Systems for Advanced Applications. DASFAA 2009. Lecture Notes in Computer Science, vol 5463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00887-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-00887-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00886-3
Online ISBN: 978-3-642-00887-0
eBook Packages: Computer ScienceComputer Science (R0)