Abstract
Clustering in data mining is a discovery process that groups similar objects into the same cluster. Various clustering algorithms have been designed to fit various requirements and constraints of application. In this paper, we study several k-medoids-based algorithms including the PAM, CLARA and CLARANS algorithms. A novel and efficient approach is proposed to reduce the computational complexity of such k-medoids-based algorithms by using previous medoid index, triangular inequality elimination criteria and partial distance search. Experimental results based on elliptic, curve and Gauss-Markov databases demonstrate that the proposed algorithm applied to CLARANS may reduce the number of distance calculations by 67% to 92% while retaining the same average distance per object. In terms of the running time, the proposed algorithm may reduce computation time by 38% to 65% compared with the CLARANS algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. Han, M. Kamber, and A. K. H. Tung, “Spatial clustering methods in data mining: A survey,” in Geographic Data Mining and Knowledge Discovery (H. Miller and J. Han, eds.), Research Monographs in Geographic Information Systems, London: Taylor and Francis, 2001.
A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Boston, MA: Kluwer, 1992.
J. Jolion, P. Meer, and S. Bataouche, “Robust clustering with applications in computer vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 791–802, 1991.
D. Lecompte, L. Kaufman, and P. Rousseeuw, “Hierarchical cluster analysis of emotional concerns and personality characteristics in a freshman population,” Acta Psychiatrica Belgica, vol. 86, pp. 324–333, 1986.
J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in 5th Berkeley symposium on mathematics, statistics and Probability, vol. 1, pp. 281–296, 1967.
L. Kaufman and P. Rousseeuw, Finding groups in data: an introduction to cluster analysis. New York: John Wiley and Sons, 1990.
T. Zhang, R. Ramakrishnan, and M. Livny, “Birch: An efficient clustering method for very large databases,” in ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, (Montreal, Canada), pp. 103–114, 1996.
S. Guha, R. Rastogi, and K. Shim, “Cure: an efficient clustering algorithm for large databases,” in ACM SIGMOD International Conference on the Management of Data, (Seattle, WA, USA), pp. 73–84, 1998.
V. Ganti, J. Gehrke, and R. Ramakrishnan, “Cactus-clustering categorical data using summaries,” in International Conference on Knowledge Discovery and Data Mining, (San Diego, USA), pp. 73–83, 1999.
G. Karypis, E.-H. Han, and V. Kumar, “Chameleon: a hierarchical clustering algorithm using dynamic modeling,” Computer, vol. 32, pp. 32–68, 1999.
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Second International Conference on Knowledge Discovery and Data Mining (E. Simoudis, J. Han, and U. Fayyad, eds.), (Portland, Oregon), pp. 226–231, AAAI Press, 1996.
R. Ng and J. Han, “Efficient and effective clustering methods for spatial data mining,” in Twentieth International Conference on Very Large Data Bases (J. B. Bocca, M. Jarke, and C. Zaniolo, eds.), (Santiago, Chile), pp. 144–155, Morgan Kaufmann, 1994.
S. C. Chu, J. F. Roddick, and J. S. Pan, “A comparative study and extensions to k-medoids algorithms,” in Fifth International Conference on Optimization: Techniques and Applications, (Hong Kong, China), pp. 1708–1717, 2001.
R. Krishnapuram, A. Joshi, and L. Yi, “A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering,” in IEEE International Fuzzy Systems Conference, (Seoul, Korea), pp. 1281–1286, 1999.
C. B. Lucasius, A. D. Dane, and G. Kateman, “On k-medoid clustering of large data sets with the aid of a genetic algorithm: background, feasibility and comparison,” Analytica Chimica Acta, pp. 647–669, 1993.
C. D. Bei and R. M. Gray, “A improvement of the minimum distortion encoding algorithm for vector quantization,” IEEE Transactions on Communication, vol. COM-33, no. 10, pp. 1132–1133, 1985.
E. Vidal, “An algorithm for finding nearest neighbours in (approximately) constant average time,” Pattern Recognition Letters, vol. 4, pp. 145–157, 1986.
S. H. Chen and J. S. Pan, “Fast search algorithm for vq-based recognition of isolated word,” IEE Proc. I, vol. 136, no. 6, pp. 391–396, 1989.
L. Guan and M. Kamel, “Equal-average hyperplane partitioning method for vector quantization of image data,” Pattern Recognition Letters, pp. 693–699, 1992.
C. H. Lee and L. H. Chen, “Fast closest codeword search algorithm for vector quantization,” IEE Proc. Vision Image and Signal Processing, vol. 141, no. 3, pp. 143–148, 1994.
S. J. Baek, B. K. Jeon, and K. M. Sung, “A fast encoding algorithm for vector quantization,” IEEE Signal Processing Letters, vol. 4, no. 2, pp. 325–327, 1997.
J. S. Pan, F. R. McInnes, and M. A. Jack, “Bound for minkowski metric or quadratic metric applied to vq codeword search,” IEE Proc. Vision Image and Signal Processing, vol. 143, no. 1, pp. 67–71, 1996.
M. R. Soleymani and S. D. Morgera, “A heigh-speed algorithm for vector quantization,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1946–1948, 1987.
J. S. Pan, Z. M. Lu, and S. H. Sun, “A fast codeword search algorithm for image coding based on mean-variance pyramids of codewords,” IEE Electronics Letters, vol. 36, no. 3, pp. 210–211, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chu, SC., Roddick, J.F., Pan, J.S. (2002). An Efficient K-Medoids-Based Algorithm Using Previous Medoid Index, Triangular Inequality Elimination Criteria, and Partial Distance Search. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_7
Download citation
DOI: https://doi.org/10.1007/3-540-46145-0_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44123-6
Online ISBN: 978-3-540-46145-6
eBook Packages: Springer Book Archive