An Efficient K-Medoids-Based Algorithm Using Previous Medoid Index, Triangular Inequality Elimination Criteria, and Partial Distance Search

Chu, Shu-Chuan; Roddick, John F.; Pan, J. S.

doi:10.1007/3-540-46145-0_7

Shu-Chuan Chu⁷,
John F. Roddick⁷ &
J. S. Pan⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2454))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1287 Accesses
4 Citations

Abstract

Clustering in data mining is a discovery process that groups similar objects into the same cluster. Various clustering algorithms have been designed to fit various requirements and constraints of application. In this paper, we study several k-medoids-based algorithms including the PAM, CLARA and CLARANS algorithms. A novel and efficient approach is proposed to reduce the computational complexity of such k-medoids-based algorithms by using previous medoid index, triangular inequality elimination criteria and partial distance search. Experimental results based on elliptic, curve and Gauss-Markov databases demonstrate that the proposed algorithm applied to CLARANS may reduce the number of distance calculations by 67% to 92% while retaining the same average distance per object. In terms of the running time, the proposed algorithm may reduce computation time by 38% to 65% compared with the CLARANS algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. Han, M. Kamber, and A. K. H. Tung, “Spatial clustering methods in data mining: A survey,” in Geographic Data Mining and Knowledge Discovery (H. Miller and J. Han, eds.), Research Monographs in Geographic Information Systems, London: Taylor and Francis, 2001.
Google Scholar
A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Boston, MA: Kluwer, 1992.
MATH Google Scholar
J. Jolion, P. Meer, and S. Bataouche, “Robust clustering with applications in computer vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 791–802, 1991.
Article Google Scholar
D. Lecompte, L. Kaufman, and P. Rousseeuw, “Hierarchical cluster analysis of emotional concerns and personality characteristics in a freshman population,” Acta Psychiatrica Belgica, vol. 86, pp. 324–333, 1986.
Google Scholar
J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in 5th Berkeley symposium on mathematics, statistics and Probability, vol. 1, pp. 281–296, 1967.
Google Scholar
L. Kaufman and P. Rousseeuw, Finding groups in data: an introduction to cluster analysis. New York: John Wiley and Sons, 1990.
Google Scholar
T. Zhang, R. Ramakrishnan, and M. Livny, “Birch: An efficient clustering method for very large databases,” in ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, (Montreal, Canada), pp. 103–114, 1996.
Google Scholar
S. Guha, R. Rastogi, and K. Shim, “Cure: an efficient clustering algorithm for large databases,” in ACM SIGMOD International Conference on the Management of Data, (Seattle, WA, USA), pp. 73–84, 1998.
Google Scholar
V. Ganti, J. Gehrke, and R. Ramakrishnan, “Cactus-clustering categorical data using summaries,” in International Conference on Knowledge Discovery and Data Mining, (San Diego, USA), pp. 73–83, 1999.
Google Scholar
G. Karypis, E.-H. Han, and V. Kumar, “Chameleon: a hierarchical clustering algorithm using dynamic modeling,” Computer, vol. 32, pp. 32–68, 1999.
Article Google Scholar
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Second International Conference on Knowledge Discovery and Data Mining (E. Simoudis, J. Han, and U. Fayyad, eds.), (Portland, Oregon), pp. 226–231, AAAI Press, 1996.
Google Scholar
R. Ng and J. Han, “Efficient and effective clustering methods for spatial data mining,” in Twentieth International Conference on Very Large Data Bases (J. B. Bocca, M. Jarke, and C. Zaniolo, eds.), (Santiago, Chile), pp. 144–155, Morgan Kaufmann, 1994.
Google Scholar
S. C. Chu, J. F. Roddick, and J. S. Pan, “A comparative study and extensions to k-medoids algorithms,” in Fifth International Conference on Optimization: Techniques and Applications, (Hong Kong, China), pp. 1708–1717, 2001.
Google Scholar
R. Krishnapuram, A. Joshi, and L. Yi, “A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering,” in IEEE International Fuzzy Systems Conference, (Seoul, Korea), pp. 1281–1286, 1999.
Google Scholar
C. B. Lucasius, A. D. Dane, and G. Kateman, “On k-medoid clustering of large data sets with the aid of a genetic algorithm: background, feasibility and comparison,” Analytica Chimica Acta, pp. 647–669, 1993.
Google Scholar
C. D. Bei and R. M. Gray, “A improvement of the minimum distortion encoding algorithm for vector quantization,” IEEE Transactions on Communication, vol. COM-33, no. 10, pp. 1132–1133, 1985.
Article Google Scholar
E. Vidal, “An algorithm for finding nearest neighbours in (approximately) constant average time,” Pattern Recognition Letters, vol. 4, pp. 145–157, 1986.
Article Google Scholar
S. H. Chen and J. S. Pan, “Fast search algorithm for vq-based recognition of isolated word,” IEE Proc. I, vol. 136, no. 6, pp. 391–396, 1989.
Google Scholar
L. Guan and M. Kamel, “Equal-average hyperplane partitioning method for vector quantization of image data,” Pattern Recognition Letters, pp. 693–699, 1992.
Google Scholar
C. H. Lee and L. H. Chen, “Fast closest codeword search algorithm for vector quantization,” IEE Proc. Vision Image and Signal Processing, vol. 141, no. 3, pp. 143–148, 1994.
Google Scholar
S. J. Baek, B. K. Jeon, and K. M. Sung, “A fast encoding algorithm for vector quantization,” IEEE Signal Processing Letters, vol. 4, no. 2, pp. 325–327, 1997.
Article Google Scholar
J. S. Pan, F. R. McInnes, and M. A. Jack, “Bound for minkowski metric or quadratic metric applied to vq codeword search,” IEE Proc. Vision Image and Signal Processing, vol. 143, no. 1, pp. 67–71, 1996.
Google Scholar
M. R. Soleymani and S. D. Morgera, “A heigh-speed algorithm for vector quantization,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1946–1948, 1987.
Google Scholar
J. S. Pan, Z. M. Lu, and S. H. Sun, “A fast codeword search algorithm for image coding based on mean-variance pyramids of codewords,” IEE Electronics Letters, vol. 36, no. 3, pp. 210–211, 2000.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics and Engineering, Flinders University of South Australia, PO Box 2100, 5001, Adelaide, South Australia
Shu-Chuan Chu & John F. Roddick
Department of Electronic Engineering, Kaohsiung University of Applied Sciences, 415 Chien Kung Road, Kaohsiung, Taiwan
J. S. Pan

Authors

Shu-Chuan Chu
View author publications
You can also search for this author in PubMed Google Scholar
John F. Roddick
View author publications
You can also search for this author in PubMed Google Scholar
J. S. Pan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, 606-8501, Kyoto, Japan
Yahiko Kambayashi
Institute for Computer Science and Business Informatics, University of Vienna, Liebiggasse 4, 1010, Vienna, Austria
Werner Winiwarter
Center for Spatial Information Science (CSIS), University of Tokyo, 4-6-1, Komaba, Meguro-ku, 153-8904, Tokyo, Japan
Masatoshi Arikawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chu, SC., Roddick, J.F., Pan, J.S. (2002). An Efficient K-Medoids-Based Algorithm Using Previous Medoid Index, Triangular Inequality Elimination Criteria, and Partial Distance Search. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_7

Download citation

DOI: https://doi.org/10.1007/3-540-46145-0_7
Published: 02 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44123-6
Online ISBN: 978-3-540-46145-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics