Skip to main content

An Efficient K-Medoids-Based Algorithm Using Previous Medoid Index, Triangular Inequality Elimination Criteria, and Partial Distance Search

  • Conference paper
  • First Online:
Book cover Data Warehousing and Knowledge Discovery (DaWaK 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2454))

Included in the following conference series:

Abstract

Clustering in data mining is a discovery process that groups similar objects into the same cluster. Various clustering algorithms have been designed to fit various requirements and constraints of application. In this paper, we study several k-medoids-based algorithms including the PAM, CLARA and CLARANS algorithms. A novel and efficient approach is proposed to reduce the computational complexity of such k-medoids-based algorithms by using previous medoid index, triangular inequality elimination criteria and partial distance search. Experimental results based on elliptic, curve and Gauss-Markov databases demonstrate that the proposed algorithm applied to CLARANS may reduce the number of distance calculations by 67% to 92% while retaining the same average distance per object. In terms of the running time, the proposed algorithm may reduce computation time by 38% to 65% compared with the CLARANS algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Han, M. Kamber, and A. K. H. Tung, “Spatial clustering methods in data mining: A survey,” in Geographic Data Mining and Knowledge Discovery (H. Miller and J. Han, eds.), Research Monographs in Geographic Information Systems, London: Taylor and Francis, 2001.

    Google Scholar 

  2. A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Boston, MA: Kluwer, 1992.

    MATH  Google Scholar 

  3. J. Jolion, P. Meer, and S. Bataouche, “Robust clustering with applications in computer vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 791–802, 1991.

    Article  Google Scholar 

  4. D. Lecompte, L. Kaufman, and P. Rousseeuw, “Hierarchical cluster analysis of emotional concerns and personality characteristics in a freshman population,” Acta Psychiatrica Belgica, vol. 86, pp. 324–333, 1986.

    Google Scholar 

  5. J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in 5th Berkeley symposium on mathematics, statistics and Probability, vol. 1, pp. 281–296, 1967.

    Google Scholar 

  6. L. Kaufman and P. Rousseeuw, Finding groups in data: an introduction to cluster analysis. New York: John Wiley and Sons, 1990.

    Google Scholar 

  7. T. Zhang, R. Ramakrishnan, and M. Livny, “Birch: An efficient clustering method for very large databases,” in ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, (Montreal, Canada), pp. 103–114, 1996.

    Google Scholar 

  8. S. Guha, R. Rastogi, and K. Shim, “Cure: an efficient clustering algorithm for large databases,” in ACM SIGMOD International Conference on the Management of Data, (Seattle, WA, USA), pp. 73–84, 1998.

    Google Scholar 

  9. V. Ganti, J. Gehrke, and R. Ramakrishnan, “Cactus-clustering categorical data using summaries,” in International Conference on Knowledge Discovery and Data Mining, (San Diego, USA), pp. 73–83, 1999.

    Google Scholar 

  10. G. Karypis, E.-H. Han, and V. Kumar, “Chameleon: a hierarchical clustering algorithm using dynamic modeling,” Computer, vol. 32, pp. 32–68, 1999.

    Article  Google Scholar 

  11. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Second International Conference on Knowledge Discovery and Data Mining (E. Simoudis, J. Han, and U. Fayyad, eds.), (Portland, Oregon), pp. 226–231, AAAI Press, 1996.

    Google Scholar 

  12. R. Ng and J. Han, “Efficient and effective clustering methods for spatial data mining,” in Twentieth International Conference on Very Large Data Bases (J. B. Bocca, M. Jarke, and C. Zaniolo, eds.), (Santiago, Chile), pp. 144–155, Morgan Kaufmann, 1994.

    Google Scholar 

  13. S. C. Chu, J. F. Roddick, and J. S. Pan, “A comparative study and extensions to k-medoids algorithms,” in Fifth International Conference on Optimization: Techniques and Applications, (Hong Kong, China), pp. 1708–1717, 2001.

    Google Scholar 

  14. R. Krishnapuram, A. Joshi, and L. Yi, “A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering,” in IEEE International Fuzzy Systems Conference, (Seoul, Korea), pp. 1281–1286, 1999.

    Google Scholar 

  15. C. B. Lucasius, A. D. Dane, and G. Kateman, “On k-medoid clustering of large data sets with the aid of a genetic algorithm: background, feasibility and comparison,” Analytica Chimica Acta, pp. 647–669, 1993.

    Google Scholar 

  16. C. D. Bei and R. M. Gray, “A improvement of the minimum distortion encoding algorithm for vector quantization,” IEEE Transactions on Communication, vol. COM-33, no. 10, pp. 1132–1133, 1985.

    Article  Google Scholar 

  17. E. Vidal, “An algorithm for finding nearest neighbours in (approximately) constant average time,” Pattern Recognition Letters, vol. 4, pp. 145–157, 1986.

    Article  Google Scholar 

  18. S. H. Chen and J. S. Pan, “Fast search algorithm for vq-based recognition of isolated word,” IEE Proc. I, vol. 136, no. 6, pp. 391–396, 1989.

    Google Scholar 

  19. L. Guan and M. Kamel, “Equal-average hyperplane partitioning method for vector quantization of image data,” Pattern Recognition Letters, pp. 693–699, 1992.

    Google Scholar 

  20. C. H. Lee and L. H. Chen, “Fast closest codeword search algorithm for vector quantization,” IEE Proc. Vision Image and Signal Processing, vol. 141, no. 3, pp. 143–148, 1994.

    Google Scholar 

  21. S. J. Baek, B. K. Jeon, and K. M. Sung, “A fast encoding algorithm for vector quantization,” IEEE Signal Processing Letters, vol. 4, no. 2, pp. 325–327, 1997.

    Article  Google Scholar 

  22. J. S. Pan, F. R. McInnes, and M. A. Jack, “Bound for minkowski metric or quadratic metric applied to vq codeword search,” IEE Proc. Vision Image and Signal Processing, vol. 143, no. 1, pp. 67–71, 1996.

    Google Scholar 

  23. M. R. Soleymani and S. D. Morgera, “A heigh-speed algorithm for vector quantization,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1946–1948, 1987.

    Google Scholar 

  24. J. S. Pan, Z. M. Lu, and S. H. Sun, “A fast codeword search algorithm for image coding based on mean-variance pyramids of codewords,” IEE Electronics Letters, vol. 36, no. 3, pp. 210–211, 2000.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chu, SC., Roddick, J.F., Pan, J.S. (2002). An Efficient K-Medoids-Based Algorithm Using Previous Medoid Index, Triangular Inequality Elimination Criteria, and Partial Distance Search. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_7

Download citation

  • DOI: https://doi.org/10.1007/3-540-46145-0_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44123-6

  • Online ISBN: 978-3-540-46145-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics