Skip to main content

An Auto-stopped Hierarchical Clustering Algorithm Integrating Outlier Detection Algorithm

  • Conference paper
Advances in Web-Age Information Management (WAIM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3739))

Included in the following conference series:

Abstract

It is a critical problem for the clustering analysis techniques to select the appropriate value of parameters. Meanwhile, the clustering algorithms lack the effective mechanism to detect outliers while treating outliers as “noise”. By regarding outliers as valuable information, the paper proposes a novel hierarchical clustering algorithm that integrates a new outlier-mining method. The algorithm stops clustering according to the dissimilarity reflected by the detected outliers and needs only one parameter, whose appropriate value can be decided in the outlier mining process. After discussing some related topics, the paper adopts 5 real-life datasets to evaluate the performance of the clustering algorithm in outlier mining and clustering and compare it with other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rosenberger, C., Chehdi, K.: Unsupervised Clustering Method with Optimal Estimation of the Number of Clusters: Application to Image Segmentation. In: International Conference on Pattern Recognition, vol. 1, pp. 656–659 (September 2000)

    Google Scholar 

  2. Xiong, X., Chan, K.L.: Towards: An Unsupervised Optimal Fuzzy Clustering algorithm for Image Database Organization. In: International Conference on Pattern Recognition, vol. 3, pp. 3909–3913 (September 2000)

    Google Scholar 

  3. Gehrke, J.: Report on the SIGKDD 2001 Conference Panel “New Research directions in KDD”. SIGKDD Explorations 3(2), 76–77 (2002)

    Article  Google Scholar 

  4. Guha, S., Rastogi, R., Shim, K.: CURE: an Efficient Clustering Algorithm for Large Database. In: Haas, L.M., Tiwary, A. (eds.) Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 73–84. ACM Press, Seattle (1998)

    Google Scholar 

  5. Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: Proc. of the 15th Int’l Conf. on Data Eng., pp. 512–521 (1999)

    Google Scholar 

  6. Zhang, T., et al.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114 (1996)

    Google Scholar 

  7. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231 (1996)

    Google Scholar 

  8. Fred, A.L.N., Leitão, J.M.N.: A new Cluster Isolation criterion Based on Dissimilarity Increments. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(8), 944–958 (2003)

    Article  Google Scholar 

  9. http://www.statdaddy.com/

  10. Knorr, E.M., Ng, R.T.: Finding Intensional Knowledge of Distance-Based outliers. In: Proceedings of the 25th Very Large Data Bases conference, Edinburgh, Scotland, pp. 211–222 (1999)

    Google Scholar 

  11. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for mining outliers from Large Data Sets. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, Dallas, Texas, United States, pp. 427–438 (2000)

    Google Scholar 

  12. Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  13. Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiment and Analysis. Technical Report #01-40, University of Minnesota, 1–40 (2001)

    Google Scholar 

  14. Faloutsos, C., Lin, K.: FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. In: Proceedings of 1995 ACM SIGMOD, SIGMOD RECORD, vol. 24(2), pp. 163–174 (1995)

    Google Scholar 

  15. Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lv, Ty., Su, Tx., Wang, Zx., Zuo, Wl. (2005). An Auto-stopped Hierarchical Clustering Algorithm Integrating Outlier Detection Algorithm. In: Fan, W., Wu, Z., Yang, J. (eds) Advances in Web-Age Information Management. WAIM 2005. Lecture Notes in Computer Science, vol 3739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563952_41

Download citation

  • DOI: https://doi.org/10.1007/11563952_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29227-2

  • Online ISBN: 978-3-540-32087-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics