An Auto-stopped Hierarchical Clustering Algorithm Integrating Outlier Detection Algorithm

Lv, Tian-yang; Su, Tai-xue; Wang, Zheng-xuan; Zuo, Wan-li

doi:10.1007/11563952_41

Tian-yang Lv^19,20,
Tai-xue Su¹⁹,
Zheng-xuan Wang¹⁹ &
…
Wan-li Zuo¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3739))

Included in the following conference series:

International Conference on Web-Age Information Management

782 Accesses
3 Citations

Abstract

It is a critical problem for the clustering analysis techniques to select the appropriate value of parameters. Meanwhile, the clustering algorithms lack the effective mechanism to detect outliers while treating outliers as “noise”. By regarding outliers as valuable information, the paper proposes a novel hierarchical clustering algorithm that integrates a new outlier-mining method. The algorithm stops clustering according to the dissimilarity reflected by the detected outliers and needs only one parameter, whose appropriate value can be decided in the outlier mining process. After discussing some related topics, the paper adopts 5 real-life datasets to evaluate the performance of the clustering algorithm in outlier mining and clustering and compare it with other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rosenberger, C., Chehdi, K.: Unsupervised Clustering Method with Optimal Estimation of the Number of Clusters: Application to Image Segmentation. In: International Conference on Pattern Recognition, vol. 1, pp. 656–659 (September 2000)
Google Scholar
Xiong, X., Chan, K.L.: Towards: An Unsupervised Optimal Fuzzy Clustering algorithm for Image Database Organization. In: International Conference on Pattern Recognition, vol. 3, pp. 3909–3913 (September 2000)
Google Scholar
Gehrke, J.: Report on the SIGKDD 2001 Conference Panel “New Research directions in KDD”. SIGKDD Explorations 3(2), 76–77 (2002)
Article Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: an Efficient Clustering Algorithm for Large Database. In: Haas, L.M., Tiwary, A. (eds.) Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 73–84. ACM Press, Seattle (1998)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: Proc. of the 15th Int’l Conf. on Data Eng., pp. 512–521 (1999)
Google Scholar
Zhang, T., et al.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114 (1996)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231 (1996)
Google Scholar
Fred, A.L.N., Leitão, J.M.N.: A new Cluster Isolation criterion Based on Dissimilarity Increments. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(8), 944–958 (2003)
Article Google Scholar
http://www.statdaddy.com/
Knorr, E.M., Ng, R.T.: Finding Intensional Knowledge of Distance-Based outliers. In: Proceedings of the 25th Very Large Data Bases conference, Edinburgh, Scotland, pp. 211–222 (1999)
Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for mining outliers from Large Data Sets. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, Dallas, Texas, United States, pp. 427–438 (2000)
Google Scholar
Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiment and Analysis. Technical Report #01-40, University of Minnesota, 1–40 (2001)
Google Scholar
Faloutsos, C., Lin, K.: FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. In: Proceedings of 1995 ACM SIGMOD, SIGMOD RECORD, vol. 24(2), pp. 163–174 (1995)
Google Scholar
Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, Changchun, China
Tian-yang Lv, Tai-xue Su, Zheng-xuan Wang & Wan-li Zuo
College of Computer Science and Technology, Harbin Engineering University, Harbin, China
Tian-yang Lv

Authors

Tian-yang Lv
View author publications
You can also search for this author in PubMed Google Scholar
Tai-xue Su
View author publications
You can also search for this author in PubMed Google Scholar
Zheng-xuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wan-li Zuo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh & Bell Laboratories,
Wenfei Fan
College of Computer Science, Zhejiang University, 310027, Hangzhou, Zhejiang, China
Zhaohui Wu
Dept. of E. I. E, Huazhong University of Science and Technology, Wuhan, China
Jun Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lv, Ty., Su, Tx., Wang, Zx., Zuo, Wl. (2005). An Auto-stopped Hierarchical Clustering Algorithm Integrating Outlier Detection Algorithm. In: Fan, W., Wu, Z., Yang, J. (eds) Advances in Web-Age Information Management. WAIM 2005. Lecture Notes in Computer Science, vol 3739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563952_41

Download citation

DOI: https://doi.org/10.1007/11563952_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29227-2
Online ISBN: 978-3-540-32087-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics