A Robust Seedless Algorithm for Correlation Clustering

Aziz, Mohammad S.; Reddy, Chandan K.

doi:10.1007/978-3-642-13657-3_6

Mohammad S. Aziz²³ &
Chandan K. Reddy²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6118))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

4120 Accesses
5 Citations

Abstract

Finding correlation clusters in the arbitrary subspaces of high- dimensional data is an important and a challenging research problem. The current state-of-the-art correlation clustering approaches are sensitive to the initial set of seeds chosen and do not yield the optimal result in the presence of noise. To avoid these problems, we propose RObust SEedless Correlation Clustering (ROSECC) algorithm that does not require the selection of the initial set of seeds. Our approach incrementally partitions the data in each iteration and applies PCA to each partition independently. ROSECC does not assume the dimensionality of the cluster beforehand and automatically determines the appropriate dimensionality (and the corresponding subspaces) of the correlation cluster. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of the proposed method. We also show the robustness of our method in the presence of a significant noise levels in the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Achtert, E., Böhm, C., David, J., Kröger, P., Zimek, A.: Robust clustering in arbitrarily oriented subspaces. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 763–774 (2008)
Google Scholar
Aggarwal, C., Yu, P.: Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 70–81 (2000)
Google Scholar
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. In: Proceedings of the ACM SIGMOD international conference on Management of data, pp. 61–72 (1999)
Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 94–105 (1998)
Google Scholar
Bohm, C., Kailing, K., Kroger, P., Zimek, A.: Computing clusters of correlation connected objects. In: Proceedings of the ACM SIGMOD international conference on Management of data, pp. 455–466 (2004)
Google Scholar
Cheng, C., Fu, A.W., Zhang, Y.: ENCLUS: Entropy-based subspace clustering for mining numerical data. In: Proceedings of the ACM conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 84–93 (1999)
Google Scholar
Ding, C.H.Q., He, X., Zha, H., Simon, H.D.: Adaptive dimension reduction for clustering high dimensional data. In: Proceedings of the IEEE International Conference on Data Mining, pp. 147–154 (2002)
Google Scholar
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the ACM conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 226–231 (1996)
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Kailing, K., Kriegel, H., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 246–257 (2004)
Google Scholar
Kriegel, H., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data (TKDD) 3(1), 1–58 (2009)
Article Google Scholar
Yip, K.Y., Cheung, D.W., Ng, M.K.: On discovery of extremely low-dimensional clusters using semi-supervised projected clustering. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), pp. 329–340 (2005)
Google Scholar
Yip, K.Y., Ng, M.K.: Harp: A practical projected clustering algorithm. IEEE Transactions on Knowledge and Data Engeneering (TKDE) 16(11), 1387–1397 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Wayne State University, Detroit, MI, USA
Mohammad S. Aziz & Chandan K. Reddy

Authors

Mohammad S. Aziz
View author publications
You can also search for this author in PubMed Google Scholar
Chandan K. Reddy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Rensselaer Polytechnic Institute, USA
Mohammed J. Zaki
The Chinese University of Hong Kong, China
Jeffrey Xu Yu
IIT Madras, Chennai, India
B. Ravindran
IIIT, Hyderabad, India
Vikram Pudi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aziz, M.S., Reddy, C.K. (2010). A Robust Seedless Algorithm for Correlation Clustering. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2010. Lecture Notes in Computer Science(), vol 6118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13657-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-13657-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13656-6
Online ISBN: 978-3-642-13657-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics