Subspace Clustering—A Survey

Kelkar, Bhagyashri A.; Rodd, Sunil F.

doi:10.1007/978-981-13-1402-5_16

Bhagyashri A. Kelkar^17,18 &
Sunil F. Rodd¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 808))

1512 Accesses
6 Citations

Abstract

High-dimensional data clustering is gaining attention in recent years due to its widespread applications in many domains like social networking, biology, etc. As a result of the advances in the data gathering and data storage technologies, many a times a single data object is often represented by many attributes. Although more data may provide new insights, it may also hinder the knowledge discovery process by cluttering the interesting relations with redundant information. The traditional definition of similarity becomes meaningless in high-dimensional data. Hence, clustering methods based on similarity between objects fail to cope with increased dimensionality of data. A dataset with large dimensionality can be better described in its subspaces than as a whole. Subspace clustering algorithms identify clusters existing in multiple, overlapping subspaces. Subspace clustering methods are further classified as top-down and bottom-up algorithms depending on strategy applied to identify subspaces. Initial clustering in case of top-down algorithms is based on full set of dimensions and it then iterates to identify subset of dimensions which can better represent the subspaces by removing irrelevant dimensions. Bottom-up algorithms start with low dimensional space and merge dense regions by using Apriori-based hierarchical clustering methods. It has been observed that, the performance and quality of results of a subspace clustering algorithm is highly dependent on the parameter values input to the algorithm. This paper gives an overview of work done in the field of subspace clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellman, R. (1961). Adaptive control processes. Princeton: Princeton University Press.
Book Google Scholar
Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. ACM SIGKDD Explorations, 6(1), 90–105.
Article Google Scholar
Francois, D., Wertz, V., & Verleysen, M. (2007). The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering, 19(7), 873–886.
Google Scholar
Agrawal, R., Gehrke, J., & Gunopulos, D. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 94–105).
Google Scholar
Liu, G., Sim, K., Li, J., & Wong, L. (2009). Efficient mining of distance-based subspace clusters. Statistical Analysis and Data Mining, 2(5–6), 427–444.
Article MathSciNet Google Scholar
Cheng, C.-H., Fu, A. W., & Zhang, Y. (1999). Entropy-based subspace clustering for mining numerical data. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 84–93).
Google Scholar
Goil, S., Nagesh, H., & Choudhary, A. (1999). Mafia: Efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University.
Google Scholar
Kröger, P., Kriegel, H.-P., & Kailing, K. (2004). Density-connected subspace clustering for high-dimensional data. In Proceedings of SIAM International Conference on Data Mining (pp. 246–257).
Google Scholar
Kriegel, H.-P. H., Kroger, P., Renz, M., & Wurst, S. (2005). A generic framework for efficient subspace clustering of high-dimensional data. In IEEE International Conference on Data Mining (pp. 250–257), Washington, DC, USA.
Google Scholar
Aggarwal, C. C., Procopiuc, C. M., Wolf, J. L., et al. (1999). Fast algorithms for projected clustering. In Proceedings of the ACM International Conference on Management of Data (SIGMOD) (pp. 61–72), Philadelphia, PA.
Google Scholar
Procopiuc, C. M., Jones, M., Agarwal, P. K., & Murali, T. M. (2002). A Monte Carlo algorithm for fast projective clustering in SIGMOD (pp. 418–427). USA.
Google Scholar
Bohm, C., Railing, K., Kriegel, H.-P., & Kroger, P. (2004). Density connected clustering with local subspace preferences. In Fourth IEEE International Conference on Data Mining, ICDM (pp. 27–34).
Google Scholar
Lance, P., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. ACM SIGKDD Explorations Newsletter, 6(1), 90–105.
Article Google Scholar
Hinneburg, A., & Keim, D. A. (1999). Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. In VLDB (pp. 506–517).
Google Scholar
Aggarwal, C. C., & Yu, P. S. (2000). Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 70–81).
Google Scholar
Friedman, J. H., & Meulman, J. J. (2004). Clustering objects on subsets of attributes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (pp. 815–849).
Google Scholar
Yang, J., Wang, W., Wang, H., & Yu, P. (2002). δ-Clusters: Capturing subspace correlation in a large data set. In Proceedings of the 18th International Conference on Data Engineering (pp. 517–528).
Google Scholar
Dash, M., Choi, K., Scheuermann, P., & Liu, H. (2002). Feature selection for clustering – a filter solution. In Proceedings of the IEEE International Conference on Data Mining (ICDM02) (pp. 115–124).
Google Scholar
Patrikainen, A., & Meila, M. (2006). Comparing subspace clusterings. TKDE, 18(7), 902–916.
Google Scholar
Müller, E., Günnemann, S., Assent, I., & Seidl, T. (2009). Evaluating clustering in subspace projections of high dimensional data. PVLDB, 2(1), 1270–1281.
Google Scholar
Weka 3: Data Mining Software in Java. (2014). Available: http://www.cs.waikato.ac.nz/ml/weka/.
OpenSubspace:Weka Subspace-Clustering Integration. (2014). Available: http://dme.rwth-aachen.de/OpenSubspace/.
Jaya Lakshmi, B., Shashi, M., & Madhuri, K. B. (2017). A rough set based subspace clustering technique for high dimensional data. Journal of King Saud University-Computer and Information Sciences.
Google Scholar
Jaya Lakshmi, B., Madhuri, K. B., & Shashi, M. (2017). An efficient algorithm for density based subspace clustering with dynamic parameter setting. International Journal of Information Technology and Computer Science, 9(6), 27–33.
Article Google Scholar
Tomašev, N., & Radovanović, M. (2016). Clustering evaluation in high-dimensional data. In Unsupervised Learning Algorithms (pp. 71–107). Berlin: Springer.
Google Scholar
Zhu, B., Ordozgoiti, B., & Mozo, A. (2016). PSCEG: An unbiased parallel subspace clustering algorithm using exact grids. In 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning ESSAN16 (pp. 27–29), Bruges (Belgium).
Google Scholar
Peignier, S., Rigotti, C., & Beslon, G. (2015). Subspace clustering using evolvable genome structure. In Proceedings of the ACM Genetic and Evolutionary Computation Conference (GECCO 2015) (pp. 1–8).
Google Scholar
Kaur, A., & Datta, A. (2015). A novel algorithm for fast and scalable subspace clustering of high-dimensional data. Journal of Big Data, 2(1), 1–24.
Google Scholar
Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2, 165–193.
Article Google Scholar
Sim, K., Gopalkrishnan, V., Zimek, A., & Cong, G. (2013). A survey on enhanced subspace clustering. Data Mining and Knowledge Discovery, 26(2), 332–397.
Article MathSciNet Google Scholar
Liu, H. W., Sun, J., Liu, L., & Zhang, H. J. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42(7), 1330–1339.
Article Google Scholar
Kriegel, H. P., Kröger, P., Zimek, A., & Oger, P. K. R. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery Data, 3(1), 1–58.
Article Google Scholar

Download references

Author information

Authors and Affiliations

KLS Gogte Institute of Technology, Belgaum, Karnataka, India
Bhagyashri A. Kelkar
Sanjay Ghodawat University, Atigre, Kolhapur, Maharashtra, India
Bhagyashri A. Kelkar
Department of Computer Science and Engineering, Gogte Institute of Technology, Belagavi, Karnataka, India
Sunil F. Rodd

Authors

Bhagyashri A. Kelkar
View author publications
You can also search for this author in PubMed Google Scholar
Sunil F. Rodd
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bhagyashri A. Kelkar .

Editor information

Editors and Affiliations

Department of Automatics and Applied Software, Aurel Vlaicu University of Arad, Arad, Romania
Valentina Emilia Balas
Audyogik Tantra Shikshan Sanstha’s, IICMR, Pune, Maharashtra, India
Neha Sharma
Faculty of Engineering and Technology, A. K. Choudhury School of Information Technology, Kolkata, West Bengal, India
Amlan Chakrabarti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kelkar, B.A., Rodd, S.F. (2019). Subspace Clustering—A Survey. In: Balas, V., Sharma, N., Chakrabarti, A. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 808. Springer, Singapore. https://doi.org/10.1007/978-981-13-1402-5_16

Download citation

DOI: https://doi.org/10.1007/978-981-13-1402-5_16
Published: 10 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1401-8
Online ISBN: 978-981-13-1402-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics