Skip to main content

Subspace Clustering—A Survey

  • Conference paper
  • First Online:
Data Management, Analytics and Innovation

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 808))

Abstract

High-dimensional data clustering is gaining attention in recent years due to its widespread applications in many domains like social networking, biology, etc. As a result of the advances in the data gathering and data storage technologies, many a times a single data object is often represented by many attributes. Although more data may provide new insights, it may also hinder the knowledge discovery process by cluttering the interesting relations with redundant information. The traditional definition of similarity becomes meaningless in high-dimensional data. Hence, clustering methods based on similarity between objects fail to cope with increased dimensionality of data. A dataset with large dimensionality can be better described in its subspaces than as a whole. Subspace clustering algorithms identify clusters existing in multiple, overlapping subspaces. Subspace clustering methods are further classified as top-down and bottom-up algorithms depending on strategy applied to identify subspaces. Initial clustering in case of top-down algorithms is based on full set of dimensions and it then iterates to identify subset of dimensions which can better represent the subspaces by removing irrelevant dimensions. Bottom-up algorithms start with low dimensional space and merge dense regions by using Apriori-based hierarchical clustering methods. It has been observed that, the performance and quality of results of a subspace clustering algorithm is highly dependent on the parameter values input to the algorithm. This paper gives an overview of work done in the field of subspace clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bellman, R. (1961). Adaptive control processes. Princeton: Princeton University Press.

    Book  Google Scholar 

  2. Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. ACM SIGKDD Explorations, 6(1), 90–105.

    Article  Google Scholar 

  3. Francois, D., Wertz, V., & Verleysen, M. (2007). The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering, 19(7), 873–886.

    Google Scholar 

  4. Agrawal, R., Gehrke, J., & Gunopulos, D. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 94–105).

    Google Scholar 

  5. Liu, G., Sim, K., Li, J., & Wong, L. (2009). Efficient mining of distance-based subspace clusters. Statistical Analysis and Data Mining, 2(5–6), 427–444.

    Article  MathSciNet  Google Scholar 

  6. Cheng, C.-H., Fu, A. W., & Zhang, Y. (1999). Entropy-based subspace clustering for mining numerical data. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 84–93).

    Google Scholar 

  7. Goil, S., Nagesh, H., & Choudhary, A. (1999). Mafia: Efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University.

    Google Scholar 

  8. Kröger, P., Kriegel, H.-P., & Kailing, K. (2004). Density-connected subspace clustering for high-dimensional data. In Proceedings of SIAM International Conference on Data Mining (pp. 246–257).

    Google Scholar 

  9. Kriegel, H.-P. H., Kroger, P., Renz, M., & Wurst, S. (2005). A generic framework for efficient subspace clustering of high-dimensional data. In IEEE International Conference on Data Mining (pp. 250–257), Washington, DC, USA.

    Google Scholar 

  10. Aggarwal, C. C., Procopiuc, C. M., Wolf, J. L., et al. (1999). Fast algorithms for projected clustering. In Proceedings of the ACM International Conference on Management of Data (SIGMOD) (pp. 61–72), Philadelphia, PA.

    Google Scholar 

  11. Procopiuc, C. M., Jones, M., Agarwal, P. K., & Murali, T. M. (2002). A Monte Carlo algorithm for fast projective clustering in SIGMOD (pp. 418–427). USA.

    Google Scholar 

  12. Bohm, C., Railing, K., Kriegel, H.-P., & Kroger, P. (2004). Density connected clustering with local subspace preferences. In Fourth IEEE International Conference on Data Mining, ICDM (pp. 27–34).

    Google Scholar 

  13. Lance, P., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. ACM SIGKDD Explorations Newsletter, 6(1), 90–105.

    Article  Google Scholar 

  14. Hinneburg, A., & Keim, D. A. (1999). Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. In VLDB (pp. 506–517).

    Google Scholar 

  15. Aggarwal, C. C., & Yu, P. S. (2000). Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 70–81).

    Google Scholar 

  16. Friedman, J. H., & Meulman, J. J. (2004). Clustering objects on subsets of attributes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (pp. 815–849).

    Google Scholar 

  17. Yang, J., Wang, W., Wang, H., & Yu, P. (2002). δ-Clusters: Capturing subspace correlation in a large data set. In Proceedings of the 18th International Conference on Data Engineering (pp. 517–528).

    Google Scholar 

  18. Dash, M., Choi, K., Scheuermann, P., & Liu, H. (2002). Feature selection for clustering – a filter solution. In Proceedings of the IEEE International Conference on Data Mining (ICDM02) (pp. 115–124).

    Google Scholar 

  19. Patrikainen, A., & Meila, M. (2006). Comparing subspace clusterings. TKDE, 18(7), 902–916.

    Google Scholar 

  20. Müller, E., Günnemann, S., Assent, I., & Seidl, T. (2009). Evaluating clustering in subspace projections of high dimensional data. PVLDB, 2(1), 1270–1281.

    Google Scholar 

  21. Weka 3: Data Mining Software in Java. (2014). Available: http://www.cs.waikato.ac.nz/ml/weka/.

  22. OpenSubspace:Weka Subspace-Clustering Integration. (2014). Available: http://dme.rwth-aachen.de/OpenSubspace/.

  23. Jaya Lakshmi, B., Shashi, M., & Madhuri, K. B. (2017). A rough set based subspace clustering technique for high dimensional data. Journal of King Saud University-Computer and Information Sciences.

    Google Scholar 

  24. Jaya Lakshmi, B., Madhuri, K. B., & Shashi, M. (2017). An efficient algorithm for density based subspace clustering with dynamic parameter setting. International Journal of Information Technology and Computer Science, 9(6), 27–33.

    Article  Google Scholar 

  25. Tomašev, N., & Radovanović, M. (2016). Clustering evaluation in high-dimensional data. In Unsupervised Learning Algorithms (pp. 71–107). Berlin: Springer.

    Google Scholar 

  26. Zhu, B., Ordozgoiti, B., & Mozo, A. (2016). PSCEG: An unbiased parallel subspace clustering algorithm using exact grids. In 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning ESSAN16 (pp. 27–29), Bruges (Belgium).

    Google Scholar 

  27. Peignier, S., Rigotti, C., & Beslon, G. (2015). Subspace clustering using evolvable genome structure. In Proceedings of the ACM Genetic and Evolutionary Computation Conference (GECCO 2015) (pp. 1–8).

    Google Scholar 

  28. Kaur, A., & Datta, A. (2015). A novel algorithm for fast and scalable subspace clustering of high-dimensional data. Journal of Big Data, 2(1), 1–24.

    Google Scholar 

  29. Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2, 165–193.

    Article  Google Scholar 

  30. Sim, K., Gopalkrishnan, V., Zimek, A., & Cong, G. (2013). A survey on enhanced subspace clustering. Data Mining and Knowledge Discovery, 26(2), 332–397.

    Article  MathSciNet  Google Scholar 

  31. Liu, H. W., Sun, J., Liu, L., & Zhang, H. J. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42(7), 1330–1339.

    Article  Google Scholar 

  32. Kriegel, H. P., Kröger, P., Zimek, A., & Oger, P. K. R. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery Data, 3(1), 1–58.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhagyashri A. Kelkar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kelkar, B.A., Rodd, S.F. (2019). Subspace Clustering—A Survey. In: Balas, V., Sharma, N., Chakrabarti, A. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 808. Springer, Singapore. https://doi.org/10.1007/978-981-13-1402-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1402-5_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1401-8

  • Online ISBN: 978-981-13-1402-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics