Advertisement

Subspace Clustering—A Survey

  • Bhagyashri A. Kelkar
  • Sunil F. Rodd
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 808)

Abstract

High-dimensional data clustering is gaining attention in recent years due to its widespread applications in many domains like social networking, biology, etc. As a result of the advances in the data gathering and data storage technologies, many a times a single data object is often represented by many attributes. Although more data may provide new insights, it may also hinder the knowledge discovery process by cluttering the interesting relations with redundant information. The traditional definition of similarity becomes meaningless in high-dimensional data. Hence, clustering methods based on similarity between objects fail to cope with increased dimensionality of data. A dataset with large dimensionality can be better described in its subspaces than as a whole. Subspace clustering algorithms identify clusters existing in multiple, overlapping subspaces. Subspace clustering methods are further classified as top-down and bottom-up algorithms depending on strategy applied to identify subspaces. Initial clustering in case of top-down algorithms is based on full set of dimensions and it then iterates to identify subset of dimensions which can better represent the subspaces by removing irrelevant dimensions. Bottom-up algorithms start with low dimensional space and merge dense regions by using Apriori-based hierarchical clustering methods. It has been observed that, the performance and quality of results of a subspace clustering algorithm is highly dependent on the parameter values input to the algorithm. This paper gives an overview of work done in the field of subspace clustering.

Keywords

Clustering Subspace clustering High-dimensional data 

References

  1. 1.
    Bellman, R. (1961). Adaptive control processes. Princeton: Princeton University Press.CrossRefGoogle Scholar
  2. 2.
    Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. ACM SIGKDD Explorations, 6(1), 90–105.CrossRefGoogle Scholar
  3. 3.
    Francois, D., Wertz, V., & Verleysen, M. (2007). The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering, 19(7), 873–886.Google Scholar
  4. 4.
    Agrawal, R., Gehrke, J., & Gunopulos, D. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 94–105).Google Scholar
  5. 5.
    Liu, G., Sim, K., Li, J., & Wong, L. (2009). Efficient mining of distance-based subspace clusters. Statistical Analysis and Data Mining, 2(5–6), 427–444.MathSciNetCrossRefGoogle Scholar
  6. 6.
    Cheng, C.-H., Fu, A. W., & Zhang, Y. (1999). Entropy-based subspace clustering for mining numerical data. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 84–93).Google Scholar
  7. 7.
    Goil, S., Nagesh, H., & Choudhary, A. (1999). Mafia: Efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University.Google Scholar
  8. 8.
    Kröger, P., Kriegel, H.-P., & Kailing, K. (2004). Density-connected subspace clustering for high-dimensional data. In Proceedings of SIAM International Conference on Data Mining (pp. 246–257).Google Scholar
  9. 9.
    Kriegel, H.-P. H., Kroger, P., Renz, M., & Wurst, S. (2005). A generic framework for efficient subspace clustering of high-dimensional data. In IEEE International Conference on Data Mining (pp. 250–257), Washington, DC, USA.Google Scholar
  10. 10.
    Aggarwal, C. C., Procopiuc, C. M., Wolf, J. L., et al. (1999). Fast algorithms for projected clustering. In Proceedings of the ACM International Conference on Management of Data (SIGMOD) (pp. 61–72), Philadelphia, PA.Google Scholar
  11. 11.
    Procopiuc, C. M., Jones, M., Agarwal, P. K., & Murali, T. M. (2002). A Monte Carlo algorithm for fast projective clustering in SIGMOD (pp. 418–427). USA.Google Scholar
  12. 12.
    Bohm, C., Railing, K., Kriegel, H.-P., & Kroger, P. (2004). Density connected clustering with local subspace preferences. In Fourth IEEE International Conference on Data Mining, ICDM (pp. 27–34).Google Scholar
  13. 13.
    Lance, P., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. ACM SIGKDD Explorations Newsletter, 6(1), 90–105.CrossRefGoogle Scholar
  14. 14.
    Hinneburg, A., & Keim, D. A. (1999). Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. In VLDB (pp. 506–517).Google Scholar
  15. 15.
    Aggarwal, C. C., & Yu, P. S. (2000). Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 70–81).Google Scholar
  16. 16.
    Friedman, J. H., & Meulman, J. J. (2004). Clustering objects on subsets of attributes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (pp. 815–849).Google Scholar
  17. 17.
    Yang, J., Wang, W., Wang, H., & Yu, P. (2002). δ-Clusters: Capturing subspace correlation in a large data set. In Proceedings of the 18th International Conference on Data Engineering (pp. 517–528).Google Scholar
  18. 18.
    Dash, M., Choi, K., Scheuermann, P., & Liu, H. (2002). Feature selection for clustering – a filter solution. In Proceedings of the IEEE International Conference on Data Mining (ICDM02) (pp. 115–124).Google Scholar
  19. 19.
    Patrikainen, A., & Meila, M. (2006). Comparing subspace clusterings. TKDE, 18(7), 902–916.Google Scholar
  20. 20.
    Müller, E., Günnemann, S., Assent, I., & Seidl, T. (2009). Evaluating clustering in subspace projections of high dimensional data. PVLDB, 2(1), 1270–1281.Google Scholar
  21. 21.
    Weka 3: Data Mining Software in Java. (2014). Available: http://www.cs.waikato.ac.nz/ml/weka/.
  22. 22.
    OpenSubspace:Weka Subspace-Clustering Integration. (2014). Available: http://dme.rwth-aachen.de/OpenSubspace/.
  23. 23.
    Jaya Lakshmi, B., Shashi, M., & Madhuri, K. B. (2017). A rough set based subspace clustering technique for high dimensional data. Journal of King Saud University-Computer and Information Sciences.Google Scholar
  24. 24.
    Jaya Lakshmi, B., Madhuri, K. B., & Shashi, M. (2017). An efficient algorithm for density based subspace clustering with dynamic parameter setting. International Journal of Information Technology and Computer Science, 9(6), 27–33.CrossRefGoogle Scholar
  25. 25.
    Tomašev, N., & Radovanović, M. (2016). Clustering evaluation in high-dimensional data. In Unsupervised Learning Algorithms (pp. 71–107). Berlin: Springer.Google Scholar
  26. 26.
    Zhu, B., Ordozgoiti, B., & Mozo, A. (2016). PSCEG: An unbiased parallel subspace clustering algorithm using exact grids. In 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning ESSAN16 (pp. 27–29), Bruges (Belgium).Google Scholar
  27. 27.
    Peignier, S., Rigotti, C., & Beslon, G. (2015). Subspace clustering using evolvable genome structure. In Proceedings of the ACM Genetic and Evolutionary Computation Conference (GECCO 2015) (pp. 1–8).Google Scholar
  28. 28.
    Kaur, A., & Datta, A. (2015). A novel algorithm for fast and scalable subspace clustering of high-dimensional data. Journal of Big Data, 2(1), 1–24.Google Scholar
  29. 29.
    Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2, 165–193.CrossRefGoogle Scholar
  30. 30.
    Sim, K., Gopalkrishnan, V., Zimek, A., & Cong, G. (2013). A survey on enhanced subspace clustering. Data Mining and Knowledge Discovery, 26(2), 332–397.MathSciNetCrossRefGoogle Scholar
  31. 31.
    Liu, H. W., Sun, J., Liu, L., & Zhang, H. J. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42(7), 1330–1339.CrossRefGoogle Scholar
  32. 32.
    Kriegel, H. P., Kröger, P., Zimek, A., & Oger, P. K. R. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery Data, 3(1), 1–58.CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.KLS Gogte Institute of TechnologyBelgaumIndia
  2. 2.Sanjay Ghodawat UniversityAtigre, KolhapurIndia
  3. 3.Department of Computer Science and EngineeringGogte Institute of TechnologyBelagaviIndia

Personalised recommendations