Advertisement

Efficient Density-Based Subspace Clustering in High Dimensions

  • Ira AssentEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7627)

Abstract

Density-based clustering defines clusters as dense areas in feature space separated by sparsely populated areas. It is known to successfully identify clusters of arbitrary shapes even in noisy data. Today, we face increasingly high-dimensional data, i.e. data objects described by many attributes. Effects attributed to the “curse of dimensionality” mean that in high-dimensional spaces, traditional clustering methods fail to identify meaningful clusters. In little more than a decade, the research field of subspace clustering has established methods for identifying clusters in subsets of the attributes in such high-dimensional spaces. As the number of possible subsets is exponential in the number of attributes, efficient algorithms are crucial. This short survey discusses challenges in this area, and presents models and algorithms for efficient and scalable density-based subspace clustering.

Keywords

Data mining Clustering Density-based clustering Subspace clustering High dimensional Efficiency 

Notes

Acknowledgments

This work has been supported in part by the Danish Council for Strategic Research, grant 10-092316, and by the Danish Council for Independent Research - Technology and Production Sciences (FTP), grant 10-081972.

References

  1. 1.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD, pp. 94–105 (1998)Google Scholar
  2. 2.
    Assent, I.: Clustering high dimensional data. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 2(4), 340–350 (2012)CrossRefGoogle Scholar
  3. 3.
    Assent, I., Krieger, R., Müller, E., Seidl, T.: DUSC: dimensionality unbiased subspace clustering. In: ICDM, pp. 409–414 (2007)Google Scholar
  4. 4.
    Assent, I., Krieger, R., Müller, E., Seidl, T.: EDSC: efficient density-based subspace clustering. In: CIKM, pp. 1093–1102 (2008)Google Scholar
  5. 5.
    Assent, I., Krieger, R., Müller, E., Seidl, T.: INSCY: indexing subspace clusters with in-process-removal of redundancy. In: ICDM, pp. 719–724 (2008)Google Scholar
  6. 6.
    Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)zbMATHGoogle Scholar
  7. 7.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998) CrossRefGoogle Scholar
  8. 8.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases. In: KDD, pp. 226–231 (1996)Google Scholar
  9. 9.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12 (2000)Google Scholar
  10. 10.
    Kailing, K., Kriegel, H.-P., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: SDM, pp. 246–257 (2004)Google Scholar
  11. 11.
    Kriegel, H.-P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdisc. Rev. Data Min. Knowl. Disc. 1(3), 231–240 (2011)CrossRefGoogle Scholar
  12. 12.
    Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1:1–1:58 (2009)CrossRefGoogle Scholar
  13. 13.
    Moise, G., Sander, J.: Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp. 533–541 (2008)Google Scholar
  14. 14.
    Moise, G., Zimek, A., Kröger, P., Kriegel, H.-P., Sander, J.: Subspace and projected clustering: experimental evaluation and analysis. Knowl. Inf. Syst. 21, 299–326 (2009). doi: 10.1007/s10115-009-0226-y CrossRefGoogle Scholar
  15. 15.
    Müller, E., Assent, I., Günnemann, S., Krieger, R., Seidl, T.: Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp. 377–386 (2009)Google Scholar
  16. 16.
    Müller, E., Assent, I., Günnemann, S., Seidl, T.: Scalable density-based subspace clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1077–1086. ACM (2011)Google Scholar
  17. 17.
    Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. PVLDB 2(1), 1270–1281 (2009)Google Scholar
  18. 18.
    Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explor. 6(1), 90–105 (2004)CrossRefGoogle Scholar
  19. 19.
    Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace clustering. Data Min. Knowl. Discov. 26, 332–397 (2012). online firstMathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceAarhus UniversityAarhusDenmark

Personalised recommendations