Skip to main content

Clustering Methods for Moderate-to-High Dimensionality Data

  • Chapter
  • First Online:
  • 2050 Accesses

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

Traditional clustering methods are usually inefficient and ineffective over data with more than five or so dimensions. In Sect. 2.3 of the previous chapter, we discuss the main reasons that lead to this fact. It is also mentioned that the use of dimensionality reduction methods does not solve the problem, since it allows one to treat only the global correlations in the data. Correlations local to subsets of the data cannot be identified without the prior identification of the data clusters where they occur. Thus, algorithms that combine dimensionality reduction and clustering into a single task have been developed to look for clusters together with the subspaces of the original space where they exist. Some of these algorithms are briefly described in this chapter. Specifically, we first present a concise survey on the existing algorithms, and later we discuss three of the most relevant ones. Then, in order to help one to evaluate and to compare the algorithms, we conclude the chapter by presenting a table to link some of the most relevant techniques with the main desirable properties that any clustering technique for moderate-to-high dimensionality data should have. The general goal is to identify the main strategies already used to deal with the problem, besides the key limitations of the existing techniques.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Table 3.1 includes a summary of one table found in [23], i.e., Table 3.1 includes a selection of most relevant desirable properties and most closely related works from the original table. Table 3.1 also includes two novel desirable properties not found in [23]–Linear or quasi-linear complexity and Terabyte-scale data analysis.

References

  1. Achtert, E., Böhm, C., David, J., Kröger, P., Zimek, A.: Global correlation clustering based on the hough transform. Stat. Anal. Data Min. 1, 111–127 (2008)

    Article  MathSciNet  Google Scholar 

  2. Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: Robust, complete, and efficient correlation clustering. SDM, USA (2007)

    Google Scholar 

  3. Agarwal, P.K., Mustafa, N.H.: k-means projective clustering. In: PODS, pp. 155–165. ACM, Paris, France (2004). http://doi.acm.org/10.1145/1055558.1055581

  4. Aggarwal, C., Yu, P.: Redefining clustering for high-dimensional applications. IEEE TKDE 14(2), 210–225 (2002). http://doi.ieeecomputersociety.org/10.1109/69.991713

    Google Scholar 

  5. Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Rec. 28(2), 61–72 (1999). http://doi.acm.org/10.1145/304181.304188

    Google Scholar 

  6. Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. SIGMOD Rec. 29(2), 70–81 (2000). http://doi.acm.org/10.1145/335191.335383

    Google Scholar 

  7. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec. 27(2), 94–105 (1998). http://doi.acm.org/10.1145/276305.276314

  8. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 11(1), 5–33 (2005). doi:10.1007/s10618-005-1396-1

    Article  MathSciNet  Google Scholar 

  9. Aho, A.V., Hopcroft, J.E., Ullman, J.: The Design and Analysis of Computer Algorithms. Addison-Wesley Longman Publishing Co., Inc., Boston (1974)

    MATH  Google Scholar 

  10. Al-Razgan, M., Domeniconi, C.: Weighted clustering ensembles. In: J. Ghosh, D. Lambert, D.B. Skillicorn, J. Srivastava (eds.) SDM. SIAM (2006)

    Google Scholar 

  11. Banfield, J.D., Raftery, A.E.: Model-based gaussian and non-gaussian clustering. Biometrics 49(3), 803–821 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  12. Böhm, C., Faloutsos, C., Pan, J.Y., Plant, C.: Robust information-theoretic clustering. In: KDD, pp. 65–75. USA (2006). http://doi.acm.org/10.1145/1150402.1150414

  13. Böhm, C., Faloutsos, C., Plant, C.: Outlier-robust clustering using independent components. In: SIGMOD, pp. 185–198. USA (2008). http://doi.acm.org/10.1145/1376616.1376638

  14. Bohm, C., Kailing, K., Kriegel, H.P., Kroger, P.: Density connected clustering with local subspace preferences. In: ICDM ’04: Proceedings of the Fourth IEEE International Conference on Data Mining, pp. 27–34. IEEE Computer Society, Washington (2004)

    Google Scholar 

  15. Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: SIGMOD, pp. 455–466. USA (2004). http://doi.acm.org/10.1145/1007568.1007620

  16. Cheng, C.H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: KDD, pp. 84–93. USA (1999). http://doi.acm.org/10.1145/312129.312199

  17. Cheng, H., Hua, K.A., Vu, K.: Constrained locally weighted clustering. In: Proceedings of the VLDB 1(1), 90–101 (2008). http://doi.acm.org/10.1145/1453856.1453871

  18. Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., Papadopoulos, D.: Locally adaptive metrics for clustering high dimensional data. Data Min. Knowl. Discov. 14(1), 63–97 (2007). doi:10.1007/s10618-006-0060-8

    Article  MathSciNet  Google Scholar 

  19. Domeniconi, C., Papadopoulos, D., Gunopulos, D., Ma, S.: Subspace clustering of high dimensional data. In: M.W. Berry, U. Dayal, C. Kamath, D.B. Skillicorn (eds.) SDM (2004)

    Google Scholar 

  20. Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes (with discussion). J. R. Stat. Soc. Ser. B 66(4), 815–849 (2004). doi:ideas.repec.org/a/bla/jorssb/v66y2004i4p815-849.html

    Article  MathSciNet  MATH  Google Scholar 

  21. Grunwald, P.D., Myung, I.J., Pitt, M.A.: Advances in Minimum Description Length: Theory and Applications (Neural Information Processing). The MIT Press, Cambridge (2005)

    Google Scholar 

  22. Kriegel, H.P., Kröger, P., Renz, M., Wurst, S.: A generic framework for efficient subspace clustering of high-dimensional data. In: ICDM, pp. 250–257. Washington (2005). http://dx.doi.org/10.1109/ICDM.2005.5

  23. Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD 3(1), 1–58 (2009). doi:10.1145/1497577.1497578

    Article  Google Scholar 

  24. Kröger, P., Kriegel, H.P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: SDM, USA (2004)

    Google Scholar 

  25. Moise, G., Sander, J.: Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp. 533–541 (2008)

    Google Scholar 

  26. Moise, G., Sander, J., Ester, M.: P3C: A robust projected clustering algorithm. In: ICDM, pp. 414–425. IEEE Computer Society (2006)

    Google Scholar 

  27. Moise, G., Sander, J., Ester, M.: Robust projected clustering. Knowl. Inf. Syst. 14(3), 273–298 (2008). doi:10.1007/s10115-007-0090-6

    Article  MATH  Google Scholar 

  28. Ng, E.K.K., chee Fu, A.W., Wong, R.C.W.: Projective clustering by histograms. TKDE 17(3), 369–383 (2005). doi:10.1109/TKDE.2005.47

    Google Scholar 

  29. Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M.: A monte carlo algorithm for fast projective clustering. In: SIGMOD, pp. 418–427. USA (2002). http://doi.acm.org/10.1145/564691.564739

  30. Rissanen, J.: Stochastic Complexity in Statistical Inquiry Theory. World Scientific Publishing Co., Inc., River Edge (1989)

    Google Scholar 

  31. Tung, A.K.H., Xu, X., Ooi, B.C.: Curler: finding and visualizing nonlinear correlation clusters. In: SIGMOD, pp. 467–478 (2005). http://doi.acm.org/10.1145/1066157.1066211

  32. Yip, K., Cheung, D., Ng, M.: Harp: a practical projected clustering algorithm. TKDE 16(11), 1387–1397 (2004)

    Google Scholar 

  33. Yiu, M.L., Mamoulis, N.: Iterative projected clustering by subspace mining. TKDE 17(2), 176–189 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robson L. F. Cordeiro .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 The Author(s)

About this chapter

Cite this chapter

Cordeiro, R. ., Faloutsos, C., Traina Júnior, C. (2013). Clustering Methods for Moderate-to-High Dimensionality Data. In: Data Mining in Large Sets of Complex Data. SpringerBriefs in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-4890-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4890-6_3

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4889-0

  • Online ISBN: 978-1-4471-4890-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics