Clustering Methods for Moderate-to-High Dimensionality Data

Cordeiro, Robson  L. F.; Faloutsos, Christos; Traina Júnior, Caetano

doi:10.1007/978-1-4471-4890-6_3

Clustering Methods for Moderate-to-High Dimensionality Data

Robson L. F. Cordeiro⁴,
Christos Faloutsos⁵ &
Caetano Traina Júnior⁴

Chapter
First Online: 01 January 2013

2050 Accesses

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

Traditional clustering methods are usually inefficient and ineffective over data with more than five or so dimensions. In Sect. 2.3 of the previous chapter, we discuss the main reasons that lead to this fact. It is also mentioned that the use of dimensionality reduction methods does not solve the problem, since it allows one to treat only the global correlations in the data. Correlations local to subsets of the data cannot be identified without the prior identification of the data clusters where they occur. Thus, algorithms that combine dimensionality reduction and clustering into a single task have been developed to look for clusters together with the subspaces of the original space where they exist. Some of these algorithms are briefly described in this chapter. Specifically, we first present a concise survey on the existing algorithms, and later we discuss three of the most relevant ones. Then, in order to help one to evaluate and to compare the algorithms, we conclude the chapter by presenting a table to link some of the most relevant techniques with the main desirable properties that any clustering technique for moderate-to-high dimensionality data should have. The general goal is to identify the main strategies already used to deal with the problem, besides the key limitations of the existing techniques.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Table 3.1 includes a summary of one table found in [23], i.e., Table 3.1 includes a selection of most relevant desirable properties and most closely related works from the original table. Table 3.1 also includes two novel desirable properties not found in [23]–Linear or quasi-linear complexity and Terabyte-scale data analysis.

References

Achtert, E., Böhm, C., David, J., Kröger, P., Zimek, A.: Global correlation clustering based on the hough transform. Stat. Anal. Data Min. 1, 111–127 (2008)
Article MathSciNet Google Scholar
Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: Robust, complete, and efficient correlation clustering. SDM, USA (2007)
Google Scholar
Agarwal, P.K., Mustafa, N.H.: k-means projective clustering. In: PODS, pp. 155–165. ACM, Paris, France (2004). http://doi.acm.org/10.1145/1055558.1055581
Aggarwal, C., Yu, P.: Redefining clustering for high-dimensional applications. IEEE TKDE 14(2), 210–225 (2002). http://doi.ieeecomputersociety.org/10.1109/69.991713
Google Scholar
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Rec. 28(2), 61–72 (1999). http://doi.acm.org/10.1145/304181.304188
Google Scholar
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. SIGMOD Rec. 29(2), 70–81 (2000). http://doi.acm.org/10.1145/335191.335383
Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec. 27(2), 94–105 (1998). http://doi.acm.org/10.1145/276305.276314
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 11(1), 5–33 (2005). doi:10.1007/s10618-005-1396-1
Article MathSciNet Google Scholar
Aho, A.V., Hopcroft, J.E., Ullman, J.: The Design and Analysis of Computer Algorithms. Addison-Wesley Longman Publishing Co., Inc., Boston (1974)
MATH Google Scholar
Al-Razgan, M., Domeniconi, C.: Weighted clustering ensembles. In: J. Ghosh, D. Lambert, D.B. Skillicorn, J. Srivastava (eds.) SDM. SIAM (2006)
Google Scholar
Banfield, J.D., Raftery, A.E.: Model-based gaussian and non-gaussian clustering. Biometrics 49(3), 803–821 (1993)
Article MathSciNet MATH Google Scholar
Böhm, C., Faloutsos, C., Pan, J.Y., Plant, C.: Robust information-theoretic clustering. In: KDD, pp. 65–75. USA (2006). http://doi.acm.org/10.1145/1150402.1150414
Böhm, C., Faloutsos, C., Plant, C.: Outlier-robust clustering using independent components. In: SIGMOD, pp. 185–198. USA (2008). http://doi.acm.org/10.1145/1376616.1376638
Bohm, C., Kailing, K., Kriegel, H.P., Kroger, P.: Density connected clustering with local subspace preferences. In: ICDM ’04: Proceedings of the Fourth IEEE International Conference on Data Mining, pp. 27–34. IEEE Computer Society, Washington (2004)
Google Scholar
Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: SIGMOD, pp. 455–466. USA (2004). http://doi.acm.org/10.1145/1007568.1007620
Cheng, C.H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: KDD, pp. 84–93. USA (1999). http://doi.acm.org/10.1145/312129.312199
Cheng, H., Hua, K.A., Vu, K.: Constrained locally weighted clustering. In: Proceedings of the VLDB 1(1), 90–101 (2008). http://doi.acm.org/10.1145/1453856.1453871
Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., Papadopoulos, D.: Locally adaptive metrics for clustering high dimensional data. Data Min. Knowl. Discov. 14(1), 63–97 (2007). doi:10.1007/s10618-006-0060-8
Article MathSciNet Google Scholar
Domeniconi, C., Papadopoulos, D., Gunopulos, D., Ma, S.: Subspace clustering of high dimensional data. In: M.W. Berry, U. Dayal, C. Kamath, D.B. Skillicorn (eds.) SDM (2004)
Google Scholar
Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes (with discussion). J. R. Stat. Soc. Ser. B 66(4), 815–849 (2004). doi:ideas.repec.org/a/bla/jorssb/v66y2004i4p815-849.html
Article MathSciNet MATH Google Scholar
Grunwald, P.D., Myung, I.J., Pitt, M.A.: Advances in Minimum Description Length: Theory and Applications (Neural Information Processing). The MIT Press, Cambridge (2005)
Google Scholar
Kriegel, H.P., Kröger, P., Renz, M., Wurst, S.: A generic framework for efficient subspace clustering of high-dimensional data. In: ICDM, pp. 250–257. Washington (2005). http://dx.doi.org/10.1109/ICDM.2005.5
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD 3(1), 1–58 (2009). doi:10.1145/1497577.1497578
Article Google Scholar
Kröger, P., Kriegel, H.P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: SDM, USA (2004)
Google Scholar
Moise, G., Sander, J.: Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp. 533–541 (2008)
Google Scholar
Moise, G., Sander, J., Ester, M.: P3C: A robust projected clustering algorithm. In: ICDM, pp. 414–425. IEEE Computer Society (2006)
Google Scholar
Moise, G., Sander, J., Ester, M.: Robust projected clustering. Knowl. Inf. Syst. 14(3), 273–298 (2008). doi:10.1007/s10115-007-0090-6
Article MATH Google Scholar
Ng, E.K.K., chee Fu, A.W., Wong, R.C.W.: Projective clustering by histograms. TKDE 17(3), 369–383 (2005). doi:10.1109/TKDE.2005.47
Google Scholar
Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M.: A monte carlo algorithm for fast projective clustering. In: SIGMOD, pp. 418–427. USA (2002). http://doi.acm.org/10.1145/564691.564739
Rissanen, J.: Stochastic Complexity in Statistical Inquiry Theory. World Scientific Publishing Co., Inc., River Edge (1989)
Google Scholar
Tung, A.K.H., Xu, X., Ooi, B.C.: Curler: finding and visualizing nonlinear correlation clusters. In: SIGMOD, pp. 467–478 (2005). http://doi.acm.org/10.1145/1066157.1066211
Yip, K., Cheung, D., Ng, M.: Harp: a practical projected clustering algorithm. TKDE 16(11), 1387–1397 (2004)
Google Scholar
Yiu, M.L., Mamoulis, N.: Iterative projected clustering by subspace mining. TKDE 17(2), 176–189 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department (ICMC), University of Sao Paulo, Av. do Trabalhador Saocarlense 400, São Carlos, SP, 13566-590, Brazil
Robson L. F. Cordeiro & Caetano Traina Júnior
Department of Computer Science, Carnegie Mellon University, Forbes Ave. 5000, Pittsburgh, PA, 15213, USA
Christos Faloutsos

Authors

Robson L. F. Cordeiro
View author publications
You can also search for this author in PubMed Google Scholar
Christos Faloutsos
View author publications
You can also search for this author in PubMed Google Scholar
Caetano Traina Júnior
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robson L. F. Cordeiro .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cordeiro, R. ., Faloutsos, C., Traina Júnior, C. (2013). Clustering Methods for Moderate-to-High Dimensionality Data. In: Data Mining in Large Sets of Complex Data. SpringerBriefs in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-4890-6_3

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4890-6_3
Published: 11 January 2013
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4889-0
Online ISBN: 978-1-4471-4890-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics