Advertisement

Clustering in High-dimensional Data Spaces

  • Fionn Murtagh
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

By high-dimensional we mean dimensionality of the same order as the number of objects or observations to cluster, and the latter in the range of thousands upwards. Bellman’s “curse of dimensionality” applies to many widely-used data analysis methods in high-dimensional spaces. One way to address this problem is by array permuting methods, involving row/column reordering. Such methods are closely related to dimensionality reduction methods such as principal components analysis. An imposed order on an array is beneficial not only for visualization but also for use of a vast range of image processing methods. For example, clustering becomes in this context image feature detection.

Keywords

Dimensionality Reduction Method Ultrametric Space Travel Salesperson Problem Sparse Array Progressive Transmission 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BELLMAN, R. (1961) Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton.zbMATHGoogle Scholar
  2. MURTAGH, F. and STARCK, J.L. (1998) “Pattern clustering based on noise modeling in wavelet space”, Pattern Recognition, 31, 847–855.CrossRefGoogle Scholar
  3. STARCK, J.L., MURTAGH, F. and BIJAOUI, A. (1998) Image and Data Analysis: The Multiscale Approach. Cambridge University Press, Cambridge.CrossRefGoogle Scholar
  4. CHEREUL, E., CREZE, M. and BIENAYME, O. (1997) “3D wavelet transform analysis of Hipparcos data”, in Maccarone, M.C., Murtagh, F., Kurtz, M. and Bijaoui, A. (eds.). Advanced Techniques and Methods for Astronomical Information Handling, Observatoire de la Côte d’Azur, Nice, France, 41–48.Google Scholar
  5. BYERS, S. and RAFTERY, A.E. (1996) “Nearest neighbor clutter removal for estimating features in spatial point processes”, Technical Report 305, Department of Statistics, University of Washington.Google Scholar
  6. BANFIELD, J.D. and RAFTERY, A.E. (1993) “Model-based Gaussian and non-Gaussian clustering”, Biometrics, 49, 803–821.MathSciNetzbMATHCrossRefGoogle Scholar
  7. MCCORMICK, W.T., SCHWEITZER, P.J. and WHITE, T.J. (1972) Problem decomposition and data reorganization by a clustering technique, Operations Research, 20, 993–1009.zbMATHCrossRefGoogle Scholar
  8. LENSTRA, J.K. (1974) “Clustering a data array and the traveling-salesman problem”, Operations Research, 22, 413–414.zbMATHCrossRefGoogle Scholar
  9. DOYLE, J. (1988) “Classification by ordering a (sparse) matrix: a simulated annealing approach”, Applied Mathematical Modelling, 12, 86–94.MathSciNetzbMATHCrossRefGoogle Scholar
  10. DEUTSCH, S.B. and MARTIN, J.J. (1971) “An ordering algorithm for analysis of data arrays”, Operations Research, 19, 1350–1362.zbMATHCrossRefGoogle Scholar
  11. STRENG, R. (1991) “Classification and seriation by iterative reordering of a data matrix”, in Bock, H.-H. and Ihm, P. (eds.). Classification, Data Analysis and Knowledge Organization Models and Methods with Applications, Springer-Verlag, Berlin, pp. 121–130.CrossRefGoogle Scholar
  12. PACKER, C.V. (1989) “Applying row-column permutation to matrix representations of large citation networks”, Information Processing and Management, 25, 307–314.CrossRefGoogle Scholar
  13. MURTAGH, F. (1985) Multidimensional Clustering Algorithms. Physica-Verlag, Würzburg.zbMATHGoogle Scholar
  14. MARCH, S.T. (1983) “Techniques for structuring database records”, Computing Surveys, 15, 45–79.zbMATHCrossRefGoogle Scholar
  15. ARABIE, P., SCHLEUTERMANN, S., DAWES, J. and HUBERT, L. (1988) “Marketing applications of sequencing and partitioning of nonsymmetric and/or two-mode matrices”, in Gaul, W. and Schader, M. (eds.), Data, Expert Knowledge and Decisions. Springer-Verlag, Berlin, pp. 215–224.CrossRefGoogle Scholar
  16. GALE, N., W.C. HALPERIN and COSTANZO, C.M. (1984) “Unclassed matrix shading and optimal ordering in hierarchical cluster analysis”, Journal of Classification, 1, 75–92.CrossRefGoogle Scholar
  17. HONGYUAN ZHA, DING, C., MING GU, XIAOFENG HE and SIMON, H. (2001), “Bipartite graph partitioning and data clustering”, preprint.CrossRefGoogle Scholar
  18. LERMAN, I.C. (1981) Classification et Analyse Ordinale des Données. Dunod, Paris.zbMATHGoogle Scholar
  19. FISHER, R.A. (1936) The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7, 179–188.CrossRefGoogle Scholar
  20. BERRY, M.W., HENDRICKSON, B. and RAGHAVAN, P. (1996) Sparse matrix reordering schemes for browsing hypertext, in Lectures in Applied Mathematics (LAM) Vol. 32: The Mathematics of Numerical Analysis, Renegar, J., Shub, M. and Smale, S. (eds.). American Mathematical Society, pp. 99–123.Google Scholar
  21. POINÇOT, P., LESTEVEN, S. and MURTAGH, F. (1998), “A spatial user interface to the astronomical literature”, Astronomy and Astrophysics Supplement Series, 130, 183–191.CrossRefGoogle Scholar
  22. POINÇOT, P., LESTEVEN, S. and MURTAGH, F. (2000), “Maps of information spaces: assessments from astronomy”, Journal of the American Society for Information Science. 51, 1081–1089.CrossRefGoogle Scholar
  23. MURTAGH, F., STARCK, J.L. and BERRY, M. (2000), “Overcoming the curse of dimensionality in clustering by means of the wavelet transform”, The Computer Journal, 43, 107–120.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Fionn Murtagh
    • 1
  1. 1.School of Computer ScienceQueen’s University BelfastBelfastNorthern Ireland

Personalised recommendations