Skip to main content

Clustering in High-dimensional Data Spaces

  • Conference paper
Classification, Clustering, and Data Analysis
  • 1765 Accesses

Abstract

By high-dimensional we mean dimensionality of the same order as the number of objects or observations to cluster, and the latter in the range of thousands upwards. Bellman’s “curse of dimensionality” applies to many widely-used data analysis methods in high-dimensional spaces. One way to address this problem is by array permuting methods, involving row/column reordering. Such methods are closely related to dimensionality reduction methods such as principal components analysis. An imposed order on an array is beneficial not only for visualization but also for use of a vast range of image processing methods. For example, clustering becomes in this context image feature detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BELLMAN, R. (1961) Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton.

    MATH  Google Scholar 

  • MURTAGH, F. and STARCK, J.L. (1998) “Pattern clustering based on noise modeling in wavelet space”, Pattern Recognition, 31, 847–855.

    Article  Google Scholar 

  • STARCK, J.L., MURTAGH, F. and BIJAOUI, A. (1998) Image and Data Analysis: The Multiscale Approach. Cambridge University Press, Cambridge.

    Book  Google Scholar 

  • CHEREUL, E., CREZE, M. and BIENAYME, O. (1997) “3D wavelet transform analysis of Hipparcos data”, in Maccarone, M.C., Murtagh, F., Kurtz, M. and Bijaoui, A. (eds.). Advanced Techniques and Methods for Astronomical Information Handling, Observatoire de la Côte d’Azur, Nice, France, 41–48.

    Google Scholar 

  • BYERS, S. and RAFTERY, A.E. (1996) “Nearest neighbor clutter removal for estimating features in spatial point processes”, Technical Report 305, Department of Statistics, University of Washington.

    Google Scholar 

  • BANFIELD, J.D. and RAFTERY, A.E. (1993) “Model-based Gaussian and non-Gaussian clustering”, Biometrics, 49, 803–821.

    Article  MathSciNet  MATH  Google Scholar 

  • MCCORMICK, W.T., SCHWEITZER, P.J. and WHITE, T.J. (1972) Problem decomposition and data reorganization by a clustering technique, Operations Research, 20, 993–1009.

    Article  MATH  Google Scholar 

  • LENSTRA, J.K. (1974) “Clustering a data array and the traveling-salesman problem”, Operations Research, 22, 413–414.

    Article  MATH  Google Scholar 

  • DOYLE, J. (1988) “Classification by ordering a (sparse) matrix: a simulated annealing approach”, Applied Mathematical Modelling, 12, 86–94.

    Article  MathSciNet  MATH  Google Scholar 

  • DEUTSCH, S.B. and MARTIN, J.J. (1971) “An ordering algorithm for analysis of data arrays”, Operations Research, 19, 1350–1362.

    Article  MATH  Google Scholar 

  • STRENG, R. (1991) “Classification and seriation by iterative reordering of a data matrix”, in Bock, H.-H. and Ihm, P. (eds.). Classification, Data Analysis and Knowledge Organization Models and Methods with Applications, Springer-Verlag, Berlin, pp. 121–130.

    Chapter  Google Scholar 

  • PACKER, C.V. (1989) “Applying row-column permutation to matrix representations of large citation networks”, Information Processing and Management, 25, 307–314.

    Article  Google Scholar 

  • MURTAGH, F. (1985) Multidimensional Clustering Algorithms. Physica-Verlag, Würzburg.

    MATH  Google Scholar 

  • MARCH, S.T. (1983) “Techniques for structuring database records”, Computing Surveys, 15, 45–79.

    Article  MATH  Google Scholar 

  • ARABIE, P., SCHLEUTERMANN, S., DAWES, J. and HUBERT, L. (1988) “Marketing applications of sequencing and partitioning of nonsymmetric and/or two-mode matrices”, in Gaul, W. and Schader, M. (eds.), Data, Expert Knowledge and Decisions. Springer-Verlag, Berlin, pp. 215–224.

    Chapter  Google Scholar 

  • GALE, N., W.C. HALPERIN and COSTANZO, C.M. (1984) “Unclassed matrix shading and optimal ordering in hierarchical cluster analysis”, Journal of Classification, 1, 75–92.

    Article  Google Scholar 

  • HONGYUAN ZHA, DING, C., MING GU, XIAOFENG HE and SIMON, H. (2001), “Bipartite graph partitioning and data clustering”, preprint.

    Book  Google Scholar 

  • LERMAN, I.C. (1981) Classification et Analyse Ordinale des Données. Dunod, Paris.

    MATH  Google Scholar 

  • FISHER, R.A. (1936) The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7, 179–188.

    Article  Google Scholar 

  • BERRY, M.W., HENDRICKSON, B. and RAGHAVAN, P. (1996) Sparse matrix reordering schemes for browsing hypertext, in Lectures in Applied Mathematics (LAM) Vol. 32: The Mathematics of Numerical Analysis, Renegar, J., Shub, M. and Smale, S. (eds.). American Mathematical Society, pp. 99–123.

    Google Scholar 

  • POINÇOT, P., LESTEVEN, S. and MURTAGH, F. (1998), “A spatial user interface to the astronomical literature”, Astronomy and Astrophysics Supplement Series, 130, 183–191.

    Article  Google Scholar 

  • POINÇOT, P., LESTEVEN, S. and MURTAGH, F. (2000), “Maps of information spaces: assessments from astronomy”, Journal of the American Society for Information Science. 51, 1081–1089.

    Article  Google Scholar 

  • MURTAGH, F., STARCK, J.L. and BERRY, M. (2000), “Overcoming the curse of dimensionality in clustering by means of the wavelet transform”, The Computer Journal, 43, 107–120.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Murtagh, F. (2002). Clustering in High-dimensional Data Spaces. In: Jajuga, K., Sokołowski, A., Bock, HH. (eds) Classification, Clustering, and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-56181-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-56181-8_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43691-1

  • Online ISBN: 978-3-642-56181-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics