Skip to main content

Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis

  • Conference paper
Data Analysis, Machine Learning and Applications

Abstract

A proposal of an extended version of the HINoV method for the identification of the noisy variables (Carmone et al. (1999)) for nonmetric, mixed, and symbolic interval data is presented in this paper. Proposed modifications are evaluated on simulated data from a variety of models. The models contain the known structure of clusters. In addition, the models contain a different number of noisy (irrelevant) variables added to obscure the underlying structure to be recovered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BILLARD, L., DIDAY, E. (2006): Symbolic data analysis. Conceptual statistics and data mining, Wiley, Chichester.

    MATH  Google Scholar 

  • CARMONE, F.J., KARA, A. and MAXWELL, S. (1999): HINoV: a new method to improve market segment definition by identifying noisy variables, Journal of Marketing Research, vol. 36, November, 501-509.

    Article  Google Scholar 

  • GNANADESIKAN, R., KETTENRING, J.R., and TSAO, S.L. (1995): Weighting and selec-tion of variables for cluster analysis, Journal of Classification, vol. 12, no. 1, 113-136.

    Article  MATH  Google Scholar 

  • HUBERT, L.J., ARABIE, P. (1985): Comparing partitions, Journal of Classification, vol. 2, no. 1, 193-218.

    Article  Google Scholar 

  • JAJUGA, K., WALESIAK, M., BAK, A. (2003): On the General Distance Measure, In: M., Schwaiger, and O., Opitz (Eds.), Exploratory data analysis in empirical research, Springer-Verlag, Berlin, Heidelberg, 104-109.

    Google Scholar 

  • MILLIGAN, G.W. (1996): Clustering validation: results and implications for applied analyses, In: P., Arabie, L.J., Hubert, G., de Soete (Eds.), Clustering and classification, World Scientific, Singapore, 341-375.

    Google Scholar 

  • TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001): Estimating the number of clusters in a data set via the gap statistic, Journal of the Royal Statistical Society, ser. B, vol. 63, part 2,411-423.

    Article  MATH  MathSciNet  Google Scholar 

  • WALESIAK, M. (2005): Variable selection for cluster analysis - approaches, problems, meth-ods, Plenary Session of the Committee on Statistics and Econometrics of the Polish Academy of Sciences, 15, March, Wroclaw.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Walesiak, M., Dudek, A. (2008). Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_11

Download citation

Publish with us

Policies and ethics