Skip to main content

Fuzzy c-Means Clustering of Incomplete Data Using Dimension-Wise Fuzzy Variances of Clusters

  • Conference paper
  • First Online:
Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2016)

Abstract

Clustering is an important technique for identifying groups of similar data objects within a data set. Since problems during the data collection and data preprocessing steps often lead to missing values in the data sets, there is a need for clustering methods that can deal with such imperfect data. Approaches proposed in the literature for adapting the fuzzy c-means algorithm to incomplete data work well on data sets with equally sized and shaped clusters. In this paper we present an approach for adapting the fuzzy c-means algorithm to incomplete data that uses the dimension-wise fuzzy variances of clusters for imputation of missing values. In experiments on incomplete real and synthetic data sets with differently sized and shaped clusters, we demonstrate the benefit over the basic approach in terms of the assignment of data objects to clusters and the cluster prototype computation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Hathaway, R.J., Bezdek, J.C.: Fuzzy \(c\)-means clustering of incomplete data. IEEE Trans. Syst. Man Cybern. Part B 31(5), 735–744 (2001)

    Article  Google Scholar 

  2. Timm, H., Döring, C., Kruse, R.: Fuzzy cluster analysis of partially missing datasets. In: Proceedings of the European Symposium on Intelligent Technologies, Hybid Systems and Their Implementation on Smart Adaptive Systems (EUNITE 2002), pp. 426–431 (2002)

    Google Scholar 

  3. Sarkar, M., Leong, T.-Y.: Fuzzy K-means clustering with missing values. In: Proceedings of the American Medical Informatics Association Annual Symposium, pp. 588–592 (2001)

    Google Scholar 

  4. van der Laan, M.Y., Pollard, K.S.: A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. J. Stat. Plann. Infer. 117(2), 275–303 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  5. Himmelspach, L., Conrad, S.: Clustering approaches for data with missing values: comparison and evaluation. In: Proceedings of the Fifth IEEE International Conference on Digital Information Management (ICDIM 2010), pp. 19–28 (2010)

    Google Scholar 

  6. Himmelspach, L., Conrad, S.: Fuzzy clustering of incomplete data based on cluster dispersion. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. LNCS, vol. 6178, pp. 59–68. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pp. 281–297 (1967)

    Google Scholar 

  8. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell (1981)

    Book  MATH  Google Scholar 

  9. Kruse, R., Döring, C., Lesot, M.-J.: Fundamentals of fuzzy clustering. In: Advances in Fuzzy Clustering and Its Applications, pp. 1–30 (2007)

    Google Scholar 

  10. Klawonn, F., Kruse, R., Winkler, R.: Fuzzy clustering: more than just fuzzification. Fuzzy Sets Syst. 281, 272–279 (2015)

    Article  MathSciNet  Google Scholar 

  11. Timm, H.: Fuzzy-Clusteranalyse: Methoden zur Exploration von Daten mit fehlenden Werten sowie klassifizierten Daten. Ph.D. thesis, Germany (2002)

    Google Scholar 

  12. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html

  13. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, New York (2002)

    MATH  Google Scholar 

  14. Runkler, T.A.: Comparing partitions by subset similarities. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. LNCS, vol. 6178, pp. 29–38. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  15. Timm, H., Döring, C., Kruse, R.: Different approaches to fuzzy clustering of incomplete datasets. Int. J. Approximate Reasoning 35(3), 239–249 (2004)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ludmila Himmelspach .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Himmelspach, L., Conrad, S. (2016). Fuzzy c-Means Clustering of Incomplete Data Using Dimension-Wise Fuzzy Variances of Clusters. In: Carvalho, J., Lesot, MJ., Kaymak, U., Vieira, S., Bouchon-Meunier, B., Yager, R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2016. Communications in Computer and Information Science, vol 610. Springer, Cham. https://doi.org/10.1007/978-3-319-40596-4_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40596-4_58

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40595-7

  • Online ISBN: 978-3-319-40596-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics