Mean Shift Clustering Algorithm for Data with Missing Values

AbdAllah, Loai; Shimshoni, Ilan

doi:10.1007/978-3-319-10160-6_38

Loai AbdAllah^17,18 &
Ilan Shimshoni¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8646))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1909 Accesses
14 Citations

Abstract

Missing values in data are common in real world applications. There are several methods that deal with this problem. In this research we developed a new version of the mean shift clustering algorithm that deals with datasets with missing values. We use a weighted distance function that deals with datasets with missing values, that was defined in our previous work. To compute the distance between two points that may have attributes with missing values, only the mean and the variance of the distribution of the attribute are required. Thus, after they have been computed, the distance can be computed in O(1). Furthermore, we use this distance to derive a formula for computing the mean shift vector for each data point, showing that the mean shift runtime complexity is the same as the Euclidian mean shift runtime. We experimented on six standard numerical datasets from different fields. On these datasets we simulated missing values and compared the performance of the mean shift clustering algorithm using our distance and the suggested mean shift vector to other three basic methods. Our experiments show that mean shift using our distance function outperforms mean shift using other methods for dealing with missing values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AbdAllah, L., Shimshoni, I.: A distance function for data with missing values and its applications on knn and kmeans algorithms. Submitted to Int. J. Advances in Data Analysis and Classification
Google Scholar
Batista, G., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence 17(5-6), 519–533 (2003)
Article Google Scholar
Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. PAMI 17(8), 790–799 (1995)
Article Google Scholar
Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Trans. PAMI 24(5), 603–619 (2002)
Article Google Scholar
Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based Object Tracking. IEEE Trans. PAMI 25(5), 564–577 (2003)
Article Google Scholar
DeMenthon, D., Megret, R.: Spatio-temporal segmentation of video by hierarchical mean shift analysis. Computer Vision Laboratory, Center for Automation Research, University of Maryland (2002)
Google Scholar
Fukunaga, K., Hostetler, L.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory 21(1), 32–40 (1975)
Article MATH MathSciNet Google Scholar
Georgescu, B., Shimshoni, I., Meer, P.: Mean shift based clustering in high dimensions: A texture classification example. In: Proceedings of the 9th International Conference on Computer Vision, pp. 456–463 (2003)
Google Scholar
Grzymała-Busse, J.W., Hu, M.: A comparison of several approaches to missing attribute values in data mining. In: Ziarko, W.P., Yao, Y. (eds.) RSCTC 2000. LNCS (LNAI), vol. 2005, pp. 378–385. Springer, Heidelberg (2001)
Chapter Google Scholar
Magnani, M.: Techniques for dealing with missing data in knowledge discovery tasks. Obtido 15(01), 2007 (2004), http://magnanim.web.cs.unibo.it/index.html
Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336), 846–850 (1971)
Article Google Scholar
Suguna, N., Thanushkodi, K.G.: Predicting missing attribute values using k-means clustering. Journal of Computer Science 7(2), 216–224 (2011)
Article Google Scholar
Tao, W., Jin, H., Zhang, Y.: Color image segmentation based on mean shift and normalized cuts. IEEE Trans. on Systems, Man, and Cybernetics, Part B 37(5), 1382–1389 (2007)
Article Google Scholar
Speech University of Eastern Finland and Image Processing Unit. Clustering dataset, http://cs.joensuu.fi/sipu/datasets/
Zhang, S., Qin, Z., Ling, C.X., Sheng, S.: “Missing is useful”: missing values in cost-sensitive decision trees. IEEE Trans. on Knowledge and Data Engineering 17(12), 1689–1693 (2005)
Article Google Scholar
Zhang, S.: Shell-neighbor method and its application in missing data imputation. Applied Intelligence 35(1), 123–133 (2011)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Haifa, Israel
Loai AbdAllah
Department of Mathematics and Computer Science, The College of Sakhnin for Teacher Education, Israel
Loai AbdAllah
Department of Information Systems, University of Haifa, Israel
Ilan Shimshoni

Authors

Loai AbdAllah
View author publications
You can also search for this author in PubMed Google Scholar
Ilan Shimshoni
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIAS/ISAE-ENSMA, Téléport 2, 1 avenue Clément Ader, BP 40109, 86961, Futuroscope Chasseneuil Cedex, France
Ladjel Bellatreche
IBM Research - India, 4, Block-C, Institutional Area, 110070, Vasant Kunj, New Delhi, India
Mukesh K. Mohania

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

AbdAllah, L., Shimshoni, I. (2014). Mean Shift Clustering Algorithm for Data with Missing Values. In: Bellatreche, L., Mohania, M.K. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2014. Lecture Notes in Computer Science, vol 8646. Springer, Cham. https://doi.org/10.1007/978-3-319-10160-6_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-10160-6_38
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10159-0
Online ISBN: 978-3-319-10160-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics