Abstract
Clustering is a widely used Data Mining method that aims to partition a given dataset into homogenous groups according to some predefined similarity criterion. The k-modes is a well known categorical clustering method that uses the notion of a mode to represent the centroid in a partition during the clustering process. The mode is a vector containing the most frequent modalities for each attribute. However, in its original version, the mode is selected randomly in each iteration, although many other candidate modes can be proposed. In this paper, a new approach is developed aiming to generate potentially candidate modes for each cluster in each iteration using their relative density. The obtained modes will then be arranged into upper and lower approximation of the Rough Set Theory in order to identify the most pertinent ones. The effectiveness of the proposed method was tested using two real world datasets and compared to the standard k-modes and it was experimentally demonstrated that it provided higher accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Qin, J., et al.: Distributed k-means algorithm and fuzzy c-means algorithm for sensor networks based on multiagent consensus theory. IEEE Trans. Cybernet. 47(3), 772–783 (2017)
Liang, J., Bai, L., Dang, C., Cao, F.: The K-means-type algorithms versus imbalanced data distributions. IEEE Trans. Fuzzy Syst. 20(4), 728–745 (2012)
Ben Salem, S., Naouali, S., Chtourou, Z.: A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach. Comput. Electr. Eng. 68, 463–483 (2018)
Ben Salem, S., Naouali, S., Chtourou, Z.: A computational cost-effective clustering algorithm in multidimensional space using the Manhattan Metric: application to the Global Terrorism Database. In: 19th International Conference on Machine Learning and Applications (ICMLA) (2017)
Ben Salem, S., Naouali, S., Chtourou, Z.: Clustering categorical data using the k-means algorithm and the attribute’s relative frequency clustering categorical data using the k-means algorithm and the attribute’s relative frequency. In: 19th International Conference on Machine Learning and Applications (ICMLA) (2017)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowl. Discov. 2(3), 283–304 (1998)
Cao, F., Huang, J.Z., Liang, J.: A fuzzy SV-k-modes algorithm for clustering categorical data with set-valued attributes. Appl. Math. Comput. 295, 1–15 (2017)
Bai, L., Liang, J., Dang, C., Cao, F.: A novel attribute weighting algorithm for clustering high-dimensional categorical data. Pattern Recognit. 44(12), 2843–2861 (2011)
Saha, A., Das, S.: Categorical fuzzy k-modes clustering with automated feature weight learning. Neurocomputing 166, 422–435 (2015)
He, Z., Deng, S., Xu, X.: Improving k-modes algorithm considering frequencies of attribute values in mode. In: International Conference on Computational Intelligence and Security, pp. 157–162 (2005)
Ng, M.K., Li, M.J., Huang, J.Z., He, Z.: On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 503–507 (2007)
Cao, F.Y., Liang, J.Y., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. Appl. 36(7), 10223–10228 (2009)
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-modes clustering. Expert Syst. Appl. 40(18), 7444–7456 (2013)
Jiang, F., Liu, G., Junwei, D., Sui, Y.: Initialization of k-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016)
Zhang, Q., Xie, Q., Wang, G.: A survey on rough set theory and its applications. CAAI Trans. Intell. Technol. 1, 323–333 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ben Salem, S., Naouali, S., Chtourou, Z. (2019). Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory. In: Farhaoui, Y., Moussaid, L. (eds) Big Data and Smart Digital Environment. ICBDSDE 2018. Studies in Big Data, vol 53. Springer, Cham. https://doi.org/10.1007/978-3-030-12048-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-12048-1_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12047-4
Online ISBN: 978-3-030-12048-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)