Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory

Ben Salem, Semeh; Naouali, Sami; Chtourou, Zied

doi:10.1007/978-3-030-12048-1_24

Semeh Ben Salem^4,5,
Sami Naouali⁵ &
Zied Chtourou⁶

Part of the book series: Studies in Big Data ((SBD,volume 53))

Included in the following conference series:

International Conference on Big Data and Smart Digital Environment

1247 Accesses

Abstract

Clustering is a widely used Data Mining method that aims to partition a given dataset into homogenous groups according to some predefined similarity criterion. The k-modes is a well known categorical clustering method that uses the notion of a mode to represent the centroid in a partition during the clustering process. The mode is a vector containing the most frequent modalities for each attribute. However, in its original version, the mode is selected randomly in each iteration, although many other candidate modes can be proposed. In this paper, a new approach is developed aiming to generate potentially candidate modes for each cluster in each iteration using their relative density. The obtained modes will then be arranged into upper and lower approximation of the Rough Set Theory in order to identify the most pertinent ones. The effectiveness of the proposed method was tested using two real world datasets and compared to the standard k-modes and it was experimentally demonstrated that it provided higher accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Qin, J., et al.: Distributed k-means algorithm and fuzzy c-means algorithm for sensor networks based on multiagent consensus theory. IEEE Trans. Cybernet. 47(3), 772–783 (2017)
Article Google Scholar
Liang, J., Bai, L., Dang, C., Cao, F.: The K-means-type algorithms versus imbalanced data distributions. IEEE Trans. Fuzzy Syst. 20(4), 728–745 (2012)
Article Google Scholar
Ben Salem, S., Naouali, S., Chtourou, Z.: A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach. Comput. Electr. Eng. 68, 463–483 (2018)
Article Google Scholar
Ben Salem, S., Naouali, S., Chtourou, Z.: A computational cost-effective clustering algorithm in multidimensional space using the Manhattan Metric: application to the Global Terrorism Database. In: 19th International Conference on Machine Learning and Applications (ICMLA) (2017)
Google Scholar
Ben Salem, S., Naouali, S., Chtourou, Z.: Clustering categorical data using the k-means algorithm and the attribute’s relative frequency clustering categorical data using the k-means algorithm and the attribute’s relative frequency. In: 19th International Conference on Machine Learning and Applications (ICMLA) (2017)
Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowl. Discov. 2(3), 283–304 (1998)
Article Google Scholar
Cao, F., Huang, J.Z., Liang, J.: A fuzzy SV-k-modes algorithm for clustering categorical data with set-valued attributes. Appl. Math. Comput. 295, 1–15 (2017)
MathSciNet MATH Google Scholar
Bai, L., Liang, J., Dang, C., Cao, F.: A novel attribute weighting algorithm for clustering high-dimensional categorical data. Pattern Recognit. 44(12), 2843–2861 (2011)
Article Google Scholar
Saha, A., Das, S.: Categorical fuzzy k-modes clustering with automated feature weight learning. Neurocomputing 166, 422–435 (2015)
Article Google Scholar
He, Z., Deng, S., Xu, X.: Improving k-modes algorithm considering frequencies of attribute values in mode. In: International Conference on Computational Intelligence and Security, pp. 157–162 (2005)
Google Scholar
Ng, M.K., Li, M.J., Huang, J.Z., He, Z.: On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 503–507 (2007)
Article Google Scholar
Cao, F.Y., Liang, J.Y., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. Appl. 36(7), 10223–10228 (2009)
Article Google Scholar
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-modes clustering. Expert Syst. Appl. 40(18), 7444–7456 (2013)
Article Google Scholar
Jiang, F., Liu, G., Junwei, D., Sui, Y.: Initialization of k-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016)
Article Google Scholar
Zhang, Q., Xie, Q., Wang, G.: A survey on rough set theory and its applications. CAAI Trans. Intell. Technol. 1, 323–333 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Polytechnic School of Tunisia, B.P. 743, Rue El khawarizmi, 2078, Al Marsá, Tunis, Tunisia
Semeh Ben Salem
Virtual Reality and Information Technologies, Military Academy of Fondouk Jedid, Tunis, Tunisia
Semeh Ben Salem & Sami Naouali
Digital Research Center of Sfax, B.P. 275, 3021, Sakiet Ezzit, Sfax, Tunisia
Zied Chtourou

Authors

Semeh Ben Salem
View author publications
You can also search for this author in PubMed Google Scholar
Sami Naouali
View author publications
You can also search for this author in PubMed Google Scholar
Zied Chtourou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Semeh Ben Salem .

Editor information

Editors and Affiliations

Faculty of Sciences and Techniques, Department of Computer Science, Moulay Ismail University, Errachidia, Morocco
Yousef Farhaoui
Department of Computer Science, Hassan II University, Casablanca, Morocco
Laila Moussaid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ben Salem, S., Naouali, S., Chtourou, Z. (2019). Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory. In: Farhaoui, Y., Moussaid, L. (eds) Big Data and Smart Digital Environment. ICBDSDE 2018. Studies in Big Data, vol 53. Springer, Cham. https://doi.org/10.1007/978-3-030-12048-1_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-12048-1_24
Published: 22 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12047-4
Online ISBN: 978-3-030-12048-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics