Skip to main content

Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory

  • Conference paper
  • First Online:
Big Data and Smart Digital Environment (ICBDSDE 2018)

Part of the book series: Studies in Big Data ((SBD,volume 53))

Included in the following conference series:

  • 1247 Accesses

Abstract

Clustering is a widely used Data Mining method that aims to partition a given dataset into homogenous groups according to some predefined similarity criterion. The k-modes is a well known categorical clustering method that uses the notion of a mode to represent the centroid in a partition during the clustering process. The mode is a vector containing the most frequent modalities for each attribute. However, in its original version, the mode is selected randomly in each iteration, although many other candidate modes can be proposed. In this paper, a new approach is developed aiming to generate potentially candidate modes for each cluster in each iteration using their relative density. The obtained modes will then be arranged into upper and lower approximation of the Rough Set Theory in order to identify the most pertinent ones. The effectiveness of the proposed method was tested using two real world datasets and compared to the standard k-modes and it was experimentally demonstrated that it provided higher accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Qin, J., et al.: Distributed k-means algorithm and fuzzy c-means algorithm for sensor networks based on multiagent consensus theory. IEEE Trans. Cybernet. 47(3), 772–783 (2017)

    Article  Google Scholar 

  2. Liang, J., Bai, L., Dang, C., Cao, F.: The K-means-type algorithms versus imbalanced data distributions. IEEE Trans. Fuzzy Syst. 20(4), 728–745 (2012)

    Article  Google Scholar 

  3. Ben Salem, S., Naouali, S., Chtourou, Z.: A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach. Comput. Electr. Eng. 68, 463–483 (2018)

    Article  Google Scholar 

  4. Ben Salem, S., Naouali, S., Chtourou, Z.: A computational cost-effective clustering algorithm in multidimensional space using the Manhattan Metric: application to the Global Terrorism Database. In: 19th International Conference on Machine Learning and Applications (ICMLA) (2017)

    Google Scholar 

  5. Ben Salem, S., Naouali, S., Chtourou, Z.: Clustering categorical data using the k-means algorithm and the attribute’s relative frequency clustering categorical data using the k-means algorithm and the attribute’s relative frequency. In: 19th International Conference on Machine Learning and Applications (ICMLA) (2017)

    Google Scholar 

  6. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowl. Discov. 2(3), 283–304 (1998)

    Article  Google Scholar 

  7. Cao, F., Huang, J.Z., Liang, J.: A fuzzy SV-k-modes algorithm for clustering categorical data with set-valued attributes. Appl. Math. Comput. 295, 1–15 (2017)

    MathSciNet  MATH  Google Scholar 

  8. Bai, L., Liang, J., Dang, C., Cao, F.: A novel attribute weighting algorithm for clustering high-dimensional categorical data. Pattern Recognit. 44(12), 2843–2861 (2011)

    Article  Google Scholar 

  9. Saha, A., Das, S.: Categorical fuzzy k-modes clustering with automated feature weight learning. Neurocomputing 166, 422–435 (2015)

    Article  Google Scholar 

  10. He, Z., Deng, S., Xu, X.: Improving k-modes algorithm considering frequencies of attribute values in mode. In: International Conference on Computational Intelligence and Security, pp. 157–162 (2005)

    Google Scholar 

  11. Ng, M.K., Li, M.J., Huang, J.Z., He, Z.: On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 503–507 (2007)

    Article  Google Scholar 

  12. Cao, F.Y., Liang, J.Y., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. Appl. 36(7), 10223–10228 (2009)

    Article  Google Scholar 

  13. Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-modes clustering. Expert Syst. Appl. 40(18), 7444–7456 (2013)

    Article  Google Scholar 

  14. Jiang, F., Liu, G., Junwei, D., Sui, Y.: Initialization of k-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016)

    Article  Google Scholar 

  15. Zhang, Q., Xie, Q., Wang, G.: A survey on rough set theory and its applications. CAAI Trans. Intell. Technol. 1, 323–333 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Semeh Ben Salem .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ben Salem, S., Naouali, S., Chtourou, Z. (2019). Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory. In: Farhaoui, Y., Moussaid, L. (eds) Big Data and Smart Digital Environment. ICBDSDE 2018. Studies in Big Data, vol 53. Springer, Cham. https://doi.org/10.1007/978-3-030-12048-1_24

Download citation

Publish with us

Policies and ethics