Skip to main content

n-means: Adaptive Clustering Microaggregation of Categorical Medical Data

  • Conference paper
  • First Online:
Intelligent Computing (CompCom 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 998))

Included in the following conference series:

  • 1617 Accesses

Abstract

Huge amount of information is managed and shared publically by the individuals and data controllers. Publically shared data contains information that can reveal identity of users, thus affecting privacy of individuals. To palliate these disclosure risks, Statistical Disclosure Control (SDC) methods are applied to the data before it is released. Microaggregation is one of the SDC methods that aggregate similar records into clusters, and then transform them into m indistinguishable records. K-means is a famous data mining clustering algorithm for continuous data, which iteratively maps similar elements into k-cluster until they all converge. However, adapting k-means algorithm for categorical multivariate is a challenging task due to high dimensionality of attributes. In this paper, we extend k-means clustering algorithm to achieve notion of microaggregation of structured data. Moreover, to preserve data utility, we extend fixed clustering nature of this algorithm to adaptive size clusters. For this purpose, we introduce n-means clustering approach that construct clusters based on the semantics of the datasets. In experiments, we proved significance of our proposed system by measuring cohesion of clusters and information loss for utility purpose.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://wordnet.princeton.edu/.

  2. 2.

    https://www.patientslikeme.com/.

References

  1. Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)

    Article  Google Scholar 

  2. Domingo-Ferrer, J.: Microaggregation. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 1736–1737. Springer, Boston (2009)

    Google Scholar 

  3. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 557–570 (2002)

    Article  MathSciNet  Google Scholar 

  4. Martínez, S., Sánchez, D., Valls, A.: Semantic adaptive microaggregation of categorical microdata. Comput. Secur. 31, 653–672 (2012)

    Article  Google Scholar 

  5. Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. 242, 35–48 (2013)

    Article  Google Scholar 

  6. Erola, A., Castellà-Roca, J., Navarro-Arribas, G., Torra, V.: Semantic microaggregation for the anonymization of query logs. In: Privacy in Statistical Databases, pp. 127–137. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Ahmad, A., Dey, L.: A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Pattern Recogn. Lett. 32, 1062–1069 (2011)

    Article  Google Scholar 

  8. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2, 283–304 (1998)

    Article  Google Scholar 

  9. Jiang, F., Liu, G., Du, J., Sui, Y.: Initialization of K-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016)

    Article  Google Scholar 

  10. Kuo, R.J., Potti, Y., Zulvia, F.E.: Application of metaheuristic based fuzzy K-modes algorithm to supplier clustering. Comput. Ind. Eng. 120, 298–307 (2018)

    Article  Google Scholar 

  11. Han, J., Yu, J., Mo, Y., Lu, J., Liu, H.: MAGE: a semantics retaining K-anonymization method for mixed data. Knowl.-Based Syst. 55, 75–86 (2014)

    Article  Google Scholar 

  12. Wei, T., Lu, Y., Chang, H., Zhou, Q., Bao, X.: A semantic approach for text clustering using WordNet and lexical chains. Expert Syst. Appl. 42, 2264–2275 (2015)

    Article  Google Scholar 

  13. Ben Salem, S., Naouali, S., Chtourou, Z.: A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach. Comput. Electr. Eng. 68, 463–483 (2018)

    Article  Google Scholar 

  14. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)

    Article  Google Scholar 

  15. Templ, M., Meindl, B., Kowarik, A., Chen, S.: Introduction to Statistical Disclosure Control (SDC). IHSN Working Paper No. 007 (2014)

    Google Scholar 

  16. Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39, 7718–7728 (2012)

    Article  Google Scholar 

  17. Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J.M., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15, 355–369 (2006)

    Article  Google Scholar 

  18. Abril, D., Navarro-Arribas, G., Torra, V.: Towards semantic microaggregation of categorical data for confidential documents. In: Modeling Decisions for Artificial Intelligence, pp. 266–276. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

Download references

Acknowledgment

We acknowledge Higher Education Commission Pakistan and Foundation University Islamabad for their support to publish this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Malik Imran-Daud .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Imran-Daud, M. (2019). n-means: Adaptive Clustering Microaggregation of Categorical Medical Data. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Intelligent Computing. CompCom 2019. Advances in Intelligent Systems and Computing, vol 998. Springer, Cham. https://doi.org/10.1007/978-3-030-22868-2_2

Download citation

Publish with us

Policies and ethics