An efficient clustering-based anonymization scheme for privacy-preserving data collection in IoT based healthcare services


The healthcare services industry has seen a huge transformation since the prominent rise of the Internet of Things (IoT). IoT in healthcare services includes a large number of unified and interconnected sensors, and medical devices that generate and exchange sensitive information. Thus, an enormous amount of data is transmitted through the network which raises an alarming concern for the privacy of patient information. Therefore, privacy preserving data collection (PPDC) is on-demand to ensure the privacy of patient data. Several pieces of research on PPDC have been proposed recently. However, the research literatures have fallen short in privacy requirements and are prone to various privacy attacks. In this paper, we propose a novel privacy-preserving data collection scheme for IoT based healthcare services systems. A clustering-based anonymity model is utilized to develop an efficient privacy-preserving scheme to meet privacy requirements and to prevent healthcare IoT from various privacy attacks. We formulated the threat model as client-server-to-user to ensure privacy on both ends. On the client-side, a modified clustering-based k-anonymity model with α-deassociation is used to anonymize the data generated from the IoT nodes. The base-level privacy is then ensured through a bottom-up clustering method which generates clusters of records as per the privacy requirements. On the server-side, the cluster-combination method-UPGMA is utilized to reduce communication costs and to achieve a better level of privacy. The proposed scheme is efficient in tackling privacy attacks such as attribute disclosure, identity disclosure, membership disclosure, sensitivity attacks, similarity attacks, and skewness attacks. The effectiveness and efficiency of the proposed scheme are proven through theoretical and experimental analyses.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. 1.

    Atzori L, Iera A, Morabito G (2010) The internet of things: a survey. Comput Netw 54(15):2787–2805

    MATH  Article  Google Scholar 

  2. 2.

    Islam SMR, Kwak D, Kabir MH, Hossain M, Kwak KS (2015) The internet of things for health care: a comprehensive survey. IEEE Access 3:678–708

    Article  Google Scholar 

  3. 3.

    Ge C, Yin C, Liu Z, Fang L, Zhu J, Ling H (2020) A privacy preserve big data analysis system for wearable wireless sensor network. Comput Secur 96:101887

    Article  Google Scholar 

  4. 4.

    Mukhopadhyay SC (2015) Wearable sensors for human activity monitoring: a review. IEEE Sensors J 15(3) Institute of Electrical and Electronics Engineers Inc.:1321–1330

    MATH  Article  Google Scholar 

  5. 5.

    Demuynck L, De Decker B (2005) Privacy-preserving electronic health records. In IFIP International Conference on Communications and Multimedia Security (pp. 150–159). Springer, Berlin, Heidelberg

  6. 6.

    Andrew J, Karthikeyan J (2020) Privacy-preserving big data publication:(K, L) anonymity. In: Intelligence in big data technologies—beyond the hype. Springer, pp. 77–88

  7. 7.

    Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data – SIGMOD ′00, pp. 439–450

  8. 8.

    Theoharidou M, Tsalis N, Gritzalis D (2016) Smart home solutions: privacy issues. In: Handbook of smart homes, health care and well-being. Springer International Publishing, pp. 67–81

  9. 9.

    Xue M, Papadimitriou P, Raïssi C, Kalnis P, Pung HK (2011) Distributed privacy preserving data collection. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 6587 LNCS, no. Part 1, pp. 93–107

  10. 10.

    Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4)

  11. 11.

    Summary of the HIPAA Security Rule| [Online]. Available: [Accessed: 03-Jul-2020]

  12. 12.

    Data protection in the EU|European Commission. [Online]. Available: [Accessed: 03-Jul-2020]

  13. 13.

    Krishnamurthy B, Wills CE (2009) On the leakage of personally identifiable information via online social networks.

  14. 14.

    Andrew J, Karthikeyan J (2019) Privacy-preserving internet of things: techniques and applications. Int J Eng Adv Technol 8(6):3229–3234

    Article  Google Scholar 

  15. 15.

    Karthikeyan AJJ, Jebastin J (2019) Privacy preserving big data publication on cloud using Mondrian anonymization techniques and deep neural networks. In: 2019 5th international conference on advanced computing & communication systems (ICACCS), pp. 722–727

  16. 16.

    Onesimu JA, Karthikeyan J (2021) An efficient privacy-preserving deep learning scheme for medical image analysis. The Importance of Human Computer Interaction: Challenges, Methods and Applications. J Inf Technol Manag 12:50–67

  17. 17.

    Mohana S, Mary SASA (2016) Preserving privacy in health care information: a memetic approach. J Med Imaging Heal Informatics 6(3):779–783

    Article  Google Scholar 

  18. 18.

    Guan Z, Zhang Y, Wu L, Wu J, Li J, Ma Y, Hu J (2019) APPA: an anonymous and privacy preserving data aggregation scheme for fog-enhanced IoT. J Netw Comput Appl 125:82–92

    Article  Google Scholar 

  19. 19.

    Lu R, Heung K, Lashkari AH, Ghorbani AA (2017) A lightweight privacy-preserving data aggregation scheme for fog computing-enhanced IoT. IEEE Access 5:3302–3312

    Article  Google Scholar 

  20. 20.

    Song T, Li R, Mei B, Yu J, Xing X, Cheng X (2017) A privacy preserving communication protocol for IoT applications in smart homes. IEEE Internet Things J 4(6):1844–1852

    Article  Google Scholar 

  21. 21.

    Jayaraman PP, Yang X, Yavari A, Georgakopoulos D, Yi X (2017) Privacy preserving internet of things: from privacy techniques to a blueprint architecture and efficient implementation. Futur Gener Comput Syst.

  22. 22.

    Sharma S, Chen K, Sheth A (2018) Toward practical privacy-preserving analytics for IoT and cloud-based healthcare systems. IEEE Internet Comput 22(2):42–51

    Article  Google Scholar 

  23. 23.

    Andrew J, Mathew SS, Mohit B (2019) “A comprehensive analysis of privacy-preserving techniques in deep learning based disease prediction systems,” pp. 0–9

  24. 24.

    Ge C, Susilo W, Liu Z, Xia J, Szalachowski P, Liming F (2020) Secure keyword search and data sharing mechanism for cloud computing. IEEE Trans Dependable Secur Comput pp. 1–1

  25. 25.

    Ren Y et al (2020) Data query mechanism based on hash computing power of blockchain in Internet of Things. Sensors 20(1):207

    Article  Google Scholar 

  26. 26.

    Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(05):557–570

    MathSciNet  MATH  Article  Google Scholar 

  27. 27.

    Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) L-diversity: privacy beyond k-anonymity. In: 22nd international conference on data engineering (ICDE’06), pp. 24–24

  28. 28.

    Ninghui L, Tiancheng, L, Venkatasubramanian S (2007), t-Closeness: privacy beyond k-anonymity and ℓ-diversity. In: Proceedings – international conference on data engineering, pp. 106–115

  29. 29.

    Prakash M, Singaravel G (2012) A new model for privacy preserving sensitive Data Mining. In: 2012 3rd international conference on computing, communication and networking technologies, ICCCNT 2012

  30. 30.

    Li N, Li T, Venkatasubramanian S (2010) Closeness: a new privacy measure for data publishing. IEEE Trans Knowl Data Eng 22(7):943–956

    Article  Google Scholar 

  31. 31.

    Prakash M, Singaravel G (2015) An approach for prevention of privacy breach and information leakage in sensitive data mining. Comput Electr Eng 45:134–140

    Article  Google Scholar 

  32. 32.

    Abdelhameed SA, Moussa SM, Khalifa ME (2019) Restricted sensitive attributes-based sequential anonymization (RSA-SA) approach for privacy-preserving data stream publishing. Knowl-Based Syst 164:1–20

    Article  Google Scholar 

  33. 33.

    Rana ME, Jayabalan M, Aasif MA (2016), Privacy preserving anonymization techniques for patient data: an overview. In: Third international congress on technology, communication and knowledge (ICTCK 2016

  34. 34.

    Guo K, Zhang Q (2013) Fast clustering-based anonymization approaches with time constraints for data streams. Knowl-Based Syst 46:95–108

    Article  Google Scholar 

  35. 35.

    He X, Chen HH, Chen Y, Dong Y, Wang P, Huang Z (2012) Clustering-based k-anonymity. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 7301 LNAI (1): 405–417

  36. 36.

    Wong R, Li J, Fu A, Wang K (2009) (Α, K)-anonymous data publishing. J Intell Inf Syst 33(2):209–234

    Article  Google Scholar 

  37. 37.

    Al Ameen M, Liu J, Kwak K (2012) Security and privacy issues in wireless sensor networks for healthcare applications. J Med Syst 36(1):93–101

    Article  Google Scholar 

  38. 38.

    Meingast M, Roosta T, Sastry S (2006) Security and privacy issues with health care information technology. In: 2006 international conference of the IEEE engineering in medicine and biology society, vol. 1, pp. 5453–5458

  39. 39.

    Li H, Guo F, Zhang W, Wang J, Xing J (2018) (a,k)-anonymous scheme for privacy-preserving data collection in IoT-based healthcare services systems. J Med Syst 42(3):56

    Article  Google Scholar 

  40. 40.

    Zhang N, Wang S, Zhao W (2005) A new scheme on privacy-preserving data classification. In: Proceeding of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining – KDD ′05, p. 374

  41. 41.

    Kim JW, Jang B, Yoo H (2018) Privacy-preserving aggregation of personal health data streams. PLoS One 13(11):e0207639

    Article  Google Scholar 

  42. 42.

    Huang M, Chen Y, Chen BW, Liu J, Rho S, Ji W (2016) A semi-supervised privacy-preserving clustering algorithm for healthcare. Peer-to-Peer Netw Appl 9(5):864–875

    Article  Google Scholar 

  43. 43.

    Sajjad H, Kanwal T, Anjum A, Malik SR, Khan A, Khan A, Manzoor U (2019) An efficient privacy preserving protocol for dynamic continuous data collection. Comput Secur 86:358–371

    Article  Google Scholar 

  44. 44.

    Sei Y, Okumura H, Takenouchi T, Ohsuga A (2019) Anonymization of sensitive quasi-identifiers for l-diversity and t-closeness. IEEE Trans Dependable Secur Comput 16(4):580–593

    Article  Google Scholar 

  45. 45.

    Wang G, Lu R, Huang C, Guan YL (2019) An efficient and privacy-preserving pre-clinical guide scheme for mobile eHealthcare. J Inf Secur Appl 46:271–280

    Google Scholar 

  46. 46.

    Odelu V, Saha S, Prasath R, Sadineni L, Conti M, Jo M (2019) Efficient privacy preserving device authentication in WBANs for industrial e-health applications. Comput Secur 83:300–312

    Article  Google Scholar 

  47. 47.

    Arfaoui A, Kribeche A, Senouci S-M (2019) Context-aware anonymous authentication protocols in the internet of things dedicated to e-health applications. Comput Netw 159:23–36

    Article  Google Scholar 

  48. 48.

    Laurent M, Leneutre J, Chabridon S, Laaouane I (2019) Authenticated and privacy-preserving consent management in the Internet of Things. Procedia Comput Sci 151:256–263

    Article  Google Scholar 

  49. 49.

    Zhu Y, Li X (2020) Privacy-preserving k-means clustering with local synchronization in peer-to-peer networks. Peer-to-Peer Netw Appl, pp. 1–13

  50. 50.

    Lu Y, Sinnott RO (2018) Semantic privacy-preserving framework for electronic health record linkage. Telemat Informatics 35(4):737–752

    Article  Google Scholar 

  51. 51.

    Truta TM, Campan A, Sun X (2012) An overview of p-sensitive k-anonymity models for microdata anonymization. Int J Uncertain Fuzziness Knowl-Based Syst 20(06):819–837

    MathSciNet  Article  Google Scholar 

  52. 52.

    Anjum A, Malik SR, Choo KKR, Khan A, Haroon A, Khan S, Khan SU, Ahmad N, Raza B (2018) An efficient privacy mechanism for electronic health records. Comput Secur 72:196–211

    Article  Google Scholar 

  53. 53.

    Jiang H-W, Wang Y-F, Xiong H-L (2016) The k-anonymity approach for data-publishing based on clustering partition. In: Wireless communication and sensor network, pp. 423–428

  54. 54.

    Boussada R, Hamdane B, Elhdhili ME, Saidane LA (2019) Privacy-preserving aware data transmission for IoT-based e-health. Comput Netw 162:106866

    Article  Google Scholar 

  55. 55.

    Yang Y, Zheng X, Guo W, Liu X, Chang V (2018) Privacy-preserving fusion of IoT and big data for e-health. Futur Gener Comput Syst 86:1437–1455

    Article  Google Scholar 

  56. 56.

    Li T, Gao C, Jiang L, Pedrycz W, Shen J 2018 Publicly verifiable privacy-preserving aggregation and its application in IoT. J Netw Comput Appl.

  57. 57.

    Zhang Y, Deng RH, Han G, Zheng D (2018) Secure smart health with privacy-aware aggregate authentication and access control in Internet of Things. J Netw Comput Appl 123:89–100

    Article  Google Scholar 

  58. 58.

    Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):571–588

    MathSciNet  MATH  Article  Google Scholar 

  59. 59.

    Miyakawa S, Saji N, Mori T (2012) Location L-diversity against multifarious inference attacks. In: Proceedings – 2012 IEEE/IPSJ 12th international symposium on applications and the internet, SAINT 2012 1:1–10

    Google Scholar 

  60. 60.

    Soria-Comas J, Domingo-Ferrer J, Sanchez D, Martinez S (2016) T-closeness through microaggregation: strict privacy with enhanced utility preservation. In: 2016 IEEE 32nd international conference on data engineering, ICDE 2016, pp. 1464–1465

  61. 61.

    Truta TM, Vinay B (2006) Privacy protection: P-sensitive k-anonymity property. In: ICDEW 2006 – proceedings of the 22nd international conference on data engineering workshops

  62. 62.

    Sun X, Sun L, Wang H (2011) Extended k-anonymity models against sensitive attribute disclosure. Comput Commun 34(4):526–535

    Article  Google Scholar 

  63. 63.

    Wong RC-W, Li J, Fu AW-C, Wang K (2006) (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. BT – proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining. Philadelphia, PA, USA, August 20–23, 2006, pp. 754–759

  64. 64.

    Abdelhameed SA, Moussa SM, Khalifa ME (2018) Privacy-preserving tabular data publishing: a comprehensive evaluation from web to cloud. Comput Secur 72:74–95

    Article  Google Scholar 

  65. 65.

    Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027

    Article  Google Scholar 

  66. 66.

    Amiri F, Yazdani N, Shakery A, Chinaei AH (2016) Hierarchical anonymization algorithms against background knowledge attack in data releasing. Knowl-Based Syst 101:71–89

    Article  Google Scholar 

  67. 67.

    Shu X, Yao D, Bertino E (2015) Privacy-preserving detection of sensitive data exposure. IEEE Trans Inf Forensics Secur 10(5):1092–1103

    Article  Google Scholar 

  68. 68.

    Gronau I, Moran S (2007) Optimal implementations of UPGMA and other common clustering algorithms. Inf Process Lett 104(6):205–210

    MathSciNet  MATH  Article  Google Scholar 

  69. 69.

    Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining – KDD ′02, p. 279

  70. 70.

    Koch C, Karras P, Kalnis P, Mamoulis N (2007) Fast data anonymization with low information loss. Proc. 33rd Int. Conf. Very large data bases, p. 1444

  71. 71.

    Bayardo RJ, Agrawal R (2005), Data privacy through optimal k-anonymization. In: 21st international conference on data engineering (ICDE’05), pp. 217–228

  72. 72.

    Lin JL, Wei MC (2008) An efficient clustering method for k-anonymization. In: ACM international conference proceeding series, vol. 331, pp. 46–50

  73. 73.

    Xu Y, Ma T, Tang M, Tian W (2014) A survey of privacy preserving data publishing using generalization and suppression. Appl Math Inf Sci 8(3):1103

    Article  Google Scholar 

  74. 74.

    Maheshwarkar N, Pathak K (2011) Privacy issues for K-anonymity model. Vivekananad Chourey/Int J Eng Res Appl wwwijeracom 1:1857–1861

    Google Scholar 

  75. 75.

    Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, vol. 23, pp. 223–228

  76. 76.

    Diaz C, Troncoso C, Danezis G (2007) Does additional information always reduce anonymity?. In: WPES’07 – proceedings of the 2007 ACM workshop on privacy in electronic society, pp. 72–75

  77. 77.

    A mathematical theory of communication – Shannon – 1948 – bell system technical Journal – Wiley Online Library. [Online]. Available: Accessed: 12-Mar-2020

  78. 78.

    Zargar ST, Joshi J, Tipper D (2013) A survey of defense mechanisms against distributed denial of service (DDOS) flooding attacks. IEEE Commun Surv Tutorials 15(4):2046–2069

    Article  Google Scholar 

  79. 79.

    Huraj L, Šimon M, Horák T (2020) Resistance of IoT sensors against DDoS attack in smart home environment. Sensors 20(18):5298

    Article  Google Scholar 

  80. 80.

    Ravi N, Shalinie SM (2020) Learning-driven detection and mitigation of DDoS attack in IoT via SDN-cloud architecture. IEEE Internet Things J 7(4):3559–3570

    Article  Google Scholar 

  81. 81.

    Ngo Q-D, Nguyen H-T, Nguyen L-C, Nguyen D-H (2020) A survey of IoT malware and detection methods based on static features. ICT Express

  82. 82.

    Kumar A, Lim TJ (2019) EDIMA: early detection of IoT malware network activity using machine learning techniques. In: IEEE 5th world forum on Internet of Things, WF-IoT 2019 – conference proceedings, pp. 289–294

  83. 83.

    Pudukotai Dinakarrao SM, Sayadi H, Makrani HM, Nowzari C, Rafatirad S, Homayoun H (2019) Lightweight node-level malware detection and network-level malware confinement in IoT networks. In: Proceedings of the 2019 design, automation and test in Europe conference and exhibition, date 2019, pp. 776–781

  84. 84.

    UCI machine learning repository: adult data set. [Online]. Available: [Accessed: 02-Mar-2019]

  85. 85.

    Zakerzadeh H, Osborn SL (2011) FAANST: fast anonymizing algorithm for numerical streaming data. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 6514 LNCS, pp. 36–50

  86. 86.

    Cao J, Carminati B, Ferrari E, Tan KL (2011) CASTLE: continuously anonymizing data streams. IEEE Trans Dependable Secur Comput 8(3):337–352

    Article  Google Scholar 

  87. 87.

    Poulis G, Loukides G, Gkoulalas-Divanis A, Skiadopoulos S (2013) Anonymizing data with relational and transaction attributes. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 8190 LNAI, no. PART 3, pp. 353–369

Download references

Author information



Corresponding author

Correspondence to J. Andrew Onesimu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection: Special Issue on Privacy-Preserving Computing

Guest Editors: Kaiping Xue, Zhe Liu, Haojin Zhu, Miao Pan and David S.L. Wei

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Onesimu, J.A., Karthikeyan, J. & Sei, Y. An efficient clustering-based anonymization scheme for privacy-preserving data collection in IoT based healthcare services. Peer-to-Peer Netw. Appl. (2021).

Download citation


  • Privacy-preserving
  • IoT
  • Healthcare data
  • Anonymization
  • K-anonymity
  • Data collection