Anonymized noise addition in subspaces for privacy preserved data mining in high dimensional continuous data

Virupaksha, Shashidhar; Dondeti, Venkatesulu

doi:10.1007/s12083-021-01080-y

Anonymized noise addition in subspaces for privacy preserved data mining in high dimensional continuous data

Published: 29 January 2021

Volume 14, pages 1608–1628, (2021)
Cite this article

Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

395 Accesses
11 Citations
Explore all metrics

Abstract

Data privacy is a major concern in data mining. Privacy-preserving data mining algorithms have been used for preserving privacy in data mining. However, privacy-preserving data mining on high dimensional continuous data leads to high data loss, information loss and identifying clusters are very difficult. In this paper, a novel technique Anonymized Noise Addition in Subspaces (ANAS) is proposed, which reduces data loss, information loss and enhances identification of clusters and privacy. Anonymization using aggregation is performed in dense and non-dense subspaces considering Euclidean distances to reduce data loss and enhance privacy. Random noise within the subspace limits is then applied to anonymized subspaces to enhance identification of clusters and reduce data loss. ANAS is run on benchmark datasets, and results show that ANAS can identify 80% of the original dataset clusters on sparse datasets, whereas the existing techniques do not identify any clusters. ANAS reduces data loss by 50%, information loss by 20% and enhances privacy by 40%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Comprehensive Survey of Anomaly Detection Algorithms

Article 26 November 2021

Durgesh Samariya & Amit Thakkar

K-Means algorithm based on multi-feature-induced order

Article 09 April 2024

Benting Wan, Weikang Huang, … Shufen Zhou

A comprehensive survey of anomaly detection techniques for high dimensional big data

Article Open access 02 July 2020

Srikanth Thudumu, Philip Branch, … Jugdutt (Jack) Singh

References

Taipale, Kim A (2003) Data mining and domestic security: Connecting the dots to make sense of data Columbia Science and Technology Law Review. 5(2)
Dittrich D, Kenneally E (2011) The Menlo report: ethical principles guiding information and communication technology research. US Department of Homeland Security
Sweeney L (2002) k-anonymity: A model for protecting privacy. In Int J Uncertain Fuzziness and Knowledge-based Syst volume 10:557–570
Article MathSciNet Google Scholar
Li T, Venkatasubramanian S (2010) t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. IEEE TKDE 22(7)
Gaby G, Iqbal M and Fung B (2015) Fusion: privacy-preserving distributed protocol for high-dimensional data Mashup IEEE 21st international conference on parallel and distributed systems
Google Scholar
Liew C, Choi C, Liew J (1985) A data distortion by probability distribution ACM trans. Database Syst (TODS) 10(3):395–411
Article Google Scholar
Brand R (2002) Microdata protection through noise addition. Lecture Notes in Computer Science London: Springer
Matthias T, Alexander K, Bernhard M (2015) Statistical disclosure control for micro-data using the R package sdcMicro. J Stat Softw 67(4):1–36. https://doi.org/10.18637/jss.v067.i04
Article Google Scholar
Templ M. (2017) Disclosure risk. In: Statistical Disclosure Control for Microdata. Springer, 49–87,
Panagopoulos P Pappu V Xanthopoulos P, Pardalos PM (2015) Constrained subspace classifier for high dimensional datasets. Omega https://doi.org/10.1016/j.omega-.2015.05.-009i
Beyer K, Goldstein J (1999) When is nearest neighbor meaningful?’ Proc 7th Int Conf database theory. In: Database theory –ICDT’99, vol 1540, pp 217–235
Chapter Google Scholar
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD 6(1):90–105
Article Google Scholar
Kriegal HP, Kroger P, Zimek A (2009) Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering and correlation clustering ACM transactions on knowledge discovery from data, 3
Agrawal R, Gehrke J, Gunopulos D, Raghavan R (2005) Automatic subspace clustering of high dimensional data for data mining applications. Data Min Knowl Disc 11(1):5–33
Article Google Scholar
Sweeney, L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int. J Uncertainty Fuzziness Knowledge Based Syst, 10(5):571–588, 2002
Ashwin M, Daniel K, Johannes G, Venkatasubramaniam M (2007) l-diversity: Privacy beyond k-anonymity in ACM Transactions on Knowledge Discovery from Data (TKDD). 1(1):3
Li T, Venkatasubramanian S (2010) t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. IEEE Trans Know Data Eng 22(7)
Defays D, Nanopoulos P (1992) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the symposium on design and analysis of longitudinal surveys. Statistics Canada, Ottawa, pp 195–204
Google Scholar
Defays DA, MN. (1998) Masking microdata using micro-aggregation. J Off Stat 14(4):449–461
Google Scholar
Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201
Article Google Scholar
Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Know Data Eng 17(7):902–911
Article Google Scholar
Lefons E, Silvestri A, Tangorra F (1983) An Analytic Approach to Statistical Databases. Proc. Ninth Int’l Conf. Very Large Data Bases:260–274
Agrawal R, Srikant R (2000) Privacy-preserving data mining. ACM SIGMOD Rec 29(2):439–450
Article Google Scholar
KimJJ, Winkler WE (2003) Multiplicative noise for masking continuous data, statist. Res. Division, U.S. bureau census, Washington, DC, USA, tech. Rep
Liu K, Kargupta H, Ryan J (2006) Random projection- based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Know Data Eng 18
Yi X, Zhang Y (2013) Equally contributory privacy preserving k-means clustering over vertically partitioned data. Inf Syst 38(1):97–107
Article Google Scholar
Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 206–215
Clifton C, Kantarcioglou M, Lin X, Zhu M (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2)
Zaman MA, Taniar D, Smith AT (2005) PPDAM: privacy- preserving distributed association rule mining algorithm. IJIIT 1(1):49–69
Google Scholar
Fung BW, Wang K, L. and Hung, P. C. K. (2009) Privacy preserving data publishing for cluster analysis. Data Knowl Eng 68:552–575
Article Google Scholar
Kumar P, Varma KI, Sureka A (2011) Fuzzy based clustering algorithm for privacy preserving data mining. Int J Bus Inf Syst 7(1):27–40
Google Scholar
Onashoga S, Bamiro B, Akinwale J, Oguntuase J (2017) KC-slice: A dynamic privacy preserving data publishing technique for multi sensitive attributes. Inf Secur J : A Glob Perspect 26(3):121–135
Google Scholar
Wang Y, Xiang Y, Singh A (2015) Differentially private subspace clustering. NIPS'15 proceedings of the 28th international conference on neural information processing systems. 1000-1008. Research collection school of information systems
Hamm JH (2015) Preserving privacy of continuous high dimensional data with Minimax filters proceedings of the 18th international conference on artificial intelligence and statistics (AISTATS) San Diego, CA, USA JMLR: W&CP volume 38
Xing K, Hu C, Yu J (2017) Mutual privacy preserving K-means clustering in social participatory sensing. IEEE Transactions on Industrial Informatics 13(4):2066–2076
Article Google Scholar
Purohit R, Bhargava D (2017) An illustration to secured way of data mining using privacy preserving data mining. Journal of Statistics and Management Systems 20(4):637–645
Article Google Scholar
Xin Y, Qiang Y, Yang X (2017) The privacy preserving method for dynamic trajectory releasing based on adaptive clustering. Information Sciences 378:131–143
Article Google Scholar
Waluyo AB, Taniar D, Rahayu W and Srinivasan B (2018) A Dual Privacy Preserving Approach for Location-Based Services Mobile Multicast Environment Mobile Netw Appl 23: 34. 2018 https://doi.org/10.1007/s11036-017-0898-6
Liu L, Li L (2018) A clustering 퐾 –anonymity privacy-preserving method for wearable IoT devices. Secur Commun Netw 2018:1–8. https://doi.org/10.1155/2018/4945152
Article Google Scholar
Zheng XL, Tian G, L and B. Xiao, B. (2018) Privacy preserved community discovery in online social networks. Futur Gener Comput Syst
Fanyu B (2018) A High-Order Clustering Algorithm Based on Dropout Deep Learning for Heterogeneous. Data Cyber-Phys-Soc Syst IEEE Access 6:11687–11693
Google Scholar
Cao H, Liu S, Wu L, Guan Z, Du X (2018) Achieving differential privacy against non-intrusive load monitoring in smart grid: a fog computing approach. Concurr. Comput. Pract. Exp
Talat, R. Obaidat, M. Muzammal, M. A (2020) Decentralised approach to privacy preserving trajectory mining future Gener. Comput Syst, 102 382–392
Fan W, He J, Guo M, Li P, Han Z, Wang R (2010) Privacy preserving classification on local differential privacy in data centers. J Parallel Distrib Comput 135:70–82
Article Google Scholar
Shaham S, Ding M, Liu B, Dang S, Lin Z, Li J Privacy preserving location data publishing: A machine learning approach. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2020.2964658
Agrawal R, Gehrke J, Gunopulos D, Raghavan R (1998) Austomatic subspace clustering of high dimensional data for data mining applications. In: Proc. of 1998 ACM SIGMOD Int. Conf. On Management of Data, pp 94–105
Chapter Google Scholar
Agrawal R, Gehrke J, Gunopulos D, Raghavan R (2005) Automatic subspace clustering of high dimensional data for data mining applications. Data Min Knowl Disc 11(1):5–33
Article Google Scholar
Josep MM-S, Joseph F (1998) A comparative study of microaggregation methods. Qüestió 22:511–526
Hansen PJ, Mladenovic B, N. (1998) Minimum sum of squares clustering in a low dimensional space. J Classif. 15:37–55
Article MathSciNet Google Scholar
Ward J (1963) Optimal grouping to optimize an optimal Function. J Am Stat Assoc. 58:236–244
Article Google Scholar
Shashidhar V, Venkatesulu D (2019) Subspace-based aggregation for enhancing utility, information measures, and cluster identification in privacy preserved data mining on high-dimensional continuous data. In J Comput Appl Taylor and Francis England DOI:1–10. https://doi.org/10.1080/1206212X.2019.1686211
Shashidhar V, Venkatesulu, D. (2020) Subspace based noise addition for privacy preserved data mining on high dimensional continuous data ambient intelligence and humanized computing, Springer Germany https://doi.org/10.1007/s12652-020-01881-8
R Core Team R (2017) A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.Rproject.org/
M. Hassani and M. Hansen (2015) subspace: Interface to OpenSubspace. R package version 1.0.4 https://CRAN.project.org-/package=subspace
Mateo-Sanz J, Domingo-Ferrer J, Sebe F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Mining Knowl Dis 11:181–193
Article MathSciNet Google Scholar
Asuncion, A. and Newman, D. J. (2007) UCI Machine Learning Repository [http://www.ics.uci.edu-/~mlearn/MLRepository.html]
Bertino E, Fovino F, Provenza LP (2005) A Framework for Evaluating Privacy Preserving Data Mining Algorithms Data Mining and Knowledge Discovery 11:121–154
Google Scholar
Hussaeni K, Fung B, Cheung W (2014) Privacy preserving trajectory stream publishing’. Data Knowl Eng:89–109
Dalenius T (1977) Towards a methodology for statistical disclosure control. Statistisk Tidskrift 5:429–444
Google Scholar
Tao Y, Chen H, Xiao X, Zhou S, Zhang D (2009) Angel: enhancing the utility of generalization for privacy preserving publication. IEEE Trans Knowl Data Eng 21(7):1073–1087
Article Google Scholar
Carrizosa E, Gómez A, Morales D (2017) Clustering categories in support vector machines. Omega 66:28–37
Article Google Scholar
Nergiz M, Atzori M, Saygin Y, Guc Y (2009) Towards trajectory anonymization: A generalization-based approach. Trans Data Privacy 2(1):47–75
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, VFSTR Deemed to be University, Guntur, India
Shashidhar Virupaksha & Venkatesulu Dondeti
Department of CSE, Presidency University, Bengaluru, India
Shashidhar Virupaksha

Authors

Shashidhar Virupaksha
View author publications
You can also search for this author in PubMed Google Scholar
Venkatesulu Dondeti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shashidhar Virupaksha.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection: Special Issue on Privacy-Preserving Computing

Guest Editors: Kaiping Xue, Zhe Liu, Haojin Zhu, Miao Pan and David S.L. Wei

Rights and permissions

Reprints and permissions

About this article

Cite this article

Virupaksha, S., Dondeti, V. Anonymized noise addition in subspaces for privacy preserved data mining in high dimensional continuous data. Peer-to-Peer Netw. Appl. 14, 1608–1628 (2021). https://doi.org/10.1007/s12083-021-01080-y

Download citation

Received: 29 July 2020
Accepted: 13 January 2021
Published: 29 January 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s12083-021-01080-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Anonymized noise addition in subspaces for privacy preserved data mining in high dimensional continuous data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Anomaly Detection Algorithms

K-Means algorithm based on multi-feature-induced order

A comprehensive survey of anomaly detection techniques for high dimensional big data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Anonymized noise addition in subspaces for privacy preserved data mining in high dimensional continuous data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Anomaly Detection Algorithms

K-Means algorithm based on multi-feature-induced order

A comprehensive survey of anomaly detection techniques for high dimensional big data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation