A generalization model for multi-record privacy preservation
- 18 Downloads
Privacy preservation becomes a more and more serious problem in data publication, which has drawn dramatic attention in research and development. Recently, several privacy preservation models and algorithms have been proposed for publishing data. However, most of the previous methods suffer from more than one drawback as follows: (i) Could not be used on multi-record datasets. (ii) Only guarantee one-way generalization. (iii) User privacy preferences are ignored. In order to satisfy higher privacy requirements and make it suitable for multi-record publishing datasets, a bidirectional personalized generalization (BP-generalization) model is proposed as a new solution in this paper. The rational is to focus anonymous objects on both relational and set-valued information. First, we merge tuples with the same attribute values in multi-record datasets to ensure the validity of quasi-identifier anonymity. Second, by enforcing l-diversity on equivalence groups and k-anonymity on fingerprint buckets respectively, privacy preservation model may resist bi-directional chain attack. Finally, a new hierarchical generalization strategy is also proposed for personal privacy preservation of sensitive attributes, then different generalization rules can be adopted for different levels of sensitive values. Extensive experimental results on two datasets show that the performance of our method is better than state-of-art techniques in terms of efficiency and information loss.
KeywordsPrivacy preservation Data publication Multi-record microdata Generalization
Compliance with ethical standards
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
- Acs G, Achara JP, Castelluccia C (2015) Probabilistic km-anonymity efficient anonymization of large set-valued datasets. In: 2015 IEEE international conference on big data (Big Data), pp 1164–1173Google Scholar
- Ghinita G, Karras P, Kalnis P, Mamoulis N (2007) Fast data anonymization with low information loss. In: 33rd international conference on very large data bases, VLDB 2007–conference proceedings, pp 758 – 769Google Scholar
- LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: 22nd International conference on data engineering (ICDE’06) vol 1, p 25Google Scholar
- Ni S, Xie M, Qian Q (2017) Clustering based k-anonymity algorithm for privacy preservation. IJ Netw Secur 19(6):1062–1071Google Scholar
- Poulis G, Loukides G, Gkoulalas-Divanis A, Skiadopoulos S (2013) Anonymizing data with relational and transaction attributes. Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8190 LNAI(PART 3), pp 353–369Google Scholar
- Sei Y, Okumura H, Takenouchi T, Ohsuga A (2017) Anonymization of sensitive quasi-identifiers for l-diversity and t-closeness. In: IEEE transactions on dependable and secure computing, pp 1–1Google Scholar
- Sopaoglu U, Abul O (2017) A top-down k-anonymization implementation for apache spark. In: 2017 IEEE international conference on big data (big data), pp 4513–4521Google Scholar
- Wang SL, Tsai YC, Kao HY, Hong TP (2011) Extending suppression for anonymization on set-valued data. Int J Innov Comput Inf Control 7(12):6849–6863Google Scholar
- Xiao X, Yi K, Tao Y (2010) The hardness and approximation algorithms for l-diversity. Advances in Database Technology—EDBT 2010. In: 13th International conference on extending database technology, proceedings, pp 135 – 146Google Scholar