A Multi-phase k-anonymity Algorithm Based on Clustering Techniques
We proposed a new k-anonymity algorithm to publish datasets with privacy protection. We improved clustering techniquesto lower data distort and enhance diversity of sensitive attributes values. Our algorithm includes four phases. Tuples are distributed to several groups in phase one. Tuples in a group own same sensitive value. In phase two, groups smaller than the threshold merge and then they are partitioned into several clusters according to quasi-identifier attributes. Each cluster would become an equivalence class. In phase three, remainder tuples are distributed to clusters evenly to satisfy L-diversity. Finally, quasi-identifier attributes values in each cluster are generalized to satisfy k-anonymity. We used OCC dataset to compare our algorithm with classic method based on clustering. Empirical results showed that our algorithm could be used to publish datasets with high security and limited information loss.
Keywordsprivacy protection k-anonymity cluster L-diversity
Unable to display preview. Download preview PDF.
- 2.Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: VLDB 2005, pp. 901–909 (2005)Google Scholar
- 3.Aggarwal, G., Feder, T., Kenthapadi, K., Zhu, A., Panigrahy, R., Thomas, D.: Achieving anonymity via clustering in a metric space. In: PODS, pp. 153–162 (2006)Google Scholar
- 5.EnamulKabir, M., Wang, H., Bertino, E.: Efficient Systematic Clustering Method for k-Anonymization. ActaInformatic 48(1), 51–66 (2011)Google Scholar
- 7.Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)Google Scholar
- 9.MPC Data Projects, http://ipums.org
- 10.He, Y., Barman, S., Naughton, J.F.: Preventing Equivalence Attacks in Updated,Anonymized Data. In: ICDE, pp. 529–540 (2011)Google Scholar