Clustering Heuristics for Efficient t-closeness Anonymisation

Kayem, Anne V. D. M.; Meinel, Christoph

doi:10.1007/978-3-319-64471-4_3

Anne V. D. M. Kayem¹⁹ &
Christoph Meinel¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10439))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1064 Accesses
4 Citations

Abstract

Anonymisation based on t-closeness is a privacy-preserving method of publishing micro-data that is safe from skewness, and similarity attacks. The t-closeness privacy requirement for publishing micro-data requires that the distance between the distribution of a sensitive attribute in an equivalence class, and the distribution of sensitive attributes in the whole micro-data set, be no greater than a threshold value of t. An equivalence class is a set records that are similar with respect to certain identifying attributes (quasi-identifiers), and a micro-data set is said to be t-close when all such equivalence classes satisfy t-closeness. However, the t-closeness anonymisation problem is NP-Hard. As a performance efficient alternative, we propose a t-clustering algorithm with an average time complexity of \(O(m^{2} \log n)\) where n and m are the number of tuples and attributes, respectively. We address privacy disclosures by using heuristics based on noise additions to distort the anonymised datasets, while minimising information loss. Our experiments indicate that our proposed algorithm is time efficient and practically scalable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: \(l\)-diversity: privacy beyond \(k\)-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 1–52 (2007). Article 3
Article Google Scholar
Kifer, D., Machanavajjhala, A.: No free lunch in data privacy. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 193–204. ACM, New York (2011)
Google Scholar
Li, N., Li, T., Venkitasubramaniam, S.: \(t\)-closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 106–115 (2007)
Google Scholar
Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Article MathSciNet MATH Google Scholar
Aggarwal, C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Databases, VLDB 2005, pp. 901–909. VLDB Endowment (2005)
Google Scholar
Bayardo, R.J., Agrawal, R.: Data privacy through optimal \(k\)-anonymization. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 217–228. IEEE (2005)
Google Scholar
Liu, K., Giannella, C., Kargupta, H.: A survey of attack techniques on privacy-preserving data perturbation methods. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining. Advances in Database Systems, vol. 34, pp. 359–381. Springer, Boston (2008). doi:10.1007/978-0-387-70992-5_15
Chapter Google Scholar
Shmueli, E., Tassa, T.: Privacy by diversity in sequential releases of databases. Inf. Sci. 298, 344–372 (2015)
Article MATH Google Scholar
Xiao, X., Yi, K., Tao., Y.: The hardness of approximation algorithms for l-diversity. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 135–146. ACM, New York (2010)
Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288. ACM, New York (2002)
Google Scholar
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Anonymizing tables. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 246–258. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30570-5_17
Chapter Google Scholar
Ciriani, V., Tassa, T., De Capitani Di Vimercati, S., Foresti, S., Samarati, P.: Privacy by diversity in sequential releases of databases. Inf. Sci. 298, 344–372 (2015)
Article Google Scholar
Aggarwal, C.C.: On unifying privacy and uncertain data models. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 386–395. IEEE, Washingtion, D.C. (2008)
Google Scholar
Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: Models and Algorithms, 1st edn. Springer Publishing Company Incorporated, New York (2008)
Book Google Scholar
Lin, J.-L., Wei, M.-C.: Genetic algorithm-based clustering approach for k-anonymization. Expert Syst. Appl. 36(6), 9784–9792 (2009)
Article Google Scholar
Shmueli, E., Tassa, T., Wasserstein, R., Shapira, B., Rokach, L.: Limiting disclosure of sensitive data in sequential releases of databases. Inf. Sci. 191, 98–127 (2012)
Article MATH Google Scholar
Aggarwal, C.C.: Data Mining: The Textbook. Springer, Cham (2015)
Book MATH Google Scholar
Xiao, Q., Reiter, K., Zhang, Y.: Mitigating storage side channels using statistical privacy mechanisms. In: Proceedings of 22nd ACM SIGSAC Conference on Computer Communications Security, CCS 2015, pp. 1582–1594. ACM, New York (2015)
Google Scholar
Meyerson, A., Williams, R.: On the complexity of optimal \(k\)-anonymity. In: Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on the Principles of Database Systems, PODS 2004, pp. 223–228. ACM, New York (2004)
Google Scholar
Dondi, R., Mauri, G., Zoppis, I.: On the complexity of the l-diversity problem. In: Murlak, F., Sankowski, P. (eds.) MFCS 2011. LNCS, vol. 6907, pp. 266–277. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22993-0_26
Chapter Google Scholar
Ciglic, M., Eder, J., Koncilia, C.: k-anonymity of microdata with NULL values. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds.) DEXA 2014. LNCS, vol. 8644, pp. 328–342. Springer, Cham (2014). doi:10.1007/978-3-319-10073-9_27
Google Scholar
Liang, H., Yuan, H.: On the complexity of t-closeness anonymization and related problems. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7825, pp. 331–345. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37487-6_26
Chapter Google Scholar
Kabir, M.E., Wang, H., Bertino, E., Chi, Y.: Systematic clustering method for \(l\)-diversity model. In: Proceedings of the Twenty-First Australasian Conference on Database Technologies, ADC 2010, Brisbane, Australia, vol. 104, pp. 93–102 (2010)
Google Scholar
Aggarwal, G., Panigrahy, R., Feder, T., Thomas, D., Kenthapadi, K., Khuller, S., Zhu, A.: Achieving anonymity via clustering. ACM Trans. Algorithms 6(3), 1–19 (2010). ACM, New York
Article MathSciNet MATH Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml

Download references

Author information

Authors and Affiliations

Faculty of Digital Engineering, Hasso-Plattner-Institute for Digital Engineering GmbH, University of Potsdam, Prof.-Dr.-Helmert Str. 2-3, 14440, Potsdam, Germany
Anne V. D. M. Kayem & Christoph Meinel

Authors

Anne V. D. M. Kayem
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Meinel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anne V. D. M. Kayem .

Editor information

Editors and Affiliations

University of Lyon, Villeurbanne, France
Djamal Benslimane
University of Milan, Milan, Italy
Ernesto Damiani
University of Michigan, Dearborn, Michigan, USA
William I. Grosky
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
Wright State University, Dayton, Ohio, USA
Amit Sheth
Johannes Kepler University, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kayem, A.V.D.M., Meinel, C. (2017). Clustering Heuristics for Efficient t-closeness Anonymisation. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10439. Springer, Cham. https://doi.org/10.1007/978-3-319-64471-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-64471-4_3
Published: 02 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64470-7
Online ISBN: 978-3-319-64471-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics