An Overview of the Use of Clustering for Data Privacy

Torra, Vicenç; Navarro-Arribas, Guillermo; Stokes, Klara

doi:10.1007/978-3-319-24211-8_10

Vicenç Torra³,
Guillermo Navarro-Arribas⁴ &
Klara Stokes³

6136 Accesses
2 Citations

Abstract

In this chapter we review some of our results related to the use of clustering in the area of data privacy. The paper gives a brief overview of data privacy and, more specifically, on data driven methods for data privacy and discusses where clustering can be applied in this setting. We discuss the role of clustering in the definition of masking methods, and on the calculation of information loss and data utility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abril, D., Navarro-Arribas, G., Torra, V.: Towards semantic microaggregation of categorical data for confidential documents. Modeling Decisions for Artificial Intelligence. Lecture Notes in Computer Science, vol. 6408, pp. 266–276. Springer, Heidelberg (2010)
Google Scholar
Abril, D., Navarro-Arribas, G., Torra, V.: Spherical microaggregation: Anonymizing sparse vector spaces. Comput. Secur. 49, 28–44 (2015)
Article Google Scholar
Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Semantic anonymisation of set-valued data. In: Proceedings of the 6th International Conference on Agents and Artificial Intelligence (ICAART) vol. 1, pp. 102–112 (2014)
Google Scholar
Batet, M., Erola, A., Sánchez D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)
Article Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)
Book MATH Google Scholar
Byun, J.-W., Sohn, Y., Bertino, E., Li, N.: Secure anonymization for incremental datasets. In: Secure Data Management. Lecture Notes in Computer Science, pp. 48–63. Springer, Heidelberg (2006)
Google Scholar
Campello, R.J.G.B.: A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recogn. Lett. 28(7), 833–841 (2007)
Article Google Scholar
Cao, J., Carminati, B., Ferrari, E., Tan, K.-L.: CASTLE: continuously anonymizing data streams. IEEE Trans. Dependable Secure Comput. 8, 337–352 (2011)
Article Google Scholar
De Capitani di Vimercati, S., Foresti, S., Livraga, G., Samarati, P.: Data privacy: definitions and techniques. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 20(6), 793–817 (2012)
Article Google Scholar
Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceeding of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204. Statistics Canada (1993)
Google Scholar
DMOZ: The Open Directory Project. www.dmoz.org (2015)
Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci. 180, 2834–2844 (2010)
Article Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Article Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M., Torra, V.: Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS’2001 (Eurostat, ISBN 92-894-1176-5), vol. 2, pp. 807–826. Creta, Greece (2001)
Google Scholar
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. Elsevier (2001)
Google Scholar
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–134. North-Holland, Amsterdam, The Netherlands (2001)
Google Scholar
Domingo-Ferrer, J., Torra, V.: Towards fuzzy c-means based microaggregation. In: Grzegorzewski, P., Hryniewicz, O., Gil, M.A. (eds.) Soft Methods in Probability and Statistics, pp. 289–294. Physica, Heidelberg (2002)
Google Scholar
Domingo-Ferrer, J., Torra, V.: Fuzzy microaggregation for microdata protection. J. Adv. Comput. Intell. Intell. Inform. 7(2), 153–159 (2003)
MATH Google Scholar
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Disc. 11(2), 195–212 (2005)
Article MathSciNet Google Scholar
Erola, A., Castellà-Roca, J., Navarro-Arribas, G., Torra, V.: Semantic microaggregation for the anonymization of query logs using the open directory project. SORT Stat. Oper. Res. 35, Trans. 41–58 (2011)
Google Scholar
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT, Cambridge 1998
MATH Google Scholar
Feder, T., Nabar, S.U., Terzi, E.: Anonymizing graphs. CoRR abs/0810.5578 (2008)
Google Scholar
Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference Very Large Data Bases, pp. 758–769 (2007)
Google Scholar
Hansen, S.L., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15(4), 1043–1044 (2003)
Article Google Scholar
Hay, M., Miklau, G., Jensen, D.: Anonymizing social networks. In: Proceedings of the VLDB Endowment (2008)
Google Scholar
Hore, B., Jammalamadaka, R.C., Mehrotra, S.: Flexible anonymization for privacy preserving data publishing: a systematic search based approach. In: Proceedings of the 7th SIAM International Conference on Data Mining (2007)
Google Scholar
Hüllermeier, E., Rifqi, M.: A fuzzy variant of the rand index for comparing clustering structures. In: Proceedings of IFSA-EUSFLAT (2009)
Google Scholar
Ladra, S., Torra, V.: On the comparison of generic information loss measures and cluster-specific ones. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 16(1) 107–120 (2008)
Article Google Scholar
Laszlo, M., Mukherjee, S.: Optimal univariate microaggregation with data suppression. J. Syst. Softw. 86, 677–682 (2013)
Article Google Scholar
Laszlo, M., Mukherjee, S.: Iterated local search for microaggregation. J. Syst. Softw. 100, 15–26 (2015)
Article Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of International Conference on Data Engineering (2006)
Book Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: T-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the IEEE ICDE (2007)
Google Scholar
Liu, J., Wang, K.: Anonymizing bag-valued sparse data by semantic similarity-based clustering. Knowl. Inf. Syst. 35, 435–461 (2013)
Article Google Scholar
Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceeding of the SIGMOD (2008)
Book Google Scholar
Martínez, S., Sánchez, D., Valls, A., Batet, M.: Privacy protection of textual attributes through a semantic-based masking method. Inf. Fusion 13(4), 304–314 (2012)
Article Google Scholar
Martínez, S., Sánchez, D., Valls, A.: Semantic adaptive microaggregation of categorical microdata. Comput. Secur. 31(5), 653–672 (2012)
Article Google Scholar
Miyamoto, S.: Introduction to Fuzzy Clustering (in Japanese). Morikita, Tokyo (1999)
Google Scholar
Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering. Springer, Berlin (2008)
MATH Google Scholar
Navarro-Arribas, G., Abril, D., Torra, V.: Dynamic anonymous index for confidential data. Data Privacy Management and Autonomous Spontaneous Security. Lecture Notes in Computer Science, vol. 8247, pp. 362–368. Springer Berlin Heidelberg, Germany (2014)
Google Scholar
Nin, J., Herranz, J., Torra, V.: On the disclosure risk of multivariate microaggregation. Data Knowl. Eng. 67, 399–412 (2008)
Article Google Scholar
Nin, J., Herranz, J., Torra, V.: How to Group Attributes in Multivariate Microaggregation. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 16(1), 121–138 (2008)
Article Google Scholar
Nin, J., Torra, V.: Analysis of the univariate microaggregation disclosure risk. N. Gener. Comput. 27, 177–194 (2009)
Article MATH Google Scholar
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. U. N. Econ. Comm. Eur. 18(4), 345–353 (2001)
Google Scholar
Pei, J., Xu, J., Wang, Z., Wang, W., Wang, K.: Maintaining K-anonymity against incremental updates. In: Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007 (SSBDM, 2007), pp. 5–5 (2007)
Google Scholar
Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13, 1010–1027 (2001)
Article Google Scholar
Solanas, A., Martínez-Balleste, A., Domingo-Ferrer, J., Mateo-Sanz, J.M.: A 2d-tree-based blocking method for microaggregating very large data sets. In: The First International Conference on Availability, Reliability and Security (ARES) (2006)
Google Scholar
Solanas, A., Pietro, R.D.: A linear-time multivariate micro-aggregation for privacy protection in uniform very large data sets. Modeling Decisions for Artificial Intelligence. Lecture Notes in Computer Science, pp. 203–214. Springer, Heidelberg (2008)
Google Scholar
Solé, M., Muntés-Mulero, V., Nin, J.: Efficient microaggregation techniques for large numerical data volumes. Int. J. Inf. Secur. 11, 253–267 (2012)
Article Google Scholar
Stokes, K.: Graph k-anonymity through k-means and as modular decomposition. In: Proceedings of the NordSec 2013. Lecture Notes in Computer Science, vol. 8208, pp. 263–278. (2013)
Article Google Scholar
Stokes, K., Torra, V.: n-Confusion: a generalization of k-anonymity. In: Proceedings of the 5th International Workshop on Privacy and Anonymity in the Information Society (PAIS). Berlin, Germany (2012)
Google Scholar
Stokes,K., Torra, V.: Multiple releases of k-anonymous data sets and k-anonymous relational databases. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 20(06), 839–853 (2012)
Article MathSciNet Google Scholar
Stokes, K., Torra, V.: On some clustering approaches for graphs. In: Proceeding of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011) (ISBN 978-1-4244-7315-1), pp. 409–415. Taipei, Taiwan (2011)
Google Scholar
Stokes, K., Torra, V.: Reidentification and k-anonymity: a model for disclosure risk in graphs. Soft. Comput. 16(10), 1657–1670 (2012)
Article MATH Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 10, 557–570 (2002)
Google Scholar
Torra, V.: Microaggregation for categorical variables: a median based approach. In: Proceeding of the Privacy in Statistical Databases (PSD 2004). Lecture Notes in Computer Science, vol. 3050, pp. 162–174 (2004)
Article Google Scholar
Torra, V. (2015) A fuzzy microaggregation algorithm using fuzzy c-means, Proc. CCIA 2015, Volume 277: Artificial Intelligence Research and Development, IOS Press, 214–223 DOI: 10.3233/978-1-61499-578-4-214
Torra, V., Miyamoto, S.: Evaluating fuzzy clustering algorithms for microdata protection. Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 3050, pp. 175–186 (2004)
Article Google Scholar
Torra, V., Narukawa, Y.: Modeling Decisions: Information Fusion and Aggregation Operators. Springer, Heidelberg (2007)
MATH Google Scholar
Truta, T.M., Campan, A.: K-anonymization incremental maintenance and optimization techniques. In: Proceeding of the 2007 ACM Symposium on Applied Computing, pp. 380–387 (2007)
Google Scholar
Vaidya, J., Clifton, C., Zhu, M.: Privacy Preserving Data Mining. Springer, New York (2006)
MATH Google Scholar
Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 689–700. ACM (2007)
Google Scholar
Zhou, B., Pei. J.: Preserving privacy in social networks against neighborhood attacks. In: Proceeding of the ICDE 2008 (2008)
Google Scholar

Download references

Acknowledgements

Partial support by the Spanish MEC (projects TIN2011-27076-C03-03 and TIN2014-55243-P) is acknowledged.

Author information

Authors and Affiliations

University of Skövde, Skövde, Sweden
Vicenç Torra & Klara Stokes
Universitat Autònoma de Barcelona, Barcelona, Spain
Guillermo Navarro-Arribas

Authors

Vicenç Torra
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Navarro-Arribas
View author publications
You can also search for this author in PubMed Google Scholar
Klara Stokes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vicenç Torra .

Editor information

Editors and Affiliations

Computer Science, Louisiana State University in Shreveport, Shreveport, Louisiana, USA
M. Emre Celebi
North American University, Houston, Texas, USA
Kemal Aydin

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Torra, V., Navarro-Arribas, G., Stokes, K. (2016). An Overview of the Use of Clustering for Data Privacy. In: Celebi, M., Aydin, K. (eds) Unsupervised Learning Algorithms. Springer, Cham. https://doi.org/10.1007/978-3-319-24211-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-24211-8_10
Published: 30 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24209-5
Online ISBN: 978-3-319-24211-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics