Abstract
In this chapter we review some of our results related to the use of clustering in the area of data privacy. The paper gives a brief overview of data privacy and, more specifically, on data driven methods for data privacy and discusses where clustering can be applied in this setting. We discuss the role of clustering in the definition of masking methods, and on the calculation of information loss and data utility.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abril, D., Navarro-Arribas, G., Torra, V.: Towards semantic microaggregation of categorical data for confidential documents. Modeling Decisions for Artificial Intelligence. Lecture Notes in Computer Science, vol. 6408, pp. 266–276. Springer, Heidelberg (2010)
Abril, D., Navarro-Arribas, G., Torra, V.: Spherical microaggregation: Anonymizing sparse vector spaces. Comput. Secur. 49, 28–44 (2015)
Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Semantic anonymisation of set-valued data. In: Proceedings of the 6th International Conference on Agents and Artificial Intelligence (ICAART) vol. 1, pp. 102–112 (2014)
Batet, M., Erola, A., Sánchez D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)
Byun, J.-W., Sohn, Y., Bertino, E., Li, N.: Secure anonymization for incremental datasets. In: Secure Data Management. Lecture Notes in Computer Science, pp. 48–63. Springer, Heidelberg (2006)
Campello, R.J.G.B.: A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recogn. Lett. 28(7), 833–841 (2007)
Cao, J., Carminati, B., Ferrari, E., Tan, K.-L.: CASTLE: continuously anonymizing data streams. IEEE Trans. Dependable Secure Comput. 8, 337–352 (2011)
De Capitani di Vimercati, S., Foresti, S., Livraga, G., Samarati, P.: Data privacy: definitions and techniques. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 20(6), 793–817 (2012)
Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceeding of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204. Statistics Canada (1993)
DMOZ: The Open Directory Project. www.dmoz.org (2015)
Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci. 180, 2834–2844 (2010)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Domingo-Ferrer, J., Mateo-Sanz, J.M., Torra, V.: Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS’2001 (Eurostat, ISBN 92-894-1176-5), vol. 2, pp. 807–826. Creta, Greece (2001)
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. Elsevier (2001)
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–134. North-Holland, Amsterdam, The Netherlands (2001)
Domingo-Ferrer, J., Torra, V.: Towards fuzzy c-means based microaggregation. In: Grzegorzewski, P., Hryniewicz, O., Gil, M.A. (eds.) Soft Methods in Probability and Statistics, pp. 289–294. Physica, Heidelberg (2002)
Domingo-Ferrer, J., Torra, V.: Fuzzy microaggregation for microdata protection. J. Adv. Comput. Intell. Intell. Inform. 7(2), 153–159 (2003)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Disc. 11(2), 195–212 (2005)
Erola, A., Castellà-Roca, J., Navarro-Arribas, G., Torra, V.: Semantic microaggregation for the anonymization of query logs using the open directory project. SORT Stat. Oper. Res. 35, Trans. 41–58 (2011)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT, Cambridge 1998
Feder, T., Nabar, S.U., Terzi, E.: Anonymizing graphs. CoRR abs/0810.5578 (2008)
Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference Very Large Data Bases, pp. 758–769 (2007)
Hansen, S.L., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15(4), 1043–1044 (2003)
Hay, M., Miklau, G., Jensen, D.: Anonymizing social networks. In: Proceedings of the VLDB Endowment (2008)
Hore, B., Jammalamadaka, R.C., Mehrotra, S.: Flexible anonymization for privacy preserving data publishing: a systematic search based approach. In: Proceedings of the 7th SIAM International Conference on Data Mining (2007)
Hüllermeier, E., Rifqi, M.: A fuzzy variant of the rand index for comparing clustering structures. In: Proceedings of IFSA-EUSFLAT (2009)
Ladra, S., Torra, V.: On the comparison of generic information loss measures and cluster-specific ones. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 16(1) 107–120 (2008)
Laszlo, M., Mukherjee, S.: Optimal univariate microaggregation with data suppression. J. Syst. Softw. 86, 677–682 (2013)
Laszlo, M., Mukherjee, S.: Iterated local search for microaggregation. J. Syst. Softw. 100, 15–26 (2015)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of International Conference on Data Engineering (2006)
Li, N., Li, T., Venkatasubramanian, S.: T-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the IEEE ICDE (2007)
Liu, J., Wang, K.: Anonymizing bag-valued sparse data by semantic similarity-based clustering. Knowl. Inf. Syst. 35, 435–461 (2013)
Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceeding of the SIGMOD (2008)
Martínez, S., Sánchez, D., Valls, A., Batet, M.: Privacy protection of textual attributes through a semantic-based masking method. Inf. Fusion 13(4), 304–314 (2012)
Martínez, S., Sánchez, D., Valls, A.: Semantic adaptive microaggregation of categorical microdata. Comput. Secur. 31(5), 653–672 (2012)
Miyamoto, S.: Introduction to Fuzzy Clustering (in Japanese). Morikita, Tokyo (1999)
Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering. Springer, Berlin (2008)
Navarro-Arribas, G., Abril, D., Torra, V.: Dynamic anonymous index for confidential data. Data Privacy Management and Autonomous Spontaneous Security. Lecture Notes in Computer Science, vol. 8247, pp. 362–368. Springer Berlin Heidelberg, Germany (2014)
Nin, J., Herranz, J., Torra, V.: On the disclosure risk of multivariate microaggregation. Data Knowl. Eng. 67, 399–412 (2008)
Nin, J., Herranz, J., Torra, V.: How to Group Attributes in Multivariate Microaggregation. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 16(1), 121–138 (2008)
Nin, J., Torra, V.: Analysis of the univariate microaggregation disclosure risk. N. Gener. Comput. 27, 177–194 (2009)
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. U. N. Econ. Comm. Eur. 18(4), 345–353 (2001)
Pei, J., Xu, J., Wang, Z., Wang, W., Wang, K.: Maintaining K-anonymity against incremental updates. In: Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007 (SSBDM, 2007), pp. 5–5 (2007)
Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13, 1010–1027 (2001)
Solanas, A., Martínez-Balleste, A., Domingo-Ferrer, J., Mateo-Sanz, J.M.: A 2d-tree-based blocking method for microaggregating very large data sets. In: The First International Conference on Availability, Reliability and Security (ARES) (2006)
Solanas, A., Pietro, R.D.: A linear-time multivariate micro-aggregation for privacy protection in uniform very large data sets. Modeling Decisions for Artificial Intelligence. Lecture Notes in Computer Science, pp. 203–214. Springer, Heidelberg (2008)
Solé, M., Muntés-Mulero, V., Nin, J.: Efficient microaggregation techniques for large numerical data volumes. Int. J. Inf. Secur. 11, 253–267 (2012)
Stokes, K.: Graph k-anonymity through k-means and as modular decomposition. In: Proceedings of the NordSec 2013. Lecture Notes in Computer Science, vol. 8208, pp. 263–278. (2013)
Stokes, K., Torra, V.: n-Confusion: a generalization of k-anonymity. In: Proceedings of the 5th International Workshop on Privacy and Anonymity in the Information Society (PAIS). Berlin, Germany (2012)
Stokes,K., Torra, V.: Multiple releases of k-anonymous data sets and k-anonymous relational databases. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 20(06), 839–853 (2012)
Stokes, K., Torra, V.: On some clustering approaches for graphs. In: Proceeding of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011) (ISBN 978-1-4244-7315-1), pp. 409–415. Taipei, Taiwan (2011)
Stokes, K., Torra, V.: Reidentification and k-anonymity: a model for disclosure risk in graphs. Soft. Comput. 16(10), 1657–1670 (2012)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 10, 557–570 (2002)
Torra, V.: Microaggregation for categorical variables: a median based approach. In: Proceeding of the Privacy in Statistical Databases (PSD 2004). Lecture Notes in Computer Science, vol. 3050, pp. 162–174 (2004)
Torra, V. (2015) A fuzzy microaggregation algorithm using fuzzy c-means, Proc. CCIA 2015, Volume 277: Artificial Intelligence Research and Development, IOS Press, 214–223 DOI: 10.3233/978-1-61499-578-4-214
Torra, V., Miyamoto, S.: Evaluating fuzzy clustering algorithms for microdata protection. Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 3050, pp. 175–186 (2004)
Torra, V., Narukawa, Y.: Modeling Decisions: Information Fusion and Aggregation Operators. Springer, Heidelberg (2007)
Truta, T.M., Campan, A.: K-anonymization incremental maintenance and optimization techniques. In: Proceeding of the 2007 ACM Symposium on Applied Computing, pp. 380–387 (2007)
Vaidya, J., Clifton, C., Zhu, M.: Privacy Preserving Data Mining. Springer, New York (2006)
Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 689–700. ACM (2007)
Zhou, B., Pei. J.: Preserving privacy in social networks against neighborhood attacks. In: Proceeding of the ICDE 2008 (2008)
Acknowledgements
Partial support by the Spanish MEC (projects TIN2011-27076-C03-03 and TIN2014-55243-P) is acknowledged.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Torra, V., Navarro-Arribas, G., Stokes, K. (2016). An Overview of the Use of Clustering for Data Privacy. In: Celebi, M., Aydin, K. (eds) Unsupervised Learning Algorithms. Springer, Cham. https://doi.org/10.1007/978-3-319-24211-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-24211-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24209-5
Online ISBN: 978-3-319-24211-8
eBook Packages: EngineeringEngineering (R0)