Abstract
Privacy preserving data mining and statistical disclosure control propose several perturbative methods to protect the privacy of the respondents. Such perturbation can introduce inconsistencies to the sensitive data. Due to this, data editing techniques are used in order to ensure the correctness of the collected data before and after the anonymization.
In this paper we propose a methodology to protect microdata based on noise addition that takes data edits into account. Informally, when adding noise causes a constraint to fail, we apply a process of noise swapping to preserve the edit constraint. We check its suitability against the constrained microaggregation, a method for microaggregation that avoids the introduction of such inconsistencies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Yu, P.S.: Privacy Preserving Data Mining: Models and Algorithms. Springer, Heidelberg (2008)
Brand, R.: Microdata protection through noise. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 97–116. Springer, Heidelberg (2002)
Clifton, C., Marks, D.: Security and privacy implications of data mining. In: Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp. 15–19 (1996)
Kim, J.J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA Section on Survey Research Methodology, pp. 303–308. American Statistical Association, Alexandria (1986)
Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference datasets to test and compare SDC methods for protection of numerical microdata. Technical report, European Project IST-2000-25069 CASC (2002)
Domingo-Ferrer, J., Mateo-Sanz, J.M., Torra, V.: Comparing SDC methods methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS 2001, Luxemburg, Eurostat, vol. 2, pp. 807–826 (2001)
Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregatio. Computers and Mathematics with Applications 55(4), 714–732 (2005)
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.V. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–133. Elsevier, Amsterdam (2001)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)
U.S. Census Bureau. Data Extraction System, http://www.census.gov/
Chambers, R.: Evaluation criteria for statistical editing and imputation. National Statistics Methodological series No.28 (January 2001)
Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proc. of 92 Symposium on Design and Analysis of Longitudinal Surveys, Statistics Canada, pp. 195–204 (1993)
De Waal, T.: An overview of statistical data editing. Statistics Netherlands (2008)
Domingo-Ferrer, J., Torra, V., Mateo-Sanz, J.M., Sebé, F.: Empirical Disclosure risk assessment of the ipso synthetic data generators. In: Monographs in Official Statistics-Work Session On Statistical Data Confidenciality, Luxemburg, Eurostat, pp. 227–238 (2006)
Domingo-Ferrer, J., Mateo-Sanz, J.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14(1), 189–201 (2002)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery, 195–212 (January 2005)
Granquist, L.: The new view on editing. Int. Statistical Review 65(3), 381–387 (1997)
Hansen, S., Mukherjee, S.: A Polynomial Algorithm for Optimal Univariate Microaggregation. IEEE Trans. on Knowledge and Data Engineering 15(4), 1043–1044 (2003)
Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Mining and Knowledge Discovery 11, 181–193 (2005) ISSN: 1384-5810
Moore, R.: Controlled data swapping techniques for masking public use microdata sets, U. S. Bureau of the Census (1996) (unpublished manuscript)
Nin, J., Herranz, J., Torra, V.: Rethinking Rank Swapping to Decrease Disclosure Risk. Data and Knowledge Engineering 64(1), 346–364 (2008)
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal of the United Nations Economic Commision for Europe 18(4), 345–353 (2001)
O’Leary, D.E.: Knowledge Discovery as a Threat to Database Security. In: Proceedings of the 1st International Conference on Knowledge Discovery and Databases, pp. 107–516 (1991)
Pierzchala, M.: A review of the state of the art in automated data editing and imputation. In: Statistical Data Editing, Vol. 1, Conference of European Statisticians Statistical Standards and Studies N. 44, UN Statistical Commission and Economic Commission for Europe, pp. 10–40 (1995)
Shlomo, N., De Waal, T.: Protection of Micro-data Subjecto to Edit Constraints Against Statistical Disclousure. Journal of Official Statistics 24(2), 229–253 (2008)
Spruill, N.L.: The Confidentiality and Analytic Usefulness of Masked Business Microdata. In: Proceedings of the Section on Survey Research Methods, pp. 602–610. American Statistical Association (1983)
Torra, V.: Microaggregation for categorical variables: A median based approach. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 162–174. Springer, Heidelberg (2004)
Torra, V.: Constrained microaggregation: Adding constraints for data editing. Transactions on Data Privacy 1(2), 86–104 (2008)
Vaidya, J., Clifton, C., Zhu, M.: Privacy Preserving Data Mining. Springer, Heidelberg (2006)
Willenborg, L., De Waal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001)
Disclosure risk and probabilistic information loss measures for continuous microdata web site, http://ppdm.iiia.csic.es/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cano, I., Torra, V. (2011). Edit Constraints on Microaggregation and Additive Noise. In: Dimitrakakis, C., Gkoulalas-Divanis, A., Mitrokotsa, A., Verykios, V.S., Saygin, Y. (eds) Privacy and Security Issues in Data Mining and Machine Learning. PSDML 2010. Lecture Notes in Computer Science(), vol 6549. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19896-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-19896-0_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19895-3
Online ISBN: 978-3-642-19896-0
eBook Packages: Computer ScienceComputer Science (R0)