Abstract
The aim is to show how statistical disclosure methods can be applied to data using the R-packages sdcMicro and sdcTable.
The reader of this chapter should be advised how popular methods in microdata protection and tabular protection can be applied within these packages to real-world data.
sdcMicro supports an exploratory approach for the anonymization of both categorical key variables and numerical variables. Hereby, global recoding, local suppression, and risk estimation can be applied interactively. Furthermore, various popular methods for microdata protection will be briefly described, but also some new methods for microdata protection and disclosure risk estimation considering real-life data problems will be introduced.
Additionally, a description of how tabular protection can be applied using the R-package sdcTable is given. The most challenging part from the user point of view is the preliminary data preparation before tabular protection can be applied. In this case, meta information about the hierarchical variables defining the table must be provided by the user.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Benedetti R. and Franconi L. Statistical and technological solutions for controlled data dissemination. In: Pre-Proceedings of New Techniques and Technologies for Statistics, Sorrento, Italy, pp. 225–570, 1998.
Bethlehem J.G., Keller W.J., and Pannekoek J. Disclosure control of microdata. Journal of the American Statistical Association, 85(409):38–45 1990.
Berkelaar M. lpSolve: Interface to Lp solve v. 5.5 to solve linear/integer programs. R Package Version 5.6.4, 2008.
Brand R. Microdata protection through noise addition. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 3050, Springer, New York, NY, pp. 347–359, 2004.
Capobianchi A., Polettini S., and Lucarelli M. Strategy for the implementation of individual risk methodology into m-ARGUS In: Report for the CASC project. No 1.2-D1, 2001.
Carlson M. Assessing microdata disclosure risk using the Poisson-inverse Gaussian distribution. Statistics in Transition, 5:901–925, 2002.
Castro J. and Baena D. Using a mathematical programming modelling language for optimal CTA. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 5262, Springer, New York, NY, pp. 1–12, 2008.
Cox L.H., Linear sensitivity measures in statistical disclosure control. Journal of Statistical Planning and Inference, 75:153–164, 1981.
Dalenius T. and Reiss S.P. (1982) Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference, vol. 6, pp. 73–85, 1982.
Defays D. and Nanopoulos P. Panels of enterprises and confidentiality: The small aggregates method. In: Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, Statistics Canada, Ottawa, pp. 195–204, 1993.
Defays D. and Anwar M.N. Masking microdata using micro-aggregation. Journal of Official Statistics, 14:449–461, 1998.
DeWolf P.P. HiTaS: A heuristic approach to cell suppression in hierarchical tables. In: Inference Control in Statistical Databases, Lecture Notes in Computer Sciences, vol. 2316, Springer, New York, NY, pp. 81–98, 2002.
Domingo-Ferrer J., Mateo-Sanz J.M. and Torra V. Comparing SDC methods for microdata on the basis of information loss and disclosure risk In: Pre-proceedings of ETK-NTTS, Springer, New York, NY, vol. 2, pp. 807–826, 2001.
Domingo-Ferrer J. and Mateo-Sanz J.M. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transcations on Knowledge and Data Engineering, 14(1):189–201, 2002.
Fischetti M. and Salazar-Gonz´alez J.J. Models and algorithms for optimizing cell suppression in tabular data with linear constraints. Journal of the American Statistical Association, 95:916–928, 1999.
Fischetti M. and Salazar-Gonz´alez J.J. Complementary cell suppression for statistical disclosure control in tabular data with linear constraints. Journal of the American Statistical Association, 95:916–928, 2000.
Forster J.J. and Webb E.L. Bayesian disclosure risk assessment: Predicting small frequencies in contingency tables. Journal of the Royal Statistical Society, C 56:551–570, 2007.
Franconi L. and Polettini S. Individual risk estimation in m-Argus: A review. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 3050, Springer, New York, NY, pp. 262–272, 2004.
Gouweleeuw J., Kooiman P., Willenborg L. and DeWolf P-P. Post-randomisation for statistical disclosure control: Theory and implementation. Journal of Official Statistics, 14(4):463– 478, 1998.
Hundepool A., Domingo-Ferrer J., Franconi L., Giessing S., Lenz R., Longhurst J., Schulte Nordholt E., Seri G., and De Wolf P. Handbook on Statistical Disclosure Control, 2007. http://neon.vb.cbs.nl/casc/Handbook.htm
Kelly J.P., Golden B.L., and Assad A.A. Using simulated annealing to solve controlled rounding problems. Annals of Operations Research, 2(2):174–190, 1990.
Kim J.J. A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, Alexandria, Louisiana, USA, pp. 303–308, 1986.
Kooiman P., Willenborg L., and Gouweleeuw J. A method for disclosure limitation of microdata. In: Research Paper 9705, Statistics Netherlands, Voorburg, 1997.
Lawrence M. and Temple Lang D. R bindings for Gtk 2.8.0 and above. R Package Version 2.12.7, 2008.
Mateo-Sanz J.M., Domingo-Ferrer J., and Sebé F. Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Mining and Knowledge Discovery, 11:181–193, 2004.
Merola G. Generalized risk measures for tabular data. In: Proceedings of the 54th Session of the International Statistical Institute, Berlin, Germany, 2003.
Polletini S. and Seri G. Strategy for the implementation of individual risk methodology into m-ARGUS. Report for the CASC Project No 1.2-D1, 2004.
Repsilber D. Sicherung persönlicher Angaben in Tabellendaten. Statistische Analysen und Studien Nordrhein-Westfalen 1:24–35, 2002.
Rinott Y. On models for statistical disclosure risk estimation. In: Proceedings of the joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Luxembourg, pp. 275–285, 2003.
Rinott Y. and Shlomo N. A generalized negative binomial smoothing model for sample disclosure risk estimation. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 4203, Springer, New York, NY, pp. 82–93, 2006.
Rousseeuw P. Multivariate Estimation with High Breakdown Point. Privacy in Statistical Databases. Mathematical Statistics and Applications. Akademiai Kiado, Budapest, pp 283–297, 1985.
Salazar-Gonz´alez J.J. Controlled rounding and cell perturbation: Statistical disclosure limitation methods for tabular data. Mathematical Programming, 105:583–603, 2005.
Samarati P. and Sweeney L. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In: SRI Intl. Tech. Report, 1998.
Samarati P. Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6):1010–1027, 2001.
Skinner C.J. and Holmes D.J. Estimating the re-identification risk per record in microdata. Journal of Official Statistics, 14:361–372, 1998.
Skinner C.J. and Shlomo N. Assessing identification risk in survey microdata using loglinear models. In: S3RI Methodology Working Papers, M06/14, University of Southampton, Southampton Statistical Sciences Research Institute, 2006.
Sweeney L. k-anonymity: A model for protecting privacy. International Journal on Uncertainty Fuzziness and Knowledge-based Systems 10(5):557–570, 2002.
Takemura A. Statistical Data Protection Eurostat, Luxembourg, pp. 45–58, 1999.
Templ M. Software development for SDC in R. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 4203, Springer, New York, NY, pp. 347–359, 2006.
Templ M. sdcMicro: A new flexible R-package for the generation of anonymised microdata – design issues and new methods. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Manchester, UK, 2007.
Templ M. and Meindl B. Robustification of microdata masking methods and the comparison with existing methods. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 5262, Springer, New York, NY, pp. 177–189, 2008.
Templ M. Statistical disclosure control for microdata using the R-package sdcMicro. Transactions on Data Privacy, 1(2):67–85, 2008.
Templ M. sdcMicro: Statistical Disclosure Control methods for the generation of public- and scientific-use files. Manual and Package. R Package Version 2.6.3, 2009.
Templ M. New Developments in Statistical Disclosure Control and Imputation: Robust Statistics Applied to Official Statistics. Suedwestdeutscher Verlag fuer Hochschulschriften, 2009.
Ting D., Fienberg S., and Trottini M. ROMM methodology for microdata release. In: Monographs of Official Statistics, Work Session on Statistical Data Confidentiality. Eurostat, Luxembourg, 2005.
Verzani J. and Lawrence M. gWidgetsRGtk2: Toolkit implementation of gWidgets for RGtk2. Published online <http://cran.r-project.org/web/packages/gWidgetsRGtk2/index.html>, 2009.
Willenborg L. and De Waal T. Elements of statistical disclosure control. Springer-Verlag, New York, NY, 2000.
Yancey W.E., Winkler W.E., and Creecy R.H. Disclosure risk assessment in perturbative microdata protection. In: Inference Control in Statistical Databases. Lecture Notes in Computer Science, Springer, New York, NY, pp. 49–60, 2002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer London
About this chapter
Cite this chapter
Templ, M., Meindl, B. (2010). Practical Applications in Statistical Disclosure Control Using R. In: Nin, J., Herranz, J. (eds) Privacy and Anonymity in Information Management Systems. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84996-238-4_3
Download citation
DOI: https://doi.org/10.1007/978-1-84996-238-4_3
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-84996-237-7
Online ISBN: 978-1-84996-238-4
eBook Packages: Computer ScienceComputer Science (R0)