Skip to main content

Practical Applications in Statistical Disclosure Control Using R

  • Chapter
  • First Online:
Book cover Privacy and Anonymity in Information Management Systems

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

Abstract

The aim is to show how statistical disclosure methods can be applied to data using the R-packages sdcMicro and sdcTable.

The reader of this chapter should be advised how popular methods in microdata protection and tabular protection can be applied within these packages to real-world data.

sdcMicro supports an exploratory approach for the anonymization of both categorical key variables and numerical variables. Hereby, global recoding, local suppression, and risk estimation can be applied interactively. Furthermore, various popular methods for microdata protection will be briefly described, but also some new methods for microdata protection and disclosure risk estimation considering real-life data problems will be introduced.

Additionally, a description of how tabular protection can be applied using the R-package sdcTable is given. The most challenging part from the user point of view is the preliminary data preparation before tabular protection can be applied. In this case, meta information about the hierarchical variables defining the table must be provided by the user.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Benedetti R. and Franconi L. Statistical and technological solutions for controlled data dissemination. In: Pre-Proceedings of New Techniques and Technologies for Statistics, Sorrento, Italy, pp. 225–570, 1998.

    Google Scholar 

  2. Bethlehem J.G., Keller W.J., and Pannekoek J. Disclosure control of microdata. Journal of the American Statistical Association, 85(409):38–45 1990.

    Article  Google Scholar 

  3. Berkelaar M. lpSolve: Interface to Lp solve v. 5.5 to solve linear/integer programs. R Package Version 5.6.4, 2008.

    Google Scholar 

  4. Brand R. Microdata protection through noise addition. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 3050, Springer, New York, NY, pp. 347–359, 2004.

    Google Scholar 

  5. Capobianchi A., Polettini S., and Lucarelli M. Strategy for the implementation of individual risk methodology into m-ARGUS In: Report for the CASC project. No 1.2-D1, 2001.

    Google Scholar 

  6. Carlson M. Assessing microdata disclosure risk using the Poisson-inverse Gaussian distribution. Statistics in Transition, 5:901–925, 2002.

    Google Scholar 

  7. Castro J. and Baena D. Using a mathematical programming modelling language for optimal CTA. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 5262, Springer, New York, NY, pp. 1–12, 2008.

    Google Scholar 

  8. Cox L.H., Linear sensitivity measures in statistical disclosure control. Journal of Statistical Planning and Inference, 75:153–164, 1981.

    Article  Google Scholar 

  9. Dalenius T. and Reiss S.P. (1982) Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference, vol. 6, pp. 73–85, 1982.

    Article  MathSciNet  MATH  Google Scholar 

  10. Defays D. and Nanopoulos P. Panels of enterprises and confidentiality: The small aggregates method. In: Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, Statistics Canada, Ottawa, pp. 195–204, 1993.

    Google Scholar 

  11. Defays D. and Anwar M.N. Masking microdata using micro-aggregation. Journal of Official Statistics, 14:449–461, 1998.

    Google Scholar 

  12. DeWolf P.P. HiTaS: A heuristic approach to cell suppression in hierarchical tables. In: Inference Control in Statistical Databases, Lecture Notes in Computer Sciences, vol. 2316, Springer, New York, NY, pp. 81–98, 2002.

    Google Scholar 

  13. Domingo-Ferrer J., Mateo-Sanz J.M. and Torra V. Comparing SDC methods for microdata on the basis of information loss and disclosure risk In: Pre-proceedings of ETK-NTTS, Springer, New York, NY, vol. 2, pp. 807–826, 2001.

    Google Scholar 

  14. Domingo-Ferrer J. and Mateo-Sanz J.M. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transcations on Knowledge and Data Engineering, 14(1):189–201, 2002.

    Article  Google Scholar 

  15. Fischetti M. and Salazar-Gonz´alez J.J. Models and algorithms for optimizing cell suppression in tabular data with linear constraints. Journal of the American Statistical Association, 95:916–928, 1999.

    Article  Google Scholar 

  16. Fischetti M. and Salazar-Gonz´alez J.J. Complementary cell suppression for statistical disclosure control in tabular data with linear constraints. Journal of the American Statistical Association, 95:916–928, 2000.

    Article  Google Scholar 

  17. Forster J.J. and Webb E.L. Bayesian disclosure risk assessment: Predicting small frequencies in contingency tables. Journal of the Royal Statistical Society, C 56:551–570, 2007.

    Article  MathSciNet  Google Scholar 

  18. Franconi L. and Polettini S. Individual risk estimation in m-Argus: A review. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 3050, Springer, New York, NY, pp. 262–272, 2004.

    Google Scholar 

  19. Gouweleeuw J., Kooiman P., Willenborg L. and DeWolf P-P. Post-randomisation for statistical disclosure control: Theory and implementation. Journal of Official Statistics, 14(4):463– 478, 1998.

    Google Scholar 

  20. Hundepool A., Domingo-Ferrer J., Franconi L., Giessing S., Lenz R., Longhurst J., Schulte Nordholt E., Seri G., and De Wolf P. Handbook on Statistical Disclosure Control, 2007. http://neon.vb.cbs.nl/casc/Handbook.htm

  21. Kelly J.P., Golden B.L., and Assad A.A. Using simulated annealing to solve controlled rounding problems. Annals of Operations Research, 2(2):174–190, 1990.

    MATH  Google Scholar 

  22. Kim J.J. A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, Alexandria, Louisiana, USA, pp. 303–308, 1986.

    Google Scholar 

  23. Kooiman P., Willenborg L., and Gouweleeuw J. A method for disclosure limitation of microdata. In: Research Paper 9705, Statistics Netherlands, Voorburg, 1997.

    Google Scholar 

  24. Lawrence M. and Temple Lang D. R bindings for Gtk 2.8.0 and above. R Package Version 2.12.7, 2008.

    Google Scholar 

  25. Mateo-Sanz J.M., Domingo-Ferrer J., and Sebé F. Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Mining and Knowledge Discovery, 11:181–193, 2004.

    Article  Google Scholar 

  26. Merola G. Generalized risk measures for tabular data. In: Proceedings of the 54th Session of the International Statistical Institute, Berlin, Germany, 2003.

    Google Scholar 

  27. Polletini S. and Seri G. Strategy for the implementation of individual risk methodology into m-ARGUS. Report for the CASC Project No 1.2-D1, 2004.

    Google Scholar 

  28. Repsilber D. Sicherung persönlicher Angaben in Tabellendaten. Statistische Analysen und Studien Nordrhein-Westfalen 1:24–35, 2002.

    Google Scholar 

  29. Rinott Y. On models for statistical disclosure risk estimation. In: Proceedings of the joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Luxembourg, pp. 275–285, 2003.

    Google Scholar 

  30. Rinott Y. and Shlomo N. A generalized negative binomial smoothing model for sample disclosure risk estimation. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 4203, Springer, New York, NY, pp. 82–93, 2006.

    Google Scholar 

  31. Rousseeuw P. Multivariate Estimation with High Breakdown Point. Privacy in Statistical Databases. Mathematical Statistics and Applications. Akademiai Kiado, Budapest, pp 283–297, 1985.

    Google Scholar 

  32. Salazar-Gonz´alez J.J. Controlled rounding and cell perturbation: Statistical disclosure limitation methods for tabular data. Mathematical Programming, 105:583–603, 2005.

    Article  MathSciNet  Google Scholar 

  33. Samarati P. and Sweeney L. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In: SRI Intl. Tech. Report, 1998.

    Google Scholar 

  34. Samarati P. Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6):1010–1027, 2001.

    Article  Google Scholar 

  35. Skinner C.J. and Holmes D.J. Estimating the re-identification risk per record in microdata. Journal of Official Statistics, 14:361–372, 1998.

    Google Scholar 

  36. Skinner C.J. and Shlomo N. Assessing identification risk in survey microdata using loglinear models. In: S3RI Methodology Working Papers, M06/14, University of Southampton, Southampton Statistical Sciences Research Institute, 2006.

    Google Scholar 

  37. Sweeney L. k-anonymity: A model for protecting privacy. International Journal on Uncertainty Fuzziness and Knowledge-based Systems 10(5):557–570, 2002.

    Article  MathSciNet  MATH  Google Scholar 

  38. Takemura A. Statistical Data Protection Eurostat, Luxembourg, pp. 45–58, 1999.

    Google Scholar 

  39. Templ M. Software development for SDC in R. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 4203, Springer, New York, NY, pp. 347–359, 2006.

    Google Scholar 

  40. Templ M. sdcMicro: A new flexible R-package for the generation of anonymised microdata – design issues and new methods. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Manchester, UK, 2007.

    Google Scholar 

  41. Templ M. and Meindl B. Robustification of microdata masking methods and the comparison with existing methods. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 5262, Springer, New York, NY, pp. 177–189, 2008.

    Google Scholar 

  42. Templ M. Statistical disclosure control for microdata using the R-package sdcMicro. Transactions on Data Privacy, 1(2):67–85, 2008.

    MathSciNet  Google Scholar 

  43. Templ M. sdcMicro: Statistical Disclosure Control methods for the generation of public- and scientific-use files. Manual and Package. R Package Version 2.6.3, 2009.

    Google Scholar 

  44. Templ M. New Developments in Statistical Disclosure Control and Imputation: Robust Statistics Applied to Official Statistics. Suedwestdeutscher Verlag fuer Hochschulschriften, 2009.

    Google Scholar 

  45. Ting D., Fienberg S., and Trottini M. ROMM methodology for microdata release. In: Monographs of Official Statistics, Work Session on Statistical Data Confidentiality. Eurostat, Luxembourg, 2005.

    Google Scholar 

  46. Verzani J. and Lawrence M. gWidgetsRGtk2: Toolkit implementation of gWidgets for RGtk2. Published online <http://cran.r-project.org/web/packages/gWidgetsRGtk2/index.html>, 2009.

  47. Willenborg L. and De Waal T. Elements of statistical disclosure control. Springer-Verlag, New York, NY, 2000.

    Google Scholar 

  48. Yancey W.E., Winkler W.E., and Creecy R.H. Disclosure risk assessment in perturbative microdata protection. In: Inference Control in Statistical Databases. Lecture Notes in Computer Science, Springer, New York, NY, pp. 49–60, 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mathias Templ .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer London

About this chapter

Cite this chapter

Templ, M., Meindl, B. (2010). Practical Applications in Statistical Disclosure Control Using R. In: Nin, J., Herranz, J. (eds) Privacy and Anonymity in Information Management Systems. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84996-238-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-84996-238-4_3

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84996-237-7

  • Online ISBN: 978-1-84996-238-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics