Skip to main content

Data Privacy with \(R\)

  • Chapter
  • First Online:
Book cover Advanced Research in Data Privacy

Part of the book series: Studies in Computational Intelligence ((SCI,volume 567))

  • 1256 Accesses

Abstract

Privacy Preserving Data Mining (PPDM) is an application field, which is becoming very relevant. Its goal is the study of new mechanisms which allow the dissemination of confidential data for data mining tasks while preserving individual private information. Additionally, due to the relevance of \(R\) language in the statistics and data mining communities, it is undoubtedly a good environment to research, develop and test privacy techniques aimed to data mining. In this chapter we outline some helpful tools in \(R\) to introduce readers to that field, so that we present several PPDM protection techniques as well as their information loss and disclosure risk evaluation process and outline some tools in \(R\) to help to introduce practitioners to this field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abril, D., Navarro-Arribas, G., Torra, V.: Supervised learning using mahalanobis distance for record linkage. In: Proceedings of 6th International Summer School on Aggregation Operators—AGOP2011. pp. 223–228 (2011)

    Google Scholar 

  2. Abril, D., Navarro-Arribas, G., Torra, V.: Improving record linkage with supervised learning for disclosure risk assessment. Inf. Fusion 13(4), 274–284 (2012)

    Article  Google Scholar 

  3. Abril, D., Navarro-Arribas, G., Torra, V.: Choquet integral for record linkage. Ann. Oper. Res. 195, 97–110 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  4. Abril, D., Navarro-Arribas, G., Torra, V.: Towards a private vector space model for confidential documents. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing. pp. 944–945. SAC ’13, ACM, New York, NY, USA (2013) http://doi.acm.org/10.1145/2480362.2480543

  5. Agafitei, M., Defays, D.: Analysis of information loss in european data due to confidentiality. In: Joint UNECE/Eurostat work session on statistical data confidentiality (2011)

    Google Scholar 

  6. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD Conference on Management of Data. pp. 439–450. ACM Press (2000)

    Google Scholar 

  7. Brand, R.: Microdata protection through noise addition. In: Inference Control in Statistical Databases, from Theory to Practice. pp. 97–116. No. 2316 in Lecture Notes in Computer Science, Springer-Verlag (2002)

    Google Scholar 

  8. Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys. pp. 195–204. Statistics Canada (1993)

    Google Scholar 

  9. Domingo-Ferrer, J., Mateo-Sanz, J.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)

    Article  Google Scholar 

  10. Domingo-Ferrer, J., Rebollo-Monedero, D.: Measuring risk and utility of anonymized data using information theory. In: Privacy and Anonymity in the Information Society (PAIS’09), Proceedings of the 2009 EDBT/ICDT Workshops (EDBT/ICDT ’09). pp. 126–130. ACM (2009)

    Google Scholar 

  11. Domingo-Ferrer, J., Sebé, F., Castellà-Roca, J.: On the security of noise addition for privacy in statistical databases. In: Privacy in Statistical Databases. Lecture Notes In Computer Science, vol. 3050, pp. 149–161 (2004)

    Google Scholar 

  12. Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, disclosure, and data access : theory and practical applications for statistical agencies, pp. 111–133. Elsevier (2001)

    Google Scholar 

  13. Domingo-Ferrer, J., Torra, V.: Ordinal, continous and heterogeneous anonymity through microaggregation. Data Min. Knowl. Disc. 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  14. Hornik, K., Theussl, S.: Rglpk: R/GNU Linear Programming Kit Interface (2012), http://CRAN.R-project.org/package=Rglpk, R package version 0.3-8

  15. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985), http://dx.doi.org/10.1007/BF01908075

  16. Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. J. Am. Stat. Assoc. 84(406), 414–420 (1989)

    Article  Google Scholar 

  17. lp\_solve, Konis, K.: lpSolveAPI: R Interface for lp\_solve version 5.5.2.0 (2011), http://CRAN.R-project.org/package=lpSolveAPI, R package version 5.5.2.0-5

  18. Mateo-Sanz, J., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min. Knowl. Discov. 11(2), 181–193 (2005)

    Google Scholar 

  19. Moore, R.: Controlled data swapping techniques for masking public use microdata sets. U.S. Bureau of the Census (unpublished manuscript) (1996)

    Google Scholar 

  20. Navarro-Arribas, G., Torra, V.: Privacy-preserving data-mining through microaggregation for web-based e-commerce. Internet Res. 20(3), 366–384 (2010)

    Article  Google Scholar 

  21. Navarro-Arribas, G., Torra, V., Erola, A., Castellà -Roca, J.: User k-anonymity for privacy preserving data mining of query logs. Inf. Process. Manage. 48(3), 476–487 (2012)

    Article  Google Scholar 

  22. Nin, J., Torra, V.: Towards the evaluation of time series protection methods. Inf. Sci. 179(11), 1663–1677 (2009)

    Article  MATH  Google Scholar 

  23. Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. United Nat. Econ. Comm. Eur. 18, 345–354 (2001)

    Google Scholar 

  24. Pagliuca, D., Seri, G.: Some results of individual ranking method on the system of enterprise acounts annual survey. Esprit SDC Project, Delivrable MI-3/D2 (1999)

    Google Scholar 

  25. R Core Team: R data import/export (2012) http://cran.r-project.org/doc/manuals/R-data.pdf

  26. Reiss, S.: Practical data-swapping: the first steps. In: IEEE Symposium on Security and Privacy. pp. 38–43 (1980)

    Google Scholar 

  27. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  28. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  29. Sweeney, L.: Uniqueness of simple demographics in the U.S. population (2000)

    Google Scholar 

  30. Templ, M., Meindl, B.: Robust statistics meets sdc: New disclosure risk measures for continuous microdata masking. In: Proceedings of the UNESCO Chair in data privacy international conference on Privacy in Statistical Databases. pp. 177–189. Springer (2008)

    Google Scholar 

  31. Templ, M.: Statistical disclosure control for microdata using the r-package sdcmicro. Trans. Data Priv. 1(2), 67–85 (2008)

    MathSciNet  Google Scholar 

  32. Torra, V.: Microaggregation for categorical variables: a median based approach. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 3050, pp. 162–174 (2004)

    Google Scholar 

  33. Torra, V.: Constrained microaggregation: adding constraints for data editing. Trans. Data Priv. 1, 86–104 (2008)

    MathSciNet  Google Scholar 

  34. Torra, V., Ladra, S.: Cluster-specific information loss measures in data privacy: A review. In: Third International Conference on Availability, Reliability and Security, 2008. ARES 08 (2008)

    Google Scholar 

  35. Torra, V., Navarro-Arribas, G.: Data privacy. WIREs Data Mining Knowl Discov (2014). doi:10.1002/widm.1129

  36. Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Springer, Berliin (2001) (Lecture Notes in Statistics)

    Google Scholar 

Download references

Acknowledgments

Partial support by the Spanish MICINN (projects COPRIVACY (TIN2011-27076-C03-03), N-KHRONOUS (TIN2010-15764), and ARES (CONSOLIDER INGENIO 2010 CSD2007-00004)) and by the EC (FP7/2007-2013) Data without Boundaries (grant agreement number 262608) is acknowledged. The work contributed by the first author was carried out as part of the Computer Science Ph.D. program of the Universitat Autónoma de Barcelona (UAB).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Abril .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Abril, D., Navarro-Arribas, G., Torra, V. (2015). Data Privacy with \(R\) . In: Navarro-Arribas, G., Torra, V. (eds) Advanced Research in Data Privacy. Studies in Computational Intelligence, vol 567. Springer, Cham. https://doi.org/10.1007/978-3-319-09885-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09885-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09884-5

  • Online ISBN: 978-3-319-09885-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics