Ordered Data Set Vectorization for Linear Regression on Data Privacy

Medrano-Gracia, Pau; Pont-Tuset, Jordi; Nin, Jordi; Muntés-Mulero, Victor

doi:10.1007/978-3-540-73729-2_34

Pau Medrano-Gracia¹,
Jordi Pont-Tuset¹,
Jordi Nin² &
…
Victor Muntés-Mulero¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4617))

Included in the following conference series:

International Conference on Modeling Decisions for Artificial Intelligence

1537 Accesses
2 Citations

Abstract

Many situations demand from publishing data without revealing the confidential information in it. Among several data protection methods proposed in the literature, those based on linear regression are widely used for numerical data. The main objective of these methods is to minimize both the disclosure risk (DR) and the information lost (IL). However, most of these techniques try to protect the non-confidential attributes based on the values of the confidential attributes in the data set. In this situation, when these two sets of attributes are strongly correlated, the possibility of an intruder to reveal confidential data increases, making these methods unsuitable for many typical scenarios. In this paper we propose a new type of methods called LiROP− k methods that, based on linear regression, avoid the problems derived from the correlation between attributes in the data set. We propose the vectorization, sorting and partitioning of all values in the attributes to be protected in the data set, breaking the semantics of these attributes inside the record. We present two different protection methods: a synthetic protection method called LiROP_s-k and a perturbative method, called LiROP_p-k. We show that, when the attributes in the data set are highly correlated, our methods present lower DR than other protection methods based on linear regression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adam, N.R., Wortmann, J.C.: Security-Control for statistical databases: a comparative study. ACM Computing Surveys 21, 515–556 (1989)
Article Google Scholar
Agrawal, R., Srikant, R.: Privacy Preserving Data Mining. In: Proc. of the ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)
Google Scholar
Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare sdc methods for protection of numerical microdata. European Project IST-2000-25069 CASC (2002), http://neon.vb.cbs.nl/casc
Burridge, J.: Information preserving statistical obfuscation. Statistics and Computing 13, 321–327 (2003)
Article MathSciNet Google Scholar
Dahlquist, G., Björck, A.: Numerical methods. Dover Publications, Mineola (2003)
MATH Google Scholar
Data Extraction System, U.S. Census Bureau: http://www.census.gov/DES/
Domingo-Ferrer, J., Torra, V.: Disclosure Control Methods and Information Loss for Microdata, Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. Elsevier Science, North-Holland, Amsterdam (2001)
Google Scholar
Domingo-Ferrer, J., Torra, V.: A Quantitative Comparison of Disclosure Control Methods for Microdata, Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–133. Elsevier Science, North-Holland, Amsterdam (2001)
Google Scholar
U.S. Energy Information Authority: http://www.eia.doe.gov/cneaf/electricity/page/eia826.html
Torra, V., Domingo-Ferrer, J.: Record linkage methods for multidatabase data mining, Information Fusion in Data Mining, pp. 101–132. Springer, Heidelberg (2003)
Google Scholar
Torra, V., Abowd, J.M., Domingo-Ferrer, J.: Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 233–242. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

DAMA-UPC, Computer Architecture Dept., Universitat Politècnica de Catalunya, Campus Nord UPC, C/Jordi Girona 1-3, 08034 Barcelona, (Catalonia, Spain)
Pau Medrano-Gracia, Jordi Pont-Tuset & Victor Muntés-Mulero
IIIA, Artificial Intelligence Research Institute, CSIC, Spanish National Research Council, Campus UAB s/n, 08193 Bellaterra (Catalonia, Spain)
Jordi Nin

Authors

Pau Medrano-Gracia
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Pont-Tuset
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Nin
View author publications
You can also search for this author in PubMed Google Scholar
Victor Muntés-Mulero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Vicenç Torra Yasuo Narukawa Yuji Yoshida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Medrano-Gracia, P., Pont-Tuset, J., Nin, J., Muntés-Mulero, V. (2007). Ordered Data Set Vectorization for Linear Regression on Data Privacy. In: Torra, V., Narukawa, Y., Yoshida, Y. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2007. Lecture Notes in Computer Science(), vol 4617. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73729-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-540-73729-2_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73728-5
Online ISBN: 978-3-540-73729-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics