Skip to main content

Co-utile Collaborative Anonymization of Microdata

  • Conference paper
  • First Online:
Book cover Modeling Decisions for Artificial Intelligence (MDAI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9321))

Abstract

In surveys collecting individual data (microdata), each respondent is usually required to report values for a set of attributes. If some of these attributes contain sensitive information, the respondent must trust the collector not to make any inappropriate use of the data and, in case any data are to be publicly released, to properly anonymize them to avoid disclosing sensitive information. If the respondent does not trust the data collector, she may report inaccurately or report nothing at all. The reduce the need for trust, local anonymization is an alternative whereby each respondent anonymizes her data prior to sending them to the data collector. However, local anonymization by each respondent without seeing other respondents’ data makes it hard to find a good trade-off minimizing information loss and disclosure risk. We propose a distributed anonymization approach where users collaborate to attain an appropriate level of disclosure protection (and, thus, of information loss). Under our scheme, the final anonymized data are only as accurate as the information released by each respondent; hence, no trust needs to be assumed towards the data collector or any other respondent. Further, if respondents are interested in forming an accurate data set, the proposed collaborative anonymization protocols are self-enforcing and co-utile.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, S., Haritsa, J.R.: A framework for high-accuracy privacy-preserving mining. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 193–204. IEEE (2005)

    Google Scholar 

  2. Dingledine, R., Mathewson, N., Syverson, P.: Tor: The second-generation onion router. Naval Research Lab, Washington DC (2004)

    Google Scholar 

  3. Domingo-Ferrer, J., Muralidhar, K.: New directions in anonymization: permutation paradigm, verifiability by subjects and intruders, transparency to users. CoRR, abs/1501.04186 (2015)

    Google Scholar 

  4. Domingo-Ferrer, J., Soria-Comas, J., Ciobotaru, O.: Co-utility: self-enforcing protocols without coordination mechanisms. In: Proceeding of the 5th International Conference on Industrial Engineering and Operations Management (IEOM 2015), pp. 1–7. IEEE (2015)

    Google Scholar 

  5. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  6. Goldreich, O.: Foundations of Cryptography. Basic Tools, vol. 1. Cambridge University Press, Cambridge (2001)

    Book  MATH  Google Scholar 

  7. Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K., de Wolf, P.-P.: Statistical Disclosure Control. Wiley, Chichester (2012)

    Book  Google Scholar 

  8. Jiang, W., Clifton, C.: Privacy-preserving distributed k-anonymity. In: Jajodia, S., Wijesekera, D. (eds.) Data and Applications Security 2005. LNCS, vol. 3654, pp. 166–177. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Jiang, W., Clifton, C.: A secure distributed framework for achieving k-anonymity. VLDB J. 15(4), 316–333 (2006)

    Article  Google Scholar 

  10. Jurczyk, P., Xiong, L.: Distributed anonymization: achieving privacy for both data subjects and data providers. In: Gudes, E., Vaidya, J. (eds.) Data and Applications Security XXIII. LNCS, vol. 5645, pp. 191–207. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  11. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD 2005), pp. 49–60. ACM, New York (2005)

    Google Scholar 

  12. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22Nd International Conference on Data Engineering (ICDE 2006). IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  13. Muralidhar, K., Sarathy, R., Domingo-Ferrer, J.: Reverse mapping to preserve the marginal distributions of attributes in masked microdata. In: Domingo-Ferrer, J. (ed.) PSD 2014. LNCS, vol. 8744, pp. 105–116. Springer, Heidelberg (2014)

    Google Scholar 

  14. Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 1998), p. 188. ACM (1998)

    Google Scholar 

  15. Song, C., Ge, T.: Aroma: a new data protection method with differential privacy and accurate query answering. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM 2014), pp. 1569–1578. ACM, New York (2014)

    Google Scholar 

  16. Soria-Comas, J., Domingo-Ferrer, J.: Probabilistic k-anonymity through microaggregation and data swapping. In: Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2012), pp. 1–8. IEEE (2012)

    Google Scholar 

  17. Wang, K., Fung, B.C.M., Dong, G.: Integrating private databases for data analysis. In: Kantor, P., Muresan, G., Roberts, F., Zeng, D.D., Wang, F.-Y., Chen, H., Merkle, R.C. (eds.) ISI 2005. LNCS, vol. 3495, pp. 171–182. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  18. Warner, S.L.: Randomized response: a survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60(309), 63–69 (1965)

    Article  Google Scholar 

  19. Xiao, X., Tao, Y.: Anatomy: simple and effective privacy preservation. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB 2006), pp. 139–150 (2006)

    Google Scholar 

  20. Xiao, X., Tao, Y.: Personalized privacy preservation. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD 2006), New York, NY, USA, pp. 229–240 (2006)

    Google Scholar 

Download references

Acknowledgments and Disclaimer

The following funding sources are gratefully acknowledged: Templeton World Charity Foundation (grant TWCF0095/AB60 “CO-UTILITY”), Government of Catalonia (ICREA Acadèmia Prize to the second author and grant 2014 SGR 537), Spanish Government (project TIN2011-27076-C03-01 “CO-PRIVACY”), European Commission (projects FP7 “DwB”, FP7 “Inter-Trust” and H2020 “CLARUS”). The second author leads the UNESCO Chair in Data Privacy. The views in this paper are the authors’ own and do not necessarily reflect the views of the Templeton World Charity Foundation or UNESCO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jordi Soria-Comas .

Editor information

Editors and Affiliations

Appendices

A k-Anonymity

k-Anonymity [14] is a privacy model that seeks to thwart re-identification of anonymized records. Central to k-anonymity is the notion of quasi-identifier attributes, also known as key attributes. Quasi-identifiers are attributes that, when considered separately, do not identify the respondent behind a record, but, which used in combination may allow an attacker to uniquely link that record to an external database containing identifiers (this database is the attacker’s background knowledge). Such a unique linkage is called re-identification.

With the above setting in mind, k-anonymity can be defined as follows.

Definition 1

( k-Anonymity). A protected data set is said to satisfy k-anonymity for \(k > 1\) if, for each combination of values of quasi-identifier attributes, at least k records exist in the data set sharing that combination.

If the quasi-identifiers considered by the data protector to enforce k-anonymity coincide with the quasi-identifiers that an attacker can use to link with his background knowledge, then k-anonymity reduces the probability of successful re-identification to 1 / k.

Of course, which attributes should be labeled as quasi-identifiers is debatable. At the very least, attributes that can be found in a public non-de-identified data sets (e.g. electoral rolls, phonebooks, etc.) must be taken as quasi-identifiers. However, this is not enough to prevent re-identification by attackers with additional knowledge.

B Reverse Mapping

Reverse mapping [3, 13] is a post-masking technique that can be applied to any anonymized data set. The result is a reverse-mapped data set, constructed by taking each attribute of the anonymized data set at a time, and replacing the value of each record by the value in the original data set with equal rank.

Thus, reverse mapping requires knowing the marginal distribution of each of the attributes in the original data set. Hence, if the data collector wants to allow reverse mapping by parties other than himself, he must release those marginal distributions. And, for those distributions to be releasable, they must be assumed to be non-disclosive. The good news is that this is quite a reasonable assumption, as the distribution of an attribute essentially conveys statistical information (it is, in principle, unrelated to any specific individual). For the extreme cases in which a single value can be associated to a specific individual (e.g. the turnover of the largest company in a specific sector), prior masking of the marginal distribution would be needed (e.g. by top coding it).

The interesting point about the reverse mapping transformation is that it allows viewing any microdata anonymization method as being functionally equivalent to permutation (mapping the original data set to the reverse-mapped data set) plus a small amount of noise (mapping the reverse-mapped data set to the anonymized data set). The noise is necessarily small because it does not modify the ranks of the values: by construction, ranks in the reverse-mapped and the anonymized data set are the same. Therefore, the essential anonymization principle turns out to be permutation.

C Co-utility

Consider a set of self-interested peers (having each a utility function, that is, a specific goal or a defined preference relation between a set of possible outcomes) that act strategically (each peer acts to seek an outcome that maximizes her utility, according to her knowledge of the environment).

Co-utility [4] models a kind of interaction between the peers in which it is in the best interest of each of them to help another peer in reaching her goal. The primary advantage of a co-utile system is that it does not require any external mechanism to enforce a particular outcome or coordinate the actions of the peers.

Co-utility can be formalized using game theory. To guarantee a specific interaction outcome without external enforcement, the outcome must be self-enforcing; in game-theoretic terms, it must be an equilibrium. An outcome is an equilibrium if no agent (peer) has incentives to change her strategy in that outcome; in other words, provided that all other agents keep their strategies unchanged, no agent can increase her utility by modifying her strategy.

If the utility of an outcome for an agent depended on the preferences of another agent, attaining an equilibrium would require each agent to report her preferences. On one side, this would increase the complexity of the system, because an agent may report untruthful preferences if she believes that doing so is going to yield a better outcome for her. On the other side, gathering all agents’ preferences by a specific party or agent should be avoided if we want a truly distributed interaction. Following the above rationale, we define games amenable to co-utility as those in which the utility of each agent is independent of the preferences of other agents.

Definition 2

(Co-utility amenable game). Let G be a sequential Bayesian game for n agents. We say that G is a co-utility-amenable game if the utility of any agent is independent of the types of the other agents, i.e., \(\forall \, i,j\), with \(i \ne j\) and \(\forall \, t_j, t'_j \in T_j\), we have that \(u_i(s_1, \ldots , s_j, \ldots , s_n, t_1, \ldots , t_j, \ldots , t_n) = u_i(s_1, \ldots , s_j, \ldots , s_n, t_1, \ldots , t'_j, \ldots , t_n)\).

Having defined a co-utility amenable game, we are ready to define when a protocol P that produces as output a strategy profile of the game is co-utile. An agent can be reluctant to play a strategy that is beneficial to herself if the strategy provides a much larger benefit to another agent. Because of that, different levels of co-utility can be distinguished, depending on whether agents maximize or just increase their utility by following the protocol. In strict co-utility each agent maximizes her utility and, thus, there is no reason for any agent to not to follow the protocol.

Definition 3

(Strict co-utility). Let G be a co-utility amenable game for n agents. Let P be a self-enforcing protocol for G. We say P is a strictly co-utile protocol if \(\forall \, i \in \{1, \ldots , n\}\), and \(\forall \, s'_1 \in S_1, \ldots , s'_n \in S_n\) and \(\forall \, t_1 \in T_1, \ldots , t_n \in T_n\), we have that \(u_i(s_1,\ldots , s_n,t_1,\ldots ,t_n) \ge u_i(s'_1,\ldots , s'_n,t_1,\ldots , t_n)\), where the outcome of P is \((s_1,\ldots , s_n)\).

Designing co-utile protocols is usually a matter of finding a group of peers with a sufficiently aligned set of preferences. As the focus of this paper is on data anonymization, we consider a set of privacy-conscious peers that are required to report some data, for instance, to answer a certain survey. Now, a rational privacy-conscious peer will report false data or report no data at all unless she has some interest in the pooled responses by all peer to be as accurate as possible. Hence, we will make the assumption that all peers are interested in obtaining an accurate data set.

Under the above assumption, one possible approach to designing a co-utile protocol can be based on each peer hiding within a group of peers when reporting her record. Note that hiding one’s identity when reporting one’s record helps other peers in the group to hide their own identities. Conversely, it is hard to hide in a group where none of the other members is anonymous.

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Soria-Comas, J., Domingo-Ferrer, J. (2015). Co-utile Collaborative Anonymization of Microdata. In: Torra, V., Narukawa, T. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2015. Lecture Notes in Computer Science(), vol 9321. Springer, Cham. https://doi.org/10.1007/978-3-319-23240-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23240-9_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23239-3

  • Online ISBN: 978-3-319-23240-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics