Provable De-anonymization of Large Datasets with Sparse Dimensions

Datta, Anupam; Sharma, Divya; Sinha, Arunesh

doi:10.1007/978-3-642-28641-4_13

Anupam Datta¹⁸,
Divya Sharma¹⁸ &
Arunesh Sinha¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 7215))

Included in the following conference series:

International Conference on Principles of Security and Trust

1469 Accesses
18 Citations

Abstract

There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-data about individuals, e.g., their preferences, movie ratings, or transaction data. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of the Narayanan-Shmatikov algorithm that was used to effectively de-anonymize the Netflix database of movie ratings. We prove theorems characterizing mathematical properties of the database and the auxiliary information available to the adversary that enable two classes of privacy attacks. In the first attack, the adversary successfully identifies the individual about whom she possesses auxiliary information (an isolation attack). In the second attack, the adversary learns additional information about the individual, although she may not be able to uniquely identify him (an information amplification attack). We demonstrate the applicability of the analytical results by empirically verifying that the mathematical properties assumed of the database are actually true for a significant fraction of the records in the Netflix movie ratings database, which contains ratings from about 500,000 users.

Download to read the full chapter text

Chapter PDF

De-anonymization of Heterogeneous Random Graphs in Quasilinear Time

Article 15 November 2017

De-anonymization of Heterogeneous Random Graphs in Quasilinear Time

Scalable non-deterministic clustering-based k-anonymization for rich networks

Article 21 May 2018

Keywords

References

PACER- Public Access to Court Electronic Records, http://www.pacer.gov (last accessed December 16, 2011)
Barbaro, M., Zeller, T.: A Face Is Exposed for AOL Searcher No. 4417749. New York Times (August 09, 2006), http://www.nytimes.com/2006/08/09/technology/09aol.html?pagewanted=all
Boreale, M., Pampaloni, F., Paolini, M.: Quantitative Information Flow, with a View. In: Atluri, V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 588–606. Springer, Heidelberg (2011)
Chapter Google Scholar
Dalenius, T.: Towards a methodology for statistical disclosure control. Statistics Tidskrift 15, 429–444 (1977)
Google Scholar
Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Chapter Google Scholar
Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008), http://dl.acm.org/citation.cfm?id=1791834.1791836
Chapter Google Scholar
Frankowski, D., Cosley, D., Sen, S., Terveen, L., Riedl, J.: You are What You Say: Privacy Risks of Public Mentions. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 565–572. ACM, New York (2006), http://doi.acm.org/10.1145/1148170.1148267
Chapter Google Scholar
Hafner, K.: And if You Liked the Movie, a Netflix Contest May Reward You Handsomely. New York Times (October 02, 2006), http://www.nytimes.com/2006/10/02/technology/02netflix.html
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 106–115 (April 2007)
Google Scholar
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1 (March 2007), http://doi.acm.org/10.1145/1217299.1217302
Narayanan, A., Shmatikov, V.: Robust De-anonymization of Large Sparse Datasets. In: Proceedings of the 2008 IEEE Symposium on Security and Privacy, pp. 111–125. IEEE Computer Society, Washington, DC (2008), http://dl.acm.org/citation.cfm?id=1397759.1398064
Google Scholar
Narayanan, A., Shmatikov, V.: Myths and fallacies of personally identifiable information. Communications of the ACM 53, 24–26 (2010)
Article Google Scholar
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. on Knowl. and Data Eng. 13, 1010–1027 (2001), http://dl.acm.org/citation.cfm?id=627337.628183
Article Google Scholar
Schwarz, H.A.: ber ein Flchen kleinsten Flcheninhalts betreffendes Problem der Variationsrechnung. Acta Societatis Scientiarum Fennicae XV, 318 (1888)
Google Scholar
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertainty, Fuzziness and Knowledge-Based System 10, 571–588 (2002), http://dl.acm.org/citation.cfm?id=774544.774553
Article MathSciNet MATH Google Scholar
Sweeney, L.: k-anonymity: a Model for Protecting Privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 557–570 (2002), http://dl.acm.org/citation.cfm?id=774544.774552
Article MathSciNet MATH Google Scholar
Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 689–700. ACM, New York (2007), http://doi.acm.org/10.1145/1247480.1247556
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, USA
Anupam Datta, Divya Sharma & Arunesh Sinha

Authors

Anupam Datta
View author publications
You can also search for this author in PubMed Google Scholar
Divya Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Arunesh Sinha
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Informatica, Università di Pisa, Largo Bruno Pontecorvo, 3, 56127, Pisa, Italy
Pierpaolo Degano
Computer Science, Worcester Polytechnic Institute, 100 Institute Road, 01609, Worcester, MA, USA
Joshua D. Guttman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Datta, A., Sharma, D., Sinha, A. (2012). Provable De-anonymization of Large Datasets with Sparse Dimensions. In: Degano, P., Guttman, J.D. (eds) Principles of Security and Trust. POST 2012. Lecture Notes in Computer Science, vol 7215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28641-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-28641-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28640-7
Online ISBN: 978-3-642-28641-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Provable De-anonymization of Large Datasets with Sparse Dimensions

Abstract

Chapter PDF

Similar content being viewed by others

De-anonymization of Heterogeneous Random Graphs in Quasilinear Time

De-anonymization of Heterogeneous Random Graphs in Quasilinear Time

Scalable non-deterministic clustering-based k-anonymization for rich networks

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Provable De-anonymization of Large Datasets with Sparse Dimensions

Abstract

Chapter PDF

Similar content being viewed by others

De-anonymization of Heterogeneous Random Graphs in Quasilinear Time

De-anonymization of Heterogeneous Random Graphs in Quasilinear Time

Scalable non-deterministic clustering-based k-anonymization for rich networks

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation