Skip to main content

Evaluating the Utility of Differential Privacy: A Use Case Study of a Behavioral Science Dataset

  • Chapter
Medical Data Privacy Handbook
  • 2691 Accesses

Abstract

Healthcare datasets may contain information that participants and data collectors have a vested interest in keeping private. Additionally, social scientists who collect large amounts of medical data value the privacy of their survey participants. As they follow participants through longitudinal studies, they develop unique profiles of these individuals. A growing challenge for these researchers is to maintain the privacy of their study participants, while sharing their data to facilitate research. This chapter evaluates the utility of a differentially private dataset. There has been extensive work, and heightened public and governmental focus on the privacy of medical datasets. However, additional efforts are needed to help researchers and practitioners better understand the fundamental notion of privacy with regards to more recent techniques, like differential privacy. The results of the study align with the theory of differential privacy, showing that dimensionality is a challenge, and that when the number of records in the database is sufficiently larger than the number of cells covered by a database query, the number of statistical tests with results close to those performed on original data, increases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A quasi-identifier is a feature or set of features that is sufficiently correlated with an entity and when combined with other such features create a unique identifier.

References

  1. Aggarwal, C.C.I., Yu, P.S.: A survey of randomization methods for privacy-preserving data mining. In: Privacy-Preserving Data Mining. Advances in Database Systems, vol. 34, pp. 137–156. Springer, New York (2008)

    Google Scholar 

  2. Bredfeldt, C.E., Butani, A.L., Pardee, R., Hitz, P., Padmanabhan, S., Saylor, G.: Managing personal health information in distributed research environments. BMC Med. Inform. Decis. Mak. 13, 116 (2013). http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3851487/

  3. Brown, J., Holmes, J., Shah, K., Hall, K., R., L., Platt, R.: Distributed health data networks. Med. Care 48(6 Suppl), S45–S51 (2010)

    Google Scholar 

  4. Clark, L., Watson, D.: Constructing validity: basic issues in objective scale development. Psychol. Assess. 7(3), 309–319 (1995)

    Article  Google Scholar 

  5. Cramér, H.: Mathematical Methods of Statistics, vol. 9. Princeton University Press, Princeton (1945)

    MATH  Google Scholar 

  6. Dankar, F.K., El Emam, K.: The application of differential privacy to health data. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 158–166. ACM (2012)

    Google Scholar 

  7. Dwork, C.: Differential privacy. In: Automata, Languages and Programming, pp. 1–12. Springer, Berlin (2006)

    Google Scholar 

  8. Dwork, C.: Differential privacy: a survey of results. In: Theory and Applications of Models of Computation, pp. 1–19. Springer, Berlin (2008)

    Google Scholar 

  9. El Emam, K., Arbuckle, L.: Anonymizing Health Data, 1st edn. O’Reilly Media, Sebastopol, CA, USA (2013)

    Google Scholar 

  10. El Emam, K., Dankar, F.: Protecting privacy using k-anonymity. J. Am. Med. Inform. Assoc. 15(5), 627–637 (2008)

    Article  Google Scholar 

  11. El Emam, K., Dankar, F., Vaillancourt, R., Roffey, T., Lysyk, M.: Evaluating the risk of re-identification of patients from hospital prescription records. Can. J. Hosp. Pharm. 62(4), 307–319 (2009)

    Google Scholar 

  12. Feldman, D., Fiat, A., Kaplan, H., Nissim, K.: Private coresets. In: Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp. 361–370. ACM (2009)

    Google Scholar 

  13. Fung, B., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. (CSUR) 42(4), 14 (2010)

    Google Scholar 

  14. Gkoulalas-Divanis, A., Loukides, G., Sun, J.: Publishing data from electronic health records while preserving privacy: a survey of algorithms. J. Biomed. Inform. 50, 4–19 (2014)

    Article  Google Scholar 

  15. Golle, P.: Revisiting the uniqueness of simple demographics in the US population. In: Proceedings of the 5th ACM Workshop on Privacy in Electronic Society, WPES ’06, pp. 77–80. ACM, New York (2006). doi:10.1145/1179601.1179615. http://doi.acm.org/10.1145/1179601.1179615

  16. Götz, M., Machanavajjhala, A., Wang, G., Xiao, X., Gehrke, J.: Privacy in search logs (2009, preprint). arXiv:0904.0682

    Google Scholar 

  17. Higgins, J.A., Tanner, A.E., Janssen, E.: Arousal loss related to safer sex and risk of pregnancy: implications for women’s and men’s sexual health. Perspect. Sex. Reprod. Health 41(3), 150–157 (2009)

    Article  Google Scholar 

  18. Hill, R., Hansen, M., Janssen, E., Sanders, S.A., Heiman, J.R., Xiong, L.: A quantitative approach for evaluating the utility of a differentially private behavioral science dataset. In: Proceedings of the IEEE International Conference on Healthcare Informatics. IEEE (2014)

    Book  Google Scholar 

  19. Inan, A., Kantarcioglu, M., Ghinita, G., Bertino, E.: Private record matching using differential privacy. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 123–134. ACM (2010)

    Google Scholar 

  20. Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and clicks privately. In: Proceedings of the 18th International Conference on World wide Web, pp. 171–180. ACM (2009)

    Google Scholar 

  21. Kushida, C.A., Nichols D.A., Jadrnicke, R., Miller, R., Walsh, J.K., Griffin, K.: Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies. Med. Care 50, S82–S101 (2012)

    Article  Google Scholar 

  22. McSherry, F.D.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09, pp. 19–30. ACM, New York (2009). doi:10.1145/1559845.1559850. http://doi.acm.org/10.1145/1559845.1559850

  23. McSherry, F., Mironov, I.: Differentially private recommender systems: building privacy into the net. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 627–636. ACM (2009)

    Google Scholar 

  24. Murphy, S.N., Gainer, V., Mendis, M., Churchill, S., Kohane, I.: Strategies for maintaining patient privacy in i2b2. J. Am. Med. Inform. Assoc. 13(Suppl), 103–108 (2011)

    Article  Google Scholar 

  25. Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: Proceedings of the 2008 IEEE Symposium on Security and Privacy, SP ’08, pp. 111–125. IEEE Computer Society, Washington, DC (2008). doi:10.1109/SP.2008.33. http://dx.doi.org/10.1109/SP.2008.33

  26. Narayanan, A., Shmatikov, V.: Myths and fallacies of personally identifiable information. Commun. ACM 53(6), 24–26 (2010). doi:10.1145/1743546.1743558. http://doi.acm.org/10.1145/1743546.1743558

  27. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001). doi:10.1109/69.971193. http://dx.doi.org/10.1109/69.971193

  28. Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS ’98, p. 188. ACM, New York (1998). doi:10.1145/275487.275508. http://doi.acm.org/10.1145/275487.275508

  29. Solomon, A., Hill, R., Janssen, E., Sanders, S.A., Heiman, J.R.: Uniqueness and how it impacts privacy in health-related social science datasets. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, pp. 523–532. ACM (2012)

    Google Scholar 

  30. Sweeney, L.: Uniqueness of simple demographics in the U.S. population. In: Technical Report: LIDAP WP4, Carnegie Mellon (2000)

    Google Scholar 

  31. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)

    Google Scholar 

  32. Xiao, Y., Xiong, L., Yuan, C.: Differentially private data release through multidimensional partitioning. In: Secure Data Management, pp. 150–168. Springer, Berlin (2010)

    Google Scholar 

  33. Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2011)

    Article  Google Scholar 

  34. Xiao, Y., Xiong, L., Fan, L., Goryczka, S., Li, H.: DPcube: differentially private histogram release through multidimensional partitioning. Transactions on Data Privacy 7(3), 195–222 (2014)

    MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is funded by NSF grants CNS-1012081.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raquel Hill .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Hill, R. (2015). Evaluating the Utility of Differential Privacy: A Use Case Study of a Behavioral Science Dataset. In: Gkoulalas-Divanis, A., Loukides, G. (eds) Medical Data Privacy Handbook. Springer, Cham. https://doi.org/10.1007/978-3-319-23633-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23633-9_4

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23632-2

  • Online ISBN: 978-3-319-23633-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics