Abstract
Healthcare datasets may contain information that participants and data collectors have a vested interest in keeping private. Additionally, social scientists who collect large amounts of medical data value the privacy of their survey participants. As they follow participants through longitudinal studies, they develop unique profiles of these individuals. A growing challenge for these researchers is to maintain the privacy of their study participants, while sharing their data to facilitate research. This chapter evaluates the utility of a differentially private dataset. There has been extensive work, and heightened public and governmental focus on the privacy of medical datasets. However, additional efforts are needed to help researchers and practitioners better understand the fundamental notion of privacy with regards to more recent techniques, like differential privacy. The results of the study align with the theory of differential privacy, showing that dimensionality is a challenge, and that when the number of records in the database is sufficiently larger than the number of cells covered by a database query, the number of statistical tests with results close to those performed on original data, increases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A quasi-identifier is a feature or set of features that is sufficiently correlated with an entity and when combined with other such features create a unique identifier.
References
Aggarwal, C.C.I., Yu, P.S.: A survey of randomization methods for privacy-preserving data mining. In: Privacy-Preserving Data Mining. Advances in Database Systems, vol. 34, pp. 137–156. Springer, New York (2008)
Bredfeldt, C.E., Butani, A.L., Pardee, R., Hitz, P., Padmanabhan, S., Saylor, G.: Managing personal health information in distributed research environments. BMC Med. Inform. Decis. Mak. 13, 116 (2013). http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3851487/
Brown, J., Holmes, J., Shah, K., Hall, K., R., L., Platt, R.: Distributed health data networks. Med. Care 48(6 Suppl), S45–S51 (2010)
Clark, L., Watson, D.: Constructing validity: basic issues in objective scale development. Psychol. Assess. 7(3), 309–319 (1995)
Cramér, H.: Mathematical Methods of Statistics, vol. 9. Princeton University Press, Princeton (1945)
Dankar, F.K., El Emam, K.: The application of differential privacy to health data. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 158–166. ACM (2012)
Dwork, C.: Differential privacy. In: Automata, Languages and Programming, pp. 1–12. Springer, Berlin (2006)
Dwork, C.: Differential privacy: a survey of results. In: Theory and Applications of Models of Computation, pp. 1–19. Springer, Berlin (2008)
El Emam, K., Arbuckle, L.: Anonymizing Health Data, 1st edn. O’Reilly Media, Sebastopol, CA, USA (2013)
El Emam, K., Dankar, F.: Protecting privacy using k-anonymity. J. Am. Med. Inform. Assoc. 15(5), 627–637 (2008)
El Emam, K., Dankar, F., Vaillancourt, R., Roffey, T., Lysyk, M.: Evaluating the risk of re-identification of patients from hospital prescription records. Can. J. Hosp. Pharm. 62(4), 307–319 (2009)
Feldman, D., Fiat, A., Kaplan, H., Nissim, K.: Private coresets. In: Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp. 361–370. ACM (2009)
Fung, B., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. (CSUR) 42(4), 14 (2010)
Gkoulalas-Divanis, A., Loukides, G., Sun, J.: Publishing data from electronic health records while preserving privacy: a survey of algorithms. J. Biomed. Inform. 50, 4–19 (2014)
Golle, P.: Revisiting the uniqueness of simple demographics in the US population. In: Proceedings of the 5th ACM Workshop on Privacy in Electronic Society, WPES ’06, pp. 77–80. ACM, New York (2006). doi:10.1145/1179601.1179615. http://doi.acm.org/10.1145/1179601.1179615
Götz, M., Machanavajjhala, A., Wang, G., Xiao, X., Gehrke, J.: Privacy in search logs (2009, preprint). arXiv:0904.0682
Higgins, J.A., Tanner, A.E., Janssen, E.: Arousal loss related to safer sex and risk of pregnancy: implications for women’s and men’s sexual health. Perspect. Sex. Reprod. Health 41(3), 150–157 (2009)
Hill, R., Hansen, M., Janssen, E., Sanders, S.A., Heiman, J.R., Xiong, L.: A quantitative approach for evaluating the utility of a differentially private behavioral science dataset. In: Proceedings of the IEEE International Conference on Healthcare Informatics. IEEE (2014)
Inan, A., Kantarcioglu, M., Ghinita, G., Bertino, E.: Private record matching using differential privacy. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 123–134. ACM (2010)
Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and clicks privately. In: Proceedings of the 18th International Conference on World wide Web, pp. 171–180. ACM (2009)
Kushida, C.A., Nichols D.A., Jadrnicke, R., Miller, R., Walsh, J.K., Griffin, K.: Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies. Med. Care 50, S82–S101 (2012)
McSherry, F.D.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09, pp. 19–30. ACM, New York (2009). doi:10.1145/1559845.1559850. http://doi.acm.org/10.1145/1559845.1559850
McSherry, F., Mironov, I.: Differentially private recommender systems: building privacy into the net. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 627–636. ACM (2009)
Murphy, S.N., Gainer, V., Mendis, M., Churchill, S., Kohane, I.: Strategies for maintaining patient privacy in i2b2. J. Am. Med. Inform. Assoc. 13(Suppl), 103–108 (2011)
Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: Proceedings of the 2008 IEEE Symposium on Security and Privacy, SP ’08, pp. 111–125. IEEE Computer Society, Washington, DC (2008). doi:10.1109/SP.2008.33. http://dx.doi.org/10.1109/SP.2008.33
Narayanan, A., Shmatikov, V.: Myths and fallacies of personally identifiable information. Commun. ACM 53(6), 24–26 (2010). doi:10.1145/1743546.1743558. http://doi.acm.org/10.1145/1743546.1743558
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001). doi:10.1109/69.971193. http://dx.doi.org/10.1109/69.971193
Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS ’98, p. 188. ACM, New York (1998). doi:10.1145/275487.275508. http://doi.acm.org/10.1145/275487.275508
Solomon, A., Hill, R., Janssen, E., Sanders, S.A., Heiman, J.R.: Uniqueness and how it impacts privacy in health-related social science datasets. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, pp. 523–532. ACM (2012)
Sweeney, L.: Uniqueness of simple demographics in the U.S. population. In: Technical Report: LIDAP WP4, Carnegie Mellon (2000)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Xiao, Y., Xiong, L., Yuan, C.: Differentially private data release through multidimensional partitioning. In: Secure Data Management, pp. 150–168. Springer, Berlin (2010)
Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2011)
Xiao, Y., Xiong, L., Fan, L., Goryczka, S., Li, H.: DPcube: differentially private histogram release through multidimensional partitioning. Transactions on Data Privacy 7(3), 195–222 (2014)
Acknowledgements
This work is funded by NSF grants CNS-1012081.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Hill, R. (2015). Evaluating the Utility of Differential Privacy: A Use Case Study of a Behavioral Science Dataset. In: Gkoulalas-Divanis, A., Loukides, G. (eds) Medical Data Privacy Handbook. Springer, Cham. https://doi.org/10.1007/978-3-319-23633-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-23633-9_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23632-2
Online ISBN: 978-3-319-23633-9
eBook Packages: Computer ScienceComputer Science (R0)