Skip to main content

A Privacy-Preserving Framework for Integrating Person-Specific Databases

  • Conference paper
Privacy in Statistical Databases (PSD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5262))

Included in the following conference series:

Abstract

Many organizations capture personal information, but the quantity of records needed to detect statistically significant patterns is often beyond the grasp of a single data collector. In the biomedical realm, this problem has pressed regulatory agencies to require funded investigators to share research-derived data to public repositories. The challenge; however, is that shared records must not reveal the identity of the subjects. In this paper, we extend a secure framework in which data holders contribute and query encrypted person-specific data stored on a third party’s server. Specifically, we develop protocols that enable data holders to merge personal records, thus creating larger profiles and diminishing duplication. The repository administrator can merge records via encrypted identifiers without decrypting or inferring the contents of the joined records. Our model is more practical than prior secure join methods because each data holder needs only a single interaction with the central repository. We further present an extension to the protocol that permits the revelation of k-anonymous demographics, such that the administrator can perform joins more efficiently with the guarantee that each record can be linked to no less than k individuals in the population. We prove the privacy preserving features of our protocols and experimentally evaluate their efficiency in a real world Census dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. National Institutes of Health: Final NIH statement on sharing research data. NOT-OD-03-032 (2003)

    Google Scholar 

  2. National Institutes of Health: Genome-wide studies in biorepositories with electronic medical record data. RFA-HG-07-05 (2007)

    Google Scholar 

  3. National Institutes of Health: Policy for sharing of data obtained in nih supported or conducted genome-wide association studies. NOT-OD-07-88 (2007)

    Google Scholar 

  4. Benkner, S., Berti, G., Engelbrecht, G., Fingberg, J., Kohring, G., Middleton, S., Schmidt, R.: Gemss: grid-infrastructure for medical service provision. Methods of Information in Medicine 44, 177–181 (2005)

    Google Scholar 

  5. Anonymous: Medicine’s new central bankers. The Economist (2005)

    Google Scholar 

  6. Barbour, V.: UK Biobank: a project in search of a protocol? Lancet 361, 1734–1738 (2003)

    Article  Google Scholar 

  7. Kantarcioglu, M., Jiang, W., Liu, Y., Malin, B.: A cryptographic approach to securely share and query genomic sequences. IEEE Transactions on Information Technology in Biomedicine (in press, 2008)

    Google Scholar 

  8. Malin, B., Sweeney, L.: How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. Journal of Biomedical Informatics 37, 179–192

    Google Scholar 

  9. Helliker, K.: A new medical worry: identity thieves find ways to target hospital patients. Wall Street Journal (2005)

    Google Scholar 

  10. Quantin, C., Allaert, F., Avillach, P., Fassa, M., Riandey, B., Trouessin, G., Cohen, O.: Building application-related patient identifiers: what solution for a european country? Int. J. Telemed Appl., 678302 (2008)

    Google Scholar 

  11. Grannis, S., Overhage, J., McDonald, C.: Analysis of identifier performance using a deterministic linkage algorithm. In: Proceedings of the 2002 American Medical Informatics Annual Fall Symposium, pp. 305–309 (2002)

    Google Scholar 

  12. Berman, J.: Zero-check: a zero-knowledge protocol for reconciling patient identities across institutions. Archives of Pathology and Laboratory Medicine 128, 344–346 (2004)

    Google Scholar 

  13. Sweeney, L.: k-Anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 557–570 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  14. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13, 1010–1027 (2001)

    Article  Google Scholar 

  15. Clifton, C., Kantarcioglu, M., Foan, A., Schadow, G., Vaidya, J., Elmagarmid, A.: Privacy-preserving data integration and sharing. In: Proc. of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2004)

    Google Scholar 

  16. Bhowmick, S., Gruenwald, L., Iwaihara, M., Chatvichienchai, S.: Private-iye: A framework for privacy preserving data integration. In: Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDEW 2006). IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  17. Scannapieco, M., Figotin, I., Bertino, E., Elmagarmid, A.: Privacy preserving schema and data matching. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (2007)

    Google Scholar 

  18. Agrawal, R., Asonov, D., Kantarcioglu, M., Li, Y.: Sovereign joins. In: ICDE 2006: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006). IEEE Computer Society, Washington (2006)

    Google Scholar 

  19. Kissner, L., Song, D.: Privacy preserving set operations. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 241–257. Springer, Heidelberg (2005)

    Google Scholar 

  20. Freedman, M.J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Eurocrypt 2004, Interlaken, Switzerland, International Association for Cryptologic Research (IACR) (2004)

    Google Scholar 

  21. Emekci, F., Agrawal, D., El Abbadi, A., Gulbeden, A.: Privacy preserving query processing using third parties. In: Proceedings of ICDE 2006, Atlanta, GA (2006)

    Google Scholar 

  22. Pon, R., Critchlow, T.: Performance-oriented privacy-preserving data integration. In: Data Integration in the Life Sciences, pp. 240–256. Springer, Heidelberg (2005)

    Google Scholar 

  23. Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: Proceedings of the 24th Int’l Conf. on Data Engineering - ICDE 2008 (2008)

    Google Scholar 

  24. Goldreich, O.: General Cryptographic Protocols. In: The Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  25. Blake, C., Merz, C.: UCI repository of machine learning databases (1998)

    Google Scholar 

  26. Sweeney, L.: Guaranteeing anonymity when sharing medical data, the datafly system. In: Proceedings of the 1997 American Medical Informatics Association Annual Fall Symposium, pp. 51–55 (1997)

    Google Scholar 

  27. IBM: IBM PCI cryptographic coprocessor (2004), http://www.ibm.com/security/cryptocards/html/pcicc.shtml

  28. Paillier, P.: Public key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)

    Google Scholar 

  29. Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 571–588 (2002)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Josep Domingo-Ferrer Yücel Saygın

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kantarcioglu, M., Jiang, W., Malin, B. (2008). A Privacy-Preserving Framework for Integrating Person-Specific Databases. In: Domingo-Ferrer, J., Saygın, Y. (eds) Privacy in Statistical Databases. PSD 2008. Lecture Notes in Computer Science, vol 5262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87471-3_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87471-3_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87470-6

  • Online ISBN: 978-3-540-87471-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics