Skip to main content

Preserving Genome Privacy in Research Studies

  • Chapter
Medical Data Privacy Handbook

Abstract

As the cost of genome sequencing continues to fall, whole genome sequencing data have become a viable alternative for improving diagnostic accuracy and supporting personalized medicine. Although they have the potential to advance public health and accelerate scientific discoveries, massive collections of genomic data also raise significant concerns about individual privacy. Like traditional clinical information, human genomes may reveal information about individuals (e.g., identity, ethnic group, disease association, predisposition to diseases such as diabetes or cancer, etc.) Even more concerning is the fact that the information is shared with ancestors and descendants, and thus loss of privacy may put the privacy of the entire family at risk. Genome privacy is a big challenge for the entire biomedical community, particularly since scientific discoveries depend on data sharing and obfuscation of data is not a good option to protect privacy. Multiple factors are involved in genomic privacy research. The components that can be used to better protect genome privacy include, but are not limited to, legal, ethical and technical aspects, e.g., federal laws, policies and regulations, informed consent policies, data use agreements, secure data repositories, as well as privacy-preserving data analysis methods. However, genome privacy challenges cannot be addressed by any single component alone. We envision that better privacy protection can be achieved through the incorporation of multiple components. The goal of this chapter to introduce the state-of-the-art in genome privacy research. This chapter begins with an introduction of genome privacy followed by an overview of the legal, ethical and technical aspects of genome privacy. After formalizing the genome privacy problem, we will review existing attack models on genomic data. The techniques for mitigating these attacks are discussed. This chapter concludes with the discussion of the challenges and the future directions in genome privacy research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Shuang Wang and Xiaoqian Jiang share the first authorship.

References

  1. Howe, D., Costanzo, M., Fey, P., et al.: Big data: the future of biocuration. Nature 455, 47–50 (2008). http://dx.doi.org/10.1038/455047a. Accessed 11 Jul 2014

  2. HiSeq X Ten.: 1000 dollar genome sequencing. http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn. Accessed 11 Jul 2014

  3. Abecasis, G.R., Auton, A., Brooks, L.D., et al.: An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012). doi:10.1038/nature11632

    Article  Google Scholar 

  4. Fu, W., O’Connor, T.D., Jun, G., et al.: Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013). doi:10.1038/nature11690

    Article  Google Scholar 

  5. Park, J.-H., Wacholder, S., Gail, M.H., et al.: Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat. Genet. 42, 570–575. (2010). doi:10.1038/ng.610

    Article  Google Scholar 

  6. Marx, V.: Biology: the big challenges of big data. Nature 498, 255–260 (2013). doi:10.1038/498255a

    Article  Google Scholar 

  7. Bradbury, A.R., Dignam, J.J., Ibe, C.N., et al. How often do BRCA mutation carriers tell their young children of the family’s risk for cancer? a study of parental disclosure of BRCA mutations to minors and young adults. J. Clin. Oncol. 25, 3705–3711 (2007). doi:10.1200/JCO.2006.09.1900

    Article  Google Scholar 

  8. Willard, H.F., Angrist, M., Ginsburg, G.S.: Genomic medicine: genetic variation and its impact on the future of health care. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360, 1543–1550 (2005). doi:10.1098/rstb.2005.1683

    Article  Google Scholar 

  9. Pulley, J.M., Denny, J.C., Peterson, J.F., et al.: Operational implementation of prospective genotyping for personalized medicine: the design of the Vanderbilt PREDICT project. Clin. Pharmacol. Ther. 92, 87–95 (2012). doi:10.1038/clpt.2011.371

    Article  Google Scholar 

  10. Collins, F.S., Varmus, H.: A new initiative on precision medicine. N. Engl. J. Med. 372, 793–795 (2015)

    Article  Google Scholar 

  11. Visscher, P.M., Brown, M.A., McCarthy, M.I., et al.: Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012). doi:10.1016/j.ajhg.2011.11.029

    Article  Google Scholar 

  12. Mailman, M.D., Feolo, M., Jin, Y., et al.: The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet. 39, 1181–1186 (2007). doi:10.1038/ng1007-1181

    Article  Google Scholar 

  13. NIH Genomic Data Sharing Policy.: http://gds.nih.gov/03policy2.html (2014)

  14. Lin, Z., Owen, A.B., Altman, R.B.: Genetics. Genomic research and human subject privacy. Science 305, 183 (2004). doi:10.1126/science.1095019

  15. Homer, N., Szelinger, S., Redman, M., et al.: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4, e1000167 (2008)

    Article  Google Scholar 

  16. Gymrek, M., McGuire, A.L., Golan, D., et al.: Identifying personal genomes by surname inference. Science 339, 321–324 (2013)

    Article  Google Scholar 

  17. Nyholt, D.R., Yu, C.-E., Visscher, P.M.: On Jim Watson’s APOE status: genetic information is hard to hide. Eur. J. Hum. Genet. 17, 147–149 (2009). doi:10.1038/ejhg.2008.198

    Article  Google Scholar 

  18. Wang, R., Li, Y.F., Wang, X., et al.: Learning your identity and disease from research papers. In: Proceedings of the 16th ACM Conference on Computer and Communications Security - CCS ’09, vol. 534. ACM Press, New York (2009). doi:10.1145/1653662.1653726

  19. Humbert, M., Ayday, E., Hubaux, J.-P., et al.: Addressing the concerns of the Lacks family: quantification of kin genomic privacy. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security - CCS ’13, pp. 1141–1152. ACM Press, New York (2013). doi:10.1145/2508859.2516707

  20. Genetic Information Nondiscrimination Act.: (2008), http://www.eeoc.gov/laws/statutes/gina.cfm. Accessed 11 Jul 2014

  21. McGuire, A.L., Caulfield, T., Cho, M.K.: Research ethics and the challenge of whole-genome sequencing. Nat. Rev. Genet. 9, 152–156 (2008). doi:10.1038/nrg2302

    Article  Google Scholar 

  22. Caulfield, T., McGuire, A.L., Cho, M., et al.: Research ethics recommendations for whole-genome research: consensus statement. PLoS Biol. 6, e73 (2008). doi:10.1371/journal.pbio.0060073

    Article  Google Scholar 

  23. Sankararaman, S., Obozinski, G., Jordan, M.I., et al.: Genomic privacy and limits of individual detection in a pool. Nat. Genet. 41, 965–967 (2009). http://dx.doi.org/10.1038/ng.436. Accessed 18 Apr 2014

  24. Amsterdam Workshop on Genome Privacy. http://seclab.soic.indiana.edu/GenomePrivacy (2014)

  25. 2014 iDASH Genome Privacy Protection Challenge Workshop. http://www.humang enomeprivacy.org/2014 (2014)

  26. 2015 iDASH Privacy and Security Workshop. http://www.humangenomeprivacy.org/2015/. Accessed 02 Jan, 2015

  27. Dwork, C.: Differential privacy. Int. Colloq. Autom. Lang. Program. 405, 2:1–2:12 (2006)

    Google Scholar 

  28. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10, 557–570 (2002)

    Google Scholar 

  29. Li, N., Li, T., Venkatasubramanian, S.: t closeness?: privacy beyond k-anonymity and -diversity. In: IEEE 23rd International Conference on Data Engineering, pp. 106–115. IEEE (2007). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4221659

  30. Yu, F., Fienberg, S.E., Slavkovic, A.B., et al.: Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inform. Published Online First: 6 February (2014). doi:10.1016/j.jbi.2014.01.008

  31. Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies (GWAS) (2007). http://grants.nih.gov/grants/guide/notice-files/NOT-OD-07-088.html. Accessed 11 Jul 2014

  32. Committees the NI of HGDSG.: Data use under the NIH GWAS data sharing policy and future directions. Nat. Genet. 46, 934–938 (2014). http://dx.doi.org/10.1038/ng.3062

  33. NIH security best practices for controlled-access data subject to the NIH genomic data sharing (GDS) policy. http://www.ncbi.nlm.nih.gov/projects/gap/pdf/dbgap_2b_security_procedures.pdf. Accessed 20 Mar 2015

  34. Dondorp, W.J., de Wert, G.M.W.R.: The ‘thousand-dollar genome’: an ethical exploration. Eur. J. Hum. Genet. 21:S6–S26 (2013)

    Article  Google Scholar 

  35. Maryland v. King. S. Ct. 2013;133:1958

    Google Scholar 

  36. Maryland v. King. S. Ct. 2013;133:1967

    Google Scholar 

  37. Health Insurance Portability and Accountability Act (HIPAA). http://www.hhs.gov/ocr/hipaa. Accessed 11 Jul 2014

  38. New rule protects patient privacy, secures health information. U.S. Department of Health and Human Services. http://www.hhs.gov/news/press/2013pres/01/20130117b.html. Accessed 11 Jul 2014

  39. HIPAA Privacy Rule, 45 C.F.R. § 164 (2014)

    Google Scholar 

  40. Nass, S.J., Levit, L.A., Gostin, L.O.: Beyond the HIPAA privacy rule: enhancing privacy, improving health through research. The National Academies Press, Washington, DC (2009)

    Google Scholar 

  41. Federal policy for the protection of human subjects. U.S. Department of Health and Human Services. http://www.hhs.gov/ohrp/humansubjects/commonrule/. Accessed 12 Mar 2015

  42. 45 C.F.R. § 46.101(b)(4)

    Google Scholar 

  43. Human Subject Research Protections, 76 Fed. Reg. 44,512, 44,524–25 (July 26, 2011)

    Google Scholar 

  44. 45 C.F.R. § 160.103, 164.514, 164.514

    Google Scholar 

  45. Baser v. Dep’t of Veterans Affairs, 2014 U.S. Dist. LEXIS 137602, at *11 (E.D. Mich. Sept. 30, 2014); Steinberg v. CVS Caremark Corp., 899 F. Supp. 2d 331, 336 (E.D. Pa. 2012)

    Google Scholar 

  46. 42 U.S.C. § 2000ff

    Google Scholar 

  47. 29 U.S.C. § 1182

    Google Scholar 

  48. E.g., Dumas v. Hurley Med. Ctr., 837 F. Supp. 2d 655, 659 (E.D. Mich. 2011); Bell v. PSS World Med., Inc., 2012 U.S. Dist. LEXIS 183288 (M.D. Fla. Dec. 7, 2012); Culbreth v. Wash. Metro. Area Transit Auth., 2012 U.S. Dist. LEXIS 37335 (D. Md. Mar. 19, 201

    Google Scholar 

  49. 42 U.S.C. § 2000ff(3)

    Google Scholar 

  50. Lee v. City of Moraine Fire Dep’t, 2014 U.S. Dist. LEXIS 61385, at *16 (S.D. Ohio May 2, 2014)

    Google Scholar 

  51. Poore v. Peterbilt of Bristol, L.L.C., 852 F. Supp. 2d 727, 730–31 (W.D. Va. 2012)

    Google Scholar 

  52. Slaughter, L.: Genetic information non-discrimination act. Harv. J. Legis. 50, 41 (2013)

    Google Scholar 

  53. For the study of bioethical issues PC. Privacy and progress in whole genome sequencing (2012)

    Google Scholar 

  54. California Genetic Information Nondiscrimination Act (2011). http://geneticprivacynetwork.org/about-calgina/. Accessed 11 Jul 2014

  55. Alaska Genetic Information Nondiscrimination Act (2014). http://doa.alaska.gov/dop/fileadmin/Equal_Employment/pdf/EEOP_Policy_Statement.pdf. Accessed 11 Mar 2015

  56. Prince, A.E.R.: Comprehensive protection of genetic information. Brooklyn Law Rev. 79, 175–227 (2013)

    Google Scholar 

  57. Lindor, N.M.: Personal autonomy in the genomic era. In: Video Proceedings of Mayo Clinic Individualizing Medicine Conference (2012)

    Google Scholar 

  58. Khan, A., Capps, B.J., Sum, M.Y., et al.: Informed consent for human genetic and genomic studies: a systematic review. Clin. Genet. 86, 199–206 (2014)

    Article  Google Scholar 

  59. Wolf, S.M., Crock, B.N., Van Ness, B., et al.: Managing incidental findings and research results in genomic research involving biobanks and archived data sets. Genet. Med. 14, 361–384 (2012)

    Article  Google Scholar 

  60. Rodriguez, L.L., Brooks, L.D., Greenberg, J.H., et al.: The complexities of genomic identifiability. Science 339, 275–276 (2013)

    Article  Google Scholar 

  61. Ball, M.P., Bobe, J.R., Chou, M.F., et al.: Harvard personal genome project: lessons from participatory public research. Genome Med. 6, 10 (2014)

    Article  Google Scholar 

  62. Naveed, M., Ayday, E., Clayton, E.W., et al.: Privacy and security in the genomic era. Published Online First: 8 May 2014. http://arxiv.org/abs/1405.1891. Accessed 11 Aug 2014

  63. Lin, Z., Owen, A.B., Altman, R.B. Genetics. Genomic research and human subject privacy. Science 305, 183 (2004). doi:10.1126/science.1095019

  64. Ayday, E., De Cristofaro, E., Hubaux, J.-P., et al. Whole genome sequencing: revolutionary medicine or privacy nightmare? Computer (Long Beach Calif) 48, 58–66 (2015). doi:10.1109/MC.2015.59

  65. Erlich, Y., Narayanan, A.: Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15, 409–421 (2014). doi:10.1038/nrg3723

    Article  Google Scholar 

  66. Lauter, K., Lopez-Alt, A., Naehrig, M.: Private computation on encrypted genomic data. In: 14th Privacy Enhancing Technologies Symposium, Workshop on Genome Privacy. http://seclab.soic.indiana.edu/GenomePrivacy/papers/Genome%20Privacy-paper9.pdf. (2014). 29 July 2014, date last accessed

  67. Bos, J.W., Lauter, K., Naehrig, M.: Private predictive analysis on encrypted medical data. J. Biomed. Inform. 50, 234–243 (2014). doi:10.1016/j.jbi.2014.04.003

    Article  Google Scholar 

  68. Cheon, J.H., Kim, M., Lauter, K.: Homomorphic computation of edit distance. In: WAHC’15 - 3rd Workshop on Encrypted Computing and Applied Homomorphic Cryptography (2015)

    Google Scholar 

  69. Homomorphic_Encryption.: http://en.wikipedia.org/w/index.php?title=Homomorphic_encryption%3Doldid=653811034 (2015). Accessed 29 Mar 2015

  70. Check Hayden, E.: Cloud cover protects gene data. Nature 519, 400–401 (2015). doi:10.1038/519400a

    Article  Google Scholar 

  71. Ayday, E., Raisaro, J.L., Hengartner, U., et al.: Privacy-preserving processing of raw genomic data. Data Priv. Manag. Auton. Spontaneous Secur. 8247, 133–147 (2014). http://infoscience.epfl.ch/record/187573. Accessed 31 Mar 2015

  72. Huang, Z., Ayday, E., Fellay, J., et al.: GenoGuard: protecting genomic data against brute-force attacks. In: 36th IEEE Symposium on Security and Privacy (S&P 2015), San Jose (2015). http://infoscience.epfl.ch/record/206772. Accessed 31 Mar 2015

  73. Danezis, G.: Simpler protocols for privacy-preserving disease susceptibility testing. In: 14th Privacy Enhancing Technologies Symposium, Workshop on Genome Privacy (GenoPri’14), Amsterdam (2014)

    Google Scholar 

  74. Djatmiko, M., Friedman, A., Boreli, R., et al.: Secure evaluation protocol for personalized medicine. In: 14th Privacy Enhancing Technologies Symposium, Workshop on Genome Privacy (GenoPri’14), Amsterdam (2014)

    Google Scholar 

  75. Lu, W., Yamada, Y., Sakuma, J.: Efficient secure outsourcing of genome-wide association studies. In: 2nd International Workshop on Genome Privacy and Security (GenoPri’15), San Jose (2015)

    Google Scholar 

  76. Duverle, D., Kawasaki, S., Yamada, Y., et al.: Privacy-preserving statistical analysis by exact logistic regression. In: 2nd International Workshop on Genome Privacy and Security (GenoPri’15), San Jose (2015)

    Google Scholar 

  77. Kantarcioglu, M., Jiang, W., Liu, Y., et al.: A cryptographic approach to securely share and query genomic sequences. IEEE Trans. Inf. Technol. Biomed. 12, 606–617 (2008). doi:10.1109/TITB.2007.908465

    Article  Google Scholar 

  78. Malin, B.A.: Protecting genomic sequence anonymity with generalization lattices. Methods Inf. Med. 44, 687–692 (2005). http://www.ncbi.nlm.nih.gov/pubmed/16400377. Accessed 12 Jan 2012

    Google Scholar 

  79. Loukides, G., Gkoulalas-Divanis, A., Malin, B.: Anonymization of electronic medical records for validating genome-wide association studies. Proc. Natl. Acad. Sci. U. S. A. 107, 7898–7903 (2010). doi:10.1073/pnas.0911686107

    Article  Google Scholar 

  80. Yu, F., Rybar, M., Uhler, C., et al.: Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases. In: Domingo-Ferrer, J., (ed.) Privacy in Statistical Databases, pp. 170–184. Springer, Cham (2010). doi:10.1007/978-3-540-87471-3

    Google Scholar 

  81. Wang, S., Mohammed, N., Chen, R.: Differentially private genome data dissemination through top-down specialization. BMC Med. Inform. Decis. Mak. 14, S2 (2014). doi:10.1186/1472-6947-14-S1-S2

    Article  Google Scholar 

  82. Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of the 19th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining - KDD ’13, p. 1079. ACM Press, New York (2013). doi:10.1145/2487575.2487687

  83. Uhler, C., Slavkovic, A.B., Fienberg, S.E.: Privacy-preserving data sharing for genome-wide association studies. J. Priv. Confidentiality 5, 137–166 (2013)

    Google Scholar 

  84. Yu, F., Fienberg, S.E., Slavkovic, A.B., et al.: Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inform. 50, 133–141 (2014). doi:10.1016/j.jbi.2014.01.008

    Article  Google Scholar 

  85. Yu, F., Ji, Z.: Scalable privacy-preserving data sharing methodology for genome-wide association studies: an application to iDASH healthcare privacy protection challenge. BMC Med. Inform. Decis. Mak. 14, S3 (2014). doi:10.1186/1472-6947-14-S1-S3

    Article  Google Scholar 

  86. De Cristofaro, E.: Genomic privacy and the rise of a new research community. IEEE Secur. Priv. 12, 80–83 (2014). doi:10.1109/MSP.2014.24

    Article  Google Scholar 

  87. 2nd International Workshop on Genome Privacy and Security (GenoPri 2015). http://www.genopri.org/. Accessed 30 Mar 2015

  88. Ohno-Machado, L., Bafna, V., Boxwala, A.A., et al.: iDASH: integrating data for analysis, anonymization, and sharing. J. Am. Med. Inform. Assoc. 19, 196–201 (2012)

    Article  Google Scholar 

  89. Jiang, X., Zhao, Y., Wang, X., et al.: A community assessment of privacy preserving techniques for human genomes. BMC Med. Inform. Decis. Mak. 14(Suppl 1), S1 (2014). doi:10.1186/1472-6947-14-S1-S1

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by NHGRI (K99HG008175), NLM (R00LM011392, R21LM012060), and NHLBI (U54HL108460).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuang Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Wang, S., Jiang, X., Fox, D., Ohno-Machado, L. (2015). Preserving Genome Privacy in Research Studies. In: Gkoulalas-Divanis, A., Loukides, G. (eds) Medical Data Privacy Handbook. Springer, Cham. https://doi.org/10.1007/978-3-319-23633-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23633-9_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23632-2

  • Online ISBN: 978-3-319-23633-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics