Privacy Challenges of Genomic Big Data

  • Hong ShenEmail author
  • Jian MaEmail author
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 1028)


With the rapid advancement of high-throughput DNA sequencing technologies, genomics has become a big data discipline where large-scale genetic information of human individuals can be obtained efficiently with low cost. However, such massive amount of personal genomic data creates tremendous challenge for privacy, especially given the emergence of direct-to-consumer (DTC) industry that provides genetic testing services. Here we review the recent development in genomic big data and its implications on privacy. We also discuss the current dilemmas and future challenges of genomic privacy.


Personal genomic data Genomic privacy Computational methods 


  1. 1.
    Allain DC, Friedman S, Senter L (2012) Consumer awareness and attitudes about insurance discrimination post enactment of the Genetic Information Nondiscrimination Act. Fam Cancer 11:637–644CrossRefPubMedGoogle Scholar
  2. 2.
    Ashton PM, Nair S, Dallman T, Rubino S, Rabsch W, Mwaigwisya S, Wain J, O'Grady J (2015) MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island. Nat Biotechnol 33:296–300CrossRefPubMedGoogle Scholar
  3. 3.
    Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR et al (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456:53–59CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Cancer Genome Atlas Research N, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM (2013) The cancer genome atlas pan-cancer analysis project. Nat Genet 45:1113–1120CrossRefGoogle Scholar
  5. 5.
    Consortium EP (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74CrossRefGoogle Scholar
  6. 6.
    Contreras JL (2015) NIH’s genomic data sharing policy: timing and tradeoffs. Trends Genet 31:55–57CrossRefPubMedGoogle Scholar
  7. 7.
    Erlich Y, Narayanan A (2014) Routes for breaching and protecting genetic privacy. Nat Rev Genet 15:409–421CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, Mc Carthy S, Mc Vean GA et al (2015) A global reference for human genetic variation. Nature 526:68–74CrossRefGoogle Scholar
  9. 9.
    Green ED, Guyer MS (2011) Charting a course for genomic medicine from base pairs to bedside. Nature 470:204–213CrossRefPubMedGoogle Scholar
  10. 10.
    Greenbaum D, Du J, Gerstein M (2008) Genomic anonymity: have we already lost it? Am J Bioeth 8:71–74CrossRefPubMedGoogle Scholar
  11. 11.
    Gurwitz D, Bregman-Eschet Y (2009) Personal genomics services: whose genomes? Eur J Hum Genet 17:883–889CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Gymrek M, McGuire AL, Golan D, Halperin E, Erlich Y (2013) Identifying personal genomes by surname inference. Science 339:321–324CrossRefPubMedGoogle Scholar
  13. 13.
    Harmanci A, Gerstein M (2016) Quantification of private information leakage from phenotype-genotype data: linking attacks. Nat Methods 13:251–256CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, Pearson JV, Stephan DA, Nelson SF, Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet 4:e1000167CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Huang H-Y, Bashir M. 2015 Direct-to-consumer genetic testing: contextual privacy predicament. In: Proceedings of the 78th ASIS&T Annual Meeting: information science with impact: research in and for the community, p. 50. American Society for Information ScienceGoogle Scholar
  16. 16.
    Im HK, Gamazon ER, Nicolae DL, Cox NJ (2012) On sharing quantitative trait GWAS results in an era of multiple-omics data and the limits of genomic privacy. Am J Hum Genet 90:591–598CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Korlach J, Bjornson KP, Chaudhuri BP, Cicero RL, Flusberg BA, Gray JJ, Holden D, Saxena R, Wegener J, Turner SW (2010) Real-time DNA sequencing from single polymerase molecules. Methods Enzymol 472:431–455CrossRefPubMedGoogle Scholar
  18. 18.
    Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921CrossRefPubMedGoogle Scholar
  19. 19.
    Lee SS, Crawley L (2009) Research 2.0: social networking and direct-to-consumer (DTC) genomics. Am J Bioeth 9:35–44CrossRefPubMedGoogle Scholar
  20. 20.
    Magnus D, Cho MK, Cook-Deegan R (2009) Direct-to-consumer genetic tests: beyond medical regulation? Genome Med 1:17CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J et al (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337:1190–1195CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    McEwen JE, Boyer JT, Sun KY (2013) Evolving approaches to the ethical management of genomic data. Trends Genet 29:375–382CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Metzker ML (2010) Sequencing technologies – the next generation. Nat Rev Genet 11:31–46CrossRefPubMedGoogle Scholar
  24. 24.
    Phillips PC (2008) Epistasis – the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet 9:855–867CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A74:5463–5467CrossRefGoogle Scholar
  26. 26.
    Schadt EE (2012) The changing privacy landscape in the era of big data. Mol Syst Biol 8:612CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Schadt EE, Woo S, Hao K (2012) Bayesian method to predict individual SNP genotypes from gene expression data. Nat Genet 44:603–608CrossRefPubMedGoogle Scholar
  28. 28.
    Shringarpure SS, Bustamante CD (2015) Privacy risks from genomic data-sharing beacons. Am J Hum Genet 97:631–646CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz MC, Sinha S, Robinson GE (2015) Big data: astronomical or genomical? PLoS Biol 13:e1002195CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA Jr, Kinzler KW (2013) Cancer genome landscapes. Science 339:1546–1558CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P et al (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562CrossRefPubMedGoogle Scholar
  32. 32.
    Weinhold N, Jacobsen A, Schultz N, Sander C, Lee W (2014) Genome-wide analysis of noncoding regulatory mutations in cancer. Nat Genet 46:1160–1165CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A, Flicek P, Manolio T, Hindorff L et al (2014) The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42:D1001–D1006CrossRefPubMedGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  1. 1.Department of Engineering and Public PolicyCarnegie Mellon UniversityPittsburghUSA
  2. 2.Computational Biology Department, School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations