Skip to main content

Repositories for Sharing Human Data in Stem Cell Research

  • Chapter
  • First Online:
  • 1116 Accesses

Abstract

High-throughput biology is data intensive, and stem cell research is no exception to this trend. Funders often require scientists to share the data generated by high-throughput methods, because sharing speeds scientific discovery and increases the benefit of public investments in science. When human data are involved, the benefits of sharing must be balanced against the risks of inappropriately disclosing sensitive, personal information. Historically, scientists anonymized data to protect the interests of people whose data were shared. However, recent development of computational methods for re-identifying people from anonymous data and empirical demonstrations of re-identification have led many commentators to question whether anonymization still provides adequate protection for people whose data are included in shared databases. Because of disclosure concerns, data sharing repositories control who can access sensitive human data and what the approved users can do with those data. Stem cell researchers who create such repositories must develop governance mechanisms that prevent harm to individuals whose data they share.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    As of July, 2012, the Nucleic Acids Research online database catalog could be found at http://www.oxfordjournals.org/nar/database/a/.

  2. 2.

    For a comprehensive review of new repositories and databases for the biological sciences, see the January 2012 database supplement of Nucleic Acids Research (volume 40, issue D1), and for a discussion of the difficulties of creating, maintaining, and analyzing large data sets, see the February 2011 issue of Science.

  3. 3.

    One of the earliest and best-known standards for information sharing is MIAME—minimum information about a microarray experiment (Brazma et al. 2001). As microarray gene expression studies became a widely used source of genome-scale data in the life sciences, scientists discovered that much of the microarray data being generated were inaccessible or unusable by most of the interested scientific community. Microarray data accompanying publications were typically reported on the authors’ website, using a variety of formats. There was no consensus on what annotation was necessary, and the data often were not annotated. Data generally did not include indicators of reliability and quality. As the scientific community moved to create repositories for microarray data sharing (such as GEO and ArrayExpress), a consensus group was formed to suggest what types of information should be included in the repositories. The MIAME standards were the consensus group’s attempt to specify minimum information that must be included in a repository to make the data interpretable and usable to the broader scientific community. Since MIAME, other consensus groups have attempted to specify minimum information standards for other types of data.

  4. 4.

    Associations between a genetic or biochemical marker and a health outcome have been patentable in the United States for several decades. However, in 2012, the US Supreme Court held that a patent on a method of calibrating a drug’s dose by assessing levels of the drug’s metabolites in a person’s blood was not patentable because it was a law of nature (Mayo Collaborative Services v. Prometheus Laboratories, Inc. 2012). This case and others currently before the courts may substantially limit researchers’ and firms’ opportunities to patent correlations between biomarkers and health states or treatment outcomes. See, also, Noonan (this volume).

  5. 5.

    When the project involves an international collaboration, differences in national laws concerning data security and transfer, national security, and privacy can make research governance extremely difficult (Zink and Silman 2008).

  6. 6.

    Shortly after Dr. Watson’s genome sequence was published, Nyholt et al. published an article describing how one could infer Dr. Watson’s genetic risk for Alzheimer’s using linkage disequilibrium between genetic markers in the published sequence and the redacted portion of Dr. Watson’s genome (Nyholt et al. 2009). The Nyholt authors demonstrated that their method worked by using it to infer the Alzheimer’s risk alleles in Craig Venter’s published sequence. As a consequence of this work, Dr. Watson and the scientists who sequenced his genome redacted an additional two megabases of his sequence around genes associated with Alzheimer’s disease risk. Nyholt et al. point out that as the scientific community’s knowledge of genetic risk and linkage disequilibrium increases, it will become more difficult to protect a person from unwanted information disclosure by withholding a portion of her otherwise public genome.

  7. 7.

    In the United States, both the regulations for the protection of human participants in research (the “Common Rule,” codified at 45 CFR Part 46) and the Health Insurance Portability and Accountability Act’s Privacy Rule (codified at 45 CFR Parts 160 and 164) would place oversight and consent requirements on a repository that included explicit identifiers.

  8. 8.

    Sometimes, molecular analysis is conducted using leftover clinical specimens for which no consent for research was obtained or for which the purported consent constituted a one-line authorization to use excess tissue in research. Unconsented research on specimens originally collected for clinical treatment or diagnoses is allowed under the Common Rule. Institutions differ as to whether data from such studies may be deposited in a repository for broad data sharing. Some institutions require researchers to contact or recontact individuals and obtain consent for data sharing (Ludman et al. 2010).

References

  • Alsheikh-Ali, A. A., Qureshi, W., Al-Mallah, M. H., & Ionnidis, J. P. A. (2011). Public availability of published research data in high impact journals. PLoS One, 6, e24357. doi:10.1371/journal.pone.0024357.

    Article  PubMed  CAS  Google Scholar 

  • Amid, C., Birney, E., Bower, L., Cerdeno-Tarraga, A., Cheng, Y., et al. (2012). Major submissions tool developments at the European nucleotide archive. Nucleic Acids Research, 40, D43–D47.

    Article  PubMed  CAS  Google Scholar 

  • Anonymous. (2011). Challenges and opportunities. Science, 331, 692–693.

    Article  Google Scholar 

  • Benitez, K., & Malin, B. (2010). Evaluating re-identification risks with respect to the HIPAA Privacy Rule. Journal of the American Medical Information Association, 17, 169–177.

    Article  Google Scholar 

  • Benson, D. A., Karsch-Mizrachi, I., Clark, K., Lipman, D., Ostell, J., et al. (2012). Genbank. Nucleic Acids Research, 40, D48–D53.

    Article  PubMed  CAS  Google Scholar 

  • Berman, J. J. (2002). Confidentiality issues for medical data miners. Artificial Intelligence in Medicine, 26, 25–36.

    Article  PubMed  Google Scholar 

  • Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., et al. (2001). Minimum information about a microarray experiment (MIAME)—Toward standards for microarray data. Nature Genetics, 29, 365–371.

    Article  PubMed  CAS  Google Scholar 

  • Campbell, E. G., Clarridge, B. R., Gokhale, M., Birenbaum, L., Hilgartner, S., et al. (2002). Data withholding in academic genetics: Evidence from a national survey. Journal of the American Medical Association, 287, 473–480.

    Article  PubMed  Google Scholar 

  • Collins, F. S., Morgan, M., & Patrinos, A. (2003). The human genome project: Lessons from large-scale biology. Science, 300, 286–290.

    Article  PubMed  CAS  Google Scholar 

  • Contreras, J. (2011). Bermuda’s legacy: Policy, patents, and the design of the genome commons. Minnesota Journal of Law, Science and Technology, 12, 61–125.

    Google Scholar 

  • Department of Health and Human Services. (2005). Protection of Human Subjects, 45 Code of Federal Regulations, Part 46.102(f).

    Google Scholar 

  • Flicek, P., Amode, M. R., Barrell, D., Beal, K., Brent, S., et al. (2012). Ensembl 2012. Nucleic Acids Research, 40, D84–D90.

    Article  PubMed  CAS  Google Scholar 

  • Foster, M. W. (1998). Model agreement for genetic research. American Journal of Human Genetics, 63, 696–702.

    Article  PubMed  CAS  Google Scholar 

  • Foster, M. W., Eisenbraun, A. J., & Carter, T. H. (1997). Communal discourse as a supplement to informed consent for genetic research. Nature Genetics, 17, 277–279.

    Article  PubMed  CAS  Google Scholar 

  • Foster, M. W., & Sharp, R. R. (2007). Share and share alike: Deciding how to distribute the scientific and social benefits of genomic data. Nature Reviews Genetics. doi:10.1038/nrg2124.

    PubMed  Google Scholar 

  • Galperin, M. Y., & Fernandez-Suarez, X. M. (2012). The 2012 nucleic acids research issue and the online molecular biology database collection. Nucleic Acids Research, 40, D1–D8.

    Article  PubMed  CAS  Google Scholar 

  • Genome Canada. (2008). Data release and resource sharing. Retrieved June 25, 2012, from http://www.genomecanada.ca/medias/PDF/EN/DataReleaseandResourceSharingPolicy.pdf.

  • Genetic Information Nondiscrimination Act of 2008, Pub. Law 110-233, 122 Stat. 881. (2008).

    Google Scholar 

  • Heeney, C., Hawkins, N., Jd, V., Boddington, P., & Kaye, J. (2011). Assessing the privacy risks of data sharing in genomics. Public Health Genomics, 14, 17–25.

    Article  PubMed  CAS  Google Scholar 

  • Hemphill, E. E., Dharia, A. P., Lee, C., Jakuba, C. M., Gibson, J. D., et al. (2011). Scld: A stem cell lineage database for the annotation of cell types and developmental lineages. Nucleic Acids Research, 39, D525–D533.

    Article  PubMed  CAS  Google Scholar 

  • Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., et al. (2008). Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genetics, 4(e1000167), 1000161–1000169.

    Google Scholar 

  • Hudson, K. L., Holohan, M. K., & Collins, F. S. (2008). Keeping pace with the times—The genetic information nondiscrimination act of 2008. New England Journal of Medicine, 358, 2661–2663.

    Article  PubMed  CAS  Google Scholar 

  • Juengst, E. T. (1998). Groups as gatekeepers to genomic research: Conceptually confusing, morally hazardous, and practically useless. Kennedy Institute of Ethics Journal, 8, 183–200.

    Article  PubMed  Google Scholar 

  • Kaye, J., Heeney, C., Hawkins, N., de Vries, J., & Boddington, P. (2009). Data sharing in genomics—Re-shaping scientific practice. Nature Reviews Genetics, 10, 331–335.

    Article  PubMed  CAS  Google Scholar 

  • Kodama, Y., Shumway, M., Leinonen, R., & International Nucleotide Sequence Database Collaboration. (2012). The sequence read archive: Explosive growth of sequencing data. Nucleic Acids Research, 40, D54–D56.

    Article  PubMed  CAS  Google Scholar 

  • Leinonen, R., Akhtar, R., Birney, E., Bower, L., Cerdeno-Tarraga, A., et al. (2011). The European nucleotide archive. Nucleic Acids Research, 39, D28–D31.

    Article  PubMed  CAS  Google Scholar 

  • Lin, Z., Owen, A. B., & Altman, R. B. (2005). Genomic research and human subject privacy. Science, 305, 183.

    Article  Google Scholar 

  • Lowrance, W. W. (2002). Learning from experience: Privacy and the secondary use of data in health research. London: The Nuffield Trust.

    Google Scholar 

  • Lowrance, W. W. (2006). Privacy, confidentiality and identifiability in genomic research. Workship on Privacy, Confidentiality and Identifiability in Genomic Research. Oct. 3–4. Bathesda: National Human Genome Research Institute.

    Google Scholar 

  • Lowrance, W. W., & Collins, F. S. (2007). Identifiability in genomic research. Science, 317, 600.

    Article  PubMed  CAS  Google Scholar 

  • Ludman, E. J., Fullerton, S. M., Spangler, L., Trinidad, S. B., Fujii, M. M., et al. (2010). Glad you asked: Participants’ opinions of re-consent for dbGaP data submission. Journal of Empirical Research on Human Research Ethics, 5, 9–16.

    Article  PubMed  Google Scholar 

  • Mailman, M. D., Feolo, M., Jin, Y., Kimura, M., Tryka, K., et al. (2007). The NCBI dbGaP database of genotypes and phenotypes. Nature Genetics, 39, 1181–1186.

    Article  PubMed  CAS  Google Scholar 

  • Malin, B. (2005). Betrayed by my shadow: Learning data identity via trail matching. Journal of Privacy Technology, 20050609001.

    Google Scholar 

  • Malin, B. (2005b). An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future. Journal of the American Medical Information Association, 12, 28–34.

    Article  Google Scholar 

  • Malin, B. (2006). Re-identification of familial database records. In AMIA 2006 symposium proceedings (pp. 525–528).

    Google Scholar 

  • Malin, B., Karp, D., & Scheuermann, R. H. (2010). Technical and policy approaches to balancing patient privacy and data sharing in clinical and translational research. Journal of Investigative Medicine, 58, 11–18.

    PubMed  Google Scholar 

  • Mathews, D. J. H., Graff, G. D., Saha, K., & Winickoff, D. E. (2011). Access to stem cells and data: Persons, property rights, and scientific progress. Science, 331, 725–727.

    Article  PubMed  CAS  Google Scholar 

  • Mayo Collaborative Services v. Prometheus Laboratories, Inc., 132 S. Ct. 1289. (2012).

    Google Scholar 

  • McGuire, A. L., & Gibbs, R. A. (2006). No longer de-identified. Science, 312, 370–371.

    Article  PubMed  CAS  Google Scholar 

  • Narayanan, A., & Shmatikov, V. (2006). Robust de-anonymization of large sparse datasets. Retrieved July 12, 2012, from http://www.Cs.Utexas.Edu/~shmat/shmat_oak08netflix.Pdf.

  • National Bioethics Advisory Commission. (1999). Research involving human biological materials: Ethical issues and policy guidance, Volume I. Rockville: National Bioethics Advisory Commission.

    Google Scholar 

  • National Center for Biotechnology Information. (2012). DbGaP. Washington, DC: National LIbrary of Medicine.

    Google Scholar 

  • National Institutes of Health. (2007). Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies (GWAS). Federal Register, 72, 49290–49297.

    Google Scholar 

  • Nyholt, D., Yu, C.-E., & Visscher, P. (2009). On Jim Watsons APOE status: Genetic information is hard to hide. European Journal of Human Genetics, 17, 147–150.

    Article  PubMed  Google Scholar 

  • Office of Extramural Research. (2003). Nih data sharing policy and implementation guidance. Retrieved February 12, 2012, from http://grants.Nih.Gov/grants/policy/data_sharing/data_sharing_guidance.Htm.

  • Ohm, P. (2010). Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review, 57, 1701–1777.

    Google Scholar 

  • Ossorio, P. (2011). Bodies of data: Genomic data and bioscience data sharing. Social Research, 78, 907–932.

    Google Scholar 

  • Phanstiel, D. H., Brumbaugh, J., Wenger, C. D., Tian, S., Probasco, M. D., et al. (2011). Proteomic and phosphoproteomic comparison of human ES and iPS cells. Nature Methods, 8, 821–827.

    Article  PubMed  CAS  Google Scholar 

  • Rai, A. K. (2005). “Open and collaborative” research: A new model for biomedicine. In R. W. Hahn (Ed.), Intellectual property rights in frontier industries. Washington, DC: AEI Press.

    Google Scholar 

  • Rodriguez, H., Snyder, M., Uhlen, M., Andrews, P., Beavis, R., et al. (2009). Recommendations from the 2008 international summit on proteomics data release and sharing policy: The Amsterdam principles. Journal of Proteome Research, 8, 3689–3692.

    Article  PubMed  CAS  Google Scholar 

  • Sankararaman, S., Obozinski, G., Jordan, M. I., & Halperin, E. (2009). Genome privacy and limits of individual detection in a pool. Nature Genetics, 41, 966–967.

    Article  Google Scholar 

  • Sayers, E. W., Barrett, T., Benson, D. A., Bolton, E., Bryant, S. H., et al. (2012). Database resources of the National Center for Biotechnology Information. Nucleic Acids Research, 40, D13–D25.

    Article  PubMed  CAS  Google Scholar 

  • Schadt, E. E., Woo, S., & Hao, K. (2012). Bayesian method to predict individual SNP genotypes from gene expression data. Nature Genetics, 44, 603–609.

    Article  PubMed  CAS  Google Scholar 

  • Sharp, R. R., & Foster, M. W. (2000). Involving study populations in the review of genetic research. Journal of Law, Medicine & Ethics, 28, 41–51.

    Article  CAS  Google Scholar 

  • Sherry, S. T., Ward, M., Kholodov, M., Baker, J., Phan, L., et al. (2001). dbSNP: The NCBI database of genetic variation. Nucleic Acids Research, 29, 308–311.

    Article  PubMed  CAS  Google Scholar 

  • Stein, L. D. (2010). The case for cloud computing in genome informatics. Genome Biology, 11, 207.

    Article  PubMed  Google Scholar 

  • Sui, S. J. H., Begley, K., Reilly, D., Chapman, B., McGovern, R., et al. (2012). The stem cell discovery engine: An integrated repository and analysis system for cancer stem cell comparisons. Nucleic Acids Research, 40, D984–D991.

    Article  Google Scholar 

  • Sweeny, L. (1996). Uniqueness of simple demographics in the u.S. Population. Working Paper LIDAP-WP4. Data Privacy Lab, Carnegie Mellon University, Pittsburgh, PA.

    Google Scholar 

  • Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., et al. (2011). Data sharing by scientists: Practices and perceptions. PLoS One, 6, e21101. doi:10.1371/journal.pone.0021101.

    Article  PubMed  CAS  Google Scholar 

  • The 1000 Genomes Project Consortium. (2010). A map of human genome variation from population-scale sequencing. Nature, 467, 1061–1073.

    Article  Google Scholar 

  • The Hinxton Group. (2010). Statement on policies and practices governing data and materials sharing and intellectual property in stem cell science. Retrieved February 12, 2012, from http://www.Hinxtongroup.Org/consensus_hg10_final.Pdf.

  • The International HapMap Consortium. (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851–861.

    Article  Google Scholar 

  • The UniProt Consortium. (2012). Reorganizing the protein space at the universal protein resource (UniProt). Nucleic Acids Research, 40, D71–D75.

    Article  Google Scholar 

  • Toronto International Data Release Workshop. (2009). Prepublication data sharing. Nature, 461, 168–170.

    Article  Google Scholar 

  • Turnbaugh, P. J., Ley, R. E., Hamady, M., Fraser-Liggett, C. M., Knight, R., et al. (2007). The human microbiome project. Nature, 449, 804–810.

    Article  PubMed  CAS  Google Scholar 

  • U.S. Agency for Healthcare Research and Quality, Bill and Melinda Gates Foundation (U.S.), U.S. Centers for Disease Control, Doris Duke Charitable Foundation (U.S.), U.S. Health Resources and Services Administration, Hewlett Foundation (U.S.), U.S. National Institutes of Health, U.S. Substance Abuse and Mental Health Services Administration, Canadian Institutes of Health Research, Deutsche Forschungsgemeinschaft, Economic and Social Research Council (UK), Medical Research Council (UK), Wellcome Trust (UK), Health Research Council of New Zealand, INSERM (FR), National Health and Medical Research Council (Australia), The World Bank. The list of signatories can be found at: http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Public-health-and-epidemiology/Signatories-to-the-joint-statement/index.htm

  • Weijer, C., Goldsand, G., & Emanuel, E. J. (1999). Protecting communities in research: Current guidelines and limits of extrapolation. Nature Genetics, 23, 275–280.

    Article  PubMed  CAS  Google Scholar 

  • Wellcome Trust. (2010). Data management and sharing. Retrieved June 15, 2012, from http://www.Wellcome.Ac.Uk/about-us/policy/spotlight-issues/data-sharing/data-management-and-sharing/index.Htm.

  • Wellcome Trust.(2011). Sharing research data to improve public health: Full joint statement by funders of health researcher. Retrieved July 8, 2012, from http://www.Wellcome.Ac.Uk/about-us/policy/spotlight-issues/data-sharing/public-health-and-epidemiology/wtdv030690.Htm.

  • Wheeler, D. A., Srinivasan, M., Egholm, M., Shen, Y., Cen, L., et al. (2008). The complete genome of an individual by massively parallel DNA sequencing. Nature, 452, 872–877.

    Article  PubMed  CAS  Google Scholar 

  • Wolf, S. M., Crock, B. N., Van Ness, B., Lawrenz, F., Kahn, J. P., et al. (2012). Managing incidental findings and research results in genomic research involving biobanks and archived datasets. Genetics in Medicine, 14, 361–384.

    Article  PubMed  Google Scholar 

  • Wolf, S. M., Paradise, J., Nelson, C. A., Kahn, J. P., & Lawrenz, F. (2008). Managing incidental findings in human subjects research: Analysis and recommendations. Journal of Law, Medicine & Ethics, 36, 219–248.

    Article  Google Scholar 

  • Woodman, R. (1999). Wellcome Trust and drug giants fund gene marker database. British Medical Journal, 318, 1093.

    Article  PubMed  CAS  Google Scholar 

  • Yeniterzi, R., Aberdeen, J., Bayer, S., Wellner, B., Hirschman, L., et al. (2010). Effects of personal identifier resynthesis on clinical text de-identification. Journal of the American Medical Information Association, 17, 159–168.

    Article  Google Scholar 

  • Zink, A., & Silman, A. (2008). Ethical and legal constraints on data sharing between countries in multinational epidemiological studies in Europe: Report from a joint workshop of the European League Against Rheumatism standing committee on epidemiology with the “AutoCure” Project. Annals of Rheumatic Disease, 67, 1041–1043.

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pilar N. Ossorio Ph.D., J.D. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Ossorio, P.N. (2014). Repositories for Sharing Human Data in Stem Cell Research. In: Hogle, L. (eds) Regenerative Medicine Ethics. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-9062-3_5

Download citation

Publish with us

Policies and ethics