The Personal Data Is Political
The success of personalized medicine does not only rely on methodological advances but also on the availability of data to learn from. While the generation and sharing of large data sets is becoming increasingly easier, there is a remarkable lack of diversity within shared datasets, rendering any novel scientific findings directly applicable only to a small portion of the human population. Here, we are investigating two fields that have been majorly impacted by data sharing initiatives, neuroscience and genetics. Exploring the limitations that are a result of a lack of participant diversity, we propose that data sharing in itself is not enough to enable a global personalized medicine.
KeywordsGenetics Personalized medicine Neuroscience Data sharing Diversity Open data Machine learning
Personalized or stratified medicine has been one of the hot topics in health care, reaching well beyond the launch of the Precision Medicine Initiative in the United States (Collins and Varmus 2010). The promise of personalized medicine is to identify individuals at risk and find optimally tailored health care solutions based on their genetic and environmental makeup (Lu et al. 2014). Although personal medicine spans over a variety of medical and biological disciplines, two subfields are particularly promising due to their growing adoption: genetics and neuroscience. Indeed, many current examples of precision medicine come from pharmacogenomics in general, specifically from oncology, where cancer treatments are picked to match the mutations found in tumours (Kummar et al. 2015; Smith 2012; Tan and Du 2012).
While this use of genetic data in health care is projected to become more central in the next years, its success will depend on multiple factors. As for most things in healthcare, cost plays a huge role. But while the costs for performing a high precision medical examination, like a brain scan, or sequencing a human genome continue to drop (Wetterstrand 2018), their usefulness is bound by both our ability to quickly process these large amounts of data as well as the lack of medically-relevant scientific knowledge we have about individual genetic variants (Dewey et al. 2014), or complex neurobiological processes. As such it is key that science be able to generate genetic knowledge more quickly (Kohane 2015).
Two recent trends in science, big data and artificial intelligence, appear to be promising for not only accelerating our genomic and neurobiological understanding but also for diagnosing in a precision medicine framework (Moon et al. 2007; Dilsizian and Siegel 2014). The idea is that artificial intelligence can be used to mine large data sets to find the smallest associations between genetic variants / neuromarkers and disease phenotypes, and to track disease progression or predict optimal treatments. To effectively create such large data collections it thus becomes central to link and share individual data sets (Kohane 2015). But while the total number of basepairs sequenced per time as well as the total number of participants included in neuroscientific studies have exponentially increased over the last years, sharing practices for such data has not kept up a similar speed (Kovalevskaya et al. 2016), despite individual efforts to enable open sharing of genetic (Mao et al. 2016; Greshake et al. 2014) or neuroscientific (Poline et al. 2012) data.
8.2 Sharing Genomic Data
To alleviate these shortcomings individual academic consortia have been founded to pool data sets across institutions and individual researchers. National efforts include the UK10K (“UK10K” 2018), which aimed to sequence 10,000 participants in the United Kingdom and the similarly structured 100,000 Genomes Project by Genomics England (“Genomics England” 2018). In the United States, the Exome Aggregation Consortium (ExAC) (“ExAC” 2018) – which has collected over 60,000 exomes - and more recently the All of Us initiative (“All of Us” 2018) are collecting and aggregating more patient data for research purposes. And it is not only academic research that is starting to collect large data sets for personalized medicine, commercial companies are starting to explore the field too.
Since deCODE Genetics and 23andMe released the first Direct-To-Consumer genetic tests back in 2007 (Vorhaus 2010), the market for commercial genetic testing has grown significantly: Not only in terms of companies like MyHeritage, FamilyTreeDNA, AncestryDNA or Veritas that have entered the market, but also in terms of the number of people who have gotten genetic tests through these services. Today, AncestryDNA has over five million customers and industry veteran 23andMe has genetic data for over two million people (McAllister 2017). These sizable commercial databases are of interest to academic and commercial researchers. 23andMe has collaborated with academic researchers on numerous research papers (“23andMe Research” 2018) and has done commercial for-profit collaborations with pharmaceutical companies like Pfizer and Genentech.
Who profits from such large-scale research remains open. As an example, in psychology the need to look into how representative study participants are has been acknowledged. After all, around 80% of all participants in psychology studies are from WEIRD (Western, Educated, Industrialized, Rich, Democratic) countries and do thus not represent human diversity (Henrich et al. 2010). As such, only WEIRD participants can fully profit from much of psychological research. To avoid the overrepresentation of WEIRD individuals found in psychology, it is key that our genetic research data resources reflect human diversity across populations. Indeed, this issue of representativeness becomes even more central in the genetic framework of Genome Wide Association Studies (GWAS). These studies are commonly used to inform personalized medicine by identifying genetic risk factors, e.g. for cancer (Agyeman and Ofori-Asenso 2015). Unfortunately, most of these identified risk factors are mere correlations, not genes directly causing a disease. As these correlations depend on the ancestry context in which they were found, findings of a GWAS are not necessarily applicable outside the human population in which an association was initially found (Bush et al. 2012) and cannot be replicated in many cases (Marigorta et al. 2013).
Indeed, many data sharing efforts show such a lack of population diversity: More than 50% of the over 60,000 samples in the ExAC consortium come from a European population (“ExAC” 2018). Similarly, commercial databases like the ones of 23andMe suffer from ancestry and race biases (“Problems with 23andMe Ancestry Composition” 2015; Euny Hong 2016). Open genomic databases – like the Personal Genome Projects and openSNP – are not fairing much better: 75% of participants in one of Harvard’s Personal Genome Project studies identified as white (Mao et al. 2016) and amongst a survey of over 500 openSNP participants over 70% come from the US, UK and Canada. Additionally, over 75% of openSNP participants had at least a Bachelor’s degree, hinting at a highly skewed demographic (Haeusermann et al. 2017).
8.3 Sharing Neurobiological Data
Similar to genetics, neuroscience has gone a long way when it comes to data sharing: While initial attempts to share data mainly focused on post-processed data, like coordinate-based results or statistical maps of magnetic resonance imaging (MRI) (Fox and Lancaster 2002), more recent initiatives enable sharing of entire functional or structural MRI datasets (Gorgolewski et al. 2015; Poldrack et al. 2013) and magneto- or electro- encephalography (M/EEG) data (Niso et al. 2016).
As in the case of psychology and genomics, neuroscience research is largely based on data of individuals from WEIRD societies (Falk et al. 2013), despite a plethora of studies showing that brain development is affected by socioeconomic status, early life stress, or cultural differences (Hackman et al. 2010; Marshall et al. 2018; Chan et al. 2018; Duval et al. 2017; Liddell and Jobson 2016). Indeed, within or across household socio-economic variables during childhood, such as family income, parental education (Ellwood-Lowe et al. 2018; Weissman et al. 2018) or neighbourhood poverty levels (Marshall et al. 2018), can be traced on trajectories of brain development, and result in differences in brain structure (Ellwood-Lowe et al. 2018) and cognitive functions (Hackman and Farah 2018), or gene expression (Parker et al. 2017). Differences in brain networks according to socio-economic status are also evident during adolescence (Weissman et al. 2018) and adulthood (Chan et al. 2018).
Furthermore, culture has been shown to influence neural functions (Liddell and Jobson 2016). Cultural and ethnic differences have an impact on emotion perception and expression, and brain responses to emotional or social cues (Derntl et al. 2012). Moreover, ethnic differences have been found in physiological responses to fear or novelty (Martínez et al. 2014; Kredlow et al. 2017), which are commonly used to assess anxiety or post-traumatic stress disorders (Bach et al. 2017). This situation is aggravated by the fact that ethnicity can influence skin conductance responses (Kredlow et al. 2017), which are commonly used as laboratory measurements of fear mechanisms (Tzovara et al. 2018), potentially leading to the exclusion of ethnicities despite being at higher risk e.g. for post-traumatic stress disorders (Roberts et al. 2011).
How much existing data sharing efforts for neuroscience are affected by these biases is hard to estimate at this point: Although these initiatives generally tend to support standardized data formats for data sharing (Niso et al. 2018; Gorgolewski et al. 2016), they only rarely include concrete guidelines for reporting of socio-demographic variables (Madan 2017).
8.4 Data Sharing as a Social Movement
All of this paints a bleak picture: The populations we are using to develop personalized medicine are highly WEIRD (Henrich et al. 2010). Even worse, we might often not even be aware of this, as we are not collecting the needed demographic data to identify our biases. Depending on the field, research studies can furthermore only contain small sample sizes, making it hard to evaluate how ethnicity or social factors influence neurobiological functions and gene expression. Only by sharing diverse datasets, and including rich demographic information will it be possible to make our understanding of disease progression, and neurobiological functions relevant for all individuals, irrespective of their social or ethnic background.
Back in 2005, Thomas Friedman firmly believed that next great breakthrough in bioscience could come from a 15-year-old who downloads the human genome in Egypt (Pink 2005). Today, we have to acknowledge that there is a good chance that this 15-year-old would not be able to profit from their own breakthrough. Because of this, we are still far away from a truly personalized medicine, making our personal data political. It is up to us, the generators of data and the people sharing data to work on changing this, ensuring that the promise of personalized medicine is equitable. Or to say it with Carol Hanisch’s words: There are no personal solutions at this time. There is only collective action for a collective solution (Hanisch 1969).
- “23andMe Research”. 2018. https://research.23andme.com/publications/.
- “All of Us”. 2018. https://allofus.nih.gov.
- Chan, Micaela Y., Jinkyung Na, Phillip F. Agres, Neil K. Savalia, Denise C. Park, and Gagan S. Wig. 2018. Socioeconomic status moderates age-related differences in the Brain’s functional network organization and anatomy across the adult lifespan. Proceedings of the National Academy of Sciences of the United States of America 115: E5144–E5153. https://doi.org/10.1073/pnas.1714021115.CrossRefPubMedPubMedCentralGoogle Scholar
- Derntl, Birgit, Ute Habel, Simon Robinson, Christian Windischberger, Ilse Kryspin-Exner, Ruben C. Gur, and Ewald Moser. 2012. Culture but not gender modulates amygdala activation during explicit emotion recognition. BMC Neuroscience 13 (1): 54. https://doi.org/10.1186/1471-2202-13-54.CrossRefPubMedPubMedCentralGoogle Scholar
- Dewey, Frederick E., Megan E. Grove, Cuiping Pan, Benjamin A. Goldstein, Jonathan A. Bernstein, Hassan Chaib, Jason D. Merker, et al. 2014. Clinical interpretation and implications of whole-genome sequencing. JAMA – Journal of the American Medical Association 311 (10): 1035–1044. https://doi.org/10.1001/jama.2014.1717.CrossRefPubMedGoogle Scholar
- Dilsizian, Steven E., and Eliot L. Siegel. 2014. Artificial intelligence in medicine and cardiac imaging: Harnessing big data and advanced computing to provide personalized medical diagnosis and treatment. Current Cardiology Reports 16 (1). https://doi.org/10.1007/s11886-013-0441-8.
- Duval, Elizabeth R., Sarah N. Garfinkel, James E. Swain, Gary W. Evans, Erika K. Blackburn, Mike Angstadt, Chandra S. Sripada, and Israel Liberzon. 2017. Childhood poverty is associated with altered hippocampal function and visuospatial memory in adulthood. Developmental Cognitive Neuroscience 23: 39–44. https://doi.org/10.1016/j.dcn.2016.11.006.CrossRefPubMedGoogle Scholar
- Ellwood-Lowe, Monica E., Kathryn L. Humphreys, Sarah J. Ordaz, M. Catalina Camacho, Matthew D. Sacchet, and Ian H. Gotlib. 2018. Time-varying effects of income on hippocampal volume trajectories in adolescent girls. Developmental Cognitive Neuroscience 30: 41–50. https://doi.org/10.1016/j.dcn.2017.12.005.CrossRefPubMedGoogle Scholar
- Euny Hong. 2016. 23andMe has a problem when it comes to ancestry reports for people of color. 2016.Google Scholar
- “ExAC”. 2018. http://exac.broadinstitute.org/faq.
- Genomics England. 2018. https://www.genomicsengland.co.uk/.
- Gorgolewski, Krzysztof J., Gael Varoquaux, Gabriel Rivera, Yannick Schwarz, Satrajit S. Ghosh, Camille Maumet, Vanessa V. Sochat, et al. 2015. NeuroVault.org: A web-based repository for collecting and sharing unthresholded statistical maps of the human brain. Frontiers in Neuroinformatics 9. https://doi.org/10.3389/fninf.2015.00008.
- Gorgolewski, Krzysztof J., Vince D. Tibor Auer, R. Cameron Craddock Calhoun, Samir Das, Eugene P. Duff, Guillaume Flandin, et al. 2016. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data 3. https://doi.org/10.1038/sdata.2016.44.CrossRefGoogle Scholar
- Hanisch, Carol. 1969. The personal is political. http://www.carolhanisch.org/CHwritings/PIP.html.
- Kovalevskaya, Nadezda V., Charlotte Whicher, Timothy D. Richardson, Craig Smith, Jana Grajciarova, Xocas Cardama, José Moreira, Adrian Alexa, Amanda A. McMurray, and Fiona G.G. Nielsen. 2016. DNAdigest and Repositive: Connecting the world of genomic data. PLoS Biology 14 (3). https://doi.org/10.1371/journal.pbio.1002418.CrossRefGoogle Scholar
- Kredlow, Alexandra M., Suzanne L. Pineles, Sabra S. Inslicht, Marie France Marin, Mohammed R. Milad, Michael W. Otto, and Scott P. Orr. 2017. Assessment of skin conductance in African American and non-African American participants in studies of conditioned fear. Psychophysiology 54 (11): 1741–1754. https://doi.org/10.1111/psyp.12909.CrossRefPubMedCentralGoogle Scholar
- Kummar, Shivaani, P. Mickey Williams, Chih-Jian Lih, Eric C. Polley, Alice P. Chen, Larry V. Rubinstein, Yingdong Zhao, Richard M. Simon, Barbara A. Conley, and James H. Doroshow. 2015. Application of molecular profiling in clinical trials for advanced metastatic cancers. JNCI-Journal of the National Cancer Institute 107 (4): djv003. https://doi.org/10.1093/jnci/djv003.CrossRefPubMedCentralGoogle Scholar
- Madan, Christopher R. 2017. Advances in studying brain morphology: The benefits of open-access data. Frontiers in Human Neuroscience 11. https://doi.org/10.3389/fnhum.2017.00405.
- Mao, Qing, Serban Ciotlos, Rebecca Yu, Madeleine P. Zhang, Robert Chin Ball, Paolo Carnevali, Nina Barua, et al. 2016. The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes. GigaScience 5 (1). https://doi.org/10.1186/s13742-016-0148-z.
- Marigorta, Urko M., Arcadi Navarro, P.M. Visscher, M.A. Brown, M.I. McCarthy, J. Yang, L.A. Hindorff, et al. 2013. High trans-ethnic replicability of GWAS results implies common causal variants. PLoS Genetics 9 (6): e1003566. https://doi.org/10.1371/journal.pgen.1003566.CrossRefPubMedPubMedCentralGoogle Scholar
- Marshall, Narcis A., Hilary A. Marusak, Kelsey J. Sala-Hamrick, Laura M. Crespo, Christine A. Rabinak, and Moriah E. Thomason. 2018. Socioeconomic disadvantage and altered corticostriatal circuitry in urban youth. Human Brain Mapping 39 (5): 1982–1994. https://doi.org/10.1002/hbm.23978.CrossRefPubMedGoogle Scholar
- McAllister, Bryant F. 2017. Exponential growth of the AncestryDNA database. 2017.Google Scholar
- Niso, Guiomar, Krzysztof J. Gorgolewski, Elizabeth Bock, Teon L. Brooks, Guillaume Flandin, Alexandre Gramfort, Richard N. Henson, et al. 2018. MEG-BIDS, the brain imaging data structure extended to magnetoencephalography. Scientific Data 5: 180110. https://doi.org/10.1038/sdata.2018.110.CrossRefPubMedPubMedCentralGoogle Scholar
- Parker, Nadine, Angelita Pui-Yee Wong, Gabriel Leonard, Michel Perron, Bruce Pike, Louis Richer, Suzanne Veillette, Zdenka Pausova, and Tomas Paus. 2017. Income inequality, gene expression, and brain maturation during adolescence. Scientific Reports 7 (1): 7397. https://doi.org/10.1038/s41598-017-07735-2.CrossRefPubMedPubMedCentralGoogle Scholar
- Pink, Daniel H. 2005. Why the world is flat. WIRED. https://www.wired.com/2005/05/friedman-2/.
- Poldrack, Russell A., Deanna M. Barch, Jason P. Mitchell, Tor D. Wager, Anthony D. Wagner, Joseph T. Devlin, Chad Cumba, Oluwasanmi Koyejo, and Michael P. Milham. 2013. Toward open sharing of task-based fMRI data: The OpenfMRI project. Frontiers in Neuroinformatics 7. https://doi.org/10.3389/fninf.2013.00012.
- Poline, Jean-Baptiste, Janis L. Breeze, Satrajit Ghosh, Krzysztof Gorgolewski, Yaroslav Halchenko, Michael Hanke, Christian Haselgrove, et al. 2012. Data sharing in neuroimaging research. Frontiers in Neuroinformatics 6 (9). https://doi.org/10.3389/fninf.2012.00009.
- “Problems with 23andMe ancestry composition”. 2015. http://koreanhistoricaldramas.com/23andme-ancestry-composition/.
- Roberts, A.L., S.E. Gilman, J. Breslau, N. Breslau, and K.C. Koenen. 2011. Race/ethnic differences in exposure to traumatic events, development of post-traumatic stress disorder, and treatment-seeking for post-traumatic stress disorder in the United States. Psychological Medicine 41 (1): 71–83. https://doi.org/10.1017/S0033291710000401.CrossRefPubMedGoogle Scholar
- Smith, Richard. 2012. Stratified, personalised, or precision medicine. Thebmjopinion.Google Scholar
- “UK10K”. 2018. https://www.uk10k.org/.
- Vorhaus, Don. 2010. The past, present and future of DTC genetic testing regulation. Genomics Law Report.Google Scholar
- Weissman, David G., Rand D. Conger, Richard W. Robins, Paul D. Hastings, and Amanda E. Guyer. 2018. Income change alters default mode network connectivity for adolescents in poverty. Developmental Cognitive Neuroscience 30: 93–99. https://doi.org/10.1016/j.dcn.2018.01.008.CrossRefPubMedPubMedCentralGoogle Scholar
- Wetterstrand, L.A. 2018. DNA sequencing costs, data from the NHGRI genome sequencing program. https://www.genome.gov/sequencingcostsdata/.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.