Introduction

One of the main challenges to understanding the onset and progression of human disease is to develop effective model systems that combine known genetic elements with disease-associated phenotypic readouts. The identification of genes linked to familial forms of diseases such as cystic fibrosis, sickle cell anemia or monogenetic forms of neurodegenerative disorders has fundamentally changed our understanding of many diseases and provided vital clues into the underlying pathogenesis (Botstein and Risch 2003; Altshuler et al. 2008; McClellan and King 2010). Detailed knowledge of disease-causing mutations and genes allows the establishment of reliable and disease-relevant cellular and animal models and facilitates the systematic analysis of molecular and cellular disease mechanisms and the development and validation of novel and effective therapeutic approaches.

In contrast to such predominantly rare and monogenic disorders, the majority of the most common medical conditions, such as obesity, heart disease, diabetes, autoimmune disease or sporadic neurodegenerative disease, have no well-defined genetic etiology and do not follow Mendelian inheritance patterns. Population genetics suggest that such sporadic or polygenic diseases result from a complex interaction between multiple genetic and non-genetic, lifestyle and environmental risk factors (Botstein and Risch 2003; Altshuler et al. 2008). The complexity and our limited knowledge of the underlying genetic component have largely prevented the generation of genetically defined disease models. The paucity of disease-relevant experimental systems represents one of the major reasons for our limited biological understanding of complex diseases and an almost complete lack of disease-modifying effective therapeutics.

In the following, we will summarize recent progress in genetics and developmental and molecular biology, which may provide a solution for generating disease-relevant in vitro models for complex disease. By combining human pluripotent stem cell (hPSC)-technology with genome editing and genome-scale epigenetic and genome-wide association studies (GWAS) data to identify disease-associated risk variants, we will provide a blueprint to create genetically defined experimental model systems that allow the functional analysis of disease-associated risk variants. As a proof of principle, we describe how we applied this approach to sporadic Parkinson’s disease and identified a common risk variant in a non-coding distal enhancer element that regulates the expression of SNCA, a key gene implicated in the pathogenesis of Parkinson’s disease (Soldner et al. 2016).

Induced Pluripotent Stem Cells to Model Complex Diseases

The ability to reprogram somatic cells into human induced pluripotent stem cells (hiPSCs) has opened the intriguing possibility of studying complex human disease in a cell culture dish (Takahashi and Yamanaka 2006; Takahashi et al. 2007; Yu et al. 2007). Following in vitro differentiation, patient-derived hiPSCs provide access to large amounts of human disease-relevant cells that carry all the genetic alterations involved in disease development (Saha and Jaenisch 2009; Soldner and Jaenisch 2012; Takahashi and Yamanaka 2013; Yu et al. 2013). Without precise knowledge of the underlying genetics, such patient-derived cells, therefore, allow the generation of relevant cellular model systems based on disease-associated genetic elements. This approach has already been used to model a range of primarily monogenetic diseases, including neurodegenerative diseases such as Alzheimer’s disease, Parkinson’s disease and amyotrophic lateral sclerosis (ALS; Cooper et al. 2012; Israel et al. 2012; Reinhardt et al. 2013; Alami et al. 2014; Wainger et al. 2014; Young et al. 2015). Despite the unprecedented potential and excitement of this approach, it became apparent that individual hiPSC lines, independent of disease status or genotype, displayed highly variable biological properties in vitro, such as the propensity to differentiate into functional cell types (Bock et al. 2011; Boulting et al. 2011; Soldner and Jaenisch 2012; Nishizawa et al. 2016). This observation significantly limits their value to identify robust disease-associated phenotypes by simply comparing patient-derived cells with unrelated controls. This system-immanent variability has proven to be particular challenging in the context of age-related diseases including neurodegenerative diseases such as Alzheimer’s and Parkinson’s disease, considering that disease-associated phenotypes typically progress slowly over many years in patients, which suggests that expected in vitro phenotypes would be rather mild and subtle. The reasons for the observed cell-to-cell differences include genetic background variations, genetic and epigenetic changes resulting from reprogramming and extended maintenance of hiPSCs and the lack of robust in vitro differentiation protocols (Soldner and Jaenisch 2012; Liang and Zhang 2013).

Some of the above-described limitations have been overcome by improved reprogramming and culture conditions (Warren et al. 2010; Hou et al. 2013), directed differentiation approaches including transcription factor-induced reprogramming (Zhang et al. 2013), insertion of cell type-specific fluorescent marker proteins to monitor differentiation (Di Giorgio et al. 2008; Hockemeyer et al. 2009, 2011; Chambers et al. 2012; Mica et al. 2013) or by consortium-size experiments to significantly increase the number of independent experimental samples (The HD iPSC Consortium 2012). However, variable genetic backgrounds between patient-derived and control cells remain an unresolved major limitation of the current hiPSC approach, due to the well-established influence of uncharacterized genetic modifiers on disease development and progression in patients and, accordingly, on disease-associated phenotypes in vitro.

Gene Editing to Generate Genetically Controlled Disease Models

The recent progress in gene editing technologies by using engineered nucleases such as meganucleases, zinc finger nucleases (ZFNs), transcription activator-like effector-based nucleases (TALEN) and the CRISPR/Cas9 system is thought to provide an elegant solution to control for differences in genetic background (Soldner et al. 2011; Soldner and Jaenisch 2012; Hockemeyer and Jaenisch 2016). In particular, the simplicity and ease of the CRISPR/Cas9 system to efficiently modify the genome in human cells, even at multiple loci simultaneously, allow us to engineer genetically controlled hPSC lines that differ only at known genetic disease-causing variants (Jinek et al. 2012, 2013; Cong et al. 2013; Mali et al. 2013).

As a proof of principle, we recently used ZFNs to either seamlessly correct Parkinson’s disease-associated mutations in the SNCA gene in patient-derived hiPSCs or to insert similar variants into wild-type human embryonic stem cells (hESCs; Soldner et al. 2011). Such isogenic pairs of hPSC lines provided an experimental system with a controlled genetic background in which the engineered disease-associated risk variants were the only experimental variables. Analyzing disease-associated phenotypes in this genetically controlled system allowed identification of nitrosative stress, accumulation of endoplasmic reticulum (ER)-associated degradation substrates, and ER stress as early Parkinson’s disease-associated pathological phenotypes (Chung et al. 2013). A further study revealed that nitrosative and oxidative stress result in S-nitrosilation of the transcription factor MEF2C and inhibition of the MEF2C-PGC1α transcriptional network contributing to mitochondrial dysfunction and apoptotic neuronal cell death (Ryan et al. 2013). By combining this monogenic disease model with disease-associated environmental stressors, the experiments further provide new mechanistic insight into gene-environmental (GxE) interaction in the pathogenesis of Parkinson’s disease (Ryan et al. 2013). Notably, both studies relying on a genetically controlled in vivo model identified novel therapeutic targets and small molecules that reversed the observed pathological phenotypes in neurons, which are currently perused as novel therapeutics for Parkinson’s disease (Chung et al. 2013; Ryan et al. 2013). The above-described approach clearly overcomes many of the limitations of the current hiPSC technology. Due to the simplicity of the CRISPR/Cas9 system to efficiently edit the genome in hiPSCs, the use of isogenic cell lines is becoming the gold standard for analyzing disease-associated phenotypes in vitro (Reinhardt et al. 2013; Kiskinis et al. 2014; Paquet et al. 2016). However, such an approach seems currently limited to monogenetic diseases in which the disease-causing genetic alterations are well established and the expected disease-associated phenotypes display robust and highly penetrant effects.

Functional Role of GWAS-Identified Risk Variants in Complex Disease

Translating the concept of engineering genetically controlled model systems to complex disease seems daunting and will require a detailed understanding of the underlying genetic component. GWAS and genome-scale next generation sequencing (NGS) approaches have significantly advanced our understanding of the genetic basis of complex disease. GWAS in particular have identified numerous common single-nucleotide polymorphisms (SNPs) associated with human traits and diseases, pinpointing the genomic loci and genes thought to play important roles in the pathophysiology of the respective diseases (Botstein and Risch 2003; Altshuler et al. 2008; McClellan and King 2010).

However, the interpretation of this permanently increasing amount of data is limited by the fact that disease-associated SNPs only statistically correlate with the underlying disease and the vast majority of risk variants have no established biological relevance to disease or clinical utility for prognosis or treatment (Altshuler et al. 2008; McClellan and King 2010). Any SNP in linkage disequilibrium (LD) with a GWAS-identified risk variant is equally likely to be causative for the risk to develop a specific disease. It has therefore been difficult to distinguish variants that are functional and disease-relevant from those that are in LD and thus only mark the underlying haplotype containing the functional variant. Advancing from genetic association to causal biologic processes has been challenging for two additional reasons. First, the majority of disease-associated genetic variants fall into the non-coding part of the genome, which impedes any functional analysis through simple transgenic overexpression or disruption in established cell lines or any analysis in non-human model systems due to the limited conservation of non-coding elements between species. Second, the prevailing hypothesis about the heritability of complex diseases suggests that multiple common or potentially rare SNPs cooperatively contribute to the risk of developing a specific disease; however, each individual risk variant will have only a small or at most medium-size additive or multiplicative effect on disease phenotypes (Gibson 2012). Indeed, disease-associated genetic variants are also prevalent in the healthy population, although with lower frequency, and the majority of carriers of risk SNPs do not develop a disease, implying that individual risk variants are not sufficient to cause disease-associated phenotypes. Consequently, only very few risk variants have been functionally linked to specific diseases, such as a common polymorphism at the 1q13 locus, which alters the expression of the SORT1 gene and is correlated with both plasma low-density lipoprotein cholesterol (LDL-C) and myocardial infarction (Musunuru et al. 2010).

Under the assumption that specific risk haplotypes contribute through dysregulation of the same molecular pathways to disease risk, a current approach suggests that we stratify patient-derived hiPSCs according to specific genetic risk variants rather than according to disease status. This approach may be sufficient in some cases to reduce the genetic heterogeneity based on known disease haplotypes and to reveal previously masked disease-associated phenotypes. Indeed, this approach was successfully used to dissect the function of a common Alzheimer’s disease-associated non-coding genetic variant in the 5′ region of the SORL1 (sortilin related receptor 1; Young et al. 2015). However, the main limitation of this approach remains the uncontrolled effect of additional genetic modifiers and the inability to identify the specific causative sequence variant that is required for further functional analysis.

Epigenomic Signatures to Prioritize GWAS-Identified Risk Variants

Cis-acting effects of genetic variants on gene expression have been proposed to be a major factor for phenotypic variation of complex traits and disease susceptibility (Schadt et al. 2003; Morley et al. 2004; Cheung et al. 2005, 2010; Lee and Young 2013; GTEx Consortium 2015). The widespread availability of cell- and tissue-specific transcriptome-wide expression data along with the corresponding genotyping data has greatly facilitated the identification of expression quantitative trait loci (eQTLs; GTEx Consortium 2015). Although able to detect statistical correlation between specific risk variants and gene expression, this approach entails limitations that are comparable to traditional GWAS in identifying the functional risk variants. Recent genome-scale epigenetic studies such as the ENCODE (ENCODE Project Consortium 2012) and Roadmap Epigenomics project (Roadmap Epigenomics Consortium 2015) have allowed us to reliably identify and catalogue regulatory elements in a cell type-, tissue- and in some cases disease-specific manner. These studies specifically have highlighted the enrichment of GWAS-identified risk variants in regulatory DNA elements specific to tissues and cell types (Ernst et al. 2011; Degner et al. 2012; Maurano et al. 2012; Hnisz et al. 2013; Trynka et al. 2013; Farh et al. 2014; Pasquali et al. 2014; Ripke et al. 2014) affected by the respective diseases. These results suggest that disease-associated risk variants may affect gene regulation by modifying the function of tissue-specific regulatory elements. In particular, distal enhancer elements that are bound by key transcription factors (TFs) and known to precisely control spatial and temporal gene expression during embryonic development and tissue homeostasis in a cell type-specific manner (Ward and Kellis 2012; Lee and Young 2013; Farh et al. 2014; Ripke et al. 2014; Wamstad et al. 2014) are found to be enriched for GWAS variants in many complex diseases.

A number of recent studies have correlated changes in TF binding in enhancer regions with sequence-specific, heritable changes in chromatin state and gene regulation (Kasowski et al. 2013; Kilpinen et al. 2013; McVicker et al. 2013), thus providing a molecular mechanism for how individual sequence variants contribute to the development of complex diseases. Recent progress in defining TF binding specificities using high throughput SELEX and chromatin immunoprecipitation sequencing (ChIP-seq) approaches has largely increased our understanding of sequence-specific TF binding in the genome and significantly improved our ability to analyze or predict TF binding on a genome-wide scale (Jolma et al. 2013, 2015). Based on the rapidly increasing availability of epigenetic data, mapping of GWAS-identified variants to TF binding sites within tissue-specific enhancer elements has been proposed as a valuable approach to prioritize and identify functional and disease-relevant risk variants (Ward and Kellis 2012; Rivera and Ren 2013; Claussnitzer et al. 2014; Wamstad et al. 2014). Indeed, such integration of GWAS with epigenetic signatures for heart-specific enhancers allowed for the identification of novel functional risk variants for cardiac phenotypes (Wang et al. 2016). Likewise, a similar approach identified an obesity-associated risk variant in the FTO locus, which alters early adipose differentiation by disrupting a TF binding site at a pre-adipocyte-specific enhancer (Claussnitzer et al. 2015).

The 3-dimensional (3D) organization of the genome is thought to contribute to the regulation of gene expression (Bickmore 2013; de Graaf and van Steensel 2013; de Laat and Duboule 2013). The recent development of chromosome conformation capture techniques (“3C” and genome-wide 3C-based methods; Dekker et al. 2002, 2013) or cohesin chromatin interaction analysis by paired-end tag sequencing (ChIA-PET; Dowen et al. 2014) allow us to determine long-range chromatin interactions such as cell type-specific promoter-enhancer interaction. These analyses suggest that active enhancer elements are bound by transcription factors and loop over long distances to contact target genes to regulate transcription. An emerging model suggests promoter-enhancer interactions typically only occur within megabase-sized topological-associated domains (TAD; Dixon et al. 2012; Nora et al. 2012), as defined by high DNA interaction frequency based on genome-wide chromosome capture data or within such TADs in insulated neighborhoods restricted by cohesin-associated CTCF-CTCF loops (Handoko et al. 2011; DeMare et al. 2013; Dowen et al. 2014; Rao et al. 2014; Ji et al. 2016). Notably, there is mounting evidence that changes in 3D structure, potentially through sequence-specific disruption of CTCF interaction, might contribute to disease development (Ji et al. 2016). Integrating datasets of cell type-specific changes in enhancer-promoter interactions and information about the 3D structure of the genome will further help us to assign disease-associated risk variants in enhancer sequences to target genes and provide supporting evidence to identify functional disease-associated risk variants and deregulated target genes.

Functional Analysis of Parkinson’s Disease-Associated Risk Variants

As a proof of principle, we describe below how we recently applied the above-elucidated approach to sporadic Parkinson’s disease as a prototypical complex disorder, to identify common risk variants in non-coding distal enhancer elements that functionally modulate the risk to develop the disease (Soldner et al. 2016). Parkinson’s disease is the second most common chronic progressive neurodegenerative disease, with a prevalence of more than 1% in the population over the age of 60. Although the discovery of genes linked to rare Mendelian forms of PD such as SNCA, LRRK2, PARKIN, PINK1 and DJ1 has provided insight into the molecular and cellular pathogenesis of the disease (Gasser et al. 2011; Singleton et al. 2013), the etiology leading to neuronal cell loss is largely unknown. Importantly, over 90% of Parkinson’s cases do not show Mendelian inheritance patterns; however, substantial clustering of cases within families suggests that sporadic, late age of onset Parkinson’s disease results from a complex interaction between genetic risk alleles and environmental factors. A recent GWAS meta-analysis has identified 26 genomic loci containing risk variants for sporadic Parkinson’s disease (Nalls et al. 2014); however, as for the majority of neurodegenerative disorders, little mechanistic insight is available on how specific sequence variations contribute to disease development and progression.

Identification of Parkinson’s Disease-Associated Risk Variants in Brain-Specific Enhancer Elements

A recent analysis of Histone H3 acetylated at lysine 27 (H3K27ac)-marked regions in the post-mortem adult brain suggests a significant enrichment of Parkinson’s disease-associated risk SNPs within distal enhancer elements (Vermunt et al. 2014). This finding supports the hypothesis that sequence-specific changes in enhancer function and deregulated transcription of linked genes mediate the risk to develop the disease. A number of specific epigenetic modifications, such as p300, mono-methylation of Histone H3 at lysine 4 (H3K4me1), H3K27ac and DNase I hypersensitive sites (DHSs) have been established as surrogate marks to reliably identify candidate enhancer sequences (Visel et al. 2009, 2013; Creyghton et al. 2010; Rada-Iglesias et al. 2011; Maurano et al. 2012). Thus, to identify specific candidate risk variants in distal enhancers, we intersected Parkinson’s disease-associated risk SNPs (Nalls et al. 2014) with publicly available epigenetic data (Roadmap Epigenomics Consortium 2015). This analysis allowed us to compile a list of risk variants ranked by the overlap of active enhancer elements. Interestingly, many of the top-ranked risk variants were located to the SNCA locus. Because changes in TF binding are thought to be the major mediator of SNP-specific changes in gene expression (Kasowski et al. 2013; Kilpinen et al. 2013; McVicker et al. 2013) we incorporated this idea to further prioritize the risk variants in enhancers by analyzing predicted TF binding for known TF binding specificities comparing both alternative genotypes for each Parkinson’s disease-associated SNP. This analysis highlighted the Parkinson’s disease-associated SNP rs356168 in an enhancer in intron-4 of SNCA as the risk variant with the highest number of genotype-dependent differential TF binding in the SNCA locus. The functional relevance of this enhancer was further supported by chromosome conformation capture data, which indicate a physical interaction (looping) between the enhancer and the promoter region of SNCA that is thought to be necessary for the cis-acting effects on gene expression (Vermunt et al. 2014).

It is well established that SNCA plays a central role in the pathogenesis of Parkinson’s disease. Point mutations in SNCA were the first genetic variants linked to familial forms of Parkinson’s disease, and the SNCA protein is the major component of Lewy bodies and Lewy neuritis, which are considered the pathological hallmark of familial and sporadic Parkinson’s disease (Gasser et al. 2011; Singleton et al. 2013). In addition, the SNCA locus represents one of the strongest Parkinson’s disease-associated GWAS hits (Nalls et al. 2014). Notably, multiplication of the entire SNCA locus was identified as causal for a rare autosomal-dominant form of Parkinson’s disease, indicating that a moderate increase of wild-type SNCA expression (1.5 times in the case of genomic duplications) is sufficient to cause an autosomal-dominant form of Parkinson’s disease (Singleton et al. 2003; Miller et al. 2004; Devine et al. 2011; Kim et al. 2012). This observation is highly suggestive of a molecular mechanism by which risk variants in the SNCA locus modify the risk to develop Parkinson’s disease by slightly modulating the expression of SNCA. This clear link between SNCA expression and the development of Parkinson’s disease in the context of genomic amplification therefore provides a good rationale for gene expression as a disease-relevant phenotypic readout to connect genetic variation to disease risk (Devine et al. 2011). Indeed, the first indication that the SNCA locus may contain risk alleles that modulate SNCA expression came from the identification of SNCA-Rep1, a complex polymorphic microsatellite repeat region approximately 10 kb upstream of the transcription start site. Multiple candidate gene association studies suggested that individuals who are homozygous for a shorter, “protective” repeat region (Rep1-257 or Rep1-259) have a significantly lower risk of developing Parkinson’s disease compared to individuals carrying the longer forms (Rep1-261 or Rep1-263; Kruger et al. 1999; Maraganore et al. 2006). Several functional studies, including the analysis of transgenic mice carrying different human SNCA-Rep1 alleles (Chiba-Falek et al. 2005; Cronin et al. 2009), suggested an “enhancer-like” function of the microsatellite repeat element based on the cis-regulatory correlation between the SNCA-Rep1 repeat length and SNCA expression.

Allele-Specific Gene Expression as a Robust Read-Out to Analyze Cis-Regulatory Effects

As explained in detail above, one of the major limitations of using hPSC-derived somatic cells to model disease in vitro is the considerable variability of the biological properties between individual cell lines. As for SNCA, a gene known to be variable between neuronal cell types such as astrocytes, oligodendrocytes and neurons and to be regulated during development and terminal differentiation, cellular heterogeneity and incomplete maturation significantly interfere with the detection of subtle differences in gene expression between distinct risk-genotypes or patient compared to control cells, respectively. Indeed, individual in vitro differentiation experiments from genetically identical sub-clones resulted in up to fourfold differences in SNCA expression (Soldner et al. 2016). To address this problem, we recently described an experimental approach that is based on determining the effect of individual regulatory elements on the transcription of the cis-regulated gene by analyzing allele-specific gene expression (Soldner et al. 2016). The deletion of just a single copy (heterozygous) of a candidate regulatory element or its exchange with an alternative disease-associated element affects only the gene expression of the cis-regulated gene on the same allele while maintaining the expression of the other, homologous allele, unaltered. Consequently, allele-specific gene expression would be biased towards lower or higher expression of the cis-regulated allele depending on the introduced genetic modification. Because expression is measured as the ratio between two individual alleles in every cell, this analysis is expected to be largely independent of cell homogeneity and can be applied to heterogeneous cell populations. In this respect, the non-targeted SNCA allele allows for a simple normalization and serves as internal control across isogenic samples.

Functional Analysis of Parkinson’s-Associated Risk Variants

To analyze allele-specific expression, we developed a robust, sensitive and highly quantitative reverse transcription polymerase chain reaction (qRT-PCR) assay based on the detection of a heterozygous SNP in the 3′UTR of SNCA. Using CRISPR/Cas9 genome editing, we generated an allelic series of isogeneic cell lines by either heterozygous deletion of the entire microsatellite repeat region (thought to have the most pronounced effect on SNCA expression) or insertion of SNCA-Rep1 elements with all of the repeat length alleles (Rep1-257, Rep1-259, Rep1-261 and Rep1-263) that are present in the normal population. Using allele-specific expression as readout, we showed that neither the deletion of the microsatellite repeat SNCA-Rep1 element nor its exchange for the shorter or longer repeat length risk alleles affected the cis-regulated expression of the linked SNCA allele, suggesting that this element has no clear role in SNCA regulation. This result conflicts with previous studies that supported an “enhancer-like” cis-regulatory effect of SNCA-Rep1 on the expression of SNCA. It is possible that difficulties in controlling the experimental variables of the transgenic mouse (Cronin et al. 2009) or neuroblastoma cell system (Chiba-Falek et al. 2005) used in the functional analyses, species-specific differences of non-coding regulatory elements or the variability in analyzing human post-mortem brain tissue (Fuchs et al. 2008; Dumitriu et al. 2012) affected the validity of these conclusions. However, because in vitro differentiated cells allow only for the analysis of early events, due to the limited time in culture, we cannot completely exclude an effect of the SNCA-Rep1 element at later time points or only in combination with additional environmental factors.

In contrast to SNCA-Rep1, the CRISPR/Cas9-mediated exchange of Parkinson’s disease-associated alleles spanning an enhancer element in the fourth intron that carries two risk SNPs (rs356168 and rs3756054) showed a significant effect on allele-specific expression of SNCA (Fig. 1; Soldner et al. 2016). When the protective A-allele at SNP rs356168 was exchanged for the risk-associated G-allele, the expression of the cis-regulated SNCA allele was increased by 6–18%. In contrast, the exchange of the adjacent risk SNP rs3756054 showed no effect on allele-specific SNCA expression, suggesting that this variant only reaches genome-wide significance in GWAS because this variant is in LD with the functional risk-modifying SNP (Fig. 1). Given that a 1.5-fold increase in SNCA expression is sufficient to cause a familial autosomal-dominant form of Parkinson’s disease, these data support the notion that a modest life-long increase of SNCA expression may represent the molecular cause of increased risk to develop Parkinson’s disease of individuals carrying the G-allele at this risk variant. Moreover, an expression quantitative trait loci (eQTL) analysis of SNCA expression in post-mortem adult brain samples suggested that a similar sequence-specific modest increase in SNCA expression occurs within the human population, further substantiating a functional role of the risk variant rs356168 in Parkinson’s disease (Soldner et al. 2016). This subtle effect on the expression of a disease-relevant gene is consistent with the hypothesis that small effect size of common genetic risk variants contributes to the heritability of sporadic diseases.

Fig. 1
figure 1

Proposed model describing the effect of multiple Parkinson’s disease (PD)-associated risk variants on SNCA expression (modified from Soldner et al. 2016). The schematic illustrates the genomic organization of the SNCA locus, including the PD-associated risk variants SNCA-Rep1 and the risk SNPs rs356168 and rs3756045, both located in a distal enhancer element in the fourth intron of SNCA. The analysis described in Soldner et al. (2016) suggests that the brain-specific transcription factors (TF) EMX2 and NKX6-1 show sequence-dependent binding at rs356168 with preference for the A-allele. The efficient TF binding in carriers of the protective A-allele results in a suppressed distal enhancer element and, consequently, in reduced expression of SNCA associated with reduced risk to develop PD. In contrast, the reduced TF binding in carriers of the PD risk-associated G-allele at this variant leads to a more active distal enhancer, resulting in increased expression of SNCA associated with an increased risk to develop PD. Notably, neither the repeat length of SNCA-Rep1 nor the PD-risk variant at rs3756054 significantly affects SNCA expression, suggesting that these elements are in linkage disequilibrium (LD) with other functional risk-modifying variants

To gain insight into the molecular basis of how risk variants affect target gene expression, we analyzed TF binding data and identified two brain-specific TFs, EMX2 and NKX6-1, that bind to the enhancer element at the risk variant. Further analysis for sequence-specific binding indicated that both TFs, EMX2 and NKX6-1 preferentially bind to the protective, lower SNCA expressing A-allele at rs356168 (Fig. 1). These results suggest a model in which the sequence-dependent binding of these TFs at a distal enhancer element represses enhancer activity and thus modulate SNCA expression. Indeed, ectopic overexpression of both TFs in neurons reduced SNCA expression (Soldner et al. 2016), consistent with previous data in mouse models demonstrating their role as repressors of enhancer function (Ligon 2003; Schisler et al. 2005; Schaffer et al. 2010; Mariani et al. 2012). Thus, our data provide a molecular link between GWAS-identified risk SNP-dependent changes in TF binding at a distal enhancer element, altered expression of SNCA and the risk to develop sporadic Parkinson’s disease (Fig. 1). EMX2 and NKX6-1 may physically interact and function in a complex to suppress enhancer activity. However, expression analysis indicated that the two TFs are only expressed in a subset of neurons and are primarily not co-expressed in the same cell, suggesting that they may function at the same enhancer element in different cell types. TF-specific usage of identical regulatory elements in distinct cell populations might be a possible explanation for the selective vulnerability of distinct neuronal populations, as observed in Parkinson’s disease.

Mechanistic Study of Sporadic Diseases: Conclusions

As outlined in this review, a major challenge of modeling sporadic diseases in the culture dish is the system-immanent variability in differentiating hESCs or hiPSCs to functional cells. The variability is caused by genetic background differences between patient-derived hiPSCs and cells derived from control individuals as well as the inconsistency of most protocols to generate homogeneous cultures of differentiated cells. These issues complicate, if not exclude, the use of gene expression level as a valid functional readout to define the molecular mechanisms of candidate disease risk variant, which are expected to only subtly alter the transcription of the downstream gene. As our analysis of the SNCA-associated risk variants demonstrates, two experimental strategies allow us to overcome these limitations: (1) the use of CRISPR/Cas9-mediated gene editing for generating disease-relevant and control lines that differ exclusively at the risk variant and (2) the development of an allele-specific assay that allows the robust detection of small differences in disease risk-associated gene expression, an assay that is independent of cell heterogeneity and extent of differentiation.