Identification of a RAI1-associated disease network through integration of exome sequencing, transcriptomics, and 3D genomics
Smith-Magenis syndrome (SMS) is a developmental disability/multiple congenital anomaly disorder resulting from haploinsufficiency of RAI1. It is characterized by distinctive facial features, brachydactyly, sleep disturbances, and stereotypic behaviors.
We investigated a cohort of 15 individuals with a clinical suspicion of SMS who showed neither deletion in the SMS critical region nor damaging variants in RAI1 using whole exome sequencing. A combination of network analysis (co-expression and biomedical text mining), transcriptomics, and circularized chromatin conformation capture (4C-seq) was applied to verify whether modified genes are part of the same disease network as known SMS-causing genes.
Potentially deleterious variants were identified in nine of these individuals using whole-exome sequencing. Eight of these changes affect KMT2D, ZEB2, MAP2K2, GLDC, CASK, MECP2, KDM5C, and POGZ, known to be associated with Kabuki syndrome 1, Mowat-Wilson syndrome, cardiofaciocutaneous syndrome, glycine encephalopathy, mental retardation and microcephaly with pontine and cerebellar hypoplasia, X-linked mental retardation 13, X-linked mental retardation Claes-Jensen type, and White-Sutton syndrome, respectively. The ninth individual carries a de novo variant in JAKMIP1, a regulator of neuronal translation that was recently found deleted in a patient with autism spectrum disorder. Analyses of co-expression and biomedical text mining suggest that these pathologies and SMS are part of the same disease network. Further support for this hypothesis was obtained from transcriptome profiling that showed that the expression levels of both Zeb2 and Map2k2 are perturbed in Rai1 –/– mice. As an orthogonal approach to potentially contributory disease gene variants, we used chromatin conformation capture to reveal chromatin contacts between RAI1 and the loci flanking ZEB2 and GLDC, as well as between RAI1 and human orthologs of the genes that show perturbed expression in our Rai1 –/– mouse model.
These holistic studies of RAI1 and its interactions allow insights into SMS and other disorders associated with intellectual disability and behavioral abnormalities. Our findings support a pan-genomic approach to the molecular diagnosis of a distinctive disorder.
KeywordsDiagnostic Intellectual disability Chromatin conformation Text mining Disease network
Smith-Magenis syndrome (SMS; MIM #182290) is a rare genomic disorder with a prevalence of 1 in 15,000. It is associated with specific craniofacial dysmorphology, developmental delay (DD), moderate to profound intellectual disability (ID), and self-injurious and stereotypic behaviors [1, 2]. SMS individuals show sleep disturbance with frequent daytime napping and night-time awakenings. They display restricted interest, obsessive thinking, and social responsiveness scale scores consistent with autism spectrum disorder (ASD) . They repetitively mouth objects, rock, spin, or twirl their body, and grind their teeth . This distinctive profile is complemented by specific lick and flip and self-hug behaviors, as well as attachment to people [5, 6, 7]. Challenging behaviors such as self-injuries, physical aggression, and destructive behavior are significantly more prevalent in SMS than in ID with mixed etiologies . Self-injuries are present in 70–97 % of individuals and include polyembolokoilamania (insertion of foreign objects into bodily orifices) and onychotillomania (pulling out finger and toe nails). Unusual behaviors can comprise poking others’ eyes, forceful hugging, and punching fists through walls and windows.
Whereas SMS is classically associated with a deletion within cytogenetic G-band 17p11.2 that includes the RAI1 gene (about 90 % of individuals) or a nucleotide variant in that gene (about 5 %) [1, 9, 10, 11, 12], some reports suggested genetic heterogeneity as SMS-like individuals were found to recurrently harbor deletions of the 2q37.3 or 2q23.1 cytobands encompassing HDAC4 and MBD5, respectively [13, 14, 15]. Similarly, PITX3 was proposed to be responsible for the SMS-like neurobehavioral abnormalities observed in an individual .
Here we use recent advances in genome sequencing technologies to further assess the genetic heterogeneity of SMS and the possible clinical overlap of this syndrome with other intellectual disability and cognitive dysfunction disorders, as some of the seemingly characteristic phenotypic features are non-discriminating among ID syndromes. We also evaluate the pertinence of network interactions and provide experimental data in support of potential molecular diagnoses.
Each of the 149 patients was clinically assessed by their respective physicians. Patients were diagnosed as potentially affected by SMS through clinical assessment. Briefly, all individuals presented intellectual disability and/or developmental delay, and the majority (>75 %) also had sleep disturbances, stereotypies, or other endophenotypes common to SMS (e.g. distinctive facial features, tantrums, self-injurious behaviors, onychotillomania). The clinical presentation of SMS is heterogeneous; therefore, the indication of SMS by a clinician can be either premature in the case of a young infant or possibly a misdiagnosis in an individual with behavioral issues and ID.
Detailed SMS patients’ phenotypes
The detailed phenotype descriptions of 13 of the 15 patients without RAI1 genetic alteration are described in Additional file 1: Supplementary text and Additional file 2: Table S1. The remaining two individuals had no clinical data available.
Array comparative genomic hybridization
Targeted chromosome 17p array comparative genomic hybridization (aCGH) analyses were carried out on each proband as previously reported . Additional genome-wide aCGH was conducted on each person using Baylor Miraca Genetics Laboratory design version 10.1, an Agilent 180 K oligo array. All array data were analyzed as previously described .
To uncover genetic variants associated with the abnormalities shown by the 15 patients without RAI1 genetic alteration, we performed whole-exome sequencing of DNA extracted from blood of the proband and both their parents whenever possible (eight trios) at the Baylor College of Medicine (BCM) Human Genome Sequencing Center (HGSC) via the Baylor-Hopkins Center for Mendelian Genetics. Exomes were captured and sequenced on an Illumina HiSeq platform using previously described methods . Sequence analysis was performed using the HGSC Mercury analysis pipeline (https://www.hgsc.bcm.edu/software/mercury) . Variants were filtered based on inheritance patterns including autosomal recessive, X-linked, and de novo/autosomal dominant. Variants with MAF < 0.05 in control cohorts (Atherosclerosis Risk in Communities (ARIC, https://www2.cscc.unc.edu/aric/), 1000 Genomes project (http://www.1000genomes.org/), the NHLBI Exome Sequencing Project (http://evs.gs.washington.edu/EVS/), and our internal BCM control database of > 5000 exomes generated as a member of the Centers for Mendelian Genomics)  and predicted to be deleterious by SIFT10 and/or PolyPhen were prioritized . Sanger sequencing confirmed putatively causative variants and their familial segregation.
The sequencing variants identified in this manuscript were deposited in ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/).
The primary sequence of each candidate protein was loaded in Swiss-PdbViewer aligned onto suitable modeling templates retrieved from SWISS-MODEL and superposed in three-dimensional (3D) space using Swiss-PdbViewer [22, 23]. Each variant was modeled in the context of the overall 3D structure to evaluate its potential impact with respect to protein folding, as well as to position of known disease-associated variants. We also assessed if missense variants perturbing the protein function clustered in 3D around key regions of the protein .
The ZEB2 Zinc finger residues 995–1078 were modeled using the pdb entry 1mey as template . MAP2K2 was modeled using both MAP2K2 (pdb entry 1s9i 3.2A resolution  and MAP2K1 (pdb entry 3eqi, 1.9A resolution) structures . The GLDC residues were aligned on the Synechocystis sp. glycine decarboxylase model PCC 6803 (pdb entry 4LHD) . To model the CASK variants, two partial CASK crystal structures (pdb entries 1kwa, chain A  and 1kgd, chain A (http://www.ncbi.nlm.nih.gov/pubmed/11729206?dopt=Abstract)) covering residues 487–572 and 739–914, respectively, were superposed on the crystal structure of PALS1/Crb (pdb entry 4wsi ) that present 35 % identity with CASK.
Because literature resources do not use entity name in a consistent way, we first checked each gene identifier by using UniProtKB (http://www.uniprot.org) or HUGO Gene Nomenclature Committee (HGNC) database (http://www.genenames.org) in order to retrieve the recommended/approved name, short name(s), alternative and synonymous name(s) if any for each targeted gene, as well as the name(s) of the encoded protein. These were used as singleton and/or pairwise strings to extract information from various literature resources: PubMed (http://www.ncbi.nlm.nih.gov/pubmed), Google Scholar, iHOP (http://www.ihop-net.org/UniPub/iHOP/), and EVEX (http://evexdb.org/), to cite here the original source of reference for this project. The obtained results were curated and the reported relationships were visualized using Cytoscape (3.2.1; http://www.cytoscape.org/). The connectivity was assessed using the Knet-function, which is based on the adaptation of spatial statistics concepts to network analysis proposed in . The statistical significance of the obtained Knet-function value was calculated with respect to a population of permuted networks (n = 106) derived from the original prior knowledge network. It is worth noting here that the connectivity is not only based on direct but also on indirect connections through shortest paths.
Identification of RAI1 interacting proteins
We identified ZBTB17/MIZ1 and BRD2 as likely interactors for RAI1 with a yeast two-hybrid assay. The yeast two-hybrid assays were performed in collaboration with the company Proteinlinks. Briefly, two fragments of the carboxyl-terminus of mouse Rai1 (a.a 1246–1841 and a.a. 1246–1890) were cloned into pCWX200 as baits. Around 10 million independent complementary DNA (cDNA) library clones (10× library coverage) were screened for protein–protein interactions with both baits. We cultured the Y304 yeast strain on galactose selective medium without leucine, histidine, trytophan, and uracil. Positive clones were replicated onto the four selective plates and examined with URA3 (or LEU2) and LacZ reporters. From this analysis, we identified ZBTB17/MIZ1, BRD2, and SOGA3 as reasonable candidates (at least two clones, supported by both baits) for RAI1 interaction candidates. These interactions were further assessed using co-immunoprecipitation (co-IP) analysis in HEK293 cells. Full-length Rai1 was cloned in pCMV-3xFLAG vector while the three candidates were cloned into pCMV-HA vectors to confirm the yeast two-hybrid results. Lysate from the co-transfected HEK293 cells (RAI1 and one of the candidates) was purified with EZview FLAG-M2 beads (Sigma) and analyzed with rat anti-HA (Abcam) on western blot. The interaction between RAI1 and ZBTB17/MIZ1 was confirmed by co-IP, however BRD2 did not express well enough on western blot, and SOGA3 was too sticky to conduct co-IP with, as it bound to the beads in the absence of FLAG-RAI1 (Additional file 3: Figure S1).
Embryo collection and RNA extraction
Mice were housed in standard specific pathogen-free conditions. All animal studies were conducted under protocols approved by the Baylor Institutional Animal Care and Use Committee and followed NIH guidelines. Timed matings between Rai1 heterozygous females and males in F2 generation in the C57BL/6 Tyr c-Brd and 129SvEv mixed genetic background were implemented to generate Rai1 –/– embryos. To harvest embryos, pregnant females were sacrificed by cervical dislocation and the embryos were dissected from the uterus in ice-cold phosphate buffered saline (PBS) solution. Similar sized embryos at 10.5 days post conception (dpc) were collected in 1.5 mL Eppendorf tubes, frozen immediately in liquid nitrogen, and stored in –80 °C. Portions of the yolk sac were saved for genotyping as described previously . For RNA extraction, the whole embryos were homogenized in Trizol and RNA was extracted according to the manufacturer’s instructions (Invitrogen) followed by purification on columns using an RNeasy mini kit (Qiagen Sciences, Germantown, MD, USA). The RNA integrity, concentration, and overall quality were tested with an Agilent Bioanalyzer 2100 and a NanoDrop ND-1000 spectrophotometer.
Microarray processing and analysis
A total of 5–10 μg of total RNA from each individual embryo of three Rai1 –/– at 10.5 dpc and three wild-type controls were used to produce complementary RNA (cRNA) target microarray transcriptome analyses. Embryos at 10.5 dpc were chosen because Rai1 functions during this stage as indicated by its strong expression and embryonic lethality of Rai1 –/– embryos from 7.5 to 18.5 dpc . In addition, the size of the Rai1 –/– embryos at 10.5 dpc is comparable to that of their wild-type littermates whereas the few surviving Rai1 –/– mice at birth are significantly smaller than the wild-type . The integrity and quality of the extracted RNAs were assessed on a 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA). The target was generated using a reverse transcription reaction to produce cDNA (SuperScript Choice System, Gibco), which was subsequently subjected to in vitro transcription with biotinylated cytidine-5′-triphosphate and uridine-5′-triphosphate using ENZo BioArray High Yield RNA Transcript Labeling kit to produce biotinylated cRNA. The target was then fragmented and hybridized to Affymetrix Mouse Genome 430 2.0 Array GeneChips (Affymetrix, Santa Clara, CA, USA) in duplicates using an Affymetrix GeneChip Fluidics Station 400. The arrays were stained with phycoerythrin-coupled avidin and scanned using a GeneArray Scanner 3000. The resultant output was analyzed using Affymetrix Microarray Suite software and examined for excessive background or evidence for RNA degradation. The chips were assessed by scaling factor, average background, percent of probe sets that are present, number of probes present, and the 3′-end to 5′-end probe intensity ratio for housekeeping probe sets (β-actin and GAPDH), as well as the number of probes present for the “spiked in” probe sets (BioB, BioC, BioD, and Crex). All the chips were of good quality, which is further supported by the observations that they have similar RNA degradation patterns and the chips were well replicated within the same genotype group as shown by scatter plot analyses. The criteria for genes differentially expressed are that the log ratio of the normalized expression values in the Rai1 deficient embryos versus the controls is > 0.5 and the P value < 0.05, which empirically gives a very low false detection rate (FDR). The probe sets with very low expression values were filtered out. We analyzed the chromosomal position of all the regulated genes using the chromosomal coordinates within recent genome assemblies of the mouse. The array data were analyzed using the GC-RMA program to estimate the expression measures from the probe level data . The program corrects the background, normalizes the raw perfect match data using the quantile normalization method, and summarizes the probe values to probe set values (expression values, one per probe set per chip), in log2 scale. The fold change for each probe is the log ratio of average expression value in the mutant samples divided by that in the wild type controls. The fold change is considered to be significant if P ≤ 0.05.
Reverse transcription polymerase chain reaction (RT-PCR) validation
For RT-PCR validation of relevant expression targets, 1 μg of total RNA (intact by gel and measured by NanoDrop) was used for RT reactions using the Quanta qScript cDNA synthesis kit. Three separate RT reactions were performed using RNA from both a Rai1 –/– embryo and a wild-type control littermate. The RT reactions and a non-RT reaction using wild-type RNA as well as a water-only control were then run on a gel and all reactions containing both RNA and RT had similar patterns and intensity. A total of 1 μL of each RT reaction was used for subsequent PCR reactions. Primers for PCR were designed to transcript regions of Zeb2, Map2k2, and Rai1 using the UCSC browser version of GRCm38/mm10. The primers (from 5′ to 3′) are as follows:
Zeb2-7 F: CTTCAAGTACAAGCACCACCTGAA
Map2k2-2 F: TGAGAGGATCTCAGAGCTGGGT
Rai1-4 F: ATGTATCCACACCTACCACTACCCAT
4C-seq and 3C-PCR validation assays
Circularized chromosome conformation capture (4C) libraries were prepared from lymphoblastoid cell lines (LCLs) of two age-matched female control individuals. Briefly, LCLs were grown at 37 °C. 5 × 107 exponentially growing cells were harvested and crosslinked with 1 % formaldehyde, lysed, and cut with DpnII, a 4-cutter restriction enzyme that allows higher resolution [34, 35]. After ligation and reversal of the crosslinks, the DNA was purified to obtain the 3C library. This 3C library was further digested with NlaIII and circularized to obtain a 4C library. The inverse PCR primers to amplify 4C-seq (4C combined with multiplexed high-throughput sequencing) templates were designed to contain Illumina adaptor tails, sample barcodes, and viewpoint-specific sequences. The selected viewpoint maps within the 5′ portion of the first intron of the RAI1 gene (700 bp from the donor site of exon 1), a region enriched in DNaseI hypersensitive and transcription factor binding sites . It corresponds to the closest suitable DpnII fragment relative to the transcriptional start sites of the targeted gene. The sequence of the 4C-seq primers is reported in Additional file 2: Table S2. We amplified at least 1.6 μg of 4C template (using about 100 ng of 4C template per inverse PCR reaction, for a total number of 16 PCRs). We multiplexed the two 4C-seq templates in equimolar ratios and analyzed them on a 100-bp single-end Illumina HiSeq flow cell. The numbers of raw, excluded, and mapped reads for each LCL sample are detailed in Additional file 2: Table S3.
To validate selected physical interactions and loop formations between non-neighboring chromatin fragments, 5 × 107 exponentially growing cells were used in conjunction with our 3C protocol as described . We tested primers positioned on the chromosome 17p11.2 sense strand 5′ to 3 for the cis-interactions and primers designed at 9p24.1 compared to control 16p11.2 region for the trans-interactions (Additional file 3: Figure S2 with primers tables). The presence of physical interactions was determined by PCR amplimer production. Control PCRs included no input (“water”) as well as DNA from chromatin digested with DpnII but without the subsequent religation step (“- Ligase”) (Additional file 3: Figure S2).
4C-seq data analysis
4C-seq data were analyzed as previously described [34, 35, 37] through the 4C-seq pipeline available at http://htsstation.epfl.ch/)  and visualized with gFeatBrowser (http://www.gfeatbrowser.com). Briefly, the multiplexed samples were separated, undigested, and self-ligated reads removed. Remaining reads were aligned and translated to a virtual library of DpnII fragments. Read counts were then normalized to the total number of reads and replicates combined by averaging the resulting signal densities (Additional file 3: Figures S3 and S4). The local correlation between the profiles of the two samples per viewpoint was calculated (Spearman correlation: 0.83). The combined profiles were then smoothed with a window size of 29 fragments. The region directly surrounding the viewpoint is usually highly enriched and can show considerable experimental variation, thereby influencing overall fragment count. To minimize these effects, the viewpoint itself and the directly neighboring “undigested” fragment were excluded during the procedure. In addition to this filtering, we modeled the data to apply a profile correction similar to the one described in  using a fit with a slope -1 in a log-log scale . Significantly interacting regions were detected by applying a domainogram analysis as described . We selected BRICKS (Blocks of Regulators In Chromosomal Kontext) with a p value threshold < 0.01 for both “cis” and “trans” interactions, and annotated the BRICKs overlapping genes as well as the closest upstream and the closest downstream genes, in a window of +/– 500 kb. The 4C libraries used to perform the circular PCR with RAI1 viewpoint’s primers had been previously tested in , with seven additional viewpoints’ primer pairs. The BRICKs genes GTDC1 and KDM4C (and the flanking genes ZEB2 and GLDC) were not called as significantly interacting regions for any of these viewpoints (see ; Additional file 2: Tables S6–S12). The raw sequencing files are available at GEO under accession number GSE83420.
Gene annotation was obtained through BioScript (http://gdv.epfl.ch/bs). Protein interaction networks for BRICKs genes were determined using STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) v9.1 (http://string-db.org/) . We exploited GO with Enrichr (http://amp.pharm.mssm.edu/Enrichr/) to assess if the chromatin-contacted genes were enriched in specific pathways and genes associated with Mendelian diseases and GIANT (http://giant.princeton.edu/) and Genemania (http://www.genemania.org/) to test tissue-specific functional interactions and produce association networks, respectively [43, 44, 45, 46]. The significance of the connectivity of the GIANT co-expression networks was assessed as described for the literature-mining network (see above). We used Enrichr Chromosome Location tool and BRICKS count in different window sizes (5 Mb, 1 Mb, and 500 kb) to determine whether any cytogenetic band other than 17p11.2 was enriched for BRICKS. Other than 17p11.2, we identified significant enrichments at cytobands 17p12, 17p13, and 2q22, where the gene ZEB2 is located.
Hi-C matrices from Rao et al.  were prepared by first applying a KR normalization to the 5 kb and 100 kb resolution observed matrices and then by dividing each normalized score by the expected one extracted from the KR expected file (as described previously in section II.c of the Extended Experimental Procedures of reference ). KR expected values less than 1 were set to 1 to avoid long-distance interaction biases. HiC matrices from Dixon et al.  were generated from the normalized datasets at a 40 kb resolution and transformed to a 400 kb resolution by summing the contacts observed in 10 × 10 sub-matrices. Expected vectors represent the mean number of contacts observed at a given distance and were used to calculate the observed/expected matrices.
Clinical and molecular findings
Through physicians from a large network of medical genetics centers, we enrolled a cohort of 149 individuals presenting with a constellation of SMS features. High-density 17p11.2 aCGH and Sanger sequencing of RAI1 showed that 134 out of 149 individuals presented a genetic or genomic alteration of the RAI1 gene [9, 11, 17, 49, 50, 51, 52], 96/134 (72 %) individuals carried the classic recurrent 3.7 Mb SMS deletion, ten (7.5 %) contained an uncommon recurrent 1 (UR1) or UR2 rearrangement, 24 (18 %) a non-recurrent RAI1 deletion, and four (3 %) had a de novo variant in RAI1 [9, 11, 49, 52, 53] (Additional file 2 Table S1). Whereas these proportions are similar to published results [12, 54], it is likely that some clinicians did not refer individuals with SMS features who were negative for SMS molecular diagnosis (via aCGH or fluorescence in situ hybridization, FISH) or who were positive for another potentially causative CNV, for example 1p36 deletion syndrome [15, 55, 56] that shares multiple similarities with SMS. Indeed, many individuals were molecularly diagnosed prior to sample submission. Consistent with this hypothesis, a separate study identified mutations affecting RAI1 in only 30 % of participants with a suspected diagnosis of SMS .
The remaining 15 individuals (10 %) showed no discernable perturbation of the RAI1 gene. The 13 with available clinical data presented the following classical SMS features: ID (12/12), DD (13/13), sleep disturbances (8/10), and/or self-injurious behavior (10/11), in particular onychotillomania (6/7) (Additional file 1: Supplementary Text, Additional file 1: Table S1). To identify the underlying cause of the phenotypes of these 15 individuals, the probands and their parents when available (eight cases) were subjected to high-resolution genome-wide aCGH and whole-exome sequencing. We identified potentially causative variants in ten individuals (Additional file 2: Table S1). These were grouped into five categories: (1) a 47, XYY karyotype (subject BAB2492); (2) de novo variants in ZEB2 (BAB2386), CASK (BAB2540), KMT2D (BAB2319), and JAKMIP1 (BAB2451); (3) compound heterozygote variants in GLDC (BAB4947); (4) a MECP2 variant in a woman with random X-inactivation (BAB2552) inherited from the individual’s mother, who presented with a skewed X-inactivation pattern (away from this allele) in her blood (Additional file 3: Figure S5); and (5) variants in POGZ (BAB2330, variant not maternally inherited), MAP2K2 (BAB2474), and the X-linked KDM5C (BAB2293), the origins of which could not be assessed. We confirmed the segregation of sequence variants in available family members by Sanger sequencing.
Summary of clinical phenotypes of SMS patients without RAI1 alteration, variants assessment, and evidence of association with RAI1presented in this paper
Variant analysis and modeling
The variants identified in KMT2D (p.E3418X) and MECP2 (p.P389fsX) are predicted to be loss-of-function alleles, which are likely pathogenic alleles as KMT2D and MECP2 are “extremely intolerant” and “intolerant” to loss-of-function variation according to the Exome Aggregation Consortium database version 0.3 (http://exac.broadinstitute.org) [10.2015] (pLI = 1.0 and 0.7, respectively) and as analogous loss-of-function variants in KMT2D and MECP2 were identified in KABUK1  and MRXS13  individuals, respectively (Additional file 2: Table S5 and S6). Additionally, the de novo variant in the candidate gene JAKMIP1 (p.D586H) occurs in a highly conserved residue and is predicted to be deleterious to the protein structure. JAKMIP1 is “extremely intolerant” to loss-of-function variation according to ExAC (pLI = 0.99). When possible, we used X-ray structures and/or cryo-EM modeling to obtain a 3D representation of the remaining encoded proteins and compared the variants we identified with those previously reported in MOWS, MICPCH, GCE, CFC4, and MRXSC individuals (Additional file 2: Table S7–S11). By and large, these models suggest that the variants identified in the current study are detrimental to the encoded proteins: (1) the ZEB2 p.H1049P variant substitutes a residue that participates in the coordination of the Zn++ atom of one of the Zinc fingers, similar to the variant p.H1045R identified in a MOWS individual (Additional file 3: Figure S6A; Additional file 2: Table S11); (2) the MAP2K2 p.D69del variant removes one of the two aspartic acid residues involved in the binding of a Ca++ ion in the conserved GELKDD loop (Additional file 3: Figure S6B); (3) the GLDC p.L726Q and p.P647L variants likely affect the packing of the encoded protein in the neighborhood of the catalytic lysine K754 residue similar to the 61 missense variants identified in GCE individuals (Additional file 3: Figure S6C, Additional file 2: Table S8); and (4) the CASK p.R489W variant places a bulky tryptophan sidechain that cannot be accommodated in the structure without changing the molecular surface (Additional file 3: Figure S6D). The possible impact of the KDM5C p.K1023R variant on this conserved position (Additional file 3: Figure S7) could not be evaluated as no template is available for this region.
The identified rare variants affect Rai1-associated genes
To gain further insight about the genes regulated by Rai1 during mouse embryonic development, we performed microarray analysis on total RNA prepared from three 10.5 dpc Rai1 –/– embryos and from three of their wild-type littermates. The two Rai1 transcripts present on the array are significantly downregulated in Rai1 –/– embryos compared to wild-type littermates (e.g. the AK013909 transcript with a fold change of 6.2 shows the largest downregulation among the 45,037 assessed probe sets). In fact, the expression values for both transcripts are within background levels in the Rai1 –/– embryos, indicating that both transcripts are not expressed in the Rai1 –/– mutants and further corroborating the contention that the engineered Rai1 mutant allele is a complete null allele . In total, 142 and 157 probe sets showed an over twofold increase or decrease, respectively (Additional file 2: Table S12; see “Methods”) in the mutant mice when compared to wild-type littermates. Consistent with the hypothesis that genes potentially causative of the SMS-like phenotypes are functionally associated to or transcriptionally regulated by RAI1, the expression levels of both Zeb2 (ENSMUSG00000026872) and Map2k2 (ENSMUSG00000035027) were perturbed in Rai1 –/– mice (Additional file 2 Table S12). These expression arrays results were subsequently confirmed by RT-PCR (Additional file 3: Figure S9). We then assessed the chromosomal position of the dysregulated genes. The enrichment score using a Pearson Chi-square goodness of fit statistic indicated that they showed a biased chromosome distribution with 22 % of the genes downregulated and 26 % of the genes upregulated in the Rai1 mutants mapping to mouse chromosome 11 (MMU11) where the Rai1 gene resides. Less than 5 % of the differentially expressed genes are located on any chromosome other than MMU11. This enrichment on MMU11 for downregulated and upregulated genes in Rai1 –/– embryos is reminiscent of our previous finding that the engineered MMU11 deletion and reciprocal duplication that mimic SMS and Potocki-Lupski syndrome rearrangements were associated with a MMU11-wide transcriptome perturbation in the five assessed adult male tissues .
The genes mapping within the RAI1-chromatin contacted genomic loci (BRICKs genes) are enriched for genes that encode proteins that interact together (82 observed interactions versus 35 expected; P = 6.41e–12). BRICKs genes are also enriched for the GO term “detection of light stimulus involved in sensory perception” in Enrichr (P = 5.45e–3) (see “Methods,” Additional file 2: Table S14). Similarly, Enrichr showed that chromosome contacts were enriched in interchromosomal and intrachromosomal cytobands (17p11, adjusted P < 1e–09; 17p12, adjusted P = 9.7e–09; 17p13, adjusted P = 1.8e–03; and 2q22 adjusted P = 4.95e–02). ZEB2, one of the eight genes found mutated in the SMS individuals, maps to the latter 2q22.3 region and is flanked by BRICKs. To further assess possible functional relationships between RAI1 and chromatin-contacted genes, we retrieved the list of 322 genes flanking the BRICKs (BRICKs flanking genes, i.e. the closest genes to be found upstream and downstream of a BRICK within a 500 kb window). The 4C assays in particular identified interchromosomal contacts with restriction fragments mapping 200 kb away from the ZEB2 and GLDC gene loci. We then compared the lists of BRICKs genes and BRICKs flanking genes with the list of genes whose expression levels were perturbed in Rai1 –/– mouse embryos. Although our analysis is restricted by a small sample size, we found a consistent trend of over-representation (Fisher’s enrichment test, P = 0.22, OR = 1.5, and P = 0.2, OR = 1.4) with 10 and 18 chromatin-contacted BRICKs genes and BRICKs flanking genes, respectively, differentially expressed in the mouse knockdown model. Interestingly, 6/10 of these BRICKs genes mapping at cytobands 17p13, 17p11 (2 genes), 17q21 and 17q23 (2 genes) have mouse orthologs that map on mouse chromosome MMU11, thus possibly explaining the enrichment of MMU11-mapping genes within genes differentially expressed in Rai1 –/– mouse embryos (Additional file 3: Figure S12).
Within a cohort of 149 individuals presenting clinical features of SMS we identified 90 % (134/149) of individuals with either a heterozygous deletion of RAI1 or a predicted deleterious variant of the RAI1 gene. We used recent advances in genome sequencing technologies to possibly identify genetic alteration(s) associated with SMS in the remaining individuals. These strategies were successfully applied to discover loci associated with ID . They revealed a large genic overlap between ID and ASD, schizophrenia, and epileptic encephalopathy , suggesting that some developmental disorders have highly variable clinical presentations. They similarly uncovered limitations to the phenotype-driven strategy and conventional clinical paradigm of identifying individuals with very similar presentations as they revealed an unsuspected phenotypic variance of known disorders [73, 74].
It is important to the medical community to identify phenotypic overlap between diseases, which suggests common causes and alterations of the same pathways, as this knowledge could be exploited therapeutically. In this report, we identify previously unappreciated relationships between SMS and its major driver RAI1 and other diseases that include MOWS, MICPCH, KABUK1, GCE, MRXS13, CFC4, and MRXSC. Literature mining, co-expression data, transcriptome profiling of Rai1 –/– animal models, and chromosomal contacts support the existence of a comprehensive “biological module”  or “disease network”  underlying these diseases.
Although none of the 15 individuals described in this study have traditional molecular diagnoses involving RAI1 haploinsufficiency and thus should formally be considered misdiagnoses, many have phenotypes with considerable overlap with SMS (Fig. 3, Additional file 2: Table S1, Additional file 1: Supplementary text). BAB4947 presented facial dysmorphisms, SMS-like behavioral disturbances that include sleep problems, polyembolokoilamania, onychotillomania, brachycephaly, and brachydactyly, as well as known GLDC-variants associated features such as seizures. His clinical diagnosis could possibly be confounded by the likely presence of two molecular diagnoses: compound heterozygous variants in GLDC and an inherited frameshift variant in TCOF1, a gene associated with Treacher Collins syndrome-1 (OMIM #154500) and possibly responsible for the down-slanting eyes, everted lateral eyelids, and malar hypoplasia. The clinical scenario is similar with cases BAB2474 and BAB2540, who did not show CFC4- (e.g. ectodermal anomalies, craniofacial features) and MICPCH-distinctive features (e.g. microcephaly and pontocerebellar hypoplasia). Likewise individual BA2492 has a constellation of symptoms (sleep disturbance, DD, cognitive impairment, brachydactyly) compatible with only the most severe 47,XYY sex chromosome aneuploidy cases . Consistent with the hypothesis of expanded phenotypes, the phenotypic variability of White-Sutton syndrome associated with variants in POGZ keeps extending with clinical features including ASD, DD, ID, schizophrenia, and microcephaly [57, 71, 77, 78, 79, 80, 81, 82, 83]. We can also not formally rule out that we have not yet determined the true genetic cause(s) of the phenotypic spectrum of these individuals or they occur in presence of more complex, blended phenotypes as exemplified by BAB4947 above. BAB2451 harbors a “probably pathogenic” variant (SIFT HumDiv score = 1; HumVar score = 0.982) in the gene JAKMIP1. Recent findings have linked the loss of JAKMIP1 to neuronal translation dysregulation during synaptic development; mice knocked out for the JAKMIP1 paralog display social deficits, stereotyped activity, altered vocal communication, increased impulsivity, and other autistic-like behaviors .
The presented results support the notion that at least some of the identified variants in candidate SMS contributory genes CASK, GLDC, KDM5C, KMT2D, MAP2K2, MECP2, POGZ, and ZEB2 are causative of the observed phenotypes and thus that modification of the function of these genes is associated with a greater phenotypic variability than previously expected (Fig. 3). Conversely, one and two carriers of damaging RAI1 variants were identified within a total of 6381 ASD [79, 85] and 2426 ID [71, 78, 86, 87, 88, 89] individuals, respectively. Whereas the phenotype of one of the ID individuals was retrospectively found to be consistent with SMS , we lack detailed phenotypic information regarding the other two cases. If we assume that these two individuals do not present with typical SMS features that would have excluded them from these cohorts, it suggests that the phenotype of carriers of RAI1 deleterious variants is similarly more variable than anticipated.
Structural variations, especially large rearrangements involving several genes, shape tissue transcriptomes and impact the expression of genes mapping to their flanks [64, 90]. We show that the homozygous deletion of Rai1 in mouse embryos  influences the expression of several genes and in particular MMU11 genes. Furthermore, the RAI1 viewpoint contacts the orthologous genes at the chromatin level. As some of these genes contribute to phenotypes associated with RAI1 variation (e.g. KRT17 with “hoarse voice” (HP:0001609), B9D1 with “low-set, posteriorly rotated ears” (HP:0000368), “hypertelorism” (HP:0000316), and “microcornea” (HP:0000482)), they could be involved in RAI1 pathways. The relevance of using 3C-based approaches as unbiased tools to discover clinically related genes is reinforced by their successful application in assessing connected regions involved in similar phenotypes  and genes interacting with risk loci identified in genome-wide association studies (GWAS) [91, 92]. The contacted regions encompass candidate genes involved in “detection of light stimulus” and related gene ontology terms. These processes all refer to photodetection, which controls circadian rhythm and melatonin production from the pineal gland. RAI1 is an important player in this mechanism, by controlling the transcriptional levels of CLOCK, a key component of the mammalian circadian oscillator that transcriptionally regulates many critical circadian genes . Another gene mapping within the SMS critical region on chromosome 17p11.2 and linked to these processes is the subunit 3 of the COP9 signal transduction complex (COPS3), essential for the light control of gene expression . It is thus possible that the disruption of the orthologous locus in the Rai1 –/– mice perturbs chromatin loops and affects expression levels of RAI1-contacted/functionally associated genes. We are well aware of the limitations of using LCLs in this type of study, and particularly to assess chromatin contacts between genes whose expression specificity resides in other cell lineages. These experiments are nevertheless worth pursuing simply because: (1) the primary human target tissues often remain beyond reach; (2) we cannot exclude a broad to ubiquitous expression pattern for the genes involved in these disease processes; and (3) long range chromatin contacts were shown to be stable across cell lines and tissues regardless of expression status . Similar limitations apply to the use of embryonic stem cell-derived material, while animal tissues have a different set of shortcomings.
Our results strongly support a disease network associated with RAI1 and illustrate the utility of a comprehensive multifaceted diagnostic approach even in the presence of a distinctive disorder.
We thank the patients and families for their contribution to this study. We are grateful to the members of the Lausanne Genomic Technologies Facility for technical help.
MNL was awarded an EMBO fellowship (ASTF 153-2015). CRB was an HHMI fellow of the Damon Runyon Cancer Research Foundation (DRG 2155-13) and is supported by a grant from the National Institutes of General Medical Sciences (K99GM120453). TH and WLC are supported by the NIH T32 GM07526 Medical Genetics Research Fellowship Program and the CPRIT RP140102 training Program, respectively. This work was supported by grants from the Swiss National Science Foundation (31003A_160203) and the Simons Foundation (SFARI274424) to AR; the US National Human Genome Research Institute (NHGRI)/National Heart Lung and Blood Institute (NHLBI) grant no. HG006542 to the Baylor-Hopkins Center for Mendelian Genomics; the Smith-Magenis Syndrome Research Foundation (SMSRF); and the National Institute of Neurological Disorders and Stroke (NINDS) NS058529 to JRL. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and material
The 4C-seq raw sequencing files and identified variants are available at GEO and ClinVar under accession number GSE83420 and SCV000299207-SCV000299218, respectively.
MNL performed the 3C and 4C experiments and analyzed the data with ML and JR. AN, MNL, IC, and IX conducted the network analysis; NG performed the variant analysis and modeling. CRB, JW, TH, ZCA, SNJ, DMM, RAG, and JRL performed the exome sequencing, analysis of the data, and subsequent validation of variants. PF, CRB, and JW conducted and analyzed X-inactivation data. WB, CRB, ESC, SG, and CAS performed and analyzed expression data on mouse embryos. JY, WLC, CRB, and WB performed and analyzed yeast two-hybrid data. MNL, CRB, JRL, and AR wrote the manuscript with contributions from IX, NG, and TH. JRL and AR designed the study and obtained the necessary funding. All authors commented on and approved the manuscript.
JRL has stock ownership in 23andMe, is a paid consultant for Regeneron Pharmaceuticals, has stock options in Lasergen, Inc., is a member of the Scientific Advisory Board of Baylor Miraca Genetics Laboratories, and is a co-inventor on multiple United States and European patents related to molecular diagnostics for inherited neuropathies, eye diseases, and bacterial genomic fingerprinting. Baylor College of Medicine (BCM) and Miraca Holdings Inc. have formed a joint venture with shared ownership and governance of the Baylor Miraca Genetics Laboratories (BMGL), which performs clinical exome sequencing. The Department of Molecular and Human Genetics at Baylor College of Medicine derives revenue from the chromosomal microarray analysis (CMA) and clinical exome sequencing offered in the Baylor Miraca Genetics Laboratory (BMGL; http://www.bmgl.com/BMGL/Default.aspx). The remaining authors declare that they have no competing interests.
Consent for publication
We obtained the authorization to publish participants’ data from their parents or legal guardians.
Ethics approval and consent to participate
The institutional review board of the Baylor College of Medicine approved this study. Participants were enrolled after written informed consent was obtained from parents or legal guardians. This study conforms to the Helsinki Declaration.
- 1.Smith ACM, Boyd KE, Elsea SH, Finucane BM, Haas-Givler B, Gropman A, et al. Smith-Magenis Syndrome. In: Pagon RA, Adam MP, Ardinger HH, Wallace SE, Amemiya A, Bean LJH, et al., editors. GeneReviews(R). Seattle: University of Washington; 1993.Google Scholar
- 14.Williams SR, Aldred MA, Der Kaloustian VM, Halal F, Gowans G, McLeod DR, et al. Haploinsufficiency of HDAC4 causes brachydactyly mental retardation syndrome, with brachydactyly type E, developmental delays, and behavioral problems. Am J Hum Genet. 2010;87(2):219–28.PubMedPubMedCentralCrossRefGoogle Scholar
- 15.Williams SR, Girirajan S, Tegay D, Nowak N, Hatchwell E, Elsea SH. Array comparative genomic hybridisation of 52 subjects with a Smith-Magenis-like phenotype: identification of dosage sensitive loci also associated with schizophrenia, autism, and developmental delay. J Med Genet. 2010;47(4):223–9.PubMedCrossRefGoogle Scholar
- 16.Derwinska K, Mierzewska H, Goszczanska A, Szczepanik E, Xia Z, Kusmierska K, et al. Clinical improvement of the aggressive neurobehavioral phenotype in a patient with a deletion of PITX3 and the absence of L-DOPA in the cerebrospinal fluid. Am J Med Genet B Neuropsychiatr Genet. 2012;159B(2):236–42.PubMedCrossRefGoogle Scholar
- 35.Loviglio MN, Leleu M, Mannik K, Passeggeri M, Giannuzzi G, van der Werf I, et al. Chromosomal contacts connect loci associated with autism, BMI and head circumference phenotypes. Mol Psychiatry. 2016. doi: 10.1038/mp.2016.84.
- 43.Alexa A, Rahnenfuhrer J. topGO: Enrichment Analysis for Gene Ontology. R package version 2.24.0. 2016. https://bioconductor.org/packages/release/bioc/html/topGO.html.
- 71.Deciphering Developmental Disorders Study. Large-scale discovery of novel genetic causes of developmental disorders. Nature. 2015;519(7542):223–8.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.