Expression profile and transcription factor binding site exploration of imprinted genes in human and mouse
In mammals, imprinted genes are regulated by an epigenetic mechanism that results in parental origin-specific expression. Though allele-specific regulation of imprinted genes has been studied for several individual genes in detail, little is known about their overall tissue-specific expression patterns and interspecies conservation of expression.
We performed a computational analysis of microarray expression data of imprinted genes in human and mouse placentae and in a variety of adult tissues. For mouse, early embryonic stages were also included. The analysis reveals that imprinted genes are expressed in a broad spectrum of tissues for both species. Overall, the relative tissue-specific expression levels of orthologous imprinted genes in human and mouse are not highly correlated. However, in both species distinctive expression profiles are found in tissues of the endocrine pathways such as adrenal gland, pituitary, pancreas as well as placenta. In mouse, the placental and embryonic expression patterns of imprinted genes are highly similar. Transcription factor binding site (TFBS) prediction reveals correlation of tissue-specific expression patterns and the presence of distinct TFBS signatures in the upstream region of human imprinted genes.
Imprinted genes are broadly expressed pre- and postnatally and do not exhibit a distinct overall expression pattern when compared to non-imprinted genes. The relative expression of most orthologous gene pairs varies significantly between human and mouse suggesting rapid species-specific changes in gene regulation. Distinct expression profiles of imprinted genes are confined to certain human and mouse hormone producing tissues, and placentae. In contrast to the overall variability, distinct expression profiles and enriched TFBS signatures are found in human and mouse endocrine tissues and placentae. This points towards an important role played by imprinted gene regulation in these tissues.
KeywordsAdrenal Gland Transcription Factor Binding Site Imprint Gene Orthologous Gene Pair Random Forest Analysis
Most genes in the mammalian genome are expressed from both parental alleles. Imprinted genes represent a minority of genes, which are transcribed from only one allele. While the molecular mechanisms underlying imprinting control have some commonalities, their individual expression control and expression patterns appear to vary in a developmental and tissue specific manner. A systematic investigation of their expression profiles may help to better understand the biological function and regulation of imprinted genes.
To date, approximately 100 genes with evidence for imprinting effects in either human or mouse are described . Based on recent predictions, the number of mammalian imprinted genes may range between 100 and 600 genes [1, 2, 3, 4], i.e. a substantial number of imprinted genes are already identified.
The imprinted expression of genes appears to be a rather conserved phenomenon in mammals ; i.e., genes that are found to be imprinted in one species are most likely imprinted in the other. This tenet, however, is not always fixed, as has been shown for the two orthologous man and mouse genes L3MBTL and L3mbtl. These genes each encode a polycomb protein. In human, the gene is frequently absent in patients with myeloid malignancies. Human L3MBTL has been shown to be paternally expressed due to monoallelic methylation [5, 6] whereas mouse L3mbtl is not imprinted nor are its CpG islands differentially methylated .
Orthologous genes that are imprinted in human and mouse are most likely either maternally or paternally expressed in both organisms. Rarely, genes are oppositely imprinted such as the ZIM2/Zim2 genes: human ZIM2 is paternally expressed while mouse Zim2 is maternally expressed . This phenomenon might be explained by the fact that the human ZIM2 gene shares 5' exons and a promoter with the likewise paternally expressed PEG3 gene, while mouse Zim2 appears not to do so .
Imprinted genes have been hypothesized to play a major role in the regulation of embryonic growth [8, 9, 10], to control placental function and to modulate the transport of nutrients from mother to embryo . Indeed, a number of imprinted genes, such as Ascl2, Phlda2, Peg10, are indispensable for proper placental morphology and function while others are involved in nutrient supply regulation [11, 12, 13, 14]. Additionally, there is strong evidence that imprinted genes control neurological development and function as well as energy homeostasis in postnatal stages of development and the adult [9, 15, 16].
Based on these various observations it seems likely that imprinted genes are tightly regulated in a developmental and tissue specific manner. While tissue specific expression profiles have been examined for some selected genes, no study on the entire class of imprinted genes has been performed so far. Furthermore little is known about the expression status in adult tissues compared to embryonic states.
In this study we performed a computational expression analysis of human and mouse imprinted genes in a variety of non-cancerous tissues using a set of existing systematic transcriptional profiling data . In particular, we (1) compared the profiles of individual genes across tissues, (2) analysed the correlations of expression patterns in human and mouse, and (3) explored the role of predicted transcription factor binding sites in correlation to tissue specific expression. Our data provide new insights into the range and extend of expression, the tissue specific function and the regulation of imprinted genes in two mammalian species.
Sets of imprinted genes selected for analysis
The Imprinted Gene Catalogue (IGC) [1, 18] reports imprinted genes in various species, including human and mouse. We gathered information from the IGC on 62 genes for which solid experimental data on allele-specific expression was available [see Additional file 1] (see Methods for definition of exclusion criteria). Among these 62 genes, only 30 had been analysed in human as well as in mouse revealing that 26 of these are imprinted in both species. For one of these genes the status was only confirmed in human, for 3 genes only in mouse. Thus, of the genes analysed in both species 87% showed conservation of imprinting. For the additional 23 imprinted genes the imprinting status had only been analysed in mouse, and for additional 9 genes only in human.
Tissue-specific expression patterns of imprinted genes
Using a publicly available gene expression dataset derived from microarray hybridisations we wanted to find out if imprinted genes form a subset of genes expressed in a particular fashion in human and mouse.
The raw microarray data were preprocessed and normalized as described in the Methods section. We confined the analysis to genes that were present on the respective expression arrays (GNF1M for mouse and HG-U133A for human) and which exhibited a confirmed imprinting status [see Additional file 1] in at least one species. For human, 29 imprinted genes met such criteria (of 35 genes with a confirmed imprinting status), and in mouse 43 (of 52 with a confirmed imprinting status). This list also includes genes reported to be imprinted in certain tissues only but not in others. As information on tissue specific imprinting is only available for some but not all imprinted genes the consideration of tissue-specific imprinting is not feasible for an unsupervised genome-wide approach. The array data did not allow distinguishing expression of the parental alleles. Our analysis could therefore only address the overall expression level. In the analysis of human gene expression we included 21 postnatal tissues and placental tissue. For the mouse, data sets for oocyte, fertilised egg and five embryonic stages were included. Unfortunately, such data was not available for the human.
For human, placenta forms a clearly separate branch (1st split) in the clustered tree of tissues (Figure 1a). In the 2nd split pancreas branches off (high influence of insulin), followed by a 3rd split branch formed by pituitary, ovary, and adrenal gland. In the clustered tree of imprinted genes, GNAS clusters apart after the 1st split followed by DLK1 in the 2nd split. The expression of both genes differs significantly from the other genes in all tissues (p value < 10-12 for all tissues). While GNAS was highly expressed in all tissues, DLK1 is significantly over-expressed in placenta, adrenal gland, ovary and pituitary and the high expression is relevant for the clustering of these tissues (see above). Regarding the 3rd split, the remaining imprinted genes branch into two large clusters (Figure 1a). The first group, consisting of INS, KCNQ1, SLC22A18, NDN, NNAT, SNRPN, PEG10, CDKN1C, MEST, ATP10A, GRB10, IGF2, SGCE, PEG3, ZIM2, comprises genes expressed at largely median level. Among those INS stands out, that is remarkably over-expressed only in pancreas, the major insulin producing organ of the body. IGF2, PEG10 and CDKN1C are strongly (over)-expressed in placenta. The second cluster consists of PHLDA2, PLAGL1, PPP1R9A, DIRAS3, L3MBTL, WT1, HYMAI, DLX5, MKRN3, TP73, UBE3A, MAGEL2 with all genes showing a relative under-expression with respect to the tissue expression median. PHLDA2 and PLAGL1 are clearly downregulated in almost all tissues but strongly over-expressed in placenta. The genes that contributed most to the specific expression pattern of placenta and pancreas were those that were either strongly up or downregulated compared to their expression in other tissues. The placental expression pattern was dominated by DLK1, PHLDA2, CDKN1C, MEST, PEG10 and IGF2. For pancreas, INS, DLK1, SNRPN, MEST KCNQ1 and HYMAI were the most prominent genes. Finally, for the 3rd split, a cluster which consisted of adrenal gland, ovary and pituitary, we applied random forest analysis to determine which genes contributed most to the formation of that cluster. These were DLK1 (mean standard error – MSE: 5.36%), PPP1R9A (4.61%), HYMAI (3.45%), PEG3 (2.3%), MEST (2.13%), ATP10A (2.09%), ZIM2 (1.99%). Applying a random forest analysis to the same tissues in mouse identifies Dlk1 (4.84%), Sgce (4.79%), Kcnq1 (2.11%), Phlda2 (1.91%), Gtl2 (1.76%), Inpp5f (1.41%), Usp29 (1.41%) as major contributors.
The clustering of imprinted genes in the mouse shows that a series of genes play a role for the branching into embryonic and brain specific clusters at the 1st/2nd split. Aside from the predominant expression profile of H19 (1st split, H19 is not represented on the human array), the remaining genes split into 2 clusters (2nd split). One is the group characterised by moderate to high expression which splits into two clusters (3rd split), where the cluster of Gtl2, Inpp5f, Nap1l5, Ndn, Nnat, Dlk1, Peg3, Grb10 and Plagl1 shows high expression in brain tissues. The second group is characterised by moderate to low expression and falls into two clusters according to the 2nd split. Of these, the cluster of Slc38a4, Peg12 and Zim1 shows mainly low expression throughout the tissues.
Additional clustering (data not shown) by combination of Euclidean and Manhattan distances, respectively, was generated with either complete or average linkage. For mouse, the structures of the obtained trees were very similar to the ones shown in figure 1. The human clustering was found to be less stable (particularly applying Manhattan distance). However, placenta always separated in the first splits and in most analyses, pancreas separated in the 2nd whereas adrenal gland and pituitary as well as amygdala and hypothalamus separated in the 2nd or 3rd.
Imprinted genes do not show prominent overexpression in distinct tissues
We next analysed whether imprinted genes on average are more strongly expressed in certain tissues compared to the non-imprinted genes present on the arrays (Figure 2a and 2b). For the analysis we sampled groups of non-imprinted genes (same number of genes as the examined imprinted gene group) 1000 times and compared their relative expression levels to the average expression of imprinted genes. These analyses were performed separately for each tissue. For human, the median expression levels of imprinted and non-imprinted genes were not significantly different (after multiple testing adjustment, i.e. Hochberg adjustment, p values ~ 0.64). In mouse, hypothalamus showed a slightly increased median expression compared to other tissues (p = 0.05).
We also compared the distribution of expression levels of imprinted and non-imprinted genes across tissues. Testing included either all non-imprinted genes on the array or randomly sampled sets. In both cases we observe similar distributions of standard deviations of expression levels across tissues between imprinted and non-imprinted genes (background) on the array [see Additional file 2]. Testing against randomly sampled gene sets the distributions of human and mouse standard deviations did not differ significantly from genomic background. Thus, the overall variability across tissues in relative expression of individual imprinted genes is not remarkably high with a few exceptions such as DLK1 and INS in human and H19 and Ins2 in mouse.
As a sum, imprinted genes show a median expression across tissues similar to non-imprinted genes. Except for a slight tendency in mouse hypothalamus, imprinted genes do not show a particular tissue-specific enrichment compared to the genome-wide average in either adult tissues or mouse embryonic tissues. In addition, imprinted genes did not show reduced or increased variability in tissue-specific expression levels. This suggests that on average imprinted genes tend neither to be expressed at almost constant levels in all tissues (like house keeping genes) nor to be only expressed in very few tissues.
We next tested whether any two tissues differ significantly. Adjusting for multiple testing in human tissues, no tissue pair reaches significance. In mouse, 60 tissue pairs out of 406 show a p value < 0.01. By chance we would expect 4 pairs with a p value of less than 0.01. Thus we observe approximately a 15 fold increase. Furthermore, pairwise comparison of embryonic tissues (fertilized egg, embryonic stages 6.5 – 10.5) with adult tissues resulted in 36 pairs with p value < 0.01 (out of 126). Hypothalamus is the tissue with the highest median expression level and shows significant over-expression in comparison to 14 tissues (out of 28 pairs). The detailed matrix is given in an additional table [see Additional file 3].
The biclustered expression matrices (Figure 1a and 1b) illustrate that several imprinted genes have conspicuous expression behaviour across tissues. H19 is such an outlier gene which in mouse is highly expressed at all embryonic stages and in skeletal muscle. Others, such as GNAS/Gnas, are strongly expressed in many tissues of one but not the other species pointing towards more general expression differences between human and mouse at this locus. Finally, in some tissues individual outliers show extensive differences in the relative expression between both species. An example is Cdkn1c, which is highly expressed in adrenalgland in the mouse but only moderately in human (Figure 1), although the general correlation between CDKN1C and Cdkn1c across all tissues is rather high (see below).
Overall, we observe that in pairwise comparisons imprinted genes are more highly expressed in mouse embryo than in adult tissues, especially bonemarrow, heart, lung, lymphnode, pancreas, prostate, salivarygland, testis, thymus, thyroid. The highest expression levels are observed for genes in the BWS region, namely H19, Cdkn1c, Phlda2 and Igf2. These genes dominate the biclustering of embryonic and placental samples in figure 1b whereas other genes behave rather inconspicuously at embryonic stages.
We also calculated the pairwise Pearson correlation coefficients for genes within particular imprinted regions, i.e. regions which at least contained three verified imprinted genes annotated to the same chromosomal band. The analysis shows that expression profiles show no more similarity among imprinted genes of a common cluster/chromosomal band than genes that reside in different regions (data not shown).
Already the biclustered expression matrices (Figures 1a and 1b) indicated that maternally and paternally imprinted genes, respectively, do not cluster together according to their tissue specific expression profiles. Calculation of Pearson correlation supports this notion showing that the parental origin of expression has no influence on tissue-specific expression profiles (data not shown). Still, overall, paternally expressed genes tend to be more highly expressed than maternal genes (for human p = 0.02, for mouse p = 0.04, t-test).
Orthologous imprinted genes in man and mouse exhibit relaxed correlation of tissue specific expression
In addition, we analysed if analogous tissues show similar expression profiles of orthologous imprinted genes in human and mouse. Out of a set of 8980 available orthologous genes we randomly sampled 1000 times genes of the same size as the imprinted gene set. For a given tissue we determined Pearson correlation coefficients comparing the relative expression profiles in human and mouse for each set of sampled genes. Thus, we derived 1000 Pearson correlation coefficients for the sampled gene sets and one for the imprinted gene set for each of the 22 tissues.
For 18 of 22 tissues, the correlation of expression of imprinted genes is in the 25% to 75% interquartile range (IQR) of randomly sampled orthologous genes (Figure 3b and [see Additional file 4]). In trachea, imprinted genes correlated slightly worse (median 0.388 for the random gene set and 0.142 for imprinted genes). In adrenal gland, pancreas, and pituitary, the correlation was stronger than in the set of random genes [see Additional file 4]. Although the correlation coefficient of placenta was between the 1st and 3rd quartiles (placenta differs from other tissues in its expression patterns of imprinted genes), it shows a clear tendency towards higher correlation in imprinted genes than in randomly sampled sets. In summary, the correlation values of human and mouse orthologous imprinted genes are not very different from those of randomly sampled non-imprinted orthologous genes (Figure 3b). In a few endocrine tissues we observe a strong expression correlation for orthologous imprinted genes. This finding is in line with previous individual expression reports on a few candidates in human and mouse [19, 20].
A few genes dominate expression profiles in distinct tissues
Examples for such associations are mouse pituitary and human placenta which are associated with DLK1/Dlk1 and PEG3/Peg3 in the second quadrant. Further examples are (1) mouse placenta and PLAGL1/Plagl1, (2) mouse pituitary, human placenta and PEG3/Peg3, and (3) human adrenal gland, pituitary, ovary and NNAT/Nnat, NDN/Ndn.
Overall, however, the correspondence analysis (Figure 4) revealed highest variance between human and mouse tissues, with only human placenta being an exception (separated by the first component, which accounts for 41.89%). Thus, almost all human and mouse tissues were clearly separated (explained inertia of the first 5 components are: 41.89%, 19.59%, 10.74%, 9.55%, and 4.02%, respectively). A strong association was seen between GNAS and all human tissues except for pancreas, adrenal gland, ovary, pituitary, and placenta. INS/Ins2 showed a strong association with mouse pancreas but less so with human pancreas, while SLC22A18 showed a stronger association with human pancreas. Dlk1 was associated with mouse pituitary (Figure 4).
Distinct sets of transcription factor binding sites correlate to tissue-specific expression patterns of imprinted genes
The analysis reveals that imprinted genes expressed in human placenta are more frequently associated with binding sites for XPF1, NFKappaB50, ER, MTF1, SF1, GLI, TEF, DR4, MEF3 and CP2. In fact, XPF1, TEF, MTF1, SF1, GLI, MEF3 and CP2 were also present on the expression array and we found XPF1, MTF1, SF1, GLI and CP2 to be upregulated (defined as higher-than-median expression plus 1 SD) in human placenta (Figure 5b). We further extracted human placenta-expressed genes from the expression dataset and explored whether the same group of transcription factors binding sites were enriched. In fact, all TFBSs except for XPF1 and MEF3 showed significant enrichment. NFKappaB50 had a p-value of 4.3*10-44, TEF 7.3*10-69, DR4 4.1*10-7, ER 0.002, MTF1 9.9*10-57, SF1 1.2*10-11, GLI 5.9*10-20 and CP2 4.4* 10-41 while XPF1 and MEF3 had a p-value of 0.4. After adjustment for elevated GC content of the upstream regions of imprinted genes, binding sites for NFKappaB50, TEF, MTF1, SF1 and CP2 still showed clear enrichment, while binding sites for DR4, ER, and GLI showed no enrichment. In summary, NFKappaB50, TEF, MTF1, SF1 and CP2 displayed placenta-specific TFBS enrichment. Binding sites for XPF1 were significantly enriched in the upstream regions of imprinted genes but not in those of placenta-expressed genes, with or without GC content adjustment.
In mouse, the same TFBSs were significantly enriched as in the human set, namely for NFKappaB50, TEF, MTF1, SF1 and CP2 (even after adjustment for elevated GC content). In addition, DR4 and GLI binding sites also showed significant enrichment. Overall, the results were comparable between human and mouse imprinted genes.
Prominent examples of tissues specific TFBS associations in imprinted genes can also be observed for adrenal gland, pituitary and ovary (2nd quadrant in Figure 5a, and Figure 5c). As for the clustering of human imprinted genes expression, (Figure 1a) the multiplied dataset consisting of TFBS enrichment and expression displays a very pronounced cluster of pituitary, adrenal gland and ovary. The strongest associations to these tissues as can be directly read from figure 5c are HEN1, LXRR4, MRF2, CEBP, RP58, HEB and XVENT1 binding sites. While XVENT1 is not represented on the array, HEN1, MRF2 and CEBP show a very pronounced upregulation in ovary and pituitary (at least two fold upregulation compared to the cellular background). For adrenal gland these are HEN1 and CEBP while MRF2 is at least 1.4 fold upregulated.
A general observation of our analysis is that imprinted genes are expressed in a broad range of adult tissues and placenta in human and mouse. For most tissues, the correlation of expression patterns of imprinted genes in human and mouse is not pronounced and does not differ from that of other randomly selected orthologous genes. Furthermore, the organization of imprinted genes into genomic clusters does not coincide with coordinated tissue-specific gene expression patterns within these genomic regions. Besides the overall "inconspicuous" expression behaviour of imprinted genes we observe particular expression patterns of subsets of imprinted genes in certain human and mouse tissues. Tissues with distinct expression profiles such as placenta, adrenal gland, ovary and pituitary show a remarkable correlative association with distinct TFBSs in the promoter regions of imprinted genes. As very little is known about transcription factors that regulate imprinted genes the identified associated factors are good candidates for experimental studies on tissue-specific regulation of imprinted genes. In addition, various imprinted zinc finger protein genes have been identified that may act as transcription factors. Among these is PLAGL1/Plagl1 that is strongly associated with murine placenta and apparently possesses the potential to regulate Igf2 and H19 , and also PEG3/Peg3 that is associated with mouse pituitary and human placenta.
In line with the parental conflict hypothesis , embryonic development and placental phenotypes are associated with imprinting mutations and imprinted gene expression [24, 25, 26]. We observe that imprinted genes do not show a generally stronger correlation of tissue-specific expression patterns in human and mouse and hence are unlikely to form a "uniform class" of genes whose functions are restricted to the same tissues and (embryonic) stages in both species. It will be interesting to extend detailed tissue specific comparisons to mouse/human embryonic stages to find out if distinctive tissue specific patterning is observed during prenatal development. So far our analysis (confined to total embryo and placenta expression) indeed suggests that the overall expression of imprinted genes in embryo and placenta is distinct from adult tissues. However, this distinction results from an exceptional expression of a rather small group of imprinted genes, with most being located in the BWS region.
One major observation of our analysis is that species specific subsets of imprinted genes form groups with pronounced expression correlation in adrenal gland, pancreas, and pituitary in human and mouse. In the correspondence analysis plots, these three organs and also placenta separate from the vast majority of tissues. All four organs (adrenal gland, pancreas, pituitary, placenta) play prominent roles in the energy metabolism of mammals: the placenta acts as the key organ in nutrient transfer between mother and embryo, and the adrenal gland, pancreas, and pituitary secrete factors, such as insulin and hormones, that play major roles in carbohydrate metabolism . Thus, coordinated imprinted gene expression in these organs may be of importance for balanced physiological pathways in energy supply. This supports the parental conflict hypothesis to some extent, which proposes imprinted genes as regulators of growth and maternal nutrient supply. Interestingly, we observe a trend towards higher expression levels of paternally expressed genes. It will be interesting to find out if these genes are particularly involved in regulating nutrient demand or other discussed functions of these endocrine tissues, such as stress response, salt and fluid balance.
Several imprinting syndromes suggest that (some) imprinted genes fulfil important neuronal functions and hence may be particularly expressed in certain neuronal tissues . Although hypothalamus, amygdala and cerebellum cluster apart from other tissues in mouse (biclustering analysis), they do not separate from other tissues in human. Also, the correspondence analysis separates brain tissues only marginally from other tissues in mouse. Hence, at least for human and mouse, particular brain-tissue specific expression profiles are likely to be species specific. In the human, the lack of particular profiles in certain brain tissues (such as hypothalamus, cerebellum, amygdala) may even indicate that here imprinted control might be confined to only a few genes such as NDN and SNRPN. However such extreme interpretations have to be taken with great caution given the fact that some genes such as GNAS or DLK1 show a strong and broadly extended expression pattern in most human tissues including brain.
In summary, imprinted genes are expressed in a broad range of tissues in the adult of human and mouse. According to their overall expression pattern they do not form a particular class of autosomally expressed genes. However, subsets of imprinted genes are strongly expressed in the pituitary, adrenal gland, pancreas and placenta. Hence particular expression patterns are found in tissues regulating hormonal and nutritional homoeostasis in both human and mouse. Such correlated expression and the enrichment of tissue specific TFBSs suggest mechanisms of co-regulation of selected imprinted genes in organs/tissues controlling hormonal pathways and growth physiology in both species.
Imprinted genes of human and mouse were downloaded from the Imprinted Genes Catalogue (IGC, 11/2007) . For some of these genes, there were conflicting reports about their imprinting status. A number of these genes were biased towards the maternal allele only in placenta. Because this organ is composed of maternal and embryonic tissue, it is difficult to distinguish whether maternal expression of genes is caused by expression in the maternal sections or by imprinted expression. Therefore, we neglected from the analyses genes for which conflicting data have been reported, as well as genes for which the only evidence for possible imprinting effects is an expression bias toward the maternal allele in placenta. Furthermore, antisense transcripts were excluded from analysis since their genomic organisation is often insufficiently defined and are often not easy to be distinguished from sense transcripts, especially in hybridisation experiments. Small RNAs such as microRNAs and snoRNAs were also removed from the analyses since they are often part of longer transcripts. The resulting, manually curated list of 62 imprinted genes is given in an additional file [see Additional file 1]. Corresponding Ensembl Identifier, potential HG-U133A Identifier, GNF1M Identifier, and Gene Symbols were determined based on the Gene Names given in the IGC list using Ensembl BioMart http://www.ensembl.org. 11 genes were found only in one species; i.e., we failed to identify an ortholog of sufficient identity in the other species. In total, the list encompasses 36 human and 52 murine imprinted genes.
When examining imprinted gene expression in each species separately (human and mouse), we only used genes that had a confirmed imprinting status (i.e., those that are marked with a tick in an additional table [see Additional file 1]) that were present on the respective expression array (that is, GNF1M for mouse and HG-U133A for human). For human, there were 29 imprinted genes, and 43 for mouse. For human-mouse comparison, we used those orthologous genes (see below) that (a) were present on both arrays and (b) showed verified imprinting status in human and mouse. In total, we analysed 19 genes. For our correlation analysis, the Pearson correlation coefficients were calculated.
For gene expression analysis, we used the expression data reported by Su and colleagues . This data has been studied with various foci several times and proven its validity for addressing expression related studies. Nevertheless, to our knowledge it has not been studied in the context of imprinting so far. The human sample dataset was based on the commercially available HG-U133A array; for mouse samples, a custom-designed array was used. Raw data were downloaded and subjected to pre-processing as follows: raw probe-set intensities were normalized using the calibration and variance stabilization method (vsn) . Using this procedure, the variance of normalized probe intensities was approximately independent of their expected absolute expression levels.
For each experiment, it was assumed that the majority of gene expression was not differential with regard to all other experiments on the same array type. Parameters for the vsn model were estimated for a random subset of 50% of the probes and then used to transform the entire array. Probe-set intensities of each probe set were summarized by applying the median polish method  after normalization. Herein, for each probe-set, a robust additive model was fitted across the arrays. For further analysis, only those experiments that were annotated with the following tissue names for both human and mouse expression experiments were considered: adrenal gland, amygdala, bone marrow, cerebellum, heart, hypothalamus, kidney, liver, lung, lymph node, ovary, pancreas, pituitary, placenta, prostate, salivary gland, skeletal muscle, testis, thymus, thyroid, trachea, and uterus. For mouse we additionally studied embryonic stages 6.5, 7.5, 8.5, 9.5, 10.5, fertilized egg and oocyte. These stages were not available for human.
Because absolute expression levels were not appropriate for comparison between species, even after normalization, we used relative expression values with respect to the genome-wide profile of the same cell type in each species; i.e., we subtracted the cell type-specific background expression from each normalized (transformed scale) gene expression value. Repeated measurements were averaged. We called the resulting expression values 'normalized expression levels' or to be more precise 'normalized relative expression levels'. We also checked whether relative expression and absolute expression differ strongly which is not the case (data not shown).
Visualization of expression profiles and statistical analyses
Biclustering and generation of heatmaps as shown in figure 1 and visualization of Pearson correlation of human/mouse relative expression as shown in figure 3a were done using TM4 . For biclustering, Euclidean distance and average linkage were chosen. We confirmed that top splits remained approximately the same when changing the distance and clustering method. Therefore we restricted the analysis shown here to Euclidean distance and average linkage. All further calculations and statistical analyses were performed using the statistical language R  and packages from Bioconductor . Correspondence analysis was calculated using the package "ca" . For classification of genes that define subclusters of tissues (i.e. brain and embryonic tissues) we applied random forest tests  and used the package "randomForest". Further statistical tests were conducted using the Base package.
Definition of orthologous genes
Using Ensembl BioMart, all orthologous genes in human and mouse that were (a) annotated with orthology type "one2one" and (b) present on the HG-U133A Affymetrix chip and on the GNF1M chip were determined. Probe-sets that were annotated to these genes were identified, in total 8980 genes. Of these, the gene sets that had the same size as the orthologous imprinted genes were randomly sampled 1000 times.
Definition of upstream regions and sequence retrieval, and examination and identification of transcription factor binding sites
The transcription factor binding sites (TFBSs) were predicted using the method developed by Rahmann and colleagues . We used Transfac 9.4 database to obtain position specific scoring matrices (PSSMs) preferentially recognized by transcription factors. In order to reduce the amount of false or overlapping predictions we narrowed the whole PSSM set to 125 non-redundant, high-quality, vertebrate matrices.
As putative promoters we defined the sequences located from 2000 bp upstream to 2000 bp downstream around ENSEMBL predicted transcription start sites (TSSs) of genes (according to ENSEMBL database, version 42). We scanned each of the putative promoters for nucleotide patterns matching the PSSMs. A match was accepted when the similarity score was above a threshold score defined as obtaining one single false positive prediction per 500 nt with a probability of 0.01 (see  for details). The above criterion yielded 175 TFBSs for 19 studied human imprinted genes. The p-values corresponding to these predicted TFBSs were then used as input for the correspondence analysis as described above.
Christine Steinhoff (MPIMG) is supported by EU IP grant EuTRACC and FastTrack (Robert Bosch Stiftung). We thank the reviewers for helpful suggestions and Sean O'Keeffe for carefully reading and editing the manuscript.
- 2.Ruf N, Bahring S, Galetzka D, Pliushch G, Luft FC, Nurnberg P, Haaf T, Kelsey G, Zechner U: Sequence-based bioinformatic prediction and QUASEP identify genomic imprinting of the KCNK9 potassium channel gene in mouse and human. Hum Mol Genet. 2007, 16 (21): 2591-2599. 10.1093/hmg/ddm216.CrossRefPubMedGoogle Scholar
- 4.Hayashizaki Y, Shibata H, Hirotsune S, Sugino H, Okazaki Y, Sasaki N, Hirose K, Imoto H, Okuizumi H, Muramatsu M, et al: Identification of an imprinted U2af binding protein related sequence on mouse chromosome 11 using the RLGS method. Nat Genet. 1994, 6 (1): 33-40. 10.1038/ng0194-33.CrossRefPubMedGoogle Scholar
- 5.Li J, Bench AJ, Piltz S, Vassiliou G, Baxter EJ, Ferguson-Smith AC, Green AR: L3mbtl, the mouse orthologue of the imprinted L3MBTL, displays a complex pattern of alternative splicing and escapes genomic imprinting. Genomics. 2005, 86 (4): 489-494. 10.1016/j.ygeno.2005.06.012.CrossRefPubMedGoogle Scholar
- 6.Li J, Bench AJ, Vassiliou GS, Fourouclas N, Ferguson-Smith AC, Green AR: Imprinting of the human L3MBTL gene, a polycomb family member located in a region of chromosome 20 deleted in human myeloid malignancies. Proc Natl Acad Sci USA. 2004, 101 (19): 7341-7346. 10.1073/pnas.0308195101.PubMedCentralCrossRefPubMedGoogle Scholar
- 14.Ono R, Nakamura K, Inoue K, Naruse M, Usami T, Wakisaka-Saito N, Hino T, Suzuki-Migishima R, Ogonuki N, Miki H, et al: Deletion of Peg10, an imprinted gene acquired from a retrotransposon, causes early embryonic lethality. Nat Genet. 2006, 38 (1): 101-106. 10.1038/ng1699.CrossRefPubMedGoogle Scholar
- 21.Benzecri J: L'analyse des donnees 1. La Taxinomie 2. L'analyse des correspondances. 1973, Paris: DunodGoogle Scholar
- 22.Varrault A, Gueydan C, Delalbre A, Bellmann A, Houssami S, Aknin C, Severac D, Chotard L, Kahli M, Le Digarcher A, et al: Zac1 regulates an imprinted gene network critically involved in the control of embryonic growth. Dev Cell. 2006, 11 (5): 711-722. 10.1016/j.devcel.2006.09.003.CrossRefPubMedGoogle Scholar
- 29.Tukey JW: Exploratory Data Analysis. 1977, Reading Masschusetts, USA: Addison-WesleyGoogle Scholar
- 31.Ihaka R, Gentleman R: R: a language for data analysis and graphics. J Comput Graph Stat. 1996, 5: 299-314. 10.2307/1390807.Google Scholar
- 33.Nenadic O, Greenacre M: Correspondence Analysis in R, with two- and three-dimensional graphics. Journal of Statistical Software. 2007, 20 (3): 1-13.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.