Background

Trisomy for human chromosome 21 (chr21) is the most frequent live-born aneuploidy and is the cause of Down syndrome (DS), whose main symptoms include intellectual disability, cardiovascular defects and craniofacial dysmorphisms [1]. The DS phenotype is thought to be associated with an altered expression of the genes located on chr21 [2]-[7]. Basic research on DS is now rapidly accelerating, and there is the possibility that the results will be beneficial for individuals with DS [8].

Several studies have shown that individuals with DS have a specific cancer risk pattern, or tumor profile: their risk of developing leukemia and testicular cancer is much higher than age-matched controls, while women with DS almost never develop breast cancer [9],[10]. In particular, children with DS show an increased prevalence of acute leukemia, both lymphoid (ALL) and myeloid (AML), with relative risk ranging from 10 to 20 times higher than the normal population [11],[12]. In nearly half of the cases, these childhood leukemias are classified as megakaryoblastic leukemia (AMKL), a relatively rare subtype of AML also known as AML M7, according to French–American–British (FAB) classification, whose incidence increases by 500-fold in children with DS by the age of 4 years as compared to the chromosomally normal population (reviewed in [13]). This observation strongly suggests that trisomy 21 directly contributes to the neoplastic transformation of hematopoietic cells, in particular in the megakaryocyte lineage cells. Interestingly, acute leukemia cells harboring megakaryocyte markers and presenting in subjects without DS may show trisomy 21 [14]. We also described a cell line derived from blast cells of a patient with type M2 AML which has trisomy 21 and megakaryocyte features [15]. More recently, mutations of the gene encoding for the transcription factor GATA1 have been shown to cooperate with trisomy 21 in initiating megakaryoblastic proliferation in nearly all DS AMKL cases while they are absent in non-DS AMKL [13],[16]. GATA1 mutations in DS cells give rise to a short, truncated form of GATA1 (GATA1s) transcription factor that, in this form, is not able to establish normal interactions with other gene regulators [17].

Transient myeloproliferative disorder (TMD) is a clonal pre-leukemia condition, occurring in 10% of children with DS during the neonatal period, presenting at a median age of 3-7 days with accumulation of immature megakaryoblasts [13]. TMD cases usually resolve spontaneosuly, but DS AMKL may develop within 1-4 years in 20-30% of these children. AMKL may develop in non-DS children, usually at an higher age in comparison to DS subjects (median 8 vs. 1.8 years, respectively) and in absence of a trisomy 21 background. Cytogenetic abormalities described in non-DS AMKL cells include trisomy 8 and 1 and monosomy 7 [13].

An open issue is the relevance of trisomy 21 as a specific background for the higher incidence of AMKL in DS. A few previous studies have used gene expression profiling by microarray analysis in order to identify specific transcriptome alterations in DS and/or non-DS AMKL, as well as in TMD [17]-[24]. Due to the rarity of AMKL, these works often analyze a small number of cases, using a variety of experimental platforms. Results were consequently affected by a small grade of comparability.

One of the first goals of this work was to perform a systematic meta-analysis using any available gene expression profile dataset related to AMKL in pediatric age in order to produce a differential transcriptome map between DS and non-DS related AMKL. This task has been accomplished using a tool recently described by us for the generation and the analysis of quantitative transcriptome maps, TRAM (Transcriptome Mapper) [25], which allows effective integration of data obtained from different experimenters, experimental platforms and data sources. The comparison of 43 DS AMKL samples with 45 non-DS AMKL samples represents the largest study on the subject, highlighting the relevance of trisomy 21 in the development of AMKL in comparison with AMKL originating from non-trisomic cells. Results show significant over- or under-expression of distinct chromosomal segments and of single key genes in the whole genome, as well as on chr21, adding new knowledge compared with that produced by the single works from which the data were originally obtained. In addition, each considered type of leukemia was compared with the expression profile of TMD cells and normal human megakaryoblast/megakaryocyte cells (MK), allowing the building of a model for the disorder in differentiation process that lead to DS and non-DS AMKL. Comparisons with cord blood-derived MK cells (CB MK) have also been performed, due to the fact that leukemias in infants or young children originate from fetal hematopoietic cells [17],[18],[26],[27] and the progenitor cells (fetal/neonatal MKP) are present in the cord blood (CB) [28],[29].

For each cell type investigated, reference expression data for about 17,000-26,000 mapped sequences have been generated and validated through a sample comparison with known data. The biological and clinical significance of these data is discussed.

Methods

Literature search

A systematic biomedical literature search was performed up to January 2013 in order to identify articles related to global gene expression profile experiments in AMKL patients (DS AMKL, non-DS AMKL and TMD conditions). A general search using the commonly used acronym "AMKL" retrieved 157 articles.

The MeSH term "Leukemia, Megakaryoblastic, Acute" was also used for a PubMed search in the expression: "Leukemia, Megakaryoblastic, Acute"[Mesh] AND ("Gene Expression Profiling"[MeSH] OR "Oligonucleotide Array Sequence Analysis"[Mesh] OR "Microarray Analysis"[Mesh] OR microarray* OR "Expression profile" OR SAGE).

Database search

Gene Expression Omnibus (GEO) [30] functional genomics repository was searched for: (AMKL[All Fields] OR (AML[All Fields] AND M7[All Fields])) AND "Homo sapiens"[Organism]. A more general search using the expression "Down Syndrome"[MeSH] AND "Homo sapiens" [Organism] was also used.

ArrayExpress database [31] of functional genomics experiments was searched for the terms: ''AMKL'', "Megakaryoblastic", "AML M7".

In order to obtain gene expression profile datasets for normal human MK cells, in addition to the 9 used in the original description of the TRAM software [25], we searched GEO for the expression ("Megakaryocytes" [Mesh] OR Megakaryoblast*) AND "Homo sapiens" [ORGANISM]. The ArrayExpress database was searched for the expressions "Megakaryocyte", "Megakaryocytic", "Megakaryoblast", "MK".

The searches were performed up to January 2013.

Dataset selection

The inclusion criteria of datasets in the analysis were: availability of the raw or pre-processed data; pediatric age of the subject from whom the sample was obtained; diagnosis of DS or non-DS AMKL or TMD.

Exclusion criteria were: exon arrays (hampering data elaboration by TRAM due to exceedingly high number of data rows) or platforms using probes split into several distinct arrays for each sample (hampering intra-sample normalization); lack of identifiers corresponding to those found in the GEO sample records (GSM) or ArrayExpress sample records; platforms assaying an atypical number of genes (i.e. <5.000 or >60.000); cell line derived data; specific subtype of non-DS AMKL, e.g. t(1;22); trisomy 21 in non-DS AMKL samples.

Normal MK samples were considered for the analysis when fulfilling these criteria: late MK colonies (10-14 days) or MK sorted cells, obtained from peripheral blood (PB), bone marrow (BM) or cord blood (CB). MK cultured for less than 10 days or Colony Forming Unit-Megakaryocytic (CFU-MK) were excluded.

In order to obtain a quantitative transcriptome map, values from each dataset were linearized when provided as logarithms. In some cases we used raw files (e.g. File CEL) to be converted into pre-processed data, using the software "Alt Analyze" [32].

TRAM (transcriptome mapper) analysis

TRAM (Transcriptome Mapper) software [25] allows the import of gene expression data recorded in the NCBI (National Center for Biotechnology Information) GEO and EBI (European Bioinformatics Institute) ArrayExpress databases in tab-delimited text format. It also allows the integration of all data by decoding probe set identifiers to gene symbols via UniGene data parsing [33], normalizing data from multiple platforms using intra-sample and inter-sample normalization (scaled quantile normalization) [34], creating graphical representation of gene expression profile through two ways, "Map" and "Cluster" mode, and determining the statistical significance of results. Moreover, TRAM allows to compare two biological conditions identifying critical genomic regions and genes with significant differential expressions.

We created a directory (folder) for each condition, containing all the sample datasets related to the same source and selected for the study: DS AMKL (pool 'A'); non-DS AMKL (pool 'B'); TMD (pool 'C'); normal MK (pool 'D'); normal CB MK (pool 'E').

We ran the whole set of analyses permitted by TRAM (in both "Map" and "Cluster" mode, although we focused on the "Map" mode) using default parameters as described [25]. We used an updated version of TRAM including enhanced resolution of gene identifiers and updated UniGene and Entrez Gene databases (TRAM 1.1, June 2013), in comparison with the original 2011 version [25]. When the gene location cytoband was not available in the Gene database [35], it was manually derived from UCSC Genome Browser [36]. TRAM is freely available at http://apollo11.isto.unibo.it/software. Briefly, gene expression values were assigned to individual loci via UniGene, intra-sample normalized as percentage of the mean value and inter-sample normalized by scaled quantile. The value for each locus, in each biological condition, is the mean value of all available values for that locus. The genome wide gene expression median value was used in order to determine percentiles of expression for each gene.

Using the "Map" mode graphical representation we searched for over/under-expressed genome segments, which have a window size of 500,000 bp and a shift of 250,000 bp. The expression value for each genomic segment is the mean of the expression values of the loci included in that segment. A segment is defined over/under-expressed if it has an expression value which is significantly different between two conditions analyzed, and contains at least 3 individually over/under-expressed genes, e.g. genes which have expression values within the highest and the lowest 2.5th percentile. Significance of the over/under-expression for single genes was determined by running TRAM in "Map" mode with a segment window of 12,500 bp. This window size corresponds to about a quarter of the mean length of a gene, so the significant over/under-expression of a segment almost always corresponds with that of a single gene. A segment or a gene was considered to be statistically significantly over- or under-expressed for q < 0.05, where q is the p-value obtained by the method of hypergeometric distribution [25] and corrected for multiple comparison. When the segment window contains more than one gene, the significance is maintained if the expression value of the over/under-expressed gene prevails over the others.

For the creation of the maps, TRAM software does not consider probes where the expression values is not available, assuming that an expression level has not been measured. Furthermore, it gives 95% of the minimum positive value present in a sample to those expression values equal to or lower than "0", in order to obtain meaningful numbers when we need to obtain a ratio between values in pool 'A' and pool 'B'. Assuming that in these cases an expression level is too low to be detected under the experimental conditions used, this transformation is useful to highlight differential gene expression. Finally, we considered the most over- or under-expressed genes among the genes associated with at least 5 data points.

At chromosomal level, we calculated (in the TRAM "chr" table) the median expression ratio for all the genes located in the same chromosome.

Other analysis

FuncAssociate analysis [37] was used to obtain Gene Ontology attributes in order to functionally characterize large sets of genes derived from the TRAM analysis.

Results

Literature search

A general search using the acronym ''AMKL'' retrieved 157 articles, 6 of them describe gene expression profiling experiments [17],[18],[20]-[23].

No additional pertinent item was retrieved using the expression described in the Methods section and including the MeSH Term "Leukemia, Megakaryoblastic, Acute".

Database search

The Gene Expression Omnibus (GEO) [30] search allowed the retrievement of three additional works describing data possibly useful for meta-analysis [19],[24],[38]. The lack of inclusion of these works in the literature search was due to failure of using the ''AMKL'' acronym and assigning the MeSH Term "Leukemia, Megakaryoblastic, Acute" during the PubMed indexing process (the more general term "Leukemia, Myeloid, Acute" was used).

The more general search using the expression "Down Syndrome"[MeSH] AND "Homo sapiens"[Organism] allowed the addition of one further work [39]. This work analyzed several types of AML samples and did not explicitly mention AMKL or AML M7 in both PubMed and GEO databases.

No further pertinent works related to AMKL were identified by ArrayExpress database [31] search.

Several datasets for normal MK cells global gene expression profile fulfilling the selection criteria were obtained from the works [40] (GEO, 7 samples) and [41] (ArrayExpress, 1 sample), in addition to the 4 sample series identified in the first report of the TRAM software [25] and obtained from different works [42]-[45], for a total of 19 datasets related to human normal MK cells.

Dataset building

Of the 10 works related to DS or non-DS AMKL retrieved as above described, 7 were considered for the meta-analysis (Table 1). It was not possible to obtain raw data from the Authors of [20], while the only sample of AML M7 described in [38] was related to "Leukemic Stem Cells" cell type and the two AML M7 reported by Tomasson et al. [39] were obtained from elderly patients. Raw data from [23] were kindly provided by Drs. Jeffrey Taub and Yubin Ge.

Table 1 Main features of the samples used in TRAM analyses

At the end, DS AMKL sample pool 'A' included 43 datasets, while non-DS AMKL sample pool 'B' was composed of 45 datasets. A TMD dataset pool 'C' was constructed starting from 20 samples described in some of the DS AMKL related articles [18],[21],[22]. Age and sex data were available for 29 out of 43 DS AMKL patients (mean age: 20 months; 11 males and 18 females), for 26 out of 45 non-DS AMKL patients (mean age: 19 months; 19 males and 7 females) and for 9 out of 20 TMD patients (mean age: 8 days; 7 males and 2 females). GATA1 mutations giving rise to GATA1s were present in all DS AMKL and TMD samples, and not in non-DS AMKL samples, considering all samples for which this information was provided. Sample identifiers and main sample features are listed in Table 1 and Additional file 1 (available at: http://apollo11.isto.unibo.it/suppl).

Two pools were constructed from the normal MK related dataset selected: pool 'D' included all available MK samples, while pool 'E' was a subset including only CB-derived MK cells (Table 1 and Additional file 1).

Transcriptome differential maps

Datasets were loaded into TRAM and analyzed obtaining 8 transcriptome maps: DS AMKL (pool 'A') vs. non-DS AMKL (pool 'B'); DS AMKL (pool 'A') vs. normal MK (pool 'D'); non-DS AMKL (pool 'B') vs. normal MK (pool 'D'); DS AMKL (pool 'A') vs. normal CB MK (pool 'E') cells; non-DS AMKL (pool 'B') vs. normal CB MK (pool 'E'); DS AMKL (pool 'A') vs. TMD (pool 'C'); TMD (pool 'C') vs. normal MK (pool 'D'); TMD (pool 'C') vs. normal CB MK (pool 'E').

For each comparison between two cell types by TRAM, we describe below or in the corresponding Figures or Tables the total of data points analyzed for each cell type, i.e. gene expression values for all human mapped loci following intra- and inter-sample normalization [25]; the number of loci for which the comparison between the two conditions was possible due to the presence of values for those loci in both sample pools considered; the number and the gene content of each genomic segment containing at least three over- or under-expressed genes and found to be statistically significantly over- or under-expressed in the comparison between the two tissues. Each genomic segment was identified among the 12,373 segments generated using the default window of 500,000 bp with a sliding window of 250,000 bp and following removal of overlapping segments with similar gene content. When the results were reported for the over/under-expressed single genes, we considered only the genes associated at at least 5 data points.

The description of the gene name corresponding to all gene symbols cited here in the text, Figures or Tables is given in the Additional file 2. We performed a PubMed search for the most relevant over- or under-expressed genes using gene symbol or gene description along with MeSH terms related to MK or MK progenitor cells, thrombopoiesis, AMKL, platelets.

Detailed results for each map are provided below, and are also available at: http://apollo11.isto.unibo.it/suppl.

The absolute (not differential) expression values and maps for each cell type (not compared to another cell type) are also available in the complete sets of results at http://apollo11.isto.unibo.it/suppl, but are not discussed here because they include typical housekeeping genes whose over-expression is no longer evident when compensated by the corresponding housekeeping genes in the compared cell type.

Transcriptome map comparison of DS AMKL vs. non-DS AMKL

We first analyzed regional differential expression of pool 'A' (43 DS AMKL samples) versus pool 'B' (45 non-DS AMKL samples) (Table 1). A total of 1,061,761 data points from the pool 'A' and 1,084,700 data points from the pool 'B' were included in the analysis. An 'A'/'B' ratio value was determinable for 25,954 loci having values both in 'A' and 'B' pools (Additional file 3). The main results are shown in Figure 1. Results obtained by the analysis included 3 significantly non-overlapped over-expressed segments (Table 2a). The highest expression ratio between DS AMKL and non-DS AMKL (3.27) was observed in a segment on chromosome 15 (15q21.2), including the known gene HDC (encoding for histidine decarboxylase, which converts L-histidine to histamine). The second segment with the highest expression was located on chromosome 4 (4q31.1) and contained over-expressed genes such as GYPE, GYPB and GYPA, encoding for glycophorin E, B and A (MNS blood group) respectively. The third over-expressed segment spans the cluster of apolipoprotein encoding genes on chromosome 19 (19q13.2).

Figure 1
figure 1

Main results of DS AMKL vs. MK, DS AMKL vs. non-DS AMK, non-DS AMKL vs. MK comparisons. For each comparison the number of loci analyzed, the most over- or under-expressed segments and single genes, and the highest and lowest median expression ratios for all the genes located in the same chromosome are indicated.

Table 2 Genomic segments significantly over- or under-expressed

At single gene level, a fold increase higher than 5 was observed in all of the first 20 genes with the greatest expression ratios of DS AMKL vs. non-DS AMKL samples (Table 3a and Additional file 3). In particular, a 24-fold increase was observed for SLITRK6 gene, encoding for a membrane protein to date described as similar to receptor for BDNF (brain-derived neurotrophic factor) and predominantly expressed in neural tissues. Among the genes with the lower 'A'/'B' expression ratios a 169.5-fold decrease was observed for a UniGene EST cluster, Hs.355689.

Table 3 List of the five most over- or under-expressed genes (all significantly, with q < 0.05)

At chromosomal level, we calculated (in the TRAM "Chr" table) the median 'A'/'B' expression ratio for all the genes located in the same chromosome. The highest ratios were near to 1 (0.93 for chr22, 0.92 for chrX and chr21, 0.91 for chr19); other values were in the range from 0.90 (chr17 and chr12) to 0.76 (chrY).

We performed two additional transcriptome maps to investigate specifically sex-biased gene expression patterns (data not shown, results may be regenerated by the user by excluding/including or reimporting samples on the basis of data provided in Additional file 1): in particular, we compared male (pool 'A.1', n=11) vs. female DS AMKL cells (pool 'A.2', n=18). These datasets are derived from the samples for which the knowledge about the sex of the sample donor was available. The results showed a significant statistical correlation of data between male and female gene expression data (r=0.99, p-value<0.0001), showing a large overlap of results between the two transcriptome maps, with the exception of single genes with a well known sex-biased expression pattern. For example, XIST, which is specifically activated in female cells to start the X-inactivation process, turns out to be the most differentially expressed gene between female (value=402.60) and male (value=12.20) DS AMKL cells (ratio=33).

Transcriptome map comparison of DS AMKL or non-DS AMKL vs. normal MK

Regional differential expression of pool 'A' (43 DS AMKL samples) or pool 'B' (45 non-DS AMKL samples) versus pool 'D' (19 normal MK cell samples, 411,381 data points) (Table 1) was investigated. An 'A'/'D' ratio value was determinable for 25,800 loci (Additional file 4). The main results are shown in Figure 1.

For what DS AMKL samples are concerned, results included 5 significantly differentially expressed segments in DS AMKL cells, 3 over- and 2 under-expressed (Table 2b). The highest expression ratio (20.81) between DS AMKL cells and normal MK was observed in the segment at coordinates 3,500,001-4,000,000 on chromosome 3, including the known gene LRRN1, encoding for a type I transmembrane protein. The second segment with highest expression ratio (18.07) was located on chromosome 1 (1q23) and contained FAM78B (family with sequence similarity 78, member B). The third segment was on chromosome 15 and included NR2F2-AS1, a non-coding RNA. The first significantly under-expressed segment (4q32.1) includes genes encoding for subunits of soluble guanylate cyclase (GUCY1A3 and GUCY1B3), while the second spans the cluster of MK specific genes located on chromosome 4 (4q12-q21).

At single gene level, a fold increase higher than 18 was observed in all of the first 20 genes with the greatest expression ratios of DS AMKL vs. MK samples (Table 3b and Additional file 4). In particular, a 59-fold increase was observed for TMEM241, encoding a transmembrane protein of unknown function. Among the genes with the lowest 'A'/'D' expression ratio a 589.7-fold decrease was observed for TSPAN10, encoding for tetraspanin 10.

At chromosomal level, the highest ratio was observed for chr21 (1.75), the lowest (chr17), other values were in the range from 1.68 (chrY) to a value of 1.23 for chr17.

Regarding the non-DS AMKL samples, results obtained by default analysis and derived from 'B'/'D' expression ratio for 25,819 loci included one significantly over- and one significantly under-expressed segment (Table 2c). The highest expression ratio (22.30) between non-DS AMKL and normal MK was observed in the same segment on chr15, significantly over-expressed also in the DS AMKL transcriptome map. This segment was the only significantly over-expressed one in this comparison. Similarly, the only significantly under-expressed segment includes the cluster of MK specific genes on chromosome 4 also found to be under-expressed in DS AMKL samples (Table 2b).

At single gene level, a fold increase higher than 31 was observed in all of the first 20 genes with the greatest expression ratios of non-DS AMKL vs. MK samples (Table 3c and Additional file 5). In particular, a 156-fold increase was observed for TMEM241. Overall, there was a remarkable overlap between the most over- (TMEM241, CMBL, ZNF445, SPRR4) and under-expressed (PF4V1, FLJ22184, FSIP2, PPP1R3B, HIST3H3, PIF1, SPSB4, ILDR1, MAP3K10, DRD4, LBX1, TSPAN10) genes in DS and in non-DS AMKL samples (Additional files 4 and 5).

At chromosomal level, looking at the median 'B'/'D' expression ratio for the genes located in the same chromosome, the highest ratio was observed for chr21 (1.77) and chrY (1.73), followed by chr13 (1.71) and chr10 (1.59); other values were in the range from 1.58 (chr20), to a value of 1.24 (chrX).

Transcriptome map comparison of DS AMKL or non DS AMKL vs. normal CB MK

Regional differential expression of pool 'A' (43 DS AMKL samples) or pool 'B' (45 non-DS AMKL samples) versus pool 'E' (8 normal cord blood (CB)-derived MK cell samples, 191,798 data points) (Table 1) was investigated. The main results are shown in Figure 2.

Figure 2
figure 2

Main results of DS AMKL vs. CB MK and non-DS AMKL vs. CB MK comparisons. For each comparison the number of loci analyzed, the most over- or under-expressed segments and single genes, and the highest and lowest median expression ratios for all the genes located in the same chromosome are indicated.

For what DS AMKL samples are concerned, results derived from 'A'/'E' expression ratio for 25,540 loci (Additional file 6) included 1 significantly over- and 3 under-expressed segments in DS AMKL cells (Table 4a). A remarkable expression ratio (80.36) between DS AMKL cells and normal CB MK was observed for a segment on chromosome 3 (3q22.1), including collagen-encoding COL6A5 and COL6A6 known loci. The three significantly under-expressed segments included the region on chromosome 4 (4q12-q21) with PF4, PPBP and CXCL3 loci implied in MK differentiation.

Table 4 Genomic segments significantly over- or under-expressed

At single gene level, a fold increase higher than 15.7 was observed in all of the first 20 genes with the greatest expression ratios of DS AMKL cells vs. CB MK (Table 5a and Additional file 6). In particular, a 45-fold increase was observed for the tyrosine phosphatase receptor gene (PTPRO), known to be involved in megakaryocytopoiesis.

Table 5 List of the five most over- or under-expressed genes (all significantly, with q <0.05)

At chromosomal level, the highest ratio was observed for chr21 (2.08), followed by chrY (1.89); other values were in the range from 1.84 (chr22) to 1.37 (chrX).

Regarding the non-DS AMKL samples, results derived from 'B'/'E' expression ratio (for 25,546 loci) included 2 significantly over- and 1 under-expressed segments in non-DS AMKL (Table 4b). A remarkable expression ratio between non-DS AMKL and normal CB MK (116.13) was observed in the same segment on chromosome 3 (3q22.1), including collagen-encoding COL6A5 and COL6A6 known loci that was observed in DS AMKL samples. The second segment was specific of non-DS AMKL samples and included the two olfactive receptor genes OR10A5 and OR10A4. The only significantly under-expressed segment included the region on chromosome 4 (4q12-q21) highly enriched in MK-specific loci (PF4, PPBP, CXCL5 and CXCL3) as in the case of DS AMKL samples, and was extended to PF4V1 locus.

At single gene level, a fold increase higher than 14.7 was observed in all of the first 20 genes with the greatest expression ratios of non-DS AMKL vs. CB MK samples (Table 5b and Additional file 7). In particular, a 33.8-fold increase was observed for PTPRO, encoding a tyrosine phosphatase receptor. Overall, there was some overlapping between the most over- and under-expressed genes in DS and in non-DS AMKL samples (Additional file 6 and Additional file 7).

At chromosomal level, regarding the median 'B'/'E' expression ratio for the genes located in the same chromosome, the highest ratio was observed for chrY (2.51), followed by chr21 (2.19); other values were in the range from 2.03 (chr20) to 1.46 (chrX).

Transcriptome map comparison of DS AMKL vs. TMD

Regional differential expression of pool 'A' (43 DS AMKL samples) versus pool 'C' (20 TMD samples, 398,162 data points) (Table 1) was investigated. The main results are shown in Figure 3.

Figure 3
figure 3

Main results of DS AMKL vs. TMD, TMD vs. MK, TMD vs. CB MK comparisons. For each comparison the number of loci analyzed, the most over- or under-expressed segments and single genes, and the highest and lowest median expression ratios for all the genes located in the same chromosome are indicated.

Results obtained by default analysis and derived from 'A'/'C' expression ratio for 25,955 loci (Additional file 8) included 2 significantly over-expressed segments in DS AMKL (Table 6a). The highest expression ratio (2.20) between DS AMKL and TMD was observed in a segment on chromosome 2 (2q31.3), including the known gene ITGA4, encoding an alpha 4 chain of integrin protein and CERKL, a gene responsible for retinitis pigmentosa and involved in the protection of cells from apoptosis induced by oxidative stress [47]. The second segment with the highest expression ratio (1.40) was located on chromosome 8 (8q21.3) and contained the NECAB1 and OTUD6B genes, encoding for neuronal Ca(2+)-binding protein and the deubiquitinating enzyme, respectively.

Table 6 Genomic segments significantly over- or under-expressed

At single gene level, a fold increase ranged from 15.5 to 3.5 for the first 20 genes with the greatest expression ratios of DS AMKL vs. TMD samples (Table 7a and Additional file 8). The highest fold increases were observed for the ZNF587B (15.5) and IFI27 (13.4) genes, encoding for a zinc finger protein and the interferon alpha-inducible protein 27, respectively. The lowest 'A'/'C' expression ratios were observed for KIAA2022 (10-fold decrease) and SLFNL1 (5-fold decrease) genes.

Table 7 List of the five genes most over- or under-expressed (all significantly, with q <0.05)

At chromosomal level, the highest ratio was observed for chr14 (0.87, followed by 0.86 for chr5, chr21, chr8, chr12 and chr16), the lowest for chrY (0.74).

Transcriptome map comparison of TMD vs. normal MK or CB MK cells

Regional differential expression of pool 'C' (20 TMD samples) versus pool 'D' (19 MK samples) or 'E' (8 CB MK samples) (Table 1) was investigated. The main results are shown in Figure 3.

For what MK samples are concerned, results obtained by default analysis and derived from 'C'/'D' expression ratios for 25,800 loci (Additional file 9) included 2 significantly over-expressed segments in TMD cells (Table 6b).

The highest expression ratio (28.0) between TMD and normal MK was observed in a segment on chromosome 3 including the known gene LRRN1, already observed as over-expressed in comparison of DS AMKL vs. normal MK (Table 2b). The second segment with the highest expression ratio (19.9) was located on chromosome 15, and contained the locus NR2F2-AS1, encoding for an antisense mRNA, already observed as over-expressed in comparison of DS AMKL and non-DS AMKL vs. normal MK (Table 2b and 2c).

At single gene level, the fold increase was higher than 16.5 for the first 20 genes with the greatest expression ratios (Table 7b and Additional file 9), with the highest fold increases for TMEM241 (126.4) and the cysteine hydrolase gene (CMBL) (91.1). The lowest 'C'/'D' expression ratios were observed for a member of the serine/threonine kinase family (MAP3K10) and the homeobox gene (LBX1) (both with 333-fold decrease).

At chromosomal level, the highest ratio was observed for chrY (2.1) and chr21 (1.93), followed by chr13 (1.71), the lowest for chrX (1.33).

As far as the comparison of TMD with CB MK samples is concerned, results derived from the 'C'/'E' expression ratio for 25,540 loci (Additional file 10) included only 1 significantly over-expressed segment in TMD cells (Table 6c). The segment with a significant high expression ratio (76.0) between TMD and normal CB MK cells was on chromosome 3 (3q22.1), including the known genes COL6A5 and COL6A6 and already observed as over-expressed in DS as well in non-DS AMKL samples in comparison with CB MK samples.

At single gene level, a fold increase ranged from 44.9 to 15.1 for the first 20 genes with the greatest expression ratios (Table 7c and Additional file 10). The highest fold increases were observed for CGA (44.9), encoding for the alpha chain of the glycoprotein hormones and PTPRO (34.7), as already observed in non-DS AMKL vs. CB MK comparison. The lowest 'C'/'E' expression ratio was observed for OLFM4 (50-fold decrease), encoding olfactomedin 4, an antiapoptotic factor that promotes tumor growth.

At chromosomal level, the highest ratio was observed for chr21 (2.40), followed by chrY (2.37) and chr20 (2.13), the lowest for chrX (1.53).

Comparison with previously published data

As a result of the analysis above described, a reference integrated map for the expression of about 26,000 mapped sequences (~75% known genes and ~25% expression sequence tags - ESTs) was de facto generated for five cell types (DS AMKL cells, non-DS AMKL cells, TMD cells, MK and CB MK). This gave us the opportunity to compare our data with the expression values of specific known genes from previously published works about the considered cell types.

Following analysis of the main literature about AMKL, we selected 38 genes of interest and have tabulated their expression values desumed from our 8 differential maps, comparing these values to the ones previously described in different experimental settings (Table 8).

Table 8 Comparison with previously published data

The wide agreement of expression ratio values for specific genes between our data, generated by systematic meta-analysis of hundred of thousands of gene expression values from any gene expression profile available, and the data obtained by different marker-specific methods in published quantitative studies, is relevant for the validation of our maps that may so be used for exploring any other expression ratio in the considered biological conditions.

Discussion

We have presented here a comprehensive analysis of transcriptome in human DS AMKL cells. Integration of data from different sources, including data obtained from different Authors using a variety of platforms, was made possible by a recent approach described by us for creation and analysis of transcriptome maps [25]. While most approaches are aimed to separate gene expression profiles related to the same biological source in subclasses, the TRAM tool provides means to integrate and summarize a pool of samples of the same biological origin leading to a global picture of gene expression for that condition. Moreover, TRAM identifies critical genomic regions and genes with significant differential expressions between two biological conditions.

Several Authors have determined gene expression profiles for DS or non-DS AMKL samples or have explicitly compared these two leukemic conditions. However, due to the rarity of the M7 subtype of leukemia and the need to limit the analysis to pediatric age because DS AMKL occurs almost exclusively in children, these studies were typically limited to small group of samples. In addition, most platforms used in the microarray studies are affected by omissions or errors in mapping a certain percentage of probes to specific loci in the genome. In our analysis, the use of a new version of the TRAM software (TRAM 1.1) allowed us to map thousands of previously uncharacterized microarray probes and to avoid the errors in probe assignment to human loci often present in the data supplied by the manufacturer along with the platforms.

Our data are derived from systematic integration of data from multiple sources at locus level (up-to-date rigorous assignment of each microarray probe to a specific human locus/transcript/EST cluster), map level (up-to-date fine mapping of each transcript on the genome map) and expression value level (assignment of a reference value to each locus in each cell type following an intra- and inter-sample normalization pipeline exploiting both parametric and non-parametric calculations). The combination of many gene expression profile datasets from different sources poses the problem of the batch effect, i.e. the systematic differences between batches (groups) of samples in microarray experiments due to purely technical reasons. However, the intrinsic resistance of the TRAM approach to the batch effect has been discussed previously [25], and it is indirectly confirmed by the clear biological meaning of the differential expression highlighted by the tool when comparison with previous direct key experimental knowledge is possible in several different types of tissues and organs [25],[34],[54].

A systematic comparison of AMKL originated from trisomy 21 cells versus non-trisomy 21 cells should highlight specific mechanisms [55] related to the presence of an extra copy of chr21 in DS children developing AMKL. Moreover, we presented a comparison with normal MK cells that has never been performed in other analyses about AMKL. Our global quantitative models of the transcriptome in the AMKL cells could also be useful to test hypotheses for correlations between any parameter associated to the condition (e.g., specific mutations or phenotype aspects) and specific changes in gene expression.

Our results, obtained in an integrated and open setting without any a priori assumption, show several previously unidentified aspects regarding specificity of AMKL originated by trisomy 21 cells.

First, there are only a few genomic regions significantly over- or under-expressed when comparing DS versus non-DS AMKL samples. This finding suggests that transcriptome maps of these two conditions are similar while on the other hand allows to focus to a small set of regions that appears to be critical in order to differentiate these disease conditions. Relevant differences regarding genes were reported in Table 2a (genomic segments) and Table 3a (single genes), with potential implications for the identification of diagnostic or therapeutic targets. There are three main regions over-expressed in DS AMKL vs. non-DS AMKL (Table 2a). The first one (15q21.2) contains HDC gene, whose mRNA is translated in the enzyme converting L-histidine to histamine produced by only a few cell types [56]; HDC mRNA increase has been shown to be associated to basophilic rather than to MK differentiation of pluripotent hematopoietic cells [57]. These observations led to the discovery of a skewing toward a potential basophilic differentiation for DS AMKL not highlighted in the original works from which the data were derived. Supporting this hypothesis, FCER1A mRNA, encoding the alpha subunit of the high-affinity IgE receptor, the initiator of the allergic response and strongly typical of basophilic differentiation, is 6.3 times over-expressed (the 8th most over-expressed known gene) in DS vs. non-DS AMKL, reinforcing the notion that in DS AMKL but not in non-DS AMKL the leukemic dedifferentiation involved the possibility of redirection toward basophilic differentiation. Remarkably, in [58] is demonstrated by an electron microscopy analysis that AMKL blast cells from children with DS may contain basophil-like granules which were almost totally absent in blasts from children with non-DS AMKL or adults with AMKL, so that our data allow to the visualization of the molecular correlation at the level of the whole transcriptome of a morphological feature observed more than 20 years ago. Two other genomic regions are over-expressed in DS vs. non-DS AMKL blasts. The first is the region of glycophorins genes (GYPE, GYPA, GYPB) on chr4, erythroid surface markers [59]: this reinforces the concept of a disturbance of multilineage myeloid hematopoiesis in DS AMKL and has been observed by flow cytometry in the non-neoplastic hematopoiesis itself in trisomy 21 [60]. The other is the region of apolipoproteins genes (APOC1, APOC2, APOE) on chr19 that has been described as a signature of progression from TMD to DS AMKL in a gene expression profile ([20], it was not possible to include this in our analysis), further underlining its specificity for DS vs. non-DS MK blast cells.

When grouping expression values by chromosome, the chromosome with the greatest global RNA output was chr21 in both TMD and DS AMKL vs. normal CB MK; we observed the same result in both DS and non-DS AMKL vs. normal MK comparisons. These data suggest that over-expression of chr21 genes is a key factor in AMKL development. In particular, ADAMTS1, encoding a protease known to inhibit angiogenesis [61] is the most over-expressed chr21 gene in DS vs. non-DS AMKL comparison. It is interesting to notice that this gene has been correlated to pediatric leukemias (ALL and DS AML) [49],[62] but its exact role in leukemogenesis is still to be discovered. Our quantitative approach summarizing all values for each locus may clearly highlight mean expression ratio near to 1.5:1 for several chr21 genes when comparing DS AMKL vs. non-DS AMKL. This observation is consistent with the presence of an additional copy of the considered genes in trisomic cells. For example, GABPA expression presents a ratio close to 1.5:1 expected in trisomy of chr21, as well as other chr21 genes (DYRK1A, SON, BACH1) (Table 8). Even relatively small but significant differences (around 1.5-fold) in expression of numerous genes likely produce an aggregate effect, as observed in [63], where the same genes seem to be candidates to explain the impact of trisomy 21 in hematopoiesis abnormalities. For this reason it could be interesting to start from significant and robust meta-analysis data to plan functional approaches in the future. Moreover, our data are consistent with the previous observation that RUNX1, ERG and ETS2 oncogenes, although located on chr21, are not over-expressed in DS vs. non-DS AMKL [18],[22]. It has also been recently demonstrated that they are not located on a chr21 duplicated minimal region in two cases of AML of M0 subtype (FAB classification) [64]. As regard oncogenes, if chr21 oncogenes cited above appear not to be over-expressed in DS AMKL, TRIB1 (chr8) may be a novel important oncogene for DS AMKL and its mutation is an earlier genetic event in leukemogenesis [65]. In particular, it has been shown that a mutation of TRIB1 [65], a myeloid oncogene whose protein product is able to enhance ERK phosphorylation and to promote degradation of C/EBP family transcription factors, is a gain-of-function mutation remaining in leukocytes of the remission stage in which GATA1 mutation disappeared. Our results show a mean value of 1.40 for the human TRIB1 expression ratio between DS and non-DS AMKL samples, with a higher over-expression observed when comparing leukemic samples to normal MK (ratio 3.15 for DS AMKL and 2.26 for non-DS AMKL).

Although the expression of chr21 genes as a whole, or of some individual chr21 genes, may be coherent with the 3:2 gene expression ratio model in the comparison between trisomic and euploid cells, we note that discrepancies from this ideal model seen in our data for some of the comparisons we have made (Figures 1, 2 and 3) may be ascribed to the complexity of gene regulation in the aneuploid state [66],[67], to individual variability [68] as well as to the general dysregulation typical of the neoplastic state for which specific cell types we analyze is concerned.

Additional biologically relevant findings came from comparison of each type of megakaryoblastic leukemic condition with normal MK cells. Due to the role of the microenvironment in the hemopoiesis, including hemopoiesis in DS [69], it is expected that DS MK would present growth alterations due to trisomy 21 in both hemopoietic and microenvironment cells. From this point of view, DS MK cells would be the ideal control for the progression of a trisomic cell toward TMD and AMKL, however no gene expression profile dataset was available for this cell type. We propose here a biological model of the transcriptome depicting progressive changes from normal MK to TMD and then to DS AMKL, able to underline both shared and unique transcriptome map patterns for DS and non-DS variants of AMKL (Figure 4).

Figure 4
figure 4

Biological model of the transcriptome depicting progressive changes from MK to TMD then to DS AMKL. Downward pointing arrow indicates the repression of genes involved in MK differentiation; upward pointing arrow indicates the over-expression of potential molecular markers of progression to AMKL. Value: mean gene expression value normalized across all the pool samples. §Observed both in DS and non-DS AMKL.

Noteworthy, the genomic segment on chr4 known to contain a cluster of genes highly specific for MK differentiation [25],[70], was the highest significantly under-expressed segment in both DS and non-DS AMKL in comparison with normal MK. In particular, the more strongly under-expressed region 4q12-q21 contains a cluster of genes, including PF4 (encoding for the platelet factor 4, a main component of platelet alpha granules) [71]-[73], and PPBP (encoding for beta-thromboglobulin) [70], that are the most up-regulated transcripts in the megakaryocytic differentiation from CD34+ hematopoietic progenitors [25]. This finding highlights a common final outcome of the block of MK cells differentiation in both DS and non-DS AMKL. It should be underlined that this result came from systematic ab initio analysis of more than 12,000 segments on the human genome including about 26,000 mapped loci, thus highlighting that this region critical for the MK differentiation is actually the more repressed in absolute when comparing transcriptome maps of AMKL (DS or non-DS) and normal MK cells.

In addition, the most under-expressed gene in TMD blasts when compared to normal MK cells is MAP3K10, encoding an activity of mitogen-activated protein kinase kinase kinase (MAPKKK). It is known that the mitogen-activated protein kinase (MAPK) pathway is involved in and is sufficient for megakaryocytic differentiation [74],[75]. MAPK activity is present in several tens of human proteins, and we have identified the member MAP3K10 as the critically repressed gene in the block of MK differentiation in the development of leukemia with MK features in that it appears down-regulated 300-fold in TMD cells and 500-fold in both DS and non-DS AMKL compared with normal MK samples (Tables 3 and 7). Finally, transcript for MPL, the receptor of thrombopoietin which is the primary regulator of normal thrombopoiesis (the formation of platelets) [53],[73],[76], is decreased by ~70% in either TMD, DS and non-DS AMKL cells vs. normal MK cells.

An exceedingly high over-expression of the gene located on chromosome 18 and encoding for the uncharacterized membrane protein TMEM241 has been found in both DS (59-fold) and non-DS (156-fold) AMKL cells vs. normal MK. Although this probe was not present in all considered experimental platforms, its extreme differential expression makes it a candidate for further studies as a marker of progression from normal MK to AMKL blasts, also due to its 126-fold over-expression in TMD vs. MK cells.

Moreover, we identified several signatures of progression specifically to DS AMKL. Remarkably, segments and genes up- or down-regulated in TMD in comparison with normal MK cells were highly similar to those specifically found in DS AMKL, underlining striking similarities between TMD and DS AMKL at the level of the whole transcriptome (already noted in [21], in their smaller set). On the other hand, a direct comparison between TMD and DS AMKL shows specific potential markers of progression to DS AMKL. As cited above, apolipoproteins genes (APOC1, APOC2, APOE) have been described as a signature of progression from TMD to DS AMKL [20] and it is interesting to notice that APOC2 is among the 20 most expressed genes in the comparison between TMD and MK (25.88-fold increase), showing a progressive increase of expression from normal MK to TMD and then to DS AMKL. In our analysis, ZNF587B appears to be the most discriminant marker between TMD and DS AMKL. Again, this observation offers the opportunity to integrate and discuss single genes and pathways previously described as abnormally expressed in DS or non-DS AMKL (Table 8). For example, the PRAME gene, encoding for a tumor antigen [21] was identified as a specific marker for DS AMKL blasts (n=7), with no expression in TMD (n=9). While our meta-analysis on PRAME expression data points (36 for DS AMKL and 11 for TMD) confirmed a clear over-expression of PRAME in DS AMKL (4.2-fold increase, Table 8), it was not the most discriminant marker, that was exactly ZNF587B, while PRAME was the 33rd out of 25,955 transcripts ordered by decreasing DS AMKL vs. TMD expression ratio (Additional file 8).

Finally, since leukemias in infants or young children originate from fetal hematopoietic cells [17],[18],[26],[27] and the progenitors (fetal/neonatal MKP) are present in the cord blood [28],[29], comparisons with CB MK cells have been also performed. Data from DS and non-DS AMKL vs. CB MK comparisons confirmed the repression of the clusters of genes expressed in MK. The over-expression of a region with collagen genes emerged both in DS and non-DS AMKL as well as that of the single gene PTPRO (Table 5a and 5b), encoding for a tyrosine phosphatase receptor known to be involved in megakaryocytopoiesis and whose mRNA targeting by antisense oligonucleotides results in inhibited MK progenitor proliferation [77]. On the other hand, the difference between DS and non-DS AMKL vs. CB MK is shown by the over-expression of the two olfactive receptor genes OR10A5 and OR10A4 (Table 4) only in non-DS derived cells.

The analysis of enrichment in specific gene functions using the tool FuncAssociate, for the 100 most over- or under-expressed genes in the comparison of DS vs. non-DS AMKL and of both of them vs. MK cells gave no significant results, other than a significant enrichment in genes involved in sequence-specific DNA-binding in non-DS AMKL vs. MK cells (data not shown), highlighting the relevance of remodeling the transcription factor network in leukemia.

Conclusions

Our results provide a systematic meta-analysis using any available gene expression profile dataset related to AMKL in pediatric age. These allow to identify more general trends and to produce a highly coherent view of the transcriptome depicting progressive changes from MK to TMD and then to DS AMKL. We believe that the originality of our results is due to several concurrent original features of the TRAM 1.1 platform. Advantages and relevant differences are: integration of the largest possible number of samples; integrated analysis of the largest possible number of genes (the integration of different platforms led us to assess expression ratio for about 26,000 loci, quantitating almost 4,000 genes in addition to the widely used platform U133A when used alone); absence of a priori filtering (in several works, this led to actual analysis of less than 50% of the genes present on the experimental platform); characterization at regional/map level in the study of gene expression (usually absent in the works from which data were obtained), relevant with regard to the study of an aneuploidy such as trisomy 21.

These results provide a new integrated model of the whole human transcriptome in DS and non DS AMKL, TMD and normal human MK cells, providing hints about pathophysiology of AMKL and also being useful to highlight possible clinical markers.

Additional files