A DNA microarray survey of gene expression in normal human tissues
- 39k Downloads
Numerous studies have used DNA microarrays to survey gene expression in cancer and other disease states. Comparatively little is known about the genes expressed across the gamut of normal human tissues. Systematic studies of global gene-expression patterns, by linking variation in the expression of specific genes to phenotypic variation in the cells or tissues in which they are expressed, provide clues to the molecular organization of diverse cells and to the potential roles of the genes.
Here we describe a systematic survey of gene expression in 115 human tissue samples representing 35 different tissue types, using cDNA microarrays representing approximately 26,000 different human genes. Unsupervised hierarchical cluster analysis of the gene-expression patterns in these tissues identified clusters of genes with related biological functions and grouped the tissue specimens in a pattern that reflected their anatomic locations, cellular compositions or physiologic functions. In unsupervised and supervised analyses, tissue-specific patterns of gene expression were readily discernable. By comparative hybridization to normal genomic DNA, we were also able to estimate transcript abundances for expressed genes.
Our dataset provides a baseline for comparison to diseased tissues, and will aid in the identification of tissue-specific functions. In addition, our analysis identifies potential molecular markers for detection of injury to specific organs and tissues, and provides a foundation for selection of potential targets for selective anticancer therapy.
KeywordsAdditional Data File Normal Human Tissue Specific Biological Process Stanford Microarray Database Normal Tissue Specimen
DNA microarrays [1, 2] have been used to profile gene expression in cancer and other diseases. In cancer, for example, microarray profiling has been applied to classify tumors according to their sites of origin [3, 4, 5], to discover previously unrecognized subtypes of cancer [6, 7, 8, 9, 10, 11], to predict clinical outcome [12, 13, 14] and to suggest targets for therapy [15, 16]. However, the identification of improved markers for diagnosis and molecular targets for therapy will depend on knowledge not only of the genes expressed in the diseased tissues of interest, but also on detailed information about the expression of the corresponding genes across the gamut of normal human tissues.
At present there is relatively little data on gene expression across the diversity of normal human tissues [17, 18, 19, 20]. Here we report a DNA microarray-based survey of gene expression in a diverse collection of normal human tissues and also present an empirical method for estimating transcript abundance from DNA microarray data.
Hierarchical clustering of gene expression in normal tissues
The two-way unsupervised analysis also identified clusters of coexpressed genes (annotated in Figure 1), which represented both tissue-specific structures and systems (discussed further below) and coordinately regulated cellular processes. For example, on the basis of the shared characteristics of well annotated genes in the clusters, we identified clusters representing cell proliferation , mitochondrial ATP production, mRNA processing, protein translation and endoplasmic reticulum-associated protein modification and secretion. Interestingly, proliferation, mitochondrial ATP production and protein translation were each represented by two distinct clusters of genes, suggesting that subsets of these functions might be differentially regulated among different tissues. One gene cluster corresponded to sequences on the mitochondrial chromosome ; we interpret this feature to reflect the relative abundance of mitochondria in each tissue sample.
Identifying tissue-specific gene expression
Estimating transcript abundance
DNA microarray experiments are often performed as comparative two-color hybridizations, permitting precise quantification of the ratio of each gene's expression between two samples. In the experiments reported here, each tissue sample was compared by hybridization to the same 'common reference' mRNA (see Materials and methods), a standard experimental design permitting the comparison of expression across all samples . Therefore, the primary measurements give us a precise picture of the variation in relative levels of each gene's expression among the samples. While this information is sufficient for many purposes, a quantitative comparison of the expression levels of transcripts of different genes is also of interest, for example in selecting especially highly expressed genes for potential diagnostic markers or therapeutic targets. Single-channel fluorescence intensities can provide a crude estimate of the relative transcript abundance of different genes, but do not control for the variable quantities of spotted DNA.
The utility of this approach is illustrated for the cluster of prostate-specific genes (derived from the hierarchical cluster in Figure 1), and is evident on comparing results depicting the relative level of each gene's expression in different samples (Figure 4b), and the relative levels of transcripts for different genes (Figure 4c). While all genes within the prostate-specific cluster were expressed at relatively increased levels in prostate compared with other tissues, estimates of transcript abundance indicated that only a subset of these genes was highly expressed in the prostate (Figure 4c). For example, RDH11 was highly expressed in prostate and was expressed at lower levels in other tissues, while STEAP2 was expressed at low levels in prostate and displayed very little or no expression in other tissues. For each of the tissue types, transcripts identified as both highly abundant and tissue specific are displayed in Additional data files 5 and 8 (for the transcript levels of all variably expressed genes, see Additional data file 2).
The main objective of our study was to survey variation in gene expression across a diverse set of normal human tissue types. We have reported here a cDNA microarray gene-expression dataset profiling approximately 26,000 human genes across 115 human tissue specimens representing 35 different tissue types. An unsupervised, two-way hierarchical clustering of the genes whose expression varied most across samples showed that at the level of gene expression, the relationship among tissues was in large part based on their anatomic locations, cellular compositions and physiologic functions. Tissue-specific features of gene expression were readily discernable in the hierarchical cluster, as were gene-expression features related to specific cellular processes (as inferred from the named genes within these features). Of particular importance, the function of uncharacterized ESTs might be deduced by virtue of their inclusion in one of these clusters. Supervised analysis also identified genes selectively expressed in each of the tissues types studied, and the analysis of functionally annotated gene sets provided information on the tissue distribution of specific biological processes, cellular components and molecular functions.
We have also reported here the application of mRNA versus genomic DNA hybridizations for estimating transcript abundances for expressed genes. Knowledge of transcript abundance should prove useful in prioritizing candidate genes for use as diagnostic markers or therapeutic targets, for which more highly expressed genes might be more tenable candidates. It is worth pointing out that our approach for estimating absolute transcript levels should be applicable to any cDNA microarray study incorporating a common reference mRNA.
While many investigators have been using DNA microarrays to profile gene expression in cancer and other human diseases, scant data exist on profiles of gene expression across the diversity of normal human tissues. Our cDNA-microarray-based survey of gene expression in normal human tissues provides a publicly accessible dataset which can be used in future analyses aimed at better understanding the physiology of various normal tissues; developing a baseline for comparison to diseased tissues, including cancer; identifying tissue-specific diagnostic markers that signify tissue injury; discovering tissue-specific therapeutic targets (for example, for treatment of prostate cancer); and identifying tumor-specific diagnostic markers and therapeutic targets, for which minimal expression in the collection of normal adult human tissues is desirable.
We have used cDNA microarrays to survey gene expression across a diverse set of normal human tissues. Using unsupervised and supervised analyses, we have identified tissue-specific patterns of gene expression. Furthermore, by comparative hybridization to normal genomic DNA, we were able to estimate transcript abundances and identify the subsets of abundantly expressed tissue-specific genes. Our dataset provides a baseline for comparison to diseased tissues, as well as a basis for identifying molecular markers of injury to specific organs and tissues, and for anticancer therapy.
Materials and methods
Normal human tissue specimens were obtained from surgery (for example, the uninvolved regions of resected tumors) or from autopsy, with institutional review board approval. Specimens were frozen on dry ice within 30 minutes of surgical removal or procurement and stored at -80°C. Histological evaluations were performed by H&E staining of frozen sections, and a pathologist (J.H. and/or M.vdR.) reviewed all slides to confirm the anatomical site of origin and histological normalcy (that is, to rule out inflammation, infection, necrosis, malignancy). In total, we selected for study 115 tissue samples representing 35 different human tissues (Additional data file 1). Total RNA was isolated from tissues using TRIzol Reagent (Invitrogen) according to the manufacturer's instruction, and RNA quality was assessed by the integrity of rRNA bands following gel electrophoresis. The poly(A)+ mRNA fraction was then isolated from total RNA using FastTrack2.0 kit (Invitrogen), and quantified by UV spectrophotometry.
Gene-expression profiling was performed essentially as reported previously , and detailed protocols for array fabrication and hybridization are available online . Briefly, Cy5-labeled cDNA was prepared using 2 μg mRNA from normal tissue samples, and Cy3-labeled cDNA was prepared using 1.5 μg mRNA common reference, pooled from 11 established human cell lines . For each experimental sample, Cy5- and Cy3-labeled samples were co-hybridized to a cDNA microarray containing 39,711 human cDNAs, representing 26,260 different genes (UniGene clusters ). For the common reference mRNA (Cy5) versus genomic DNA (Cy3) comparisons, normal female genomic DNA was labeled as described . Following hybridization, microarrays were imaged using an Axon GenePix 4000 scanner (Axon Instruments). Fluorescence ratios for array elements were extracted using GenePix software, and uploaded onto the Stanford Microarray Database (SMD)  for subsequent analysis. The complete microarray dataset is accessible from SMD , or from the Gene Expression Omnibus  (accession number GSE2193).
Fluorescence ratios were normalized by mean-centering genes for each array (that is, 'global' normalization), and then by mean centering each gene across all arrays. We included for analysis only well-measured genes whose expression varied, as determined by: signal intensity over background more than twofold in either test or reference channels in at least 75% of samples; and a fourfold or more ratio variation from the mean in at least two samples (unless otherwise indicated). Hierarchical clustering was performed and displayed using Cluster and TreeView software . Tissue-selective genes were identified using the two-class (each tissue versus all other tissues) significance analysis of microarrays (SAM) method , which utilizes a modified t-test statistic and sample-label permutations to evaluate statistical significance. The false-discovery rate (FDR), an estimate of the fraction of falsely called tissue-selective genes, varied by tissue, but in all cases was less than 5% (specific FDRs are listed in Additional data file 3). For tissue-selective genes, only tissue types with two or more samples were considered for analysis, and we only considered genes that were well-measured in more than 50% of the samples for the selected tissue type analyzed. GO annotations were assigned to arrayed genes using the AmiGO browser  to select relevant GO annotations, and the 'loc2go' file  to identify the corresponding sets of genes. Transcript abundance was estimated by multiplying (for each gene) the ratio of tissue sample mRNA versus common reference mRNA by the ratio (average ratio from triplicate experiments) of common reference mRNA versus normal female genomic DNA. Highly-abundant tissue specific transcripts were defined for each tissue type as the top (capped at 50 genes) tissue specific transcripts, identified using the SAM method, from the 1,000 most abundantly expressed transcripts in the full dataset.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 is a table listing the normal tissue specimens included in microarray analysis. Additional data file 2 is a table listing the variably expressed genes. Additional data file 3 is a table listing tissue-specific transcripts. Additional data file 4 is a table listing functionally annotated gene sets. Additional data file 5 is a table listing highly abundant tissue-specific transcripts. Additional data file 6 is a figure showing tissue-specific gene expression. Additional data file 7 is a figure showing expression of functionally annotated gene sets. Additional data file 8 is a figure showing highly abundant tissue-specific gene expression.
We thank Ash Alizadeh and the members of the Pollack and Brown labs for helpful suggestions. We also thank Janet Mitchell and the Stanford Tissue Bank for collection of tissues, Mike Fero and the staff of the Stanford Functional Genomics Facility for providing high-quality cDNA microarrays, and Gavin Sherlock and Catherine Ball of the Stanford Microarray Database group for providing outstanding database support. This work was supported by a grant from the National Cancer Institute. P.O.B is an investigator of the Howard Hughes Medical Institute.
- 9.Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, et al: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA. 2001, 98: 13790-13795. 10.1073/pnas.191502998.PubMedPubMedCentralCrossRefGoogle Scholar
- 10.Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach M, van De Rijn M, Rosen GD, Perou CM, Whyte RI, et al: Diversity of gene expression in adenocarcinoma of the lung. Proc Natl Acad Sci USA. 2001, 98: 13784-13789. 10.1073/pnas.241500798.PubMedPubMedCentralCrossRefGoogle Scholar
- 11.Lapointe J, Li C, Higgins JP, Van De Rijn M, Bair E, Montgomery K, Ferrari M, Egevad L, Rayford W, Bergerheim U, et al: Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci USA. 2004, 101: 811-816. 10.1073/pnas.0304146101.PubMedPubMedCentralCrossRefGoogle Scholar
- 14.Leung SY, Chen X, Chu KM, Yuen ST, Mathy J, Ji J, Chan AS, Li R, Law S, Troyanskaya OG, et al: Phospholipase A2 group IIA expression in gastric adenocarcinoma is associated with prolonged survival and less frequent metastasis. Proc Natl Acad Sci USA. 2002, 99: 16203-16208. 10.1073/pnas.212646299.PubMedPubMedCentralCrossRefGoogle Scholar
- 15.Armstrong SA, Kung AL, Mabon ME, Silverman LB, Stam RW, Den Boer ML, Pieters R, Kersey JH, Sallan SE, Fletcher JA, et al: Inhibition of FLT3 in MLL. Validation of a therapeutic target identified by gene expression based classification. Cancer Cell. 2003, 3: 173-183. 10.1016/S1535-6108(03)00003-5.PubMedCrossRefGoogle Scholar
- 23.Perou CM, Jeffrey SS, van de Rijn M, Rees CA, Eisen MB, Ross DT, Pergamenschikov A, Williams CF, Zhu SX, Lee JC, et al: Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci USA. 1999, 96: 9212-9217. 10.1073/pnas.96.16.9212.PubMedPubMedCentralCrossRefGoogle Scholar
- 28.Pat Brown's Lab Protocols. [http://brownlab.stanford.edu/protocols.html]
- 30.Stanford Microarray Database. [http://smd.stanford.edu]
- 31.Gene Expression Omnibus (GEO). [http://www.ncbi.nlm.nih.gov/geo]
- 32.EisenLab Software. [http://rana.lbl.gov/EisenSoftware.htm]
- 34.NCBI loc2go. [ftp://ftp.ncbi.nih.gov/refseq/LocusLink/loc2go]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.