Introduction

Rheumatoid arthritis (RA) is a chronic inflammatory disease causing synovial joint damage, disability, and a shortened life expectancy (1,2). An awareness of the destructive potential of RA has led to more aggressive use of disease-modifying anti-rheumatic drugs (DMARDs) (3) and the development of immune therapies targeted to molecules and cells important in the pathogenesis of RA. These include the TNF inhibitors infliximab, etanercept, and adalimumab (4). Synovial joint damage occurs early in the disease course, and many patients demonstrate erosions within a few months after becoming symptomatic (5). Recent evidence suggests that early aggressive therapy (infliximab and methotrexate) yields greater benefit than similar therapy after failure of other drugs (68). To initiate early aggressive therapy requires reliable and rapid determination of diagnosis and prognosis. In addition, factors used to predict a poor prognosis, including sex, age of onset, multiple joint involvement, rheumatoid factor, and the presence of the shared epitope of HLA-DR4, are not always reliable (912).

Gene expression profiling may allow early diagnosis, aid in identifying factors that predict poor prognosis, and help focus early, aggressive, and expensive therapy to those that would benefit the most. Expression analysis of tissues taken at the site of disease within a synovial joint is invasive and impractical on a routine basis. However, recent studies have demonstrated unique gene expression changes in peripheral blood mononuclear cells (PBMCs) from patients with cancer, multiple sclerosis, and lupus (1317). In this study, a genomewide scan of PBMCs from normal volunteers and RA PBMCs was performed using oligonucleotide arrays representing 6800 human genes to explore gene expression in the PBMCs of individuals with RA.

Materials and Methods

Patient Selection

Patients with RA, defined by American College of Rheumatology (ACR) criteria (18), were identified in a rheumatology clinic with approval from the local research ethics committee. Demographic data including age, sex, and time since diagnosis were collected. A tender joint count (TJC 0–28), swollen joint count (JC 0–28), patient’s best global assessment (visual analog scale), and erythrocyte sedimentation rate (ESR) were performed to calculate a 28-joint disease activity score (DAS28). The presence of rheumatoid factor (RF) and the use of DMARDs were recorded. Blood was also collected from healthy volunteers with no previous diagnosis of RA or other chronic inflammatory diseases.

Isolation of RNA and Preparation of Labeled Hybridization Solutions

An 8-mL sample of venous blood was collected into CPT Vacutainer cell purification tubes (Becton Dickinson, Franklin Lakes, NJ, USA) and refrigerated immediately. Samples were immediately transferred to the laboratory, and PBMCs from the 9 RA and 13 normal volunteers were separated according to the manufacturer’s recommendations. Briefly, the tube was centrifuged at 1500g (2700 rpm) at room temperature, and PBMCs were isolated before being washed twice in PBS. Total RNA was extracted using the RNeasy minikit (Qiagen, Valencia, CA, USA). For each sample, 2 µg total RNA was used to generate cDNA as described (19). RNA quality was determined by observing distinct 28S and 18S ribosomal bands on an agarose gel. First-strand cDNA synthesis was performed under the following buffer conditions: 1× 1st-strand buffer (Invitrogen Life Technologies, Carlsbad, CA, USA), 10 mM DTT (Gibco/Invitrogen), 500 µM of each dNTP (Invitrogen Life Technologies), 400 units Superscript RT II (Invitrogen Life Technologies), and 40 units RNase inhibitor (Ambion, Austin, TX, USA). The reaction proceeded at 47°C for 1 h. Second-strand cDNA was synthesized with the addition of the following reagents at the final concentrations listed: 1× 2nd-strand buffer (Invitrogen Life Technologies), an additional 200 µM of each dNTP (Invitrogen Life Technologies), 40 units E. coli DNA polymerase I (Invitrogen Life Technologies), 2 units E. coli RNaseH (Invitrogen Life Technologies), and 10 units E. coli DNA ligase. The reaction proceeded at 15°C for 2 h; during the last 5 min of this reaction, 6 units T4 DNA polymerase (New England Biolabs, Beverly, MA, USA) was added. The resulting double-stranded cDNA was purified with the use of BioMag carboxyl-terminated particles as follows: 0.2 mg BioMag particles (Polysciences, Warrington, PA, USA) were equilibrated by washing three times with 0.5 M EDTA and resuspended at a concentration of 22.2 mg/mL in 0.5 M EDTA. The double-stranded cDNA reaction was diluted to a final concentration of 10% PEG/1.25 M NaCl, and the bead suspension was added to a final bead concentration of 0.614 mg/mL. The reaction was incubated at room temperature for 10 min. The cDNA/bead complexes were washed with 300 µL of 70% ethanol, the ethanol was removed, and the tubes were allowed to air dry. The cDNA was eluted with the addition of 20 µL of 10 mM Tris-acetate, pH 7.8, and incubated for 2 to 5 min, and the cDNA-containing supernatant was removed.

Purified double stranded cDNA (10 µL) was added to an in vitro transcription (IVT) solution which contained 1× IVT buffer (Ambion), 5000 units T7 RNA polymerase (Epicentre Technologies, Madison, WI, USA), 3mM GTP, 1.5 mM ATP, 1.2 mM CTP, and 1.2 mM UTP (Amersham/Pharmacia), 0.4 mM each bio-16 UTP and bio-11 CTP (Enzo Diagnostics, Farmingdale, NY, USA), and 80 units RNase inhibitor (Ambion). The reaction proceeded at 37°C for 16 h. Labeled RNA was purified with the use of an RNeasy kit (Qiagen). The RNA yield was quantified by measuring absorbance at 260 nm.

Hybridization to Affymetrix Microarrays and Detection of Fluorescence

Eleven in vitro synthesized transcripts from segments of bacterial genes were included in each hybridization reaction to generate a global standard curve to normalize the oligonucleotide microarrays to each other and estimate the sensitivity of the arrays (20). Purified biotinylated cRNA (10 µg) was hybridized to oligonucleotide arrays comprised of 6937 human gene qualifiers (human FL6800 array P/N900183, Affymetrix, Santa Clara, CA).

Raw fluorescent intensity values were collected and reduced with GeneChip v3.2 software (Affymetrix) as described (Affymetrix GeneChip Analysis Suite User Guide). This determined the probability of each gene qualifier represented on the array being absent, present, or marginal, as well as calculating a specific hybridization intensity value, or average difference, for each transcript. The relative abundances of the 11 bacterial control cRNA transcripts ranged from 1:300,000 (3 ppm) to 1:1000 (1000 ppm) stated in terms of the number of control transcripts per total transcripts. As determined by the signal response from these control transcripts, the sensitivity of detection of the arrays ranged between ∼1:300,000 and 1:100,000 copies/million. The average difference for each transcript was normalized to frequency values as described (20).

Transcripts designated absent in all samples were excluded from the analysis; 3295 (49%) of the transcripts remained. Further analysis of the processed data was performed with GeneSpring version 7.1 (Agilent Technologies, Redwood City, CA, USA). To identify transcripts that were increased in the RA samples compared with controls, >50% (at least 5 of 9) of the samples had to be called present, with a frequency of 10 ppm or greater, and have a change in expression, relative to the average expression of the controls, of at least 2-fold. The resulting data set had 324 gene qualifiers. To find gene qualifiers whose expression was decreased, a list was generated of gene qualifiers from normal samples that were called present with a frequency of ≥10 ppm. The resulting list was filtered for an average decrease in expression, relative to the controls, of at least 2-fold in the disease samples. Six gene qualifiers met these criteria; 330 transcripts were used for the analyses. Annotation for each gene was determined based on GO, Entrez Gene, PubMed, and literature searches.

Statistical and Clustering Analyses

An unsupervised hierarchical clustering was performed on the 330 genes to group the samples on the basis of similarity of their expression profiles (21). Statistically significant differences in expression were determined using Welch ANOVA (22) coupled with two different multiple testing corrections. The Benjamini and Hochberg false discovery rate (FDR) (23) was applied with a P value <0.05, with 326 genes passing this criterion. The Bonferroni family-wise error rate (FWER) (24,25) was applied with a P value <0.05, with 189 gene qualifiers passing this criterion. Finally, a class prediction using the k-nearest neighbor method (26) was applied to the filtered data to determine which genes had the highest discrimination between normal and RA samples.

Results

Characteristics of the RA patients used in the study, including demographics, disease activity scores, and DMARD use, are illustrated in Table 1. In total, 324 transcripts increased by at least two-fold between the RA and control subjects, and six transcripts decreased by at least two-fold between the RA and control subjects (Table 2).

Table 1 Characteristics of RA patients including demographics, disease activity scores, and DMARD use.
Table 2 Differentially regulated transcripts.

Unsupervised Clustering

An unsupervised clustering analysis was performed on the 330 genes that passed the initial filtration, based on a hierarchical correlation coefficient algorithm (21). Samples were grouped based on similarity of expression. The resulting dendrogram describes the sample relationships by grouping the RA samples and controls by their expression patterns (Figure 1). Figure 1A depicts a region where expression levels in the RA samples were increased compared with the normal samples. This analysis suggests that there are significant differences in the gene expression of RA and control samples.

Figure 1
figure 1

Unsupervised hierarchical cluster analysis of RNA from 9 RA and 13 control PBMC samples. Total RNA samples were analyzed on oligonucleotide arrays as described. In no case were samples pooled. Genes were selected for analysis if they had a present call, a frequency greater than 10 ppm, and two-fold change expression in five of nine RA samples. The expression patterns of 330 genes are displayed in a dendrogram where columns represent each sample and rows represent individual genes. Genes are colored on a gradient (from −10-fold to 10-fold), with those increase in expression relative to the average of the control in red. Those that decrease are in blue, and those with little or no change are in yellow. A, region where expression levels in the RA samples were increased compared with the normal samples.

ANOVA Analysis

To minimize the inclusion of genes not related to the disease state, several statistical approaches were used. The 330 transcripts that passed the initial filtration (Table 2) were subjected to a Student t test and a Welch ANOVA with two multiple testing corrections (22). To control for a proportion of genes that may appear in the analysis by chance, an FDR was calculated set to a threshold of 5%. This analysis defines a proportion of the genes that are expected to occur by chance relative to the total number of transcripts identified; 326 transcripts were called significant with this analysis (Table 2). In addition, the more stringent Bonferroni FWER using a P value cutoff of 0.05 was also performed, with 189 transcripts passing this analysis (Table 2).

Class Prediction

A k-nearest neighbor analysis was performed to identify a gene set that may distinguish the RA samples from normals. The prediction strength was evaluated using the 330 genes shown in Table 2. A list of predictor genes was assembled using the k-nearest neighbor method (26) to organize genes based on normalized expression levels. Cross-validation analyses comparing each sample to the model generated by the remaining samples were used to optimize the analysis parameters. This resulted in a number of neighbors value of 6 with a decision cutoff P value of 0.2 to predict expression patterns in RA vs. controls. Twenty-nine transcripts comprise the prediction gene set. The 29 prediction transcripts were grouped based on a hierarchical correlation to show the relationships (Figure 2).

Figure 2
figure 2

Class prediction. Using a class prediction algorithm, a list of genes that most consistently distinguished diseased vs. normal samples was generated. Classification was generated by the k-nearest neighbors algorithm (26). The number of neighbors selected was six, with a decision cutoff for P value ratio of 0.2. The final list was determined by an iterative cross-validation process in which the best combination of number of genes and neighbors was found to derive the most discriminating list. In the cross-validation mode, each sample in turn was set aside as the test article, and the remainder of the samples were used to generate the model, which was then evaluated on the test article. (A) Fold change and P values of the 29 prediction genes. (B) Unsupervised hierarchical cluster analysis of the 29 genes. The expression patterns of 29 genes are displayed in a dendrogram where columns represent each sample and rows represent individual genes. Genes are colored on a gradient (from −10-fold to 10-fold) with those increase in expression relative to the average of the control in red. Those that decrease are in blue, and those with little or no change are in yellow.

Characterization of the RA Disease-Related Genes

The 330 differentially regulated transcripts were categorized into functional groups and are presented as the average fold change of RA frequency compared with that of the controls (Table 2). This analysis clustered the genes into 19 functional classes and highlighted one chromosomal location. Ten genes with increased expression in the RA PBMCs compared with normal controls map to an RA susceptibility locus, 6p21.3 (27) (Table 3). The functional classes are diverse and include genes involved in calcium binding, chaperones, cytokines, transcription, translation, signal transduction, extracellular matrix, integral to plasma membrane, integral to intracellular membrane, mitochondrial, ribosomal, structural, enzymes, and proteases. Many of these 330 genes or gene products are known to be differentially regulated in RA. Twenty-five genes were classified as unknown because they either coded for a hypothetical protein or were identified as an open reading frame of unknown function.

Table 3 Genes with increased expression in RA compared with normal PBMCs at the RA susceptibility locus 6p21.3

The k-nearest neighbor analysis identified genes that may be preferentially regulated in the RA samples. Of the 29 genes identified by the class prediction analysis (Figure 2B) to be expressed in the RA PBMCs compared with the controls, only RELA (NFκB p65) (28), IGF2 (insulin-like growth factor 2) (29)], FTH1 (ferritin heavy chain) (30), and SELP (selectin P) (31) have previously been associated with RA. Furthermore, both NF-κB and selectin P have been used as therapeutic targets in animal models (32,33). INPP5E (inositol polyphosphate-5-phosphatase E), STAB1 (stabilin), AGPAT1 (1-acylglycerol-3-phosphate O-acyltransferase 1), TCIRG1 (T-cell, immune regulator 1, ATPase, H+ transporting, lysosomal V0 protein A isoform 3), HD (Huntingtin), SREBF1 (sterol regulatory element binding factor 1), and IRF3 (interferon regulatory factor 3) are examples of genes that have not previously been associated with RA.

Discussion

In this study, the mRNA levels of 6800 genes were measured in PBMCs from RA patients with active disease and normal individuals. All patients were on DMARD therapy that included methotrexate. Three hundred thirty differentially expressed transcripts were detected in at least 50% of the patients and exhibited a minimum of a two-fold change in expression from normal individuals. A number of genes previously thought to be involved in RA pathogenesis were detected in this study. These include the transcripts for TNF receptor TNFRSF1B (p75) and CCL5 (RANTES). TNFa has a key role in RA, and the expression of mRNA and protein of TNF receptors is increased in RA synovial membranes and sera (3436). In murine models, as well as TNFα transgenic and receptor knockout mice, the pathogenic activity of TNF has been well documented. Furthermore, both the soluble form of the TNF receptor and antibodies against TNF are efficacious in animal models and are effective therapies for RA (4,6–8,37,38). CCL5 is a chemokine expressed in the serum and synovial joints of patients with RA and is likely to play important roles in recruitment of inflammatory cells (39). A polyclonal antibody to RANTES improved symptoms in animals with adjuvant induced arthritis (40). RNA transcripts encoding proteins from a number of signaling pathways, including NF-κB, were present in increased amounts in individuals with RA, and many of these are targets for therapeutic blockade (41). NF-κB (RELA) has important roles in the production of inflammatory cytokines such as IL-1 and TNF (28). The presence of these known genes in the data set further validates the array data and analysis.

A k-nearest neighbor analysis was applied to the data set to identify genes preferentially expressed in the PBMCs from RA patients compared with controls. Twenty-nine genes were identified. Some of these genes have been previously identified as being differentially regulated in RA and include IGF2 (29), FTH1 (30), and SELP (31). SELP contributes to many inflammatory diseases and has been shown to mediate leukocyte interaction with endothelial cell wall (42). Levels of SELP are increased in the synovial fluid of RA patients (43). In the murine collagen-induced arthritis model, the deletion of SELP resulted in more severe disease compared with wild-type mice (44).

Many genes not previously known as being differently regulated in RA were also identified, for example, TCIRG1 (T-cell, immune regulator 1), INPP5E (inositol polyphosphate-5-phosphatase E), and STAB1 (stabilin). TCIRG1 is a seven-transmembrane, novel T cell protein that plays a role in T cell activation (45). Antibodies to TCIRG1 (TIRC7) prevent human T cell proliferation in vitro, inhibit type I subset-specific IFNγ and IL-2, but not the type II subset cytokine IL-4. A TIRC7 antibody prolonged survival in a rat model of acute kidney allograft rejection (45). TIRC7-null mice have disrupted T and B cell responses in vitro and in vivo, suggesting that TIRC7 may play a role in T and B lymphocyte balance (46).

INPP5E, a member of the inositol polyphosphate 5-phosphatase family, similar to INPP5D (Table 2), regulates PI-3 kinase signal transduction (47). AGPAT1 (1-acylglycerol-3-phosphate O-acyltransferase 1) catalyzes the conversion of lysophosphatidic acid (LPA) to phosphatidic acid (PA). LPA and PA are two phospholipids involved in signal transduction and phospholipid synthesis (48). Overexpression of AGPAT-1 in cell lines leads to the expression of both TNF-α and IL-6 in cells stimulated with IL-1β, suggesting that AGPAT-1 overexpression may amplify cellular signaling responses from cytokines (49).

Interestingly, 10 transcripts, including AGPAT1, differentially regulated in the RA PBMC from this study map to chromosome region 6p21.3, the major histo-compatibility (MHC) locus III (27) (Table 3). Many of the genes in the MHCIII region have fundamental roles in a variety of cellular functions and include the inflammatory cytokines TNFα, LTA, LTB, and the advanced glycation end product receptor, RAGE (AGER) (27). Multifactor interactions contribute to the disease process at several levels. One hypothesis is that dysregulation of genes in a locus could contribute to the etiology of the disease, perhaps through coordinated transcription of regions of a chromosome in response to stress or inflammation. RA is a complex autoimmune disorder, and expression analysis of a larger number of patients may validate this hypothesis.

STAB1 [also known as common lymphatic endothelial and vascular receptor (CLEVER-1 or FEEL-1)] was overexpressed in the RA PBMCs. This gene, identified by the k-nearest neighbor analysis, was expressed in 100% of RA PBMC samples and exhibited the highest fold change in this study (64-fold). Stabilin 1 is a large glycoprotein, multifunction scavenger receptor. Characterized as FEEL-1, this protein demonstrated a role as a scavenger receptor that binds to both advanced glycation end products as well as gram-positive and gram-negative bacteria (50,51). The receptor was shown to be expressed on mononuclear cells, tissue macrophages, and endothelial cells (5052). An antibody to FEEL-1 demonstrated a marked reduction in cell-to-cell interaction in a Matrigel tube formation assay, suggesting a role for the receptor in angiogenesis (50). CLEVER-1 has been demonstrated to be involved in the PMBC transmigration through vascular and lymphatic endothelium (52). The CLEVER-1 gene is encoded by 69 exons, and multiple isoforms are expressed in the endothelium (52). The potential function of CLEVER-1 in RA remains to be elucidated.

Several studies of gene expression in RA have been reported. Devauchelle et al. (53) focused on differences in expression in synovia isolated from RA patients compared with that of synovia from osteoarthritis patients. Watanabe et al. (54) reported on differences in expression between RA and normal synovial fibroblasts, and van der Pouw Kraan et al. (55) identified differences in gene expression in RA synovia, allowing the classification of different disease subtypes. A recent study by Bovin et al. (56), using a 12,000-gene oligonucleotide microarray, examined changes in gene expression between PBMCs from 14 RA patients vs. 7 sex-and age-matched controls, and they identified 25 genes that were discriminative. Although different filter criteria were applied to the data sets present here and the report from Bovin et al. (56), there were nine genes that overlapped between the two studies, including S100A12, NCF4, and GNG10. Of the genes that did not overlap, four were not present on the microarray used in this study, three showed changes but did not meet the strict data filtration criteria, and four were not called present in any of the samples. Another study by Olsen et al. (57), using a 4300-gene cDNA microarray, identified a gene expression signature for early-onset rheumatoid arthritis in PBMCs. In that study, the authors segregated the data based on those with longstanding and early-onset disease. There is some overlap between the Olsen et al. (57) study and the results presented here. Of the 44 genes identified, eight from Olsen, et al. also appeared in the present study. Of the 30 that do not, four were not on the human FL6800 array, 15 were not called present in any of the samples, and the others were not included due to the filtration criteria. In the results presented here, patients were selected from the high disease activity cohort, and during analysis, several filtration criteria were applied to the data set with several statistical analyses and a minimum expression criteria of at least 50% of the patients. These measures ensured that the resulting defined gene signature was as robust as possible.

It must be noted that RA patients possess a broad spectrum of disease severity and time of onset, and the comparisons above serve to highlight the multiple differences in patient selection criteria, study materials, protocols, and data analysis that exist in studies so far. Combining the data from our study with that of others, however, does point to several consistent changes in gene expression that would be useful to investigate further. For example, the increased expression of the RAGE ligand S100A12 has been observed in more than 1 study and, as a result, has highlighted the RAGE pathway as potentially important in RA; it is now subject to further study by our group.

The information from this study can be used in two major ways. First, it allows genes important in the pathogenesis of RA to be identified. These genes can then be investigated in detail to determine their potential roles in disease. Second, the power of DNA microarray profiling, with its ability to monitor the expression of multiple genes simultaneously, may allow the identification of patterns of gene expression associated with RA. This may enable rapid diagnosis of RA and predictions of prognosis, as well as response to, and side effects of, DMARDs. The use of these techniques is most advanced in oncology, where predictions of prognosis can be made for certain cancers (17). This provides clinically useful information that guides decisions about how aggressive a treatment regimen should be for a given patient. There is a marked difference in the clinical features of RA between individuals, and molecular phenotyping (or patient profiling) may identify or characterize different disease subgroups and courses of disease progression.

A weakness of global gene expression analysis techniques lies in identifying the relationship of changes in gene expression to the disease process. Changes in gene expression may either cause a disease process or occur as a consequence of it. The presence of gene expression changes in genes that have been associated with RA validates the data set. However, not all genes are primarily regulated by changes in mRNA levels, with many being subject to posttranscriptional regulation. TNF, the best-validated molecular therapeutic target in RA, does not emerge from this type of analysis. This study examined expression in nine RA patients and identified a set of genes that is preferentially expressed in RA patients compared with controls. Although the data are intriguing, samples from a larger number of patients would aid in a class prediction to determine which genes are most associated with disease state and type of prognosis. It is interesting to note that a recent study of PBMC expression profiles in several autoimmune diseases showed that, whereas all diseases displayed profiles that differed from a normal immune response, not all diseases could be clearly distinguished from each other (58).

Gene expression studies on PBMCs may not exactly represent the situation within the inflamed synovial membranes of RA. RA is a systemic disease, however, and differences in cytokine production and phenotype of PBMCs in RA have been demonstrated (5961). This approach has the advantage of being a rapid and minimally invasive way of obtaining cells from patients. The usefulness of assaying tissue samples in RA is limited by availability and sampling bias due to regional differences in disease activity in synovia. However, if the diagnostic/predictive results of a gene expression profile can be demonstrated, PBMCs are a readily accessible source of cells.

The use of oligonucleotide microarrays enables a broader view of complex inflammatory diseases, such as RA. The simultaneous measurement of multiple mRNA transcripts allows an increased understanding of the complexity of proteins that may be interacting in a disease state rather than focusing on one or two at a time. This study identified 330 mRNA transcripts that were differentially regulated in the PBMCs from RA patients compared with normal volunteers. Having demonstrated that these techniques can be used with PBMCs, the next step involves looking at patterns of gene expression in individuals over time and detailed phenotypic examination of these individuals to determine patterns of gene expression associated with different features of RA.