Development of a novel splice array platform and its application in the identification of alternative splice variants in lung cancer
Microarrays strategies, which allow for the characterization of thousands of alternative splice forms in a single test, can be applied to identify differential alternative splicing events. In this study, a novel splice array approach was developed, including the design of a high-density oligonucleotide array, a labeling procedure, and an algorithm to identify splice events.
The array consisted of exon probes and thermodynamically balanced junction probes. Suboptimal probes were tagged and considered in the final analysis. An unbiased labeling protocol was developed using random primers. The algorithm used to distinguish changes in expression from changes in splicing was calibrated using internal non-spliced control sequences. The performance of this splice array was validated with artificial constructs for CDC6, VEGF, and PCBP4 isoforms. The platform was then applied to the analysis of differential splice forms in lung cancer samples compared to matched normal lung tissue. Overexpression of splice isoforms was identified for genes encoding CEACAM1, FHL-1, MLPH, and SUSD2. None of these splicing isoforms had been previously associated with lung cancer.
This methodology enables the detection of alternative splicing events in complex biological samples, providing a powerful tool to identify novel diagnostic and prognostic biomarkers for cancer and other pathologies.
KeywordsAlternative Splice Normal Lung Tissue Splice Form Alternative Splice Event Intron Retention
non-small cell lung cancer
carcinoembryonic antigen-related cell adhesion molecule 1
small cell lung cancer
four and a half LIM domains 1
melanophilin or Slac2-a
sushi domain-containing protein 2
heterogeneous nuclear ribonucleoproteins.
Alternative splicing of pre-mRNA is a post-transcriptional modification essential for the regulation of gene expression and function. Through alternative splicing, multiple transcripts are produced from a single mRNA precursor, widely expanding proteome diversity. Deep sequencing applied to diverse human tissues and epithelial cell lines has recently revealed that more than 90% of human genes undergo alternative splicing . A global analysis of alternative splicing in the human transcriptome suggested that exon skipping is the most prevalent form of alternative splicing . Alternative splicing is a tightly regulated process influenced by cell type, developmental stage, external conditions, etc; however, it is also associated with multiple disease conditions, including cancer . For example, cancer-related aberrantly spliced variants have been shown to be actively involved in the initiation and/or progression of some types of cancer . Splicing alterations are the consequence of splice-site mutations, deregulation of splicing regulatory factors, or both . Tumor-specific variations in splicing may generate new epitopes that can serve as a starting point for immune therapy or targeted delivery, as well as for the development of new diagnostic or prognostic tools . Thus, the identification and molecular characterization of alternative splicing variants associated with cancer is currently a very active area of research . In recent years, powerful techniques for genome-wide identification and analysis of alternative splicing isoforms have been developed. These large-scale high-throughput analytical methods have been applied to the identification of differential splicing events in cancer tissues . Exon microarrays, which contain both known and predicted exons, have been recently used for this purpose [9, 10, 11, 12, 13]. However, since they are not specifically designed to examine alternative splicing, they fail to detect events such as the alternative use of 5' or 3' splice sites, intron retention, or the insertion of cryptic exons. Other splicing-specific microarrays have been developed to cover most alternative splicing events. These arrays contain oligonucleotide probes that span exon-exon junctions, and probes positioned within exons to determine individual exon levels and overall transcript expression. The use of splice-junction oligonucleotides to analyze splice events was proposed as early as 1986, when Morgan and Ward used them to identify differential splice forms of minute virus in mice cDNA . In 1996, Lockhart et al. reported one of the first genome-wide microarray studies and suggested the potential of microarrays for the analysis of alternative splicing , but it was not until 2002 that Clark et al. developed the first microarray containing splice-junction oligonucleotides to analyze splice events in yeast . In 2003, Johnson et al. used microarrays containing oligonucleotide probes complementary to exon-exon junction sequences to discover new alternative splice variants in human tissues [17, 18]. Also in 2003, Wang et al. designed an algorithm that aimed to deconvolute the absolute concentrations of each alternative transcript present in a complex mixture starting from the hybridization intensities detected on splice chips . A new algorithm, called SPACE, has recently been developed for estimating the number of different splicing isoforms (known and unknown), and determining their structures and relative concentrations .
Nonetheless, currently available splice arrays still have many limitations, mainly due to problems in the design of the array, the labeling protocol, and data analysis. The development of robust and efficient splice microarrays and data-analysis methods will facilitate progress in the diagnosis, prognosis, and therapy of cancer and other pathologies. In the present work, we describe a novel comprehensive methodology for high-throughput profiling of alternative splicing in complex biological samples. In this methodology, processing of results is based on the array specific design, which is original and thought specifically for alternative splicing-discovery. The strategy consists of optimization of probe design, development of an unbiased amplification protocol that avoids inappropriate transcript coverage due to 3'-biased labeling, and implementation of detailed data processing. Oligonucleotides for the splice array were designed using the Tethys module (Oryzon Genomics, Barcelona, Spain), an inhouse oligo design program, complemented with a new splice-analysis specific module (AltTethys). The algorithm targets the best possible oligonucleotide for each sequence, rather than imposing a strict oligonucleotide quality cutoff. A new labeling protocol was developed to ensure optimal all-length transcript coverage and lineal amplification, working with small amounts of human material. To analyze the data, we developed a novel algorithm for the analysis of two-color arrays that allows for a statistically robust identification of candidate spliced genes in absence of a prior hypothesis about the contributing isoforms. We have applied this technology to the identification of lung cancer-associated splicing variants. Lung cancer is a devastating disease with few therapeutic options or suitable molecular biomarkers for early diagnosis. The results obtained in this study validated the utility of the platform, allowing the identification of new cancer-associated splicing variants with potential utility in the management of lung cancer.
Development and validation of the unbiased labeling protocol
The basis for the reliable detection of splice forms in a microarray format is a high-quality, non-biased labeling procedure. Different labeling protocols have been applied in splicing analyses: (a) Castle et al. developed a PCR + T7 amplification-based protocol ; (b) a commercial kit is available from Stratagene (Fairplay II) to produce a labeled first-strand cDNA using random hexamers which works well if a sufficient amount of starting material is available; (c) some authors have applied standard 3'-biased RNA labeling protocols, ignoring the loss of 5' events.
Performance of the array in the evaluation of artificial splicing forms
In the next step, the hybridization behavior of high-quality oligonucleotides for the yeast controls spiked into the mixture was evaluated and then compared with that of their corresponding thermodynamical half-oligonucleotides. Half-oligonucleotides contain the thermodynamic half of the total control oligonucleotides (located on either the 5' or the 3' end) complemented to the size of the total oligonucleotide with a sequence not expected to hybridize with the yeast control sequence. The half-oligonucleotides thus represent an event in which the DNA on one end of the junction is 100% joined to a different DNA sequence. The use of half-oligonucleotides caused a sharp decrease in signal intensity compared with the complete oligonucleotides (Additional file 1: Figure S1), illustrating that the hybridization temperature and washing conditions were adequate. Nevertheless, most signals did not drop to zero and there was considerable variation in the relative signal intensities of both half-oligonucleotides, indicating that different splice forms can still contribute to the signal of oligonucleotides containing only half of the target sequence of a given transcript.
Identification of alternatively spliced genes in lung cancer and selection of genes for validation
The positive results obtained in the pilot array using spiked-in isoforms of individual genes encouraged us to develop a splice array to identify differential splicing variants in complex biological samples, specifically, in clinical samples from patients with lung cancer. Based on gene-expression databases [22, 23], 7,958 genes expressed in normal and tumor lung tissue were selected and used to design a splice array. The array contained 115,318 exon probes and 105,141 junction probes for the selected genes, control probes for the yeast YML059C, YOR328, and YIL129C transcripts to monitor the labeling procedure, as well as control probes for maize transcripts (Zm48, Exp and Xet), employed in standard gene expression analyses by Oryzon (both positive and negative controls at different concentrations).
Differential splicing in the 7,958 genes was then assessed with the splice array, by hybridization of 20 pairs of Cy5 labeled tumor and Cy3 labeled normal tissue samples (TCy5 vs NCy3) prepared from 20 non-small cell lung cancer (NSCLC) patients (Additional file 2: Table S1), as well as three self to self comparisons (NCy5 vs NCy3). The array data from this study have been submitted to Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo under accession no. GSE18346. The results were analyzed with AltPolyphemus (Additional file 2: Supplementary Methods), which allowed intensity changes of all the oligonucleotides for a given gene to be analyzed with respect to whether these changes reflected gene expression changes or isoform changes. Essentially, after pre-processing (data filtering and normalization) the algorithm first estimates the experimental variability of the microarray analysis platform using the data of the standard deviation of the gene probe data on the replicates of the self to self array.
To identify differences in expression level or splice forms, "change" is first assessed in the self to self hybridization. Any change that is not clearly greater than the inherent variability of the measurement system is considered "no-change". In the self to self hybridization the standard deviation (σs,g) can be calculated from the total gene probe dataset and correlated to the standard deviation of the control probe dataset (σs,g = CF·σc). The data spreading for the total gene probe set was always a little higher than that for the control probes (Additional file 2: Supplementary Methods), which means that the control probes slightly underestimate the experimental variation. In tumor vs normal tissue experiments, it is not possible to measure directly the standard deviation for the no-change situation for the total gene probe data set (as different samples are compared), but the standard deviation on the control dataset (σc) can be measured and the standard deviation for the no-change situation for the total gene probe dataset for the tumor vs normal array can be estimated (σ*s,g = CF·σc). Robust change can then be defined as change below or above the threshold TH = ± 3σ*s,g, although more stringent cut-off can be applied if desired. Once robust change is defined, the algorithm examines whether the ratios of the signal in the Cy5 and Cy3 channels for the exon and junction probes for a given gene (G) fall within the variability of the experiment (reflecting genes with regular differential gene expression) or outside of that variability (potential splice form variation); i.e. below or above μG ± 3σ*s,g, μG being the mean value for the ratios of the signal in the Cy5 and Cy3 channels for all the oligos for gene G. Note that μG for a gene with differential expression will be clearly over the threshold for the detection of change (the individual probes are differentially expressed) but the variation among the different probes will be below that threshold (Additional file 1: Figure S2c). For a gene to be selected as a candidate for differential splicing, at least one probe has to fall outside of the limits of the marked threshold. The algorithm considers two hypotheses: the observed hybridization can be explained by a differential mixture of isoforms or by a whole gene expression change, and calculates the error of both hypotheses. If there is a possible isoform change, the algorithm establishes the "Form Change", an arbitrary and empirically defined figure to relatively rank the candidates. Figure S3 in Additional file 1 shows a typical output from the AltPolyphemus software for a candidate gene susceptible to alternative splicing in lung cancer.
Genes with potential splice variants differentially expressed between normal lung and lung cancer tissues, as determined by the splice array
C4b-binding protein alpha chain
Rho GTPase-activating protein 7
Sushi domain-containing protein 2
Cell division cycle protein 20 homolog
Four and a half LIM domains protein 1
Voltage-dependent calcium channel subunit alpha-2/delta-2
Carcinoembryonic antigen-related cell adhesion molecule 1
Mediator of RNA polymerase II transcription subunit 17
Validation of splice variants differentially expressed in lung cancer
Differences in alternative splicing between primary NSCLC tissue and normal lung tissue in the ten selected genes were validated by PCR and sequencing. Validation was performed with samples from a group of patients included in the array (Additional file 2: Table S1) and an independent series of NSCLC patients (Additional file 2: Table S2). IPO8 was used as the reference gene . Alterations in alternative splicing were confirmed in 4 out of the 10 genes: CEACAM1, FHL1, MLPH, and SUSD2.
This article reports the development of a platform for the analysis of differential alternative splicing in complex biological samples and its application to the discovery of alternative splice forms associated with lung cancer. Microarray-based methods have been described previously for the identification of splicing events in different physiological and pathological conditions [9, 10, 11, 12, 13, 35, 36, 37]. However, in spite of progress in development and interpretation, splice arrays still have many limitations and are far from attaining the level of standardization and robustness achieved with other high-throughput analytical methods, such as expression arrays . These limitations involve several steps of the process, such as the array's design, the labeling protocol, and data analysis.
The detection of alternative splicing using arrays containing only exon probes is based on the idea that a discrete set of exons, some of which are skipped in the event of alternative splicing, constitute the final mRNA. But this view is largely simplified, as exons may be longer or shorter, junctions may form at different positions, and intron sequences between two exons may be retained. In addition, some exons are very small, to the extent that any oligonucleotide designed to detect them would require the inclusion of sequences from flanking exons. Even in the case of splice arrays that contain junction probes, the design of these probes is challenging. Most splice arrays make use of oligonucleotides of constant length or Tm, but they do not consider that the contributions of the two sequences on either side of the junction may be substantially different due to differences in sequence composition. Probe quality is also affected by the strong spatial restrictions on oligonucleotide design required for the analysis of differential splicing. This leads to an inevitable breakdown of the strict thermodynamic and specificity criteria that are usually imposed on the design of an expression microarray. As for the labeling protocols, methods for standard gene-expression analysis are generally based on labeling from the 3' end, followed by detection with 3'-end probes. However, in a splicing analysis, the sequences of the oligonucleotides present in the array need to spread over the complete length of the transcripts, and the quality of the analysis strongly depends on homogeneous labeling of the RNA. If the labeling protocol is inappropriate, hybridization of insufficient material create "absent" values; more importantly, if the intensity of a fraction (but not all) of the probes for a given gene drops below the detection limit, normal gene expression changes will lead to incorrect detections of splice changes. Consequently, the rate of false-positive discovery would be considerable for genes expressed at low levels. Additionally, in a splice array, data processing and the identification of genuine differential splicing events are more complex than in standard gene-expression arrays and require specific analytical algorithms. This is due to the larger variation in thermodynamic conditions and possible cross-hybridization or folding of sub-optimally designed probes. Finally, an additional requirement for the analytical algorithm is the need to distinguish differential splicing from changes in gene expression levels. The methodology described in the present work has addressed all these limitations.
In designing the probes, we applied an oligonucleotide design algorithm that performs an in silico thermodynamic simulation of the hybridization procedure. The algorithm targets the best possible oligonucleotide for each sequence, rather than imposing a strict oligonucleotide quality cutoff. Several control oligonucleotides were also designed and included in the array to control labeling and hybridization processes. The proper design and inclusion of control oligonucleotides, as well as appropriate use of the data generated by these controls in data processing, are especially relevant considering that technical variability can be introduced by the addition of steps in the labeling protocol necessary to avoid labeling biases. To analyze the data, we developed new software to interpret the intensity changes of all the oligonucleotides for a given gene and to decide whether they reflected expression changes or isoform changes.
The efficacy of our new methodology and its potential usefulness in a clinical setting were tested in an application designed to identify genes differentially spliced in primary lung tumors. Lung cancer is the leading cause of cancer deaths worldwide , with the major form, NSCLC, accounting for about 80% of all lung cancers. In spite of advances in early detection and treatment, overall 5-year survival rates for NSCLC remain at about 15% , underlining the need for a better understanding of the molecular pathogenesis of NSCLC. It has been proposed that modifications in the concentration, localization, composition, or activity of RNA-binding proteins acting as splicing regulatory factors induce the splicing alterations characteristic of lung cancer . In this sense, the abnormal expression of heterogeneous nuclear ribonucleoproteins (hnRNP) in NSCLC clinical samples and animal models suggests that tumors develop specific hnRNP profiles [42, 43, 44]. This alteration would generate clinically relevant alternative splice forms contributing to lung carcinogenesis. A recent report presented a genome-wide analysis of alternative splicing events in lung adenocarcinoma . In that study, the authors obtained a list of cancer-related candidate genes showing alternative splicing events and implicated in cancer.
In the present study, the presence of differentially expressed splice variants in NSCLC was evaluated using a splice array designed to detect near 8000 genes known to be expressed in lung tissue. Analysis of the splice array data generated a list of candidates, from which 10 genes were selected for validation. Since one of the main purposes for this selection was to validate the quality of the detection process, no biological criteria were considered in the selection of the candidate genes at this point. RT-PCR experiments, followed by sequencing, were used to validate the results from the array, with changes in alternative splicing confirmed in four genes. As expected, the validation success was below the rates obtained in gene expression studies and was comparable to the rates reported in previous splicing studies [13, 18]. Regardless of the platform and algorithm used to detect differential splicing, by microarray or other hybridization-based analysis, it is important to realize that the technology is inherently sensitive to a number of errors that can lead to the incorrect identification of alternative splicing. For example, low-level expression can lead to the erroneous identification of splice events, due to the fact that not all oligonucleotides generate the same level of signal, and the signal of low-responsive oligonucleotides can drop below the detection limit thereby generating false "form changes" when the overall expression level differs between Cy3 and Cy5 channels. Cross-hybridization, obviously, is another potential cause of the detection of false "form changes". While cross-hybridization can sometimes be suspected when the higher signal of one oligonucleotide compared to the others cannot be justified by a much higher Tm or a sub-optimal design, it will generally go unnoticed until further detailed analysis is performed. Moreover, there is no guarantee that all possible gene-structure changes are analyzed in the validation process, unless a very extensive validation approach is applied for any gene of interest (which may be hampered by the availability of clinical material). The four genes with lung cancer-associated alternative splice forms newly identified in this study were: CEACAM1, FHL1, MLPH, and SUSD2.
Ceacam1 is a CEA-related cell adhesion molecule downregulated in several human cancer types, including prostate, breast, and colorectal cancers . CEACAM1 has been described as a lung tumor marker, and its expression has been associated with the prognosis of lung adenocarcinoma [45, 46, 47]. Two major CEACAM1 isoforms have been described: a long (L-) form and a short (S-) form, which, respectively, include or exclude exon 7. The exclusion of exon 7 generates a proximal stop codon that translates into a shorter cytoplasmic domain. Tumor cells transfected with CEACAM1-1L are less tumorigenic, suggesting that the L-form functions as a tumor suppressor gene . Wang et al. reported that CEACAM1-4S is the predominant isoform in NSCLC tissues, whereas in normal lung tissues the main isoform is CEACAM1-4L . This splice pattern was recently confirmed . In addition to confirming previous data, our analysis predicted other changes in the splicing of CEACAM1 around exons 2 and 5, which were validated by PCR. For the first time, it was demonstrated that lung tumors frequently overexpress three splice isoforms: CEACAM1-1, CEACAM1-3, and CEACAM1-3A. The alternative use of these exons affects different Ig-like structural domains in the extracellular portion of the respective proteins.
The family of four and a half LIM (FHL) proteins, also known as skeletal muscle LIM proteins (SLIM), is characterized by four complete LIM domains preceded by an N-terminal half LIM domain . LIM domains are cysteine-rich, double zinc-finger motifs involved in protein-protein interactions. FHL has been shown to regulate tissue differentiation, proliferation, adhesion, migration, cytoskeletal organization [51, 52], and recently, to play a role in carcinogenesis through a TGF-β-like signaling pathway . Four and a half LIM domains 1 (Fhl1 or Slim1) is a member of this family and has likewise been implicated in skeletal muscle development  as well as in the pulmonary vascular remodeling underlying pulmonary hypertension . Interestingly, Fhl1 is downregulated in many types of solid malignancies and it exhibits tumor suppressor activity [30, 31, 32]. Among the splice variants described for FHL1, in our study the expression of two of them, FHL1 and FHL1B, was identified in lung samples. In agreement with previous reports, clear downregulation in the expression of the FHL1 gene was detected in lung cancer specimens. More importantly, we determined that the downregulation of FHL1 is significantly higher than that of FHL1B. The two proteins are identical over the first three LIM domains but FHL1B contains a distinct C-terminus (96 amino acids) with three potential bipartite nuclear localization signals, a putative nuclear export sequence, and a binding motif for the transcription factor RBP-J [27, 28]. Whereas FHL1 is mainly located at focal adhesions, FHL1B is predominantly a nuclear protein and has unique physiological functions, including the regulation of Notch signaling through its association with RBP-J . Notch signaling profoundly influences the regulation of tumor progression, specifically, tumor cell proliferation, differentiation, apoptosis, and angiogenesis .
Mlph is a member of the synaptotagmin-like protein family and is involved in the transport of melanosomes . These lysosome-related organelles are specialized in the synthesis and distribution of melanin. Mlph is an essential member of the melanosome trafficking complex, acting as a link between Rab27a and myosin Va [57, 58]. It may also be involved in the trafficking of epithelial Na + channel in cells of the collecting duct of the kidney . Mlph contains an N-terminal Slp homology domain (SHD) involved in binding to Rab27a, a myosin-binding domain (MBD) in its middle region, and a C-terminal actin-binding domain (ABD). Here, we demonstrated that lung tissue expresses at least two isoforms of MLPH, one with and one without exon 9. Skipping of this exon generates a protein 28 amino acids shorter than the normal protein, without affecting any of the three characterized functional domains. In lung tumors, there is specific downregulation of the isoform containing exon 9.
The recently identified Susd2 is a single-pass type I membrane protein with an extracellular portion that contains somatomedin B, AMOP, von Willebrand factor type D, and sushi/CCP/SCR domains . Although its physiological function is still unknown, overexpression of Susd2 is thought to suppress tumorigenicity [34, 60]. In agreement with this postulated role for the protein, we observed reduced expression of SUSD2 in lung cancer tissues. Interestingly, intron retention was frequently detected between the last exons of the mRNA. Inclusion of intronic sequences within an mRNA is termed exonization. Although this modification is the rarest type of alternative splicing found in normal cells, exonization events in cancer cells are frequent and may be associated with impairments in splicing regulatory factors . The exonization of introns affects the extracellular portion of SUSD2 in that the translation of intron 11 introduces a premature stop codon which disrupts the von Willebrand factor type D domain at amino acid 631. Exonization of intron 12 generates a protein whose last 41 amino acids are substituted by 68 new amino acids (38 coded by intron 12 and 30 new amino acids translated as consequence of a frame-shift in the reading frame of exon 13). Translation of intron 13 generates a protein with a new sequence of 23 amino acids from position 781 (without affecting any known functional domain), while retention of intron 14 introduces a premature stop codon, eliminating seven amino acids at the C-terminal end.
We have developed and tested a novel platform for high-throughput analysis of alternative splicing events in biological samples. The application of this methodology will aid in understanding the functional relevance of splice variants in pathological conditions and facilitate the identification of new biomarkers and targets for therapy. To prove the usefulness of this platform, this methodology was used to identify cancer-associated splice variants in lung cancer. Differentially expressed splice variants of four genes were identified, with potential utility in the diagnosis of lung cancer. Additional work is in progress to analyze the relevance of these newly characterized cancer-associated isoforms as well as to validate additional candidates from data obtained in the splice array.
Primary tumors and their corresponding normal lung tissues were obtained from patients with non-small cell lung cancer (NSCLC) treated with curative resectional surgery at the Clínica Universidad de Navarra (Pamplona, Spain) or at the Hospital Marqués de Valdecilla (Santander, Spain). None of the patients received chemo- or radiotherapy prior to surgery. The study was approved by the ethics committees of the participating institutions and informed consent was obtained from each patient. Surgically removed samples were immediately frozen in liquid nitrogen and stored at -80°C until use. A portion of each sample was sectioned in a cryostat and mounted onto slides. After fixation, these samples were stained with hematoxylin and eosin, and then carefully examined by two experienced researchers. Samples containing less than 70% tumor cells were discarded. The 42 specimens selected for the study were divided into two groups: one for discovery (n = 20; Additional file 2: Table S1) and one for validation (n = 22; Additional file 2: Table S2).
Lung cancer cell lines
All lung cancer cell lines were obtained from the American Type Culture Collection (ATCC), except HCC44, HCC827, EPLC-272H, and HCC15, which were obtained from the German Collection of Microorganisms and Cell Cultures (DSMZ). Cells were grown in RPMI supplemented with 2 mM glutamine, 10% fetal bovine serum (FBS), 100 U/ml penicillin, and 100 μg/ml streptomycin.
New splice array design
Oligonucleotides for the splice array were designed using Tethys software (Oryzon Genomics, Barcelona, Spain) complemented with a new splice-analysis specific module (AltTethys) generated for the present study. Tethys is an oligonucleotide design algorithm that performs in silico thermodynamic simulation of the hybridization procedure previously used in the design of gene expression or genome hybridization arrays. The array design included exon and junction oligonucleotides to detect genes expressed in the human samples, and oligonucleotides for the detection of artificial spiked-in control genes from yeast. The control oligonucleotides served a dual purpose: to control labeling and hybridization, and to provide relevant statistical information for data analysis.
For exon probes, the target Tm was 75°C and the Tm range 70-80°C, with a length modulation of 30-36 bp and a temperature limit of 60°C for cross-hybridization. If no specific exon probe complied with these conditions, the best possible oligonucleotide (minimal cross-hybridization Tm and secondary structure folding Tm) was selected.
Junction probes were designed using junction target sequences with a total maximum length of 50 bp. Probe sequences with different lengths but with Tm values as similar as possible for the two sections of the oligonucleotide flanking both sides of the junction were selected. The total length of the junction probes ranged from 25 to 42 bp. Among all possible sequences, the oligonucleotide with a Tm as close as possible to the target temperature of 75°C was selected. Finally, the oligonucleotide was checked for Secondary Structure Folding Tm and Maximum Cross-Hybridization Tm using the Tethys oligonucleotide design backend, taking note of the potential for cross-hybridization.
An extensive battery of probes was included to control labeling and hybridization processes and to provide relevant statistical information for data analysis: (a) three yeast artificial target sequences were spiked into the hybridization mixture at three different concentration levels but balanced in the Cy3 and Cy5 channels; (b) optimum oligonucleotides (in terms of specificity and thermodynamics) distributed over the length of the target sequences to assess labeling bias; (c) half oligonucleotides (generated by splitting optimum oligonucleotides into two thermodynamically equivalent halves complemented with stretches of AT at the 5' or 3' end) to simulate differential splicing at the splice-donor or splice-acceptor site. These probes were included more than ten times in the array in order to determine intra-array variability.
In addition, a higher number of positive and negative control probes were distributed over the array surface (maize expansin, ZmMYB42, and xyloglucan endo-transglycosylase). These probes were used to assess detection limits and range, to verify spatial homogeneity, and to determine experimental within-array variation.
Cloning of artificial constructs for VEGF, PCBP4, and CDC6
The performance of the splice array was tested using a pilot array designed to identify different transcripts of three genes: VEGF, PCBP4, and CDC6. Artificial transcripts were generated for three VEGF isoforms (VEGF121, VEGF165, and VEGF185) , two PCBP4 isoforms (PCBP4 and PCBP4a) , and the only known CDC6 isoform.
Preparation of yeast controls
Three yeast sequences of DNA were amplified by PCR from genomic DNA of Saccharomyces cerevisae strain S288C using two chimeric primers, where a primer consisted of a T7 promoter and the gene specific sequence, and the other primer consisted of a tail of 20 timidines and the gene specific sequence. These three genes were: YIL129C (7100 bp), YML059C (4900 bp) and YOR328W (4600 bp). They were amplified by PCR with 32 ng of genomic DNA from yeast and using a combination of two polymerase TaqI:Pfu (20:1). cRNA was generated using an in vitro transcription system (T7 Megascript kit; Ambion) getting the final artificial unique splice forms. The sequences were spiked into the samples prior to labeling.
RNA extraction and labeling
Total RNA from paired normal and tumor samples from lung tissues was extracted using the RNeasy Extraction Kit (Qiagen) according to the manufacturer's instructions, with minor modifications. RNA quality was assessed using an Agilent Bioanalyzer 2100 and quantified using a Nanodrop ND-1000 spectrophotometer. Samples with an RNA integrity number (RIN) below 7 were excluded from further analysis. PolyA + RNA was extracted using Dynabead magnetic particles. To obtain homogeneous labeling of the RNA across the entire length of the transcript, a novel labeling procedure, described in the Results section, was developed. Fifty nanograms of PolyA + RNA from normal tissue was labeled with Cy3 and the same amount of PolyA + RNA from tumor samples with Cy5. In addition, 50 ng of PolyA + RNA from normal tissue was labeled with Cy5. Prior to labeling, artificial yeast transcipts were spiked into all polyA + samples mixtures. The quality of the labeled samples was verified using the Agilent 2100 Bioanalyzer and sample concentration was determined using the Nanodrop ND-1000 spectrophotometer.
Array hybridization and data acquisition
Labeled normal tissue cRNA (4.5 μg) was mixed with the same amount of labeled tumor cRNA from the same patient, and equal quantities of Cy3 and Cy5 labeled Xet and Zm42 cRNA controls were spiked in to serve as hybridization controls. The cRNA was mixed with 25 × fragmentation buffer (Agilent) and incubated at 60°C for 30 min to fragment RNA. Afterwards, 250 μl of 2 × hybridization buffer (Agilent) was added to stop the fragmentation reaction and the mixture was hybridized on the array. Slides were incubated for 17 h at 60°C in an Agilent DNA Hybridization Oven (G2545A) with the rotation setting at 4 rpm. A total of twenty Cy3 labeled normal and Cy5 labeled tumor lung cancer samples, were cohybridized pairwise on the splice array as well as three Cy3 labeled normal and Cy5 labeled normal samples. Raw data were acquired using an Agilent DNA Microarray Scanner and Agilent Feature Extraction Software (V.9.1). The general reproducibility of the hybridization platform (labeling procedure, hybridization, and detection) was assessed by means of self to self hybridization; the standard deviation of the fold change of all oligonucleotides was 0.093.
For data processing, a novel algorithm that distinguished between changes in gene expression and splicing variation was developed. The analysis of differential splice isoforms is more complex than the analysis of differential gene expression, due to a higher variation in thermodynamic conditions and possible cross-hybridization and folding of sub-optimally designed oligonucleotides. In addition, compared with regular gene expression analyses, additional variation can be introduced due to the incorporation of extra steps in the labeling protocol. This gives a special importance to the incorporation of spiked-in controls in the array design and their use in data processing. The data processing procedure (which is detailed in Additional file 2: Supplementary Methods) was divided into four steps: data filtering and normalization, probe spot calibration, gene probe statistical analysis, and isoform analysis. Data processing is discussed more extensively in Additional file 2: Supplementary Methods.
Validation of cancer-associated splice variants by PCR
Results obtained in the splice array were validated by PCR. Two micrograms of RNA from the clinical samples were reverse transcribed. Genomic DNA contamination was controlled in each RNA sample using a reaction mix lacking reverse transcriptase. One microliter of cDNA diluted 1:10 was used for PCR amplification, and the PCR products were electrophoresed in agarose gels. For sequencing, the amplified bands were purified using the Qiagen MinElute PCR Purification Kit and sequenced in an ABI377 sequencer (Perkin-Elmer Applied Biosystems). Real-time PCR was performed using SYBR Green PCR Master Mix (Applied Biosystems) in the Applied Biosystems 7300 Real-Time PCR System. The reactions were carried out according to the manufacturer's instructions. Each sample was analyzed in triplicate. Relative levels of expression were determined by the Ct method using IPO8 as the reference . Primers used for validation are shown in Additional file 2: Table S3.
This work was supported by "UTE project CIMA" and grants from Spanish Ministry of Industry [Programa Ingenio 2010, CENIT Ref. Oncnosis], Fundación MMA, and Red Temática de Investigación Cooperativa en Cáncer [RTICC, RD06/0020/0066], Instituto de Salud Carlos III (ISCIII), Spanish Ministry of Science and Innovation & European Regional Development Fund (ERDF) "Una manera de hacer Europa". We greatly appreciate Cristina Sainz, Amaya Lavin and Ana Remirez for their technical assistance and Uxua Montes for her help in the collection of clinical samples.
- 2.Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D: A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008, 321 (5891): 956-960. 10.1126/science.1160342.PubMedCrossRefGoogle Scholar
- 7.Gattenlohner S, Stuhmer T, Leich E, Reinhard M, Etschmann B, Volker HU, Rosenwald A, Serfling E, Bargou RC, Ertl G: Specific Detection of CD56 (NCAM) Isoforms for the Identification of Aggressive Malignant Neoplasms with Progressive Development. Am J Pathol. 2009, 174 (4): 1160-1171. 10.2353/ajpath.2009.080647.PubMedCentralPubMedCrossRefGoogle Scholar
- 9.Gardina PJ, Clark TA, Shimada B, Staples MK, Yang Q, Veitch J, Schweitzer A, Awad T, Sugnet C, Dee S: Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array. BMC Genomics. 2006, 7: 325-10.1186/1471-2164-7-325.PubMedCentralPubMedCrossRefGoogle Scholar
- 10.French PJ, Peeters J, Horsman S, Duijm E, Siccama I, van den Bent MJ, Luider TM, Kros JM, van der Spek P, Sillevis Smitt PA: Identification of differentially regulated splice variants and novel exons in glial brain tumors using exon expression arrays. Cancer Res. 2007, 67 (12): 5635-5642. 10.1158/0008-5472.CAN-06-2869.PubMedCrossRefGoogle Scholar
- 12.Thorsen K, Sorensen KD, Brems-Eskildsen AS, Modin C, Gaustadnes M, Hein AM, Kruhoffer M, Laurberg S, Borre M, Wang K: Alternative splicing in colon, bladder, and prostate cancer identified by exon array analysis. Mol Cell Proteomics. 2008, 7 (7): 1214-1224. 10.1074/mcp.M700590-MCP200.PubMedCrossRefGoogle Scholar
- 13.Xi L, Feber A, Gupta V, Wu M, Bergemann AD, Landreneau RJ, Litle VR, Pennathur A, Luketich JD, Godfrey TE: Whole genome exon arrays identify differential expression of alternatively spliced, cancer-related genes in lung cancer. Nucleic Acids Res. 2008, 36 (20): 6535-6547. 10.1093/nar/gkn697.PubMedCentralPubMedCrossRefGoogle Scholar
- 17.Castle J, Garrett-Engele P, Armour CD, Duenwald SJ, Loerch PM, Meyer MR, Schadt EE, Stoughton R, Parrish ML, Shoemaker DD: Optimization of oligonucleotide arrays and RNA amplification protocols for analysis of transcript structure and alternative splicing. Genome Biol. 2003, 4 (10): R66-10.1186/gb-2003-4-10-r66.PubMedCentralPubMedCrossRefGoogle Scholar
- 20.Anton MA, Gorostiaga D, Guruceaga E, Segura V, Carmona-Saez P, Pascual-Montano A, Pio R, Montuenga LM, Rubio A: SPACE: an algorithm to predict and quantify alternatively spliced isoforms using microarrays. Genome Biol. 2008, 9 (2): R46-10.1186/gb-2008-9-2-r46.PubMedCentralPubMedCrossRefGoogle Scholar
- 22.Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA. 2001, 98 (24): 13790-13795. 10.1073/pnas.191502998.PubMedCentralPubMedCrossRefGoogle Scholar
- 24.Nguewa PA, Agorreta J, Blanco D, Lozano MD, Gomez-Roman J, Sanchez BA, Valles I, Pajares MJ, Pio R, Rodriguez MJ: Identification of Importin 8 (IPO8) as the most accurate reference gene for the clinicopathological analysis of lung specimens. BMC Mol Biol. 2008, 9: 103-10.1186/1471-2199-9-103.PubMedCentralPubMedCrossRefGoogle Scholar
- 26.Cowling BS, McGrath MJ, Nguyen MA, Cottle DL, Kee AJ, Brown S, Schessl J, Zou Y, Joya J, Bonnemann CG: Identification of FHL1 as a regulator of skeletal muscle mass: implications for human myopathy. J Cell Biol. 2008, 183 (6): 1033-1048. 10.1083/jcb.200804077.PubMedCentralPubMedCrossRefGoogle Scholar
- 27.Brown S, McGrath MJ, Ooms LM, Gurung R, Maimone MM, Mitchell CA: Characterization of two isoforms of the skeletal muscle LIM protein 1, SLIM1. Localization of SLIM1 at focal adhesions and the isoform slimmer in the nucleus of myoblasts and cytoplasm of myotubes suggests distinct roles in the cytoskeleton and in nuclear-cytoplasmic communication. J Biol Chem. 1999, 274 (38): 27083-27091. 10.1074/jbc.274.38.27083.PubMedCrossRefGoogle Scholar
- 33.Matesic LE, Yip R, Reuss AE, Swing DA, O'Sullivan TN, Fletcher CF, Copeland NG, Jenkins NA: Mutations in Mlph, encoding a member of the Rab effector family, cause the melanosome transport defects observed in leaden mice. Proc Natl Acad Sci USA. 2001, 98 (18): 10238-10243. 10.1073/pnas.181336698.PubMedCentralPubMedCrossRefGoogle Scholar
- 34.Sugahara T, Yamashita Y, Shinomi M, Yamanoha B, Iseki H, Takeda A, Okazaki Y, Hayashizaki Y, Kawai K, Suemizu H: Isolation of a novel mouse gene, mSVS-1/SUSD2, reversing tumorigenic phenotypes of cancer cells in vitro. Cancer Sci. 2007, 98 (6): 900-908. 10.1111/j.1349-7006.2007.00466.x.PubMedCrossRefGoogle Scholar
- 40.Ries LAG, Melbert D, Krapcho M, Stinchcomb DG, Howlader N, Horner MJ, Mariotto A, Miller BA, Feuer EJ, Altekruse SF: SEER Cancer Statistics Review, 1975-2005. 2008, National Cancer Institute Bethesda, MDGoogle Scholar
- 42.Pino I, Pio R, Toledo G, Zabalegui N, Vicent S, Rey N, Lozano MD, Torre W, Garcia-Foncillas J, Montuenga LM: Altered patterns of expression of members of the heterogeneous nuclear ribonucleoprotein (hnRNP) family in lung cancer. Lung Cancer. 2003, 41 (2): 131-143. 10.1016/S0169-5002(03)00193-4.PubMedCrossRefGoogle Scholar
- 43.Zerbe LK, Pino I, Pio R, Cosper PF, Dwyer-Nield LD, Meyer AM, Port JD, Montuenga LM, Malkinson AM: Relative amounts of antagonistic splicing factors, hnRNP A1 and ASF/SF2, change during neoplastic lung growth: implications for pre-mRNA processing. Mol Carcinog. 2004, 41 (4): 187-196. 10.1002/mc.20053.PubMedCrossRefGoogle Scholar
- 44.Pio R, Zudaire I, Pino I, Castano Z, Zabalegui N, Vicent S, Garcia-Amigot F, Odero MD, Lozano MD, Garcia-Foncillas J: Alpha CP-4, encoded by a putative tumor suppressor gene at 3p21, but not its alternative splice variant alpha CP-4a, is underexpressed in lung cancer. Cancer Res. 2004, 64 (12): 4171-4179. 10.1158/0008-5472.CAN-03-2982.PubMedCrossRefGoogle Scholar
- 47.Lee MK, Kim JH, Lee CH, Kim JM, Kang CD, Kim YD, Choi KU, Kim HW, Kim JY, Park do Y: Clinicopathological significance of BGP expression in non-small-cell lung carcinoma: relationship with histological type, microvessel density and patients' survival. Pathology. 2006, 38 (6): 555-560. 10.1080/00313020601024029.PubMedCrossRefGoogle Scholar
- 60.Sugahara T, Yamashita Y, Shinomi M, Isobe Y, Yamanoha B, Iseki H, Takeda A, Okazaki Y, Kawai K, Suemizu H: von Willebrand factor type D domain mutant of SVS-1/SUSD2, vWD(m), induces apoptosis in HeLa cells. Cancer Sci. 2007, 98 (6): 909-915. 10.1111/j.1349-7006.2007.00467.x.PubMedCrossRefGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.