Analysis of AmpliSeq RNA-Sequencing Enrichment Panels
This study presents a proof of concept of encoding genomic signatures in the AmpliSeq technology. The samples of patients with a disease and healthy ones have been processed using an AmpliSeq RNA sequencing kit of a custom design, that include 290 amplicons, sequenced using an IonTorrent machine. The read count data show the sufficient coverage in most of the chosen amplicons, which results in a good separability between the disease patients and healthy donors. In addition, several amplicons allow for checking useful genomics variants (SNPs), whenever the coverage level permits. The paper presents a machine-learning classifier evaluation of the answer to the question of difference between the patients and healthy donors, based upon the AmpliSeq panel data. The outcome confirms the potential utility of similar RNA amplicon kits in the research and clinical practice to encode gene expression signatures of diseases and their phenotypes.
KeywordsGenomics Transcriptomics Amplicon sequencing Classification Genomic signatures
Next generation sequencing techniques, which have already become the driving force in molecular biology, are recently being introduced into various areas of applications in medicine. To achieve a focused insight maintaining economically affordable costs, it is essential to apply specialized kits for enrichment of particular sequences, e.g. exome kits [1, 2]. Such enrichment kits exist now for both DNA and RNA sequencing, and can be ordered in a fully customized way, running the design process via the specialized web interfaces of the technical solution providers. The other very popular technique of precise measurements in genomics and transcriptomics is RT-PCR. The primers can be freely designed or ordered in pre-defined panels, specific for a given application, e.g. TaqMan . The statistical analysis of such data described in . The amplicon enrichment kits for RNA sequencing are a solution that combines the advantages of both: enriched sequencing and RT-PCR approaches. Amplicon sequencing has been done recently on the sequencing platforms of all three generations , still combining many amplicons in one PCR run and preparing an RNA sequencing library from such an amplified product is a novel technique. The example of such technique is AmpliSeq, introduced by LifeTech in early 2013. As with other products of modern nanotechnology, the biological hardware often precedes the methodologies for in-depth analysis of data, simply because by the amount and variety of data that it produces. This paper presents a study on the technical applicability of AmpliSeq kits in the area of autoimmune disease, in particular to verify the gene expression signature [6, 7] that differentiates patients of various diseases and the healthy donors. It can also be a technical proof that will encourage researchers from other areas of medical research to encode their gene expression signatures into the amplicon sequencing panels, especially if speed, precision and cost of analysis prove to be competitive.
2 Materials and Methods
2.1 Panel Design
The amplicon panel has been designed for 289 amplicons of 284 genes known from the medical literature to be specific for the disease. 12 amplicons included a SNP in the coding region.
2.2 RNA Samples
The blood samples have been isolated from the blood of 8 patients and 8 healthy donors, matching the patients by age and gender. RNA extraction was done using RNeasy Mini Kit (Quiagen, cat. no. 74104) with subsequent purification by precipitation and ethanol washes. Concentration and purity was measured with NanoDrop 1000. Integrity of RNA was measured with Agilent 2200 TapeStation resulting RIN* values between 8.6 and 9.3. The sequencing libraries were prepared with Ion AmpliSeq RNA Library kit (Life Technologies, cat. no. 4482335) using custom primers (Life Technologies), designed as above, and Ion Xpress Barcode Adapters (Life Technologies, cat. no. 4474518), according to the manufacturers protocol. Barcoded libraries were pooled in equimolar amounts, diluted to the concentration of 20 pM and used for subsequent template preparation with Ion PGM Template OT2 200 Kit (Life Technologies, cat. no. 4480974), according to the manufacturer’s protocol.
2.3 Sequencing and Data Acquisition
The sequencing reactions were performed using a Personal Genome Machine (PGM) System with Ion PGM Sequencing 200 Kit v2 (Life Technologies, cat. no. 4482006) and Ion 318 Chip Kit (Life Technologies, cat. no. 4466617), according to the manufacturer’s instructions. The sequenced data were primarily processed using TorrentServer software (TorrentSuite v 3.6, LifeTech). Reads data were extracted from the chip using FastqCreator plugin v3.6.0-r57238.
2.4 Mapping of Reads and Variant Calling
3 Results and Discussion
The sequencing of 16 samples with the 318 chip resulted in 2853777 total reads, out of which 2.851M (97 % aligned bases) could be mapped to the canonic transcripts list. Detection levels. Out of 289 amplicons, 235 had the detection level of 10 in at least one of the samples. The coverage presenting the fraction of amplicons with a useful range is presented as the Fig. 1A.
3.1 Fold Change and Differential Expression
The Fig. 1B presents the distribution of fold changes and the plot of SAMseq results.
The combined results of detection level check and the differential expression analysis prove that the majority of amplicons was designed in such a way that it is useful for differentiating between patients and healthy donors. Detection of SNPs is possible only when the depth of coverage of the reads is sufficient. In the case of RNA it depends on the gene expression, thus is never guaranteed. Nonetheless, some of the SNPs was detected and reported in VFC files in a systematic way. The utility of the gene expression signature is confirmed by the machine learning approach. The initial clustering (Fig. 2A and B - subset of 77 amplicons with the highest absolute log fold change and 75 with highest variance) shows that most of the patients and healthy donors cluster together. The gender seems to have some influence on the clustering. There is an outlier – a healthy donor turned out to be a person from different geographical zone than the others, thus could have the immune system trained in a completely different way than the other patients and donors, coming all from Europe. The results of classification summarized in the Fig. 3, show that most of the samples are correctly classified. Adding the gender information increases the predictive power of classification. In a similar way, adding the attribute describing the SNPs found, increases the number of the samples correctly classified. All the results of the analyses described above support the claim that the gene signatures can be efficiently encoded in the AmpliSeq panel. The counts of reads representing the expression levels of genes may be used in a combined way as predictors and also in combination with clinical parameters (e.g. gender, age) and the genotyping results, that can be obtained in case of some genes from the same RNA panel.
The results described above show that such approach is feasible and may render medically useful results. There is still a room for improvement, especially in the area of custom design of the panel. In particular, tuning the selection of amplicons, can be used to distinguish between disease phenotypes in the cases that can be diagnosed from peripheral blood samples as we have proven in the case of the disease.
4 Software Availability
The software is available as the Bioconductor R package ampliQueso:
We are grateful to Kelli Bramlett, Jeoffrey Schageman, and Daniel Williams from LifeTech for discussion on AmpliSeq technology, to Andreas Tobler for coordinating the collaboration and to Marzanna Künzli-Gontarczyk, Daria Bochenek and Josias Brito Frazao for the help in the sequencing library prep and discussion on the lab aspects of the study. This work was supported by the grants Sciex.ch (nr. 11.182 to AS and MO, and nr 12.289 to MW and MO).