Rational design of cancer gene panels with OncoPaD
- 21k Downloads
Profiling the somatic mutations of genes which may inform about tumor evolution, prognostics and treatment is becoming a standard tool in clinical oncology. Commercially available cancer gene panels rely on manually gathered cancer-related genes, in a “one-size-fits-many” solution. The design of new panels requires laborious search of literature and cancer genomics resources, with their performance on cohorts of patients difficult to estimate.
We present OncoPaD, to our knowledge the first tool aimed at the rational design of cancer gene panels. OncoPaD estimates the cost-effectiveness of the designed panel on a cohort of tumors and provides reports on the importance of individual mutations for tumorigenesis or therapy. With a friendly interface and intuitive input, OncoPaD suggests researchers relevant sets of genes to be included in the panel, because prior knowledge or analyses indicate that their mutations either drive tumorigenesis or function as biomarkers of drug response. OncoPaD also provides reports on the importance of individual mutations for tumorigenesis or therapy that support the interpretation of the results obtained with the designed panel. We demonstrate in silico that OncoPaD designed panels are more cost-effective—i.e. detect a maximum fraction of tumors in the cohort by sequencing a minimum quantity of DNA—than available panels.
With its unique features, OncoPaD will help clinicians and researchers design tailored next-generating sequencing (NGS) panels to detect circulating tumor DNA or biopsy specimens, thereby facilitating early and accurate detection of tumors, genomics informed therapeutic decisions, patient follow-up and timely identification of resistance mechanisms to targeted agents. OncoPaD may be accessed through http://www.intogen.org/oncopad.
KeywordsCancer panels Panels cost-effectiveness Rational design of panels Tumor early detection Drug profiling of tumor cohorts Cancer driver genes Anti-cancer drug response biomarkers
DNA base pairs
Chronic lymphocytic leukemia
Cumulative mutations frequency
Next generation sequencing
The Cancer Genome Atlas
Profiling somatic mutations in the coding sequence of genes that have predictive, prognostic or diagnostic value is becoming a standard tool in clinical oncology [1, 2]. Gene panels present advantages with respect to whole-exome sequencing in the clinical and translational research settings that extend beyond cost-effectiveness. For example, they possess a higher sensitivity to detect variants and are less prone to the detection of false-positive somatic mutations , which are key requirements if mutations detected via gene panels sequencing are going to be used to guide targeted cancer therapies or for early cancer screening via liquid biopsies .
Several commercial solutions are currently available to meet the growing need of cancer gene panels. All these currently available commercial and in-house cancer gene panels rely on manually gathered cancer-related genes and/or alterations that are known biomarkers of sensitivity or resistance to targeted agents, and constitute “one-size-fits-many” solutions. In both translational and basic investigation, researchers may need to design gene panels specifically tailored for particular questions (see, for example [1, 5, 6]). The design of specific panels requires laborious search of the literature and cancer genomics resources. Furthermore, whether the panel chosen comes from a commercial source or is designed by the researcher, it is very difficult to estimate its cost-effectiveness on a cohort of cancer patients.
Our previous systematic analysis of large cancer cohorts , which produced comprehensive catalogs of driver genes  across 28 cancer types, together with an in-house expert-curated compilation of tumor alterations, relevant to tumorigenesis or influencing drug effect, provide an opportunity to solve the aforementioned hurdles. Here, we present OncoPaD (http://intogen.org/oncopad), to our knowledge the first web-based tool aimed at the rational design of cancer gene panels, which dynamically estimates their cost-effectiveness to profile large cohorts of tumors of 28 cancer types.
Cancer cohort data
Mutational cancer data were obtained from the cohort of 6792 samples from 28 cancer types collected by Rubio-Perez and Tamborero et al. , see reference for details on data collection. We added a cohort of 506 chronic lymphocytic leukemias (CLL) from Puente et al. 
A panel can be designed to profile any of the 28 cancer types (i.e. a comprehensive solid and hematologic panel), for a group of them (e.g. a panel only for hematologic malignancies or for lung carcinomas) or for an individual cancer type (e.g. a panel for breast cancer). Additional file 1: Table S1 presents a list of all cohorts included and cohort groups pre-built in OncoPaD.
Integrating lists of known cancer driver genes
The Cancer Gene Census .
Genes with validated oncogenic mutations in specific cancer types from a manual in-house compilation (see below).
Specific CLL (underrepresented in the cohorts in (1)) drivers from Puente et al. .
We integrated these four lists into a complete and reliable catalog of cancer driver genes as input of OncoPaD. Although the four lists have several genes in common, they are complementary as each of them is generated through a different approach (see Additional file 2: Supplementary Methods for more details; Additional file 3: Table S2 contains the driver genes comprised in each list).
Prioritization of panel candidates
Tier 1 candidates: genes and/or mutational hotspots that contribute the most to the slope of the CMF distribution, i.e. to the mutational coverage of the panel.
Tier 2 candidates: their contribution to the CMF distribution is smaller than that of genes and/or mutational hotspots of Tier 1.
Tier 3 candidates: all other genes and/or mutational hotspots included in the panel. Their contribution to the coverage of the panel is negligible.
Tier 1 candidates are preferred to design the panel. Tier 2 candidates may be included if maximum coverage of the mutations in the cohort is desired, although their inclusion may reduce sequencing depth. The users may fine-tune Tier 1 candidates if they comprise a long list using the Tier 1 stringent classification option (see Additional file 2: Supplementary Methods).
Identification of hotspots with high density of mutations
We designed a simple algorithm for the identification of mutational hotspots. Briefly, it iteratively identifies the minimum number of base pairs regions (of at most 100 bps) across the sequence of the gene that contain most of its mutations (see below). In each iteration the hotspot with the highest number of mutations is identified. Its mutations are then removed from the gene before the following iteration. The search stops when all sites left in the gene contain fewer than two mutations. After all hotspots are identified, the algorithm checks whether all hotspots identified account for at least a minimum fraction of all the mutations in the gene (set at 80 % by default, but configurable by the user). If this is the case, all identified hotspots are incorporated individually into the panel (see Additional file 2: Figure S1); else, the complete exome of the gene is incorporated into the panel.
Resources used to annotate mutations and genes in the panel
A list of validated oncogenic mutations, obtained from the catalog of driver mutations of Tamborero et al. (in preparation, available at www.intogen.org/downloads), which contains somatic and germline mutations whose role in oncogenesis has been experimentally validated in different cancer types.
A list of mutations known to predict sensitivity or resistance to anti-cancer drugs, obtained from the Cancer bioMarkers database by Tamborero et al. (in preparation, available at https://www.cancergenomeinterpreter.org/biomarkers), which contains expert curated annotations of genomics biomarkers associated to a drug effect on tumors, either drug “response” or “resistance.”
At gene level, OncoPaD adds information regarding the mode of action of the gene in cancer (i.e. a prediction on whether it acts through loss of function or activation) and the tendency of mutations in the gene to occur in the major clone in specific cancer type(s) according to the Cancer Drivers Database . Data retrieved from all aforementioned resources will be continuously updated as new releases become available.
Design and implementation of the OncoPaD web service
OncoPaD imposes no computational burden on its users beyond the employment of a reasonably modern web browser; no browser plugins are needed. The users are required to register using the Mozilla Persona service just to keep track of the visits and jobs run at the server.
Results and discussion
OncoPaD is a tool for the rational design of gene panels
Comparison of OncoPaD with other resources. Six different features are included: (1) the input genes for panel design; (2) whether the resource allows to estimate (and fine-tune) the cost-effectiveness of the designed panel; (3) whether the resource provides additional ancillary annotations for mutations included in the panel; (4) whether the tool is a web service easy to maintain, evolve and use or a static resource; (5) the type of output provided to the user; and (6) the level of customization of the panel that the user can attain
Martinez et al. 
• Genes with HIMs from COSMIC
• User’s mutation list
Genes with NSMs in at least 4 % of samples in cohort 1
User’s gene list
• Driver genes in 28 cancer types
• Genes with drug biomarkers
• User’s gene list
In silico performance
Fraction of tumors from cohort 1 with NSMs
Kbps included in the panel
• Fraction of tumors from cohort 2 with PAMs
• Kbps included in the panel
Metadata of panel mutations
Functional impact (SIFT and Polyphen)
• Validated oncogenic mutations
• Drug biomarker mutations
Type of resource
Json file with selection of genes
List of ranked pan-cancer and per cancer type genes
• Bed file
• Panel primers
• Reports with information on mutations included in the panel and performance (interactive HTML/PDF/Excel/Bed file)
User customization options
• Filter by genes with HIMs
• Filter by genes found in COSMIC
• Add/remove genes
Input gene list
• Cancer type(s) to design the panel
• Panel input genes (pre-compiled lists of drivers/biomarkers and/or user defined).
• Fine-tune the design of the panel
Note that OncoPaD, as TEAM  and the approach presented by Martinez et al. , aims to design gene panels to detect exclusively protein-coding point mutations and small indels. This is a limitation of the three methods, since copy number alterations, translocations, and non-coding mutations, which may be relevant for cancer development and the response to anti-cancer treatments, are not targeted for detection. This is the result of several decades of research on cancer overwhelmingly focused on the relevance of coding point mutations. As more information on other driver alterations—in particular arising from the analysis of tumor whole-genomes—becomes available, we will include it within OncoPaD to support the design of more comprehensive cancer gene panels.
OncoPaD designs highly cost-effective panels
We compared the cost-effectiveness of OncoPaD designed panels to that of several available panels in three research scenarios. To carry out the comparisons, we first defined (and computed in silico) the cost-effectiveness of a gene panel as the balance between the fraction of samples of a cohort with mutations in genes contained in it (coverage), and the total DNA amount (Kbps). We used this in silico representation as a proxy of the real-life cost-effectiveness of a gene panel.
We speculated that the cost-effectiveness of OncoPaD panels should increase the more homogeneous the cohort under screening is in terms of cancer types represented because their design relies on tumor type specific drivers. Therefore, we next compared the cost-effectiveness of OncoPaD and commercially available panels screening only the subset of solid tumors within the pan-cancer cohort (Fig. 2b, Additional file 4: Table S3B). Here, the advantage of OncoPaD panels among all those evaluated is more apparent. Specifically, an OncoPaD hotspots (Tier 1) designed panel would cover the highest fraction of solid tumors in the cohort (83 %), sequencing only 291 Kbps of DNA. To stratify solid tumors potentially responsive to anti-cancer agents, three OncoPaD designs would provide information about all tumors in the cohort, followed by the OncoVantage Solid Tumor Mutation Analysis (Quests diagnostics) (97 %). Finally, we compared the cost-effectiveness of panels in screening tumor type-specific cohorts (Fig. 2c, Additional file 4: Table S3C). While all assayed panels would detect between three-quarters and four-fifths of breast carcinomas, between three-quarters and nine-tenths of glioblastomas and virtually all colorectal adenocarcinomas, OncoPaD designed panels would do that by sequencing a dramatically smaller amount of DNA. For instance, the Comprehensive Cancer Panel (Ion AmpliSeq™) panel would cover 99 % of the tumors in the colorectal cohort, sequencing 862.21 Kbps of DNA, compared to 97 % with 21.61 Kbps of DNA (40 times less) of an OncoPAD whole genes Tier 1 panel, consequently increasing the number of samples that can be analyzed in parallel and/or increasing sequencing coverage. It is also important to bear in mind that while the genes in all OncoPaD panels are drivers of each tumor types, other panels include genes which are not implied in tumorigenesis in the tumor type(s) of the panel cohort (or any tumor type) and may lead to the detection of false positives. This would increase their likelihood of detecting false-positive mutations (either germline or somatic unrelated to tumorigenesis) , a feature which may turn key when the material sequenced comes from a paraffin-fixed sample with no normal DNA to filter the variants in the patient’s genome.
Additionally, we assessed the cost-effectiveness of available solid tumor panels (see above) and OncoPaD solid tumors panels on a cohort of cervical and endocervical cancer which is not currently included in the OncoPaD pan-cancer cohort (Additional file 2: Figure S2), to assess the capacity of extrapolation of the catalog of driver genes included in the tool to novel not covered cancer types. An OncoPaD panel of Tier 1 genes exhibited the highest cost-effectiveness, with the Centrogene panel producing a greater coverage of the tumors of the cohort, but at the expense of sequencing four times more DNA. Note that OncoPaD will be continuously updated as new sequenced tumor cohorts and lists of novel cancer driver genes and drug biomarkers become available.
In summary, OncoPaD designed panels present better cost-effectiveness than their currently available counterparts. Furthermore, the availability of several lists of genes relevant to tumorigenesis in different cancer types or specifically informative of the response to anti-cancer drugs provides them a unique versatility with respect to available one-size-fits-many solutions.
Use case: designing a panel with OncoPaD to screen the drug response of a cohort of lung carcinomas
OncoPaD will help clinicians and researchers design NGS panels to detect circulating tumor DNA or biopsy specimens, thereby facilitating early and accurate detection of tumors, genomics informed therapeutic decisions and patient follow-up, with timely identification of resistance mechanisms to targeted agents (researchers dealing with studies as the ones exemplified in the “Background” section constitute the natural users of OncoPaD). We illustrate its use in three specific real-life research questions through tutorials available at http://www.intogen.org/oncopad/case_studies.
We have presented OncoPaD, to our knowledge the first tool aimed at the rational design of cancer gene panels. The estimated cost-effectiveness of OncoPaD designed panels surpasses that of their currently available counterparts. The intuitive design and versatility of the tool will aid clinicians and researchers in the design of panels to address a variety of translational and basic research questions.
We thank Rodrigo Dienstmann for his helpful comments on the manuscript and the web tool. The results published here are partly based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/.
A.G.-P. is supported by a Ramón y Cajal contract (RYC-2013-14554), which also funds the publication of this article. We also acknowledge funding from the Spanish Ministry of Economy and Competitiveness (grant no. SAF2012-36199), the Marató de TV3 Foundation, and the Spanish National Institute of Bioinformatics (INB). C.R.-P. is supported by an FPI fellowship. D.T. is supported by the People Programme (Marie Curie Actions) of the Seventh Framework Programme of the European Union (FP7/2007- 2013) under REA grant agreement no. 600388 and by the Agency of Competitiveness for Companies of the Government of Catalonia, ACCIÓ.
Availability of data and materials
Project name: OncoPaD
Project home page: http://www.intogen.org/oncopad
License: Open without restriction to all academic use
CR-P implemented OncoPaD gene tier prioritization algorithm and the web server, the comparison of cost-effectiveness, built manuscript figures and supplementary data, and contributed drafting the manuscript. JD-P implemented OncoPaD hotspot detection algorithm, improved OncoPaD web-server performance and is responsible for its maintenance. DT helped designing the project and provided support on manuscript revision. NL-B and AG-P designed and supervised the project, A-GP drafted the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
- 11.Team C. CherryPy — A Minimalist Python Web Framework. http://www.cherrypy.org/.
- 12.Highcharts. Highcharts. http://www.highcharts.com/.
- 13.Schroeder MP. muts-needle-plot: Mutations Needle Plot v0.8.0. 2015 Jan 26. http://zenodo.org/record/14561.
- 17.Kandoth C, McLellan MD, Vandin F, Ye K, Niu B, Lu C, et al. Mutational landscape and significance across 12 major cancer types. Nature. 2013;503(7471):333–9.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.