Profiling lung adenocarcinoma by liquid biopsy: can one size fit all?
Cancer is first and foremost a disease of the genome. Specific genetic signatures within a tumour are prognostic of disease outcome, reflect subclonal architecture and intratumour heterogeneity, inform treatment choices and predict the emergence of resistance to targeted therapies. Minimally invasive liquid biopsies can give temporal resolution to a tumour’s genetic profile and allow the monitoring of treatment response through levels of circulating tumour DNA (ctDNA). However, the detection of ctDNA in repeated liquid biopsies is currently limited by economic and time constraints associated with targeted sequencing.
Here we bioinformatically profile the mutational and copy number spectrum of The Cancer Genome Network’s lung adenocarcinoma dataset to uncover recurrently mutated genomic loci.
We build a panel of 400 hotspot mutations and show that the coverage extends to more than 80% of the dataset at a median depth of 8 mutations per patient. Additionally, we uncover several novel single-nucleotide variants present in more than 5% of patients, often in genes not commonly associated with lung adenocarcinoma.
With further optimisation, this hotspot panel could allow molecular diagnostics laboratories to build curated primer banks for ‘off-the-shelf’ monitoring of ctDNA by droplet-based digital PCR or similar techniques, in a time- and cost-effective manner.
KeywordsLung adenocarcinoma Cancer genomics Mutation Tumour suppressor Oncogene SNV Circulating tumour DNA ctDNA Liquid biopsy
circulating tumour DNA
epidermal growth factor receptor
mutation annotation format
hepatocyte growth factor receptor
polymerase chain reaction
single nucleotide variation
the cancer genome atlas
Cancer is a disease of the genome; one which is initiated by nanostructural perturbations in the structure and function of DNA (e.g. somatic mutations, epigenetic modifications, etc.) and driven by the sequential accumulation of these perturbations (Hanahan and Weinberg 2011). The study of genomic aberrations and the identification of somatic mutations that drive a particular malignancy are, therefore, fundamental to the understanding of tumour biology. In addition, targeted therapies developed to inhibit the growth of a tumour are almost exclusively stratified to patients harbouring specific mutational profiles (Huang et al. 2014). For example, cetuximab, an anti-epidermal growth factor receptor (EGFR) therapy, is only truly effective in patients with EGFR amplifications (Yang et al. 2013). Tumour genotype information is needed by clinicians on a per patient basis.
Resistance to targeted therapies often emerges during a treatment regimen. Pre-existing resistant populations in a treatment-naïve tumour and induced-resistant populations acquired de novo during therapy have both been described as mechanisms of resistance. Bhang and colleagues have recently traced the emergence of erlotinib resistance in a model of lung adenocarcinoma, identifying a pre-existing MET-amplified clonal population responsible for in vitro recurrence (Bhang et al. 2015). In a separate lung cancer model, Hata et al. showed that EGFRT790M mutations could be acquired during navitoclax therapy and drive the inhibitor-resistant phenotype (Hata et al. 2016). Thus, including temporal resolution in cancer genomic information will better inform treatment decisions.
Because of the clinical importance of tumour genomics, it is unsurprising that the sequencing of tumour biopsies prior to, and during, treatment regimens has become commonplace over the past several years. However, spatial heterogeneity within a tumour can lead to an under-representation of intratumour heterogeneity and an inaccurate reporting of tumour genotypic information gleamed from punch biopsies (Sottoriva et al. 2013; de Bruin et al. 2014). Moreover, such biopsies are relatively invasive for solid tumours. Thus, many researchers and clinicians alike have turned to so-called ‘liquid biopsies’ in an attempt to identify circulating mutant tumour DNA (ctDNA) in a patient’s blood (Newman et al. 2014; Ma et al. 2015). By deep molecular characterisation of this ctDNA across multiple sequential biopsies, it is hoped that researchers and oncologists will gain a better picture of cancer’s genetic makeup and how this evolves over time, without the considerations associated with spatial heterogeneity.
Typically, profiling of ctDNA is achieved through deep or targeted amplicon sequencing (Newman et al. 2014). However, this approach is limited in terms of cost and throughput. For some of the more immediate clinical applications of ctDNA, such as tracking treatment response, temporal resolution of a tumour’s evolution may be as useful as a deep understanding of its molecular drivers. Thus, many approaches for sequential monitoring of ctDNA have focussed on high-throughput techniques such as droplet and digital PCR to trace individual mutations in a patient’s blood over time (Zheng et al. 2016). In this study, we sought to determine whether a panel of recurrently mutated genomic loci (hereafter ‘hotspots’) could be developed which would give suitable coverage over the entirety of the intertumour heterogeneity seen in human malignancies. As a test case, we focus on lung adenocarcinomas: a malignancy that is not well suited to typical punch biopsy techniques and that has substantial genomic heterogeneity amongst the clinical population.
Lung adenocarcinomas are characterised by genomic aberrations in 23 driver genes
Next, we performed the same analysis on nine previously described tumour suppressor genes frequently altered in lung adenocarcinoma (Fig. 1, lower panel). Unsurprisingly, TP53 was the most frequently mutated tumour suppressor gene with >40% of patients harbouring a missense or truncating mutation. CDKN2A was the next most frequently altered gene with >20% of patients carrying copy number losses. Altogether, the nine tumour suppressors profiled were altered in 61% of the 230 test cases.
In total, 93% of the 230 patients possessed at least one genomic aberration in our panel of 23 drivers, with >50% having alterations in two or more genes (a ‘depth’ of two per patient). Although the detection of copy number aberrations in ctDNA is possible (Bettegowda et al. 2014), we elected to focus the rest of the analysis on SNVs and frameshift mutations, which can be detected with greater confidence across a wider range of techniques.
Hotspots in frequently mutated drivers are relatively rare
Most techniques aimed at detecting mutational events within a gene, such as digital droplet PCR or SNV array technologies, detect specific base-pair substitutions or frameshift mutations at a defined genomic locus rather than across the entire gene length. Despite the fact that over 30% of patients harbour a missense mutation in KRAS, many of these mutations could be missed without proper direction. Thus, it is important to identify specific hotspot loci within driver genes to create targeted panel.
Profiling the recurrently mutated tumour suppressor genes, TP53 and ANK5 (Fig. 2a, lower panels) revealed a near even distribution of missense and truncating mutations. This supports the longstanding observation that tumour suppressor genes do not tend to have hotspot regions that confer a change in catalytic activity but rather tend to be truncated or deleted in late-stage malignancies. Indeed, the observation that tumour suppressors do not tend to have recurrent hotspot regions is the basis of the 20/20 rule often used to define tumour drivers (Vogelstein et al. 2013). Analysis of the tendency for mutations in our 23 driver genes to co-occur across multiple patients revealed the same pattern. Whilst a number of mutations do co-occur, the majority are mutually exclusive (Fig. 2b). Thus, it is likely that a panel of recurrently mutated regions in our 23 driver genes would not be enough to cover a substantial proportion of lung adenocarcinoma patients to a high depth.
Genome-wide panels of recurrently mutated regions cover >80% patients
As our panel of 100 hotspots only covered 59% of TCGA patients, we examined the panel size needed to cover a majority of patients at a relatively high depth. Figure 3b shows the correlation between size of mutational panel and overall coverage of the dataset for four different representative depths. We start to see diminishing returns in coverage at a hotspot panel size of 1000 mutations. Therefore, covering the majority of patients at a depth greater than 10 mutations is unlikely. This highlights the intertumour heterogeneity seen between patients with lung adenocarcinoma (Zhang et al. 2014). However, a 400-mutation panel gives a median coverage of 7.9 mutations per patient (Fig. 3b, right panel) with 82.8% patients covered by at least one mutation and 57.6% of patients covered by two or more mutations. The 400-mutation panel is dominated by insertions and SNVs (Fig. 3c, left) and is balanced in terms of specific base-pair changes (Fig. 3c, right). Although the 400-mutation panel does not cover the entirety of TCGA lung adenocarcinoma patients, its scale is feasible for a molecular diagnostics lab. Thus, probes for these 400 mutations could be optimised for off-the-shelf use in clinics—with the addition of more targeted probes for specific patients.
400 SNV hotspot panel covers >55% of 183 patients in Broad validation set
Over the past several years, renewed effort in cancer research has yielded a myriad of molecular drivers of and contributors to tumour progression. Alongside the most often cited contributors, there are changes in stromal cell infiltrates (Kalluri and Zeisberg 2006), alterations in receptor prevalence or cell signalling (O’Neill et al. 2016), and nanotopographical changes to the cancer cell’s niche (Cassidy 2014; Cassidy et al. 2014). However, cancer is fundamentally a disease of the genome and only by understanding the patterns of clonal dynamics and evolution of genomic clones will the disease be fully understood.
As the need for accurate and temporally specific genomic information makes its way into the clinical setting, we must adopt new methodologies of profiling a tumour’s genome in a non-invasive and low-cost manner. Analysis of ctDNA has shown much promise in this regard, being used in many pioneering studies for monitoring treatment response, predicting relapse, and profiling intratumour heterogeneity (Bettegowda et al. 2014; Ma et al. 2015; Zheng et al. 2016). However, analysis of ctDNA is often initially based on targeted sequencing, which is both expensive and time consuming. Typically, specific primers can be designed after initial sequencing and ctDNA levels in the blood can be followed by less-demanding techniques, such as droplet digital PCR (Zheng et al. 2016). In this study, we set out to identify a panel of recurrent mutations in lung adenocarcinoma that would cover the majority of patients. Primers could then be designed and optimised for this panel ready for ‘off-the-shelf’ use in molecular diagnostic laboratories.
Lung adenocarcinoma is particularly heterogeneous and, even with a panel of 400 recurrent hotspots, coverage of 1× was only possible in ~80% of patients (Fig. 3b). This is particularly problematic as many of these mutations are likely passengers and therefore not necessarily clonal to the whole tumour. Thus, with a coverage of 1× we could not be sure that ctDNA levels were truly representative of the tumour bulk as a whole. However, this panel could be substantially refined in the future given the prevalence of recurrent copy number aberrations in driver genes seen in Fig. 1, and the recurrent promoter methylation in lung cancer (Belinsky 2004) which is recapitulated in ctDNA (Mishima et al. 2015; Warton et al. 2016). Care should also be taken to include likely ‘truncal’ genomic aberrations common to the tumour as a whole and not restricted to minor subclonal populations. Differences in TCGA and Broad datasets (Fig. 4) reflect tumour heterogeneity in lung adenocarcinomas and suggest that recurrently methylated CpG sites may also require inclusion in such panels. Although if such efforts relied on bisulfide conversion of CpG islands, we may see a loss of resolution for “C to T” SNVs at these sites.
The need for rapid identification of ctDNA in the time- and cost-constrained environment of clinical oncology is clear, and lung adenocarcinoma is of particular interest due to the difficulty in collecting recurrent solid biopsies. Our study aimed to identify a targeted hotspot panel for lung adenocarcinoma. We described mutation patterns in known genetic drivers of lung adenocarcinoma and profiled genome-wide recurrently mutated loci. Moreover, this work has identified several novel recurrent mutations in genes not typically associated with lung adenocarcinoma, which are each present in a significant subset of TCGA lung adenocarcinoma patients (Fig. 3a, e.g. IL32, LOC650368, HSD17B7P2 and RPSA). Whilst our panels were informative, they did not provide sufficient coverage and depth to be clinically useful. Future work should refine our initial panel to include recurrent copy number aberrations and hyper-methylated promoter regions.
ctDNA shows great promise for low-invasive serial monitoring of tumour burden and heterogeneity through treatment cycles. However, current ctDNA detection techniques rely on next-generation sequencing which is time consuming, expensive and requires bioinformatics expertise and access to specialist sequencing facilities. Tracing ctDNA through serial biopsy is better suited to high-throughput and low-cost techniques such as digital droplet PCR. In this scenario, a molecular diagnostics laboratory would first deeply sequence a patients’ ctDNA and then design primers for subsequent digital droplet PCR. In this study, we sought to define a panel of common hotspot mutations in lung adenocarcinoma to allow molecular diagnostic laboratories to design and optimise primers to cover the majority of patients. Although our 400-hotspot panel showed good coverage and depth in the TCGA dataset, all patients could not be covered. The difficulties in finding hotspots common to all patients reflect the profound intertumour heterogeneity seen in all cancers (Cassidy and Bruna 2016) and in particular lung adenocarcinomas. Further work is needed to optimise the panel design prior to use in the clinic, alongside continued collection of whole genome sequencing data from lung adenocarcinoma patients. Beyond mutations, efforts should be made to include recurrently methylated CpGs and copy number aberrations in such panels.
Primary mutational analysis was carried out using cBioPortal (cbioportal.org) (Cerami et al. 2012; Gao et al. 2013). Lollipops were constructed using the R package ‘lollipops’ (github.com/pbnjay/lollipops), with pathway data obtained from Cytoscape 3.2.1 (cytoscape.org) (Lopes et al. 2010). Called somatic mutations (SNVs) and clinical metadata were downloaded from the TCGA Data Portal (tcga-data.nci.nih.gov) (Network 2014). Validation dataset from the Broad Institute was downloaded from dbGAP (Imielinski et al. 2012). Mutation annotation format (MAF) files were manipulated in R Studio (Mac) 0.99.484 (rstudio.com). Combined data were analysed in Microsoft Excel (Mac 14.4.3) and R Studio with results plotted in GraphPad Prism 6 (Mac) and R (3.3.1 Unix; r-project.org).
JWC designed the study. HWC, EST, APC and JWC carried out all analysis. JWC drafted the manuscript. CV, HLO and EH aided in interpretation of results and manuscript preparation. All authors contributed to this manuscript. All authors read and approved the final manuscript.
The authors are grateful to Ms. Rosemarie Truman and Mr. Jonathan Lui from the Centre for Advancing Innovation for valuable discussions and guidance. This work relies on open source data provided by The Cancer Genome Atlas Network, and would not have been possible without their free access principles.
HWC, EST, NP, EH and JWC hold stock in OneTest diagnostics.
No external funding was sought for the study.
- Cassidy JW, Roberts JN, Smith C-A, Robertson M, White K, Biggs MJ, Oreffo ROC, Dalby MJ. Osteogenic lineage restriction by osteoprogenitors cultured on nanometric grooved surfaces: the role of focal adhesion maturation. Acta Biomater. 2014;10:651–60. doi: 10.1016/j.actbio.2013.11.008.CrossRefGoogle Scholar
- Hata AN, Niederst MJ, Archibald HL, Gomez-Caraballo M, Siddiqui FM, Mulvey HE, Maruvka YE, Ji F, Bhang HC, Krishnamurthy Radhakrishna V, et al. Tumor cells can follow distinct evolutionary paths to become resistant to epidermal growth factor receptor inhibition. Nat Med. 2016;22:262–9.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.