SurvivalGWAS_Power: a user friendly tool for power calculations in pharmacogenetic studies with “time to event” outcomes
Abstract
Background
Power calculators are currently available for the design of genetic association studies of binary phenotypes and quantitative traits, but not for “time to event” outcomes, which are of particular relevance in pharmacogenetics. With the rapid emergence of pharmacogenetic association studies of single nucleotide polymorphisms (SNPs), and the complexity of clinical outcomes they consider, there is a need for software to perform power calculations of time to event data over a range of design scenarios and analytical methodologies.
Results
We have developed the user friendly software tool SurvivalGWAS_Power to perform power calculations for time to event outcomes over a range of study designs and different analytical approaches. The software calculates the power to detect SNP association with a time to event outcome over a range of study design scenarios. The software enables analyses under a Cox proportional hazards model or Weibull regression model, and can account for treatment and SNPtreatment interaction effects. Simulated data sets can also be generated by SurvivalGWAS_Power to enable analyses with methods that are not currently supported by the power calculator, thereby increasing the flexibility of the software.
Conclusions
SurvivalGWAS_Power addresses the need for flexible and userfriendly software for power calculations for genetic association studies of time to event outcomes, with particular design features of relevance in pharmacogenetics.
Keywords
Pharmacogenetics Power calculation Time to event Cox proportional hazards Weibull regression Simulation Censoring SNPtreatment interactionAbbreviations
 GWAS
Genomewide association studies
 MAF
Minor allele frequency
 SNPs
Single nucleotide polymorphisms
Background
Power calculations are an essential component of study design, and are readily available for genomewide association studies (GWAS) of binary phenotypes and quantitative traits [1]. However, as yet, software is not available to determine adequate sample size for GWAS of “time to event” outcomes. Such outcomes are of particular relevance in the emerging field of pharmacogenetics, where the event could be death, disease remission, or the occurrence of an adverse drug reaction, for example, after treatment intervention. The most appropriate approach to the analysis of time to event data is through survival modelling, which can allow for censoring because the event has not occurred, or due to patient dropout, before the end of the study. Software is thus required to enable power calculations for GWAS of time to event outcomes, for alternative analytical models, particularly for pharmacogenetic studies, where the impact of alternative treatments, and potentially their interaction with single nucleotide polymorphisms (SNPs), is often of relevance in the study design.
The objective of GWAS is to identify SNPs that are associated with outcome, typically at a stringent level of genomewide significance (defined as p < 5 × 10^{−8}) [2]. SNPs are variations in the DNA sequence and are encoded by genotypes. Genotypes can be represented as AA, AB and BB, where A and B are the two alternative alleles at the SNP. Association of a SNP with outcome can then be assessed within a generalised linear modelling framework, accounting for relevant confounding variables as covariates. It is typical to assume an “additive” genetic model, whereby the effect of the heterozygous AB genotype is intermediate between homozygous AA and BB genotypes. Under this additive model, genotypes are coded according to the number of B alleles carried, such that AA = 0, AB = 1 and BB = 2.
Currently, software to determine power for GWAS of binary phenotypes and quantitative traits is available. These include the freely available ‘Genetic Power Calculator’, developed by Purcell et al. [1], which is a web based platform, and ‘genomeSIM’ developed by Dudek et al. [3] for the simulation of largescale genomic data in population based casecontrol samples. However, as yet, software for power calculations for GWAS of time to event outcomes is not available. In response to this analytical bottleneck, we have developed the user friendly tool, SurvivalGWAS_Power, which is the first program to implement both data generation and power calculations for GWAS of time to event outcomes. The software is of particular relevance to pharmacogenetic studies, where the design will likely include alternative treatment interventions, and analyses are likely to consider SNPtreatment interaction effects.
Implementation
SurvivalGWAS_Power was built using C# and developed as a Windows application, utilising predesigned frameworks Math.NET and Accord.NET [4], for the generation of pharmacogenetic data and statistical analyses, respectively. SurvivalGWAS_Power requires specification of genetic parameters, such as the magnitude of the SNP effect on the outcome and the minor allele frequency (MAF). The varied collection of design scenarios includes adding a recruitment period, SNPtreatment interactions, and/or different censoring options (for example, withdrawal due to an adverse treatment event). We created these pharmacogenomic study designs after a thorough examination of published studies in the literature [5], including Charland et al. [6], Depta et al. [7], Wiese et al. [8], and Absenger et al. [9]. The power calculation is performed by simulating multiple datasets based on the user specified parameter settings and study design options, specifically testing for SNP associations (and SNPtreatment interactions, if required) with the time to event outcome across all simulated datasets. For GWAS, the usual threshold for “genomewide significance” is p < 5 × 10^{−8}. However, the software is equally applicable to power calculations for individual SNPs, for which a nominal significance threshold of p < 0.05 is appropriate.
User interface
The main window consists of two panels, the first for design, analysis and parameter inputs, and the second for all output. The menu bar has a “Save Sample Data” option, as well as another option to store all the datasets from every simulation run. This is useful for those who want to test power for methods not supported by the program. The data are saved as a text file, in R statistical software friendly readable format. The interface has been designed to be user friendly; there are various help buttons to navigate the user through the program in the form of tooltips, and an example of a commonly used pharmacogenetic study design is available as a guide. The inputs are split into two sections: (i) data generation inputs; and (ii) statistical analysis inputs. The user defined parameter inputs are submitted in text boxes.
Data simulation
For each replicate of data, a SNP genotype (coded as 0, 1 or 2) is generated for each individual from a binomial distribution dependent on the MAF and assuming HardyWeinberg equilibrium. The user is given the option of incorporating an active treatment against a placebo. Treatment allocation is also simulated using a binomial distribution.
Time to event for each individual is then simulated on the basis of specified model parameters from a Weibull distribution, which allows for the possibility of a deviation from a proportional hazards assumption. The value of the shape parameter, a, of the Weibull distribution is specified by the user. A value of a < 1 indicates that the failure rate decreases over time. A value of a = 1 indicates that the failure rate is constant over time, resulting in proportional hazards. A value of a < 1 indicates that the failure rate increases with time. The scale parameter of the Weibull distribution is parameterized to incorporate SNP, treatment, and SNPtreatment interaction effects in generating time to event for each individual. Specifically, the scale parameter for the ith individual is given by, \( {b}_i={d}_0{e}^{\beta_s{S}_i+{\beta}_x{x}_i+{\beta}_{\gamma }{S}_i{x}_i} \), where x _{ i } is the treatment covariate (coded as 0/1 for placebo/active), S _{ i } is the SNP genotype coded under an additive model for the minor allele. The value of the “baseline” scale parameter d _{0}, is specified by the user. The parameters β _{ s } and β _{ x } are the effect on loghazard of the minor allele at the SNP, and the treatment effect, respectively, and β _{ γ } is the interaction effect between the SNP and treatment. The values of each of these parameters are also specified by the user.

Scenario 1End of study censoring.
This scenario is designed based on a user specified and fixed end of study time, Z. If the event occurs before the end of the study and due to this scenario not including censoring before time Z, the observed event time for the ith individual is δ _{ i } = t _{ i }; otherwise δ _{ i } = Z.

Scenario 2Censoring during the study period and at end of study.
The censoring time of the ith individual, c _{ i }, is simulated from a Weibull distribution with a user defined scale parameter and a fixed shape parameter of 1. Small values of the scale parameter will generate more censored observations. If censoring occurs before the end of the study, the individual is assumed to have dropped out at that time, thus δ _{ i } = c _{ i }. If censoring occurs after the end of the study, yet the event occurred before the end of the study, then the observed event time for the ith individual is δ _{ i } = t _{ i } − r _{ i }; otherwise δ _{ i } = Z − r _{ i }.

Scenario 3Recruitment period and end of study censoring.
The recruitment time, r _{ i }, is simulated from a discrete uniform distribution between 0 and a specified end time. There is no censoring before the end of the study and if the event occurs before the end of the study the observed event time for the ith individual is δ _{ i } = t _{ i } − r _{ i }; otherwise δ _{ i } = Z − r _{ i }.

Scenario 4Censoring during the study period and at the end of study with a recruitment period.
The censoring time of the ith individual, c _{ i }, is simulated from a Weibull distribution with a user defined scale parameter and a fixed shape parameter of 1. If censoring does not occur before the end of study, and an event has occurred, an individual will have observed time δ _{ i } = t _{ i }–r _{ i }, unless censored at the end of study, then δ _{ i } = Z − r _{ i }. If censoring does occur during the study period, then δ _{ i } = c _{ i } − r _{ i }.
Analysis
There are two options to compare when running the analysis; (i) a Cox proportional hazards model and (ii) a Weibull regression model. Users can select between running their choice of analysis model by fitting: (i) the SNP alone; (ii) the SNP and treatment; or (iii) the SNP, treatment and SNPtreatment interaction.
The Cox proportional hazards regression model is the most extensively used analysis of time to event outcomes. It is a semiparametric model that assumes that the hazard functions of individuals are proportional over time. The framework Accord.Net has a built in Cox proportional hazards function, which calculates the partial likelihood and obtains parameter estimates and Wald test pvalue.
The Weibull regression model is parametric with completely specified hazard and survivor functions, and is beneficial in scenarios where the hazard is not proportional or has an accelerated failure time feature. We obtain maximum likelihood estimates of model parameters using an iterative NewtonRaphson method, the full derivation of which is presented in Additional file 1.
Validation
Output
The output comprises of a sample dataset, a table of the analysis output for each simulation run and two histograms of parameters across simulations: (i) coefficient values for the SNP effect from the regression model; and (ii) ‐ log _{10} Wald pvalues for the SNP effect. All histograms can be saved by right clicking the graph and selecting “save as image”. Power, at the specified significance threshold, φ, is approximated by the proportion of replicates for which p < φ for the SNP effect on outcome. Power, at the same significance threshold, is also calculated for the SNPtreatment interaction effect, if this term is included in the analysis model.
Results
SurvivalGWAS_Power can simulate a large number of datasets to enable efficient estimation of power based on specified model parameters and design scenarios. We present the results of example power calculations for two similar scenarios to demonstrate the utility of the software.
Performance
Conclusions
SurvivalGWAS_Power effectively and efficiently estimates the power of the Cox proportional hazards model and Weibull regression model under a variety of pharmacogenetic settings. Specifically, we allow for testing of SNP main effects (i.e. testing the null hypothesis H _{0} : β _{ s } = 0 against the alternative H_{A} : β _{ s } ≠ 0) and SNPtreatment interaction effects (i.e. testing the null hypothesis H_{0} : β _{ γ } = 0 against the alternative H_{A} : β _{ s } ≠ 0).
The software offers the option for users to simulate data and use other programs such as R for analysis. For example, Uno et al. [10] have demonstrated that, where the proportional hazards assumption is invalid, the use of the Cox proportional hazards method will produce a loss in power to detect associations. They propose using alternative robust measures for the difference between survival curves instead of parametric models. The flexibility of our software enables generation of time to event data under models with nonproportional hazards that can be exported for association testing with methods not supported by our power calculator.
SurvivalGWAS_Power is important as it is the first genetic data simulator with time to event outcomes, and the first to enable estimation of power for multiple pharmacogenetic designs and analysis methods. However, the software is not limited to pharmacogenetic designs: for example, the treatment can be used to represent any binary covariate. This adds flexibility to the software for application to general GWAS of time to event outcomes.
In summary, this application offers a much needed userfriendly and flexible software tool for power calculations for time to event outcomes in pharmacogenetic association study designs.
Notes
Acknowledgements
We thank Duncan Appelbe of the Department of Biostatistics at the University of Liverpool for his help and guidance with C#.
Funding
APM is a Wellcome Trust Senior Fellow in Basic Biomedical Science (under award WT098017). Funding for open access charge: Wellcome Trust.
Availability of data and materials
Project name: SurvivalGWAS_Power (Pharmacogenetic Power Calculator for Time to Event Studies).
Project home page: https://www.liverpool.ac.uk/translationalmedicine/research/statisticalgenetics/software/.
Operating system(s): Windows.
Programming language: C#.
Other requirements: None.
License: GNU General Public License, version 3 (GPL3.0).
Any restrictions to use by nonacademics: None.
Authors’ contributions
HS carried out the investigation of commonly used pharmacogenetic study designs, coded and developed the software, debugged and tested the software and drafted the manuscript. ALJ tested the software and helped to draft the manuscript. APM participated in the design and coordination of the software and helped to draft the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not Applicable.
Ethics approval and consent to participate
Not Applicable.
Supplementary material
References
 1.Purcell S, Cherny SS, Sham PC. Genetic power calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics. 2003;19(1):149–50.CrossRefPubMedGoogle Scholar
 2.Clarke GM, Anderson CA, Pettersson FH, Cardon LR, Morris AP, Zondervan KT. Basic statistical analysis in genetic casecontrol studies. Nat protoc. 2011;6:121–33.CrossRefPubMedPubMedCentralGoogle Scholar
 3.Dudek M, Motsinger AA, Velez DR, Williams SM, Ritchie MD. Data simulation software for wholegenome association and other studies in human genetics. Scott Pacific Symposium Biocomputing. 2006;11:499–510.Google Scholar
 4.Souza CR, “The Accord.NET Framework,” http://accordframework.net. São Carlos, Brazil. December 2014.
 5.Syed H, Jorgensen AL, Morris AP. Evaluation of methodology for the analysis of “timetoevent” data in pharmacogenomic genomewide association studies. Pharmacogenomics. 2016; doi: 10.2217/pgs.16.19.
 6.Charland SL, Agatep BC, Herrera V, Schrader B, Frueh FW, Ryvkin M, Shabbeer J, Devlin JJ, Superko HR, Stanek EJ. Providing patients with pharmacogenomic test results affects adherence to statin therapy: results of the additional KIF6 risk offers better adherence to statins (AKROBATS) trial. Pharmacogenomics J. 2014;14:272–80.CrossRefPubMedGoogle Scholar
 7.Depta JP, Lenzini PA, Lanfear DE, Wang TY, Spertus JA, Bach RG, Cresci S. Clinical outcomes associated with proton pump inhibitor use among clopidogreltreated patients within CYP2C19 genotype groups following acute myocardial infarction. Pharmacogenomics J. 2015;15:20–5.CrossRefPubMedGoogle Scholar
 8.Wiese MD, Alotaibi N, O’Doherty C, Sorich MJ, Suppiah V, Cleland LG, Proudman SM. Pharmacogenomics of NAT2 and ABCG2 influence the toxicity and efficacy of sulphasalazine containing DMARD regimens in early rheumatoid arthritis. Pharmacogenomics j. 2014;14:350–5.CrossRefPubMedGoogle Scholar
 9.Absenger G, Benhaim L, Szkandera J, Zhang W, Yang D, Labonte MJ, Pichler M, Stotz M, Samonigg H, Renner W, Gerger A, Lenz HJ. The cyclin D1 (CCND1) rs9344 G4A polymorphism predicts clinical outcome in colon cancer patients treated with adjuvant 5FUbased chemotherapy. Pharmacogenomics j. 2014;14:130–4.CrossRefPubMedGoogle Scholar
 10.Uno H, Claggett B, Tian L, Inoue E, Gallo P, Miyata T, Schrag D, Takeuchi M, Uyama Y, Zhao L, Skali H, Solomon S, Jacobus S, Hughes M, Packer M, Wei LJ. Moving beyond the hazard ratio in quantifying the betweengroup difference in survival analysis. J clin oncol. 2014;32:2380–5.CrossRefPubMedPubMedCentralGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.