A Simulation Study Comparing SNP Based Prediction Models of Drug Response
Lack of replication on findings and missing heritability are two of the major challenges in Pharmacogenetics (PGx) studies. Recently developed statistical methods for genome-wide association studies offer greater power both to identify relevant genetic markers and to predict drug response or phenotype based on these markers. However, the relative performance of these methods has not been thoroughly studied. Here, we present several simulations to compare the performance of these analysis methods. In our first simulation, we compared five different approaches: Elastic Net (EN), Genome-wide Association Study (GWAS)+EN, Principal Component Regression (PCR), Random Forest (RF) and Support Vector Machine (SVM). The results showed that EN has the smallest test mean squared error (MSE) and the highest portion of causal SNPs among identified SNPs. In the second simulation, we compared three approaches, GWAS+EN, GWAS+RF and GWAS+SVM. The GWAS+RF has the smallest test MSE and the highest causal percent. In the third simulation study, we compared two cross validation procedures: GWAS+EN versus modified learn and confirm cross validation GWAS+EN. The latter approach demonstrated better prediction accuracy at the expense of greatly increased computational time.
KeywordsGenomics GWAS Predictive modeling Machine learning Cross validation
Useful discussions with Dr. Zheng Zha and reviews by Dr. Yu-chen Su at Takeda Pharmaceutical Develop Center are highly appreciated.
Conflict of Interest
The project was carried out while Dr. Pingye Zhang was a summer intern at Takeda develop center at Deerfield, IL. USA. All other authors were Takeda employees at the time. The nature of the research is comparison of statistical methodologies and cross validation procedures, there is no conflict of interests.
- 2.Schrodi, S.J., Mukherjee, S., Shan, Y., Tromp, G., Sninsky, J.J., Callear, A.P., et al.: Genetic-based prediction of disease traits: prediction is very difficult, especially about the future. Front. Genet. 5 Article162. 2 (2014)Google Scholar
- 6.Visscher, P.M., Yang, J., Goddard, M.E.: A commentary on ‘common SNPs explain a large proportion of the heritability for human height’ by Yang et al. Twin Res. Hum. Genet. 13, 517–524 (2010)Google Scholar
- 8.Francis Lam, Y.W.: Scientific challenges and implementation barriers to translation of Pharmacogenomics in clinical practice. ISRN Pharm. Article ID 641089 (2013)Google Scholar
- 21.Tin Kam, H.O.: Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August, pp. 278–282 (1995)Google Scholar