Statistical Methods for Drug Discovery

Kuhn, Max; Yates, Phillip; Hyde, Craig

doi:10.1007/978-3-319-23558-5_4

Statistical Methods for Drug Discovery

Max Kuhn⁶,
Phillip Yates^6,7 &
Craig Hyde⁶

Chapter

3300 Accesses
2 Citations

Part of the book series: Statistics for Biology and Health ((SBH))

Abstract

This chapter is a broad overview of the drug discovery process and areas where statistical input can have a key impact. The focus is primarily in a few key areas: target discovery, compound screening/optimization, and the characterization of important properties. Special attention is paid to working with assay data and phenotypic screens. A discussion of important skills for a nonclinical statistician supporting drug discovery concludes the chapter.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://www.pngu.mgh.harvard.edu/purcell/plink/.
2.
The term “Mendelian Randomization” refers to the notion that we are randomized at birth to the genetic “treatment” of the SNP.
3.
Thankfully, the academic community has been highly co-operative with one another in creating large consortia to produce meta-analyses from many smaller GWAS studies that total to hundreds of thousands of subjects.
4.
http://www.iconplc.com.
5.
http://www.certara.com.
6.
http://www.simcyp.com/.
7.
http://www.simulations-plus.com/.
8.
http://www.mbswonline.com.
9.
http://bit.ly/1qilzvh.

References

Abecasis G, Cherny S, Cookson W, Cardon L (2001) Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30(1):97–101
Article Google Scholar
Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P (2007) Molecular biology of the cell. Garland Publishing, New York
Google Scholar
Alberts B, Bray D, Hopkin K, Johnson A, Lewis J, Raff M, Roberts K, Walter P (2013) Essential cell biology. Garland Publishing, New York
Google Scholar
Anderson B, Holford N (2008) Mechanism-based concepts of size and maturity in pharmacokinetics. Ann Rev Pharmacol Toxicol 48(1):303–332
Article Google Scholar
Arrowsmith J (2011a) Trial watch: phase III and submission failures: 2007–2010. Nat Rev Drug Discov 10(2):87–87
Google Scholar
Arrowsmith J (2011b) Trial watch: phase II failures: 2008–2010. Nat Rev Drug Discov 10(5): 328–329
Google Scholar
Bickle M (2010) The beautiful cell: high-content screening in drug discovery. Anal Bioanal Chem 398(1):219–226
Article Google Scholar
Bonate P (2011) Pharmacokinetic-pharmacodynamic modeling and simulation. Springer, Berlin
Book Google Scholar
Box GEP, Hunter S, Hunter W (2005) Statistics for experimenters: design, innovation, and discovery. Wiley, Hoboken
Google Scholar
Burdick R, Borror C, Montgomery D (2003) A review of methods for measurement systems capability analysis. J Qual Technol 35(4):342–354
Google Scholar
Burdick R, Borror C, Montgomery D (2005) Design and analysis of gauge R&R studies: making decisions with confidence intervals in random and mixed ANOVA models, vol 17. SIAM, Philadelphia
Book Google Scholar
Burton P, Clayton D, Cardon L, Craddock N et al (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145):661–678
Article Google Scholar
Clark J, Flanagan M, Telliez J-B (2014) Discovery and development of janus kinase (JAK) inhibitors for inflammatory diseases. J Med Chem 57(12):5023–5038
Article Google Scholar
Cochran W, Cox G (1950) Experimental designs. Wiley, New York
Google Scholar
Crick F (1970) Central dogma of molecular biology. Nature 227(5258):561–563
Article Google Scholar
Curry S, McCarthy D, DeCory H, Marler M, Gabrielsson J (2002) Phase I: the first ppportunity for extrapolation from animal data to human exposure. Wiley, New York, pp 95–115
Google Scholar
Djebali S, Davis C, Merkel A, Dobin A et al (2012) Landscape of transcription in human cells. Nature 489(7414):101–108
Article Google Scholar
Dunham I, Kundaje A, Aldred S, Collins P et al (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414):57–74
Article Google Scholar
Eggert U (2013) The why and how of phenotypic small-molecule screens. Nat Chem Biol 9(4):206–209
Article Google Scholar
Espie P, Tytgat D, Sargentini-Maier M, Poggesi I, Watelet J (2009) Physiologically based pharmacokinetics (PBPK). Drug Metab Rev 41(3):391–407
Article Google Scholar
Evans S, Dawson P (1988) The end of the p value? Br Heart J 60(3):177
Article Google Scholar
Fieller E (1954) Some problems in interval estimation. J R Stat Soc Ser B (Methodological) 16(2):175–185
MATH MathSciNet Google Scholar
Ganesh T, Jiang J, Yang M, Dingledine R (2014) Lead optimization studies of cinnamic amide EP2 antagonists. J Med Chem 57(10):4173–4184
Article Google Scholar
Gao X (2011) Multiple testing corrections for imputed SNPs. Genet Epidemiol 35(3):154–158
Article Google Scholar
Gentleman R, Carey VJ, Bates D et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80
Google Scholar
Gregory R (2005) Synergy between sequence and size in large-scale genomics. Nat Rev Genet 6(9):699–708
Article Google Scholar
Griffith M, Griffith O, Coffman A, Weible J, McMichael J, Spies N, Koval J, Das I, Callaway M, Eldred J, Miller C, Subramanian J, Govindan R, Kumar R, Bose R, Ding L, Walker J, Larson D, Dooling D, Smith S, Ley T, Mardis E, Wilson R (2013) DGIdb: mining the druggable genome. Nat Methods 10(12):1209–1210
Article Google Scholar
Grundberg E, Small K, Hedman A, Nica A et al (2012) Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat Genet 44(10):1084–1089
Article Google Scholar
Haaland P (1989) Experimental design in biotechnology, vol 105. CRC Press, Boca Raton
Google Scholar
Haney S, Lapan P, Pan J, Zhang J (2006) High-content screening moves to the front of the line. Drug Discov Today 11(19–20):889–894
Article Google Scholar
Harvey P, Tarran R, Garoff S, Myerburg M (2011) Measurement of the airway surface liquid volume with simple light refraction microscopy. Am J Respir Cell Mol Biol 45(3):592–599
Article Google Scholar
Hendriks M, de Boer J, Smilde A (1996) Robustness of analytical chemical methods and pharmaceutical technological products. Elsevier, Amsterdam
Google Scholar
Hermann J, Chen Y, Wartchow C, Menke J, Gao L, Gleason S, Haynes N, Scott N, Petersen A, Gabriel S, Vu B, George K, Narayanan A, Li S, Qian H, Beatini N, Niu L, Gan Q (2013) Metal impurities cause false positives in high-throughput screening campaigns. ACS Med Chem Lett 4(2):197–200
Article Google Scholar
Hill A, LaPan P, Li Y, Haney S (2007) Impact of image segmentation on high-content screening data quality for SK-BR-3 cells. BMC Bioinf 8(1):340–353
Article Google Scholar
Holmes M, Simon T, Exeter H, Folkersen L et al (2013) Secretory phospholipase A2-IIA and cardiovascular disease. J Am Coll Cardiol 62(21):1966–1976
Article Google Scholar
Howie B, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5:e1000529
Article Google Scholar
Hughes J, Rees S, Kalindjian S, Philpott K (2011) Principles of early drug discovery. Br J Pharmacol 162(6):1239–1249
Article Google Scholar
Hwang W, Fu Y, Reyon D, Maeder M, Tsai S, Sander J, Peterson R, Yeh J-R, Joung J (2013) Efficient genome editing in zebrafish using a CRISPR-Cas system. Nat Biotechnol 31(3): 227–229
Article Google Scholar
Johnson R, Wichern D (2007) Applied multivariate statistical analysis, 6th edn. Prentice Hall, New York
MATH Google Scholar
Jones S, de Souza P, Lindsay M (2004) siRNA for gene silencing: a route to drug target discovery. Curr Opin Pharmacol 4(5):522–527
Article Google Scholar
Jorde L, Wooding S (2004) Genetic variation, classification and ‘race’. Nat Genet 36:S28–S33
Article Google Scholar
Kainkaryam R, Woolf P (2009) Pooling in high-throughput drug screening. Curr Opin Drug Discov Dev 12(3):339–350
Google Scholar
Kalbfleisch J, Prentice R (1980) The statistical analysis of failure time data. Wiley, New York
MATH Google Scholar
Kang H, Sul J, Service S, Zaitlen N, Kong S, Freimer N, Sabatti C, Eskin E (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42(4):348–354
Article Google Scholar
Kim S, Swaminathan S, Inlow M, Risacher S, The Alzheimer’s Disease Neuroimaging Initiative (ADNI) (2013) Influence of genetic variation on plasma protein levels in older adults using a multi-analyte panel. PLoS ONE 8(7):e70269
Article Google Scholar
Kola I, Landis J (2004) Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov 3(8):711–716
Article Google Scholar
Korn K, Krausz E (2007) Cell-based high-content screening of small-molecule libraries. Curr Opin Chem Biol 11(5):503–510
Article Google Scholar
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Berlin
Book MATH Google Scholar
Landry Y, Gies J-P (2008) Drugs and their molecular targets: an updated overview. Fundam Clin Pharmacol 22(1):1–18
Article Google Scholar
Li J, Ji L (2005) Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95(3):221–227
Article Google Scholar
Li Y, Willer C, Ding J, Scheet P, Abecasis G (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34(8):816–834
Article Google Scholar
Lin J, Lu A (1997) Role of pharmacokinetics and metabolism in drug discovery and development. Pharmacol Rev 49(4):403–449
Google Scholar
Lindsay M (2003) Target discovery. Nat Rev Drug Discov 2(10):831–838
Article Google Scholar
Lonsdale J, Thomas J, Salvatore M, Phillips R et al (2013) The genotype-tissue expression (GTEx) project. Nat Genet 45(6):580–585
Article Google Scholar
Luo C, Laaja P (2004) Inhibitors of JAKs/STATs and the kinases: a possible new cluster of drugs. Drug Discov Today 9(6):268–275
Article Google Scholar
Malo N, Hanley J, Cerquozzi S, Pelletier J, Nadon R (2006) Statistical practice in high-throughput screening data analysis. Nat Biotechnol 24(2):167–175
Article Google Scholar
Matthews J, Altman D (1996) Statistics notes: interaction 2: compare effect sizes not P values. Br Med J 313(7060):808–808
Article Google Scholar
McVean G, Altshuler D, Durbin R, Abecasis G et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422):56–65
Article Google Scholar
Montgomery D (2012) Introduction to statistical quality control. Wiley, New York
Google Scholar
Muller P, Milton M (2012) The determination and interpretation of the therapeutic index in drug development. Nat Rev Drug Discov 11(10):751–761
Article Google Scholar
Murray C, Rees D (2009) The rise of fragment-based drug discovery. Nat Chem 1(3):187–192
Article Google Scholar
Nyholt D (2004) A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet 74(4):765–769
Article Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, Bender D, Maller J, Sklar P, de Bakker P, Daly M, Sham P (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575
Article Google Scholar
R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org
Rang H, Dale M, Ritter J, Moore P (2007) Pharmacology. Churchill Livingstone, Edinburgh
Book Google Scholar
Ratjen F, Doring D (2003) Cystic fibrosis. Lancet 361(9358):681–689
Article Google Scholar
Remlinger K, Hughes-Oliver J, Young S, Lam R (2006) Statistical design of pools using optimal coverage and minimal collision. Technometrics 48(1):133–143
Article MathSciNet Google Scholar
Rendic S, Di Carlo F (1997) Human cytochrome P450 enzymes: a status report summarizing their reactions, substrates, inducers, and inhibitors. Drug Metab Rev 29(1–2):413–580
Article Google Scholar
Rockman M, Kruglyak L (2006) Genetics of global gene expression. Nat Rev Genet 7(11):862–872
Article Google Scholar
Sackett D (2001) Why randomized controlled trials fail but needn’t: 2. Failure to employ physiological statistics, or the only formula a clinician-trialist is ever likely to need (or understand!). Can Med Assoc J 165(9):1226–1237
Google Scholar
Shariff A, Kangas J, Coelho L, Quinn S, Murphy R (2010) Automated image analysis for high-content screening and analysis. J Biomol Screen 15(7):726–734
Article Google Scholar
Shin S, Fauman E, Petersen A, Krumsiek J et al (2014) An atlas of genetic influences on human blood metabolites. Nat Genet 46(6):543–550
Article Google Scholar
Simpson E (1951) The interpretation of interaction in contingency tables. J R Stat Soc Ser B (Methodological) 13:238–241
MATH Google Scholar
Smith G, Shah E (2003) Mendelian randomization: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 32(1):1–22
Article Google Scholar
Soille P (2003) Morphological image analysis: principles and applications. Springer, Berlin
Google Scholar
Sterne J (2001) Sifting the evidence—what’s wrong with significance tests? Another comment on the role of statistical methods. Br Med J 322(7280):226–231
Article Google Scholar
Swinney D (2013) Phenotypic vs. target-based drug discovery for first-in-class medicines. Clin Pharmacol Ther 93(4):299–301
Article Google Scholar
Swinney D, Anthony J (2011) How were new medicines discovered? Nat Rev Drug Discov 10(7):507–519
Article Google Scholar
The C Reactive Protein Coronary Heart Disease Genetics Collaboration (2011) Association between c reactive protein and coronary heart disease: mendelian randomisation analysis based on individual participant data. Br Med J 342:d548
Article Google Scholar
Verkman A, Song Y, Thiagarajah J (2003) Role of airway surface liquid and submucosal glands in cystic fibrosis lung disease. Am J Physiol Cell Physiol 284(1):C2–C15
Article Google Scholar
Voight B, Peloso G, Orho-Melander M, Frikke-Schmidt R, Barbalic M, Jensen M, Hindy G, Holm H, Ding E, Johnson T et al (2012) Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet 380(9841):572–580
Article Google Scholar
Wang Q, Rager J, Weinstein K, Kardos P, Dobson G, Li J, Hidalgo I (2005) Evaluation of the MDR-MDCK cell line as a permeability screen for the blood-brain barrier. Int J Pharm 288(2): 349–359
Article Google Scholar
Watson J (1992) Recombinant DNA. Macmillan, New York
Google Scholar
Wilks A (2008) The JAK kinases: not just another kinase drug discovery target. Semin Cell Dev Biol 19(4):319–328
Article MathSciNet Google Scholar
Yang H, Liu X, Chimalakonda A, Lu Z, Chen C, Lee F, Shyu W (2010) Applied pharmacokinetics in drug discovery and development. Wiley, Hoboken, pp 177–239
Google Scholar
Zhang X (2011) Optimal high-throughput screening: practical experimental design and data analysis for genome-scale RNAi research. Cambridge University Press, Cambridge
Book Google Scholar
Zheng W, Thorne N, McKew J (2013) Phenotypic screens as a renewed approach for drug discovery. Drug Discov Today 18(21–22):1067–1073
Article Google Scholar

Download references

Acknowledgements

We would like to thank David Potter and Bill Pikounis for providing feedback on a draft of this chapter.

Author information

Authors and Affiliations

Pfizer Global R&D, Groton, CT, USA
Max Kuhn, Phillip Yates & Craig Hyde
Pfizer’s BioTherapeutics Statistics Group, Groton, CT, USA
Phillip Yates

Authors

Max Kuhn
View author publications
You can also search for this author in PubMed Google Scholar
Phillip Yates
View author publications
You can also search for this author in PubMed Google Scholar
Craig Hyde
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Max Kuhn .

Editor information

Editors and Affiliations

Nonclinical Statistics, Abbvie Inc, North Chicago, Illinois, USA
Lanju Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kuhn, M., Yates, P., Hyde, C. (2016). Statistical Methods for Drug Discovery. In: Zhang, L. (eds) Nonclinical Statistics for Pharmaceutical and Biotechnology Industries. Statistics for Biology and Health. Springer, Cham. https://doi.org/10.1007/978-3-319-23558-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-23558-5_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23557-8
Online ISBN: 978-3-319-23558-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics