Abstract
Screening is about making decisions on the modulating activity of one particular compound on a biological system. When a compound testing experiment is repeated under the same conditions or as close to the same conditions as possible, the observed results are never exactly the same, and there is an apparent random and uncontrolled source of variability in the system under study. Nevertheless, randomness is not haphazard. In this context, we can see statistics as the science of decision making under uncertainty. Thus, the usage of statistical tools in the analysis of screening experiments is the right approach to the interpretation of screening data, with the aim of making them meaningful and converting them into valuable information that supports sound decision making.
In the HTS workflow, there are at least three key stages where key decisions have to be made based on experimental data: (1) assay development (i.e. how to assess whether our assay is good enough to be put into screening production for the identification of modulators of the target of interest), (2) HTS campaign process (i.e. monitoring that screening process is performing at the expected quality and assessing possible patterned signs of experimental response that may adversely bias and mislead hit identification) and (3) data analysis of primary HTS data (i.e. flagging which compounds are giving a positive response in the assay, namely hit identification).
In this chapter we will focus on how some statistical tools can help to cope with these three aspects. Assessment of assay quality is reviewed in other chapters, so in Section 1 we will briefly make some further considerations. Section 2 will review statistical process control, Section 3 will cover methodologies for detecting and dealing with HTS patterns and Section 4 will describe approaches for statistically guided selection of hits in HTS.
Key words
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsAbbreviations
- EDA:
-
Exploratory Data Analysis
- IQR:
-
Inter-Quartile Range
- M:
-
Mean
- MSR:
-
Minimum Significant Ratio
- PR:
-
Pattern Recognition
- QA:
-
Quality Assurance
- QC:
-
Quality Control
- QSAR:
-
Quantitative Structure Activity Relationship
- SD:
-
Standard Deviation
- SDI:
-
Standard Deviation of Inactives
- SEL:
-
Systematic Error Level
- SPC:
-
Statistical Process Control
- SQC:
-
Screening Quality Control
- uHTS:
-
ultra-High-Throughput Screening
- VEP:
-
Variance Explained by the Patterns
References
Charles Annis, Statistical Engineering. Available online at http://www.statisticalengineering.com
Malo N, Hanley JA, Cerquozzi S, Pelletier J, Nadon R. (2006) Statistical practice in high-throughput screening data analysis. Nat Biotechnol; 24(2): 167–175.
Macarron, R and Hertzberg R. Chapter 2 of this book, Design and Implementation of High Throughput Screening Assays.
Assay Guidance Manual Version 4.1. (2005) Eli Lilly and Company and NIH Chemical Genomics Center. Available online at http://www.ncgc.nih.gov/manual/toc.html
Taylor P, Stewart F, Dunnington DJ et al. (2000) Automated assay optimization with integrated statistics and smart robotics. J Biomol Screen; 5: 213–225.
Eastwood BJ, Farmen MW, Iversen PW, Craft TJ, Smallwood JK, Garbison KE, Delapp NW, Smith GF. (2006) The minimum significant ratio: a statistical parameter to characterize the reproducibility of potency estimates from concentration-response assays and estimation by replicate-experiment studies. J Biomol Screen; 11(3): 253–261.
Sittampalam GS, Iversen PW, Boadt JA, Kahl SD, Bright S, Zock JM, Janzen WP, Lister MD. (1997) Design of signal windows in high throughput screening assays for drug discovery. J Biomol Screen; 2: 159–169.
Iversen PW, Eastwood BJ, Sittampalam GS, Cox KL. (2006) A comparison of assay performance measures in screening assays: signal window, Z' factor, and assay variability ratio. J Biomol Screen; 11: 247–252.
Zhang JH, Chung TDY, Oldenburg KR. (1994) A simple statistical parameter for use in evaluation and validation of high throughput screening assays. J Biomol Screen; 4: 67–73.
Gribbon P, Lyons R, Laflin P, Bradley J, Chambers C, Williams BS, Keighley W. (2005) Sewing A. Evaluating real-life high-throughput screening data. J Biomol Screen; 10(2): 99–107.
Wu Z, Sui, Y. (2008) Quantitative assessment of hit detection and confirmation in single and duplicate high-throughput screenings. J Biomol Screen Online First; first published on January 23, 2008 as doi:10.1177/1087057107312628.
Gunter B, Brideau C, Pikounis B, Liaw A. (2003) Statistical and graphical methods for quality control determination of high-throughput screening data. J Biomol Screen; 8(6): 624–633.
Brideau C, Gunter B, Pikounis B, Liaw A. (2003) Improved statistical methods for hit selection in high-throughput screening. J Biomol Screen; 8(6): 634–647.
Wu G, Yuan Y, Hodge CN. (2003) Determining appropriate substrate conversion for enzymatic assays in high-throughput screening. J Biomol Screen; 8(6): 694–700.
Padmanabha R, Cook L, Gill J. (2005) HTS quality control and data analysis: a process to maximize information from a high-throughput screen. Comb Chem High Throughput Screen; 8(6): 521–527.
Westgard JO. (2001) Six Sigma Quality Design & Control. Desirable Precision and Requisite QC for Laboratory Measurement Processes. Westgard QC, Inc., Madison.
Enrick NL. (1985) Quality, Reliability, and Process Improvement. Industrial Press Inc, New York.
Coma I, Clark L, Diez E, Harper G, Herranz J, Hofmann G, Lennon M, Richmond N, Valmaseda M, Macarron R. (2009) Process validation and screen reproducibility in high-throughput screening. J Biomol Screen; 4(1): 66–76.
Analytical Methods Committee. Robust Statistics-How Not to Reject Outliers. (1989); Analyst 114: 1693–1697.
Kevorkov D, Makarenkov V. (2005) Statistical analysis of systematic errors in high-throughput screening. J Biomol Screen; 10(6): 557–567.
Available online at http://www.info2.uqam.ca/∼makarenv/HTS/old/hts.html
Root DE, Kelley BP, Stockwell BR. (2003) Detecting spatial patterns in biological array experiments. J Biomol Screen; 8(4): 393–398.
Makarenkov V, Zentilli P, Kevorkov D, Gagarin A, Malo N, Nadon R. (2007) An efficient method for the detection and elimination of systematic error in high-throughput screening. Bioinformatics; 23(13): 1648–1657.
Tukey JW. (1977) Exploratory Data Analysis. Addison-Wesley, Reading, MA.
Hoaglin J, Mosteller F, Tukey J. (1983) Understanding Robust and Exploratory Data Analysis. John Wiley, New York.
Inglese J, Auld DS, Jadhav A et al. (2006) Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries. Proc Natl Acad Sci USA; 103(31): 11473–11478.
Popa-Burke IG, Issakova O, Arroway JD, Bernasconi P, Chen M, Coudurier L, Galasinski S, Jadhav AP, Janzen WP, Lagasca D, Liu D, Lewis RS, Mohney RP, Sepetov N, Sparkman DA, Hodge CN. (2004) Streamlined system for purifying and quantifying a diverse library of compounds and the effect of compound concentration measurements on the accurate interpretation of biological assay results. Anal Chem; 76(24): 7278–7287.
Gagarin A, Makarenkov V, Zentilli P. (2006) Using clustering techniques to improve hit selection in high-throughput screening. J Biomol Screen; 11(8): 903–914.
Zhang JH, Chung TD, Oldenburg KR. (2000) Confirmation of primary active substances from high throughput screening of chemical and biological populations: a statistical approach and practical considerations. J Comb Chem; 2(3): 258–265.
Fogel P, Collette P, Dupront A, Garyantes T, Guedin D. (2002) The confirmation rate of primary hits: a predictive model. J Biomol Screen; 7(3): 175–190.
Zhang XD. (2007) A new method with flexible and balanced control of false negatives and false positives for hit selection in RNA interference high-throughput screening assays. J Biomol Screen; 12 (5): 645–655.
Wu X, Sills MA, Zhang JH. (2005) Further comparison of primary hit identification by different assay technologies and effects of assay measurement variability. J Biomol Screen; 10(6): 581–589.
Sui Y, Wu Z. (2007) Alternative statistical parameter for high-throughput screening assay quality assessment. J Biomol Screen; 12(2): 229–234.
Li Z, Mehdi S, Patel I, Kawooya J, Judkins M, Zhang W, Diener K, Lozada A, Dunnington D. (2000) An ultra-high throughput screening approach for an adenine transferase using fluorescence polarization. J Biomol Screen; 5(1): 31–38.
Janzen W, Bernasconi P, Cheatham L, Mansky P, Popa-Burke I, Williams K, Worley J, Hodge N. (2004) Optimizing the chemical genomics process. In: Darvas F, Guttman A, Dorman F (eds) Chemical Genomics: Advances in Drug Discovery and Functional Genomics Applications. Marcel Dekker, New York.
Rousseeuw PJ, Leroy AM. (1987) Robust Regression and Outliers Detection. John Wiley, New York.
Ripley BD, Venables WN. (2000) Modern Applied Statistics with S. Springer.
Acknowledgements
The authors are greatly indebted to Ricardo Macarron, Mike Snowden, Mark Lennon, Gavin Harper, Martin Everett, Liz Clark, Glenn Hofmann, Geoff Mellor, Chris Molloy, Andy Vines, Dave Bolton and Javier Sanchez-Vicente for all the productive discussions about how to best implement statistical methodologies in the HTS process at GlaxoSmithKline. Likewise, we would like to thank many other colleagues in IT and Screening for their ideas and experimental data. SQC software has been the result of a joint collaborative effort with Tessella. We are also grateful to Robert Hertzberg, Stephen Pickett and Emilio Diez for their support in the writing of this manuscript.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Appendix 1: Estimation of the data centre in ASDIC
Appendix 1: Estimation of the data centre in ASDIC
If the results of an HTS campaign are as below, we note:
n | size of the sample, number of compounds or pools |
\((x_1 ,x_2 ,...,x_n )\) | activity values |
\((x_{1:n} ,x_{2:n} ,...,x_{n:n} )\) | ordered activity values |
\(x_{i:n}\) | ith value in the ordered sample |
\(\hat \theta\) | location estimator |
\(\hat \theta\) | location estimator |
\(r_i = \left( {x_i - \hat \theta } \right)\) | residuals |
\(\left( {r^2 } \right)_{i:n}\) | ordered squared residuals |
-
The mean is the LS (least squares) estimator, because it minimises the expression
$$\mathop {\min }\limits_{\hat \theta } \sum\limits_{i = 1}^n {r_i^2 }$$ -
The LMS (least median squares) estimator minimises the expression
$$\mathop {\min }\limits_{\hat \theta } \left( {\mathop {median}\limits_{i = 1,...,n} \left( {r_i^2 } \right)} \right)$$ -
The LTS (least trimmed squares) estimator minimises the expression
$$\mathop {\min }\limits_{\hat \theta } \sum\limits_{i = 1}^h {\left( {r^2 } \right)_{i:n} }$$where \(h = \left[ {{n \mathord{\left/ {\vphantom {n 2}} \right. \kern-\nulldelimiterspace} 2}} \right] + 1\) is the half sample size
-
The LTSq (least trimmed squares quarter) estimator minimises the expression
$$\mathop {\min }\limits_{\hat \theta } \sum\limits_{i = 1}^q {\left( {r^2 } \right)_{i:n} }$$where \(q = \left[ {{n \mathord{\left/ {\vphantom {4}} \right. \kern-\nulldelimiterspace} }} \right] + 1\) is the quarter sample size.
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Coma, I., Herranz, J., Martin, J. (2009). Statistics and Decision Making in High-Throughput Screening. In: Janzen, W., Bernasconi, P. (eds) High Throughput Screening. Methods in Molecular Biology, vol 565. Humana Press. https://doi.org/10.1007/978-1-60327-258-2_4
Download citation
DOI: https://doi.org/10.1007/978-1-60327-258-2_4
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-60327-257-5
Online ISBN: 978-1-60327-258-2
eBook Packages: Springer Protocols