Skip to main content

What Do We Know?: Simple Statistical Techniques that Help

  • Protocol
  • First Online:
Chemoinformatics and Computational Chemical Biology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 672))

Abstract

An understanding of simple statistical techniques is invaluable in science and in life. Despite this, and despite the sophistication of many concerning the methods and algorithms of molecular modeling, statistical analysis is usually rare and often uncompelling. I present here some basic approaches that have proved useful in my own work, along with examples drawn from the field. In particular, the statistics of evaluations of virtual screening are carefully considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Loredo, T. J., From Laplace to Supernova SN 1987A: Bayesian inference in Astrophysics. Maximum Entropy and Bayesian Methods. P. F. Fougere (ed). Kluwer Academic, Netherlands: 1990, 81–142.

    Chapter  Google Scholar 

  2. Press, W. H.; Teukolsky, S. A.; Vetterling, W. T.; Flannery, B. P., Numerical Recipes 3rd Edition: The Art of Scientific Computing. 3rd ed; Cambridge University Press, New York: 2007.

    Google Scholar 

  3. Wainer, H., The most dangerous equation: Ignorance of how sample size affects statistical variation has created havoc for nearly a millennium. Am. Sci. 2007, 248–256.

    Google Scholar 

  4. Stigler, S. M., Statistics and the question of standards. J. Res. Natl. Inst. Stand. Technol. 1996, 101, 779–789.

    Article  Google Scholar 

  5. Student, The probably error of a mean. Biometrika 1908, 6, 1–25.

    Google Scholar 

  6. DeLong, E. R.; DeLong, D. M.; Clarke-Pearson, D. L., Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988, 44, 837–845.

    Article  PubMed  CAS  Google Scholar 

  7. Cortes, C.; Mohri, M., Confidence intervals for the area under the ROC curve. Adv. Neural. Inf. Process. Syst. 2004, 17, 305–312.

    Google Scholar 

  8. Huang, N.; Shoichet, B. K.; Irwin, J. J., Benchmarking sets for molecular docking. J. Med. Chem. 2006, 49, 6789–6801.

    Article  PubMed  CAS  Google Scholar 

  9. Bayly, C. I.; Truchon, J.F., Evaluating virtual screening methods: good and bad metrics for the “early recognition” problem. J. Chem. Inf. Model., 2007, 47, 488–508.

    Article  PubMed  Google Scholar 

  10. Jain, A. N., Surflex-Dock 2.1: Robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search. J. Comput. Aided Mol. Des. 2007, 21, 281–306.

    Article  PubMed  CAS  Google Scholar 

  11. Skillman, A. G.; Nicholls, A., SAMPL2: Statistical Analysis of the Modeling of Proteins and Ligands: 2008.

    Google Scholar 

  12. Scargle, J. D., Publication bias: The “File-Drawer” problem in scientific inference. J. Sci. Explor. 2000, 14, 91–106.

    Google Scholar 

  13. Ziliak, S. T.; McCloskey, D. N., The Cult of Statistical Significance. The University of Michigan Press, USA: 2007.

    Google Scholar 

  14. Warren, G. L.; Andrews, C. W.; Capelli, A. M.; Clarke, B.; LaLonde, J.; Lambert, M. H.; Lindvall, M.; Nevins, N.; Semus, S. F.; Senger, S.; Tedesco, G.; Wall, I. D.; Woolven, J. M.; Peishoff, C. E.; Head, M. S., A critical assessment of docking programs and scoring functions. J. Med. Chem. 2006, 49, 5912–5931.

    Article  PubMed  CAS  Google Scholar 

  15. Enyedy, I. J.; Egan, W. J., Can we use docking and scoring for hit-to-lead optimization? J. Comput. Aided Mol. Des. 2008, 22, 161–168.

    Article  PubMed  CAS  Google Scholar 

  16. Rerks-Ngarm, S.; Pitisuttithum, P.; Nitayaphan, S.; Kaewkungwal, J.; Chiu, J.; Paris, R.; Premsri, N.; Namwat, C.; de Souza, M.; Adams, E.; Benenson, M.; Gurunathan, S.; Tartaglia, J.; McNeil, J. G.; Francis, D. P.; Stablein, D.; Birx, D. L.; Chunsuttiwat, S.; Khamboonruang, C.; Thongcharoen, P.; Robb, M. L.; Michael, N. L.; Kunasol, P.; Kim, J. H., Vaccination with ALVAC and AIDSVAX to prevent HIV-1 infection in Thailand. N. Engl. J. Med. 2009, 361, 2209–2220.

    Article  PubMed  CAS  Google Scholar 

  17. Welch, B. L., The generalization of “student’s” problem when several different population variances are involved. Biometrika 1946, 34, 28–35.

    Google Scholar 

  18. Satterhwaite, F. E., An approximate distribution of estimates of variance components. Biometrics Bull. 1947, 2, 110–114.

    Article  Google Scholar 

  19. Glantz, S. A., How to detect, correct, and prevent errors in the medical literature. Circulation 1980, 61, 1–7.

    Article  PubMed  CAS  Google Scholar 

  20. Snedecor, G. W.; Cochran, W. G., Statistical Methods. 8th ed.; Blackwell Publishing, Malden, MA: 1989.

    Google Scholar 

  21. McGann, M. R.; Almond, H. R.; Nicholls, A.; Grant, J. A.; Brown, F. K., Gaussian docking functions. Biopolymers 2003, 68, 76–90.

    Article  PubMed  CAS  Google Scholar 

  22. Rush, T. S.; Grant, J. A.; Mosyak, L.; Nicholls, A., A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction. J. Med. Chem. 2005, 48, 1489–1495.

    Article  PubMed  CAS  Google Scholar 

  23. Glantz, S. A., Primer of Biostatistics. 5th ed.; McGraw-Hill, New York: 2002.

    Google Scholar 

  24. Kanji, G. K., 100 Statistical Tests. 3 rd ed.; Sage Publications, London: 2006.

    Google Scholar 

  25. Bulmer, M. G., Principles of Statistics. Dover, USA: 1979.

    Google Scholar 

  26. Keeping, E. S., Introduction to Statistical Inference. Dover, USA: 1995.

    Google Scholar 

  27. van Belle, G., Statistical Rules of Thumb. Wiley, New York: 2002.

    Google Scholar 

  28. Pepe, M. S., The Statistical Evaluation of Medical Tests for Classifaction and Prediction. Oxford University Press: 2004.

    Google Scholar 

  29. Good, P. I.; Hardin, J. W., Common Errors in Statistics (and How to Avoid Them). 2nd ed.; Wiley-InterScience, New Jersey: 2006.

    Book  Google Scholar 

  30. Moye, L. A., Statistical Reasoning in Medicine. 2nd ed.; Springer, New York: 2006.

    Book  Google Scholar 

  31. Silvia, D. S., Data Analysis: A Bayesian Tutorial. Oxford Science Publications: 1996.

    Google Scholar 

  32. Marin, J. -M.; Robert, C. P., Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer, New York: 2007.

    Google Scholar 

  33. Carlin, B. P.; Loius, T. A., Bayes and Empirical Bayes Methods for Data Analysis. 2nd ed.; Chapman & Hall/CRC, Boca Raton, FL: 2000.

    Book  Google Scholar 

  34. Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G., Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 2002, 42, 1273–1280.

    Article  PubMed  CAS  Google Scholar 

  35. Vidal, D.; Thormann, M.; Pons, M., LINGO, an efficient holographic text based method to calculate biophysical properties and intermolecular similarities. J. Complement. Integr. Med. 2005, 45, 386–393.

    CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Appendices

Appendix 1: Using logit to Get Error Bounds

In the section on enrichment as a metric for virtual screening we arrived at a formula for the variance of the enrichment.

$$ {\text{SEM}}(E({f_{\text{i}}})) \approx \frac{1}{{{f_{\text{i}}}}}\sqrt {{\frac{{{f_{\text{a}}}(1 - {f_{\text{a}}})}}{{{N_{\text{a}}}}}}} \left( {1 + \frac{1}{2} s{{({f_{\text{i}}})}^2}\frac{1}{{RE}}\frac{{(1 - {f_{\text{i}}})}}{{(1 - {f_{\text{a}}})}}} \right) $$
(60)

Let’s assume that R, the ratio of inactives to actives, is very large so we just have:

$$ {\text{SEM}}(E({f_{\text{i}}})) = \frac{1}{{{f_{\text{i}}}}}\sqrt {{\frac{{{f_{\text{a}}}(1 - {f_{\text{a}}})}}{{{N_{\text{a}}}}}}} $$
(61)

In our example, the ROC enrichment was fivefold at 1% inactives because the fraction of actives was 0.05. This would mean the 95% error would be:

$$ \begin{gathered} {\text{Err}}(95\% |{f_{\text{i}}} = 0.01) = \pm 1.96 \times 100\sqrt {{\frac{{0.05 \times 0.95}}{{{N_{\text{a}}}}}}} \hfill \\ {\text{Err}}(95\% |{f_{\text{i}}} = 0.01) = \pm 40.5/\sqrt {{{N_{\text{a}}}}} \hfill \\ \end{gathered} $$
(62)

Now the enrichment is 5.0. If N a < (40.5/5.0)2, i.e., N a < 65, then the lower error bound becomes negative, i.e., a nonsense value. The problem is, as mentioned in the text, that the quantity of interest, the fraction f a of actives, is bounded between 0 and 1. However, if we transform with the logit function it becomes unbounded, just like the Gaussian.

$$ y = l(x) = \log (\frac{x}{{1 - x}}) $$
(63)

If we make this transformation then the fraction f a = 0.05 becomes:

$$ l(0.05) = \log (\frac{{0.05}}{{1 - 0.05}}) = - 2.944 $$
(64)

This is our new mean. Now, we have to recalculate the variance in logit space, i.e.,

$$ \begin{gathered} {{\rm var}_l} \approx {\left( {\frac{1}{{ < x > (1 - < x > )}}} \right)^2}{{\rm var}_x} \\ = {\left( {\frac{1}{{0.05(1 - 0.05)}}} \right)^2}0.05 \times (1 - 0.05) \\ = 21.05 \\ \end{gathered} $$
(65)

This means the error bounds on the logit version of f a become:

$$ \begin{gathered} l({f_{\text{i}}} = 0.01) = - 2.944\pm 1.96\sqrt {{\frac{{21.05}}{{{N_{\text{a}}}}}}} \hfill \\ = - 2.944\pm 8.99/\sqrt {{{N_{\text{a}}}}} \hfill \\ \end{gathered} $$
(66)

Suppose we set N a to a value much less than the “silly” threshold of 65. Let’s make it 25. In non-logit space this means the 95% range of enrichments is:

$$ {\text{E}}\left( {{f_{\text{i}}} = 0.0{1}} \right) = \left[ {{5}.0 - {4}0.{5}/{5}.0,{5}.0 + {4}0.{5}/{5}.0} \right] $$
(67)
$$ = \left[ { - {3}.{1},{13}.{1}} \right] $$

Clearly, the lower range is nonsense. Now consider the range of f a in logit space:

$$ {f_{\text{a}}}{,_{\text{logit}}}\left( {{f_{\text{i}}} = 0.0{1}} \right) = \left[ { - {2}.{944}--{8}.{99}/{5}, - {2}.{944} + {8}.{99}/{5}} \right] $$
(68)
$$ = \left[ { - {4}.{8}, - {1}.{144}} \right] $$

Now it is perfectly ok that the lower range is negative because logit functions go from negative to positive infinity. The final step, then, is to transform these values back to a fraction, using the inverse logit function, i.e.,

$$ {l^{ - 1}}(y) = \frac{1}{{1 + {e^{ - y}}}} $$
(69)

And then divide by f i to get the enrichment. If we do this we get:

$$ {f_{\text{a}}}\left( {{f_{\text{i}}} = 0.0{1}} \right) = \left[ {0.00{8},0.{247}} \right] $$
(70)
$$ E\left( {0.0{1}} \right) = \left[ {0.{8},{24}.{7}} \right] $$

Clearly, these are large error bounds, error bounds that actually include an enrichment of less than random! However, they are not negative and they are a reflection of the difficulty of pinning the enrichment down with so few actives. Even if we repeat the analysis with four times as many actives, i.e., N a = 100, the 95% range is still [2.1, 11.5]. The untransformed range for N a = 100 is ~[1.0, 9.0].

Appendix 2: Why Variances Add

Suppose we have two sources of error that can move the measured value away from its true mean, and let’s suppose that mean value is zero for simplicity. The CLT tells us that each source alone will produce a distribution of values according to the number of observations and the intrinsic variance of each source:

$$ pd{f_\alpha }(x) = \sqrt {{\frac{N}{{2\pi \sigma_{^\alpha }^2}}}} {e^{ - {x^2}N/2\sigma_{^\alpha }^2}};\quad \quad pd{f_\beta }(y) = \sqrt {{\frac{N}{{2\pi \sigma_\beta^2}}}} {e^{ - {y^2}N/2\sigma_{^\beta }^2}} $$
(71)

Now x and y are independent variations from the mean; therefore, the probability of observing an error of x from the first source and y from the second source has to be the joint probability, i.e.,

$$ pd{f_{\alpha, \beta }}(x,y) = \frac{N}{{2\pi {\sigma_\alpha }{\sigma_\beta }}}{e^{ - N({x^2}/2\sigma_{^\alpha }^2 + {y^2}/2\sigma_{^\beta }^2)}} $$
(72)

Now for such a combination of errors the total error is just (x + y). So what is the average square of the error, i.e., the variance, over all possible x and y? This is just the two dimensional averaging (i.e., integral) of (x + y)2, weighted by pdf α, β(x, y), i.e.,

$$ {\rm var} (x + y) = \frac{N}{{2 \pi {\sigma_\alpha }{\sigma_\beta }}} \iint\limits \begin{array}{l} x = - \infty, \infty \\ y = - \infty, \infty \end{array}{{{(x + y)}^2}}{e^{ - (N/2)({x^2}/\sigma_\alpha^2 + {y^2}/\sigma_\beta^2)}}{\rm d} x{\rm d} y $$
(73)

We can split this into three integrals by expanding (x + y)2. Thus:

$$ \begin{array}{c} {\rm var} (x + y) = \frac{N}{{2\pi {\sigma_\alpha }{\sigma_\beta }}} \iint \limits_{l} x = - \infty, \infty \\ y = - \infty, \infty {{x^2}}{e^{ - (N/2)({x^2}/\sigma_\alpha^2 + {y^2}/\sigma_\beta^2)}} {\rm d} x{\rm d} y \\ + \frac{N}{{\pi {\sigma_\alpha }{\sigma_\beta }}} \iint \limits_{l} x = - \infty, \infty \\ y = - \infty, \infty {xy}{e^{ - (N/2)({x^2}/\sigma_\alpha^2 + {y^2}/\sigma_\beta^2)}} {\rm d} x{\rm d} y \\ + \frac{N}{{2\pi {\sigma_\alpha }{\sigma_\beta }}} \iint \limits_{l} x = - \infty, \infty \\ y = - \infty, \infty {{y^2}}{e^{ - (N/2)({x^2}/\sigma_\alpha^2 + {y^2}/\sigma_\beta^2)}}{\rm d} x{\rm d} y \end{array} $$
(74)

We can rewrite the first term as:

$$ \begin{array}{c} \frac{N}{{2\pi {\sigma_\alpha }{\sigma_\beta }}} \iint \limits_{x = - \infty, \infty y = - \infty, \infty} {{x^2}}{e^{ - (N/2)({x^2}/\sigma_\alpha^2 + {y^2}/\sigma_\beta^2)}}{\rm d} x{\rm d} y \\ = \frac{N}{{2\pi {\sigma_\alpha }{\sigma_\beta }}}\int\limits_{x = - \infty }^\infty {{x^2}{e^{ - (N/2)({x^2}/\sigma_\alpha^2)}}{\rm d} x} \int\limits_{y = - \infty }^\infty {{e^{ - (N/2)({y^2}/\sigma_\beta^2)}}} {\rm d} y \end{array} $$
(75)

Therefore, we can integrate the integral over y independently. We can do the same thing for the third term for x. This leads to:

$$ \begin{array}{c} {\text{var}}(x + y) = \sqrt {{\frac{N}{{2\pi {\sigma_\alpha }}}}} \int\limits_{x = - \infty, \infty } {{x^2}{e^{ - (N/2)({x^2}/\sigma_\alpha^2)}}{\rm d} x} \\ + \frac{N}{{\pi {\sigma_\alpha }{\sigma_\beta }}}\iint\limits_{l} x = - \infty, \infty \\ y = - \infty, \infty {xy}{e^{ - (N/2)({x^2}/\sigma_\alpha^2 + {y^2}/\sigma_\beta^2)}}{\rm d} x{\rm d} y \\ + \sqrt {{\frac{N}{{2\pi {\sigma_\beta }}}}} \int\limits_{y = - \infty, \infty } {{y^2}{e^{ - (N/2)({y^2}/\sigma_\beta^2)}}{\rm d} y} \end{array} $$
(76)

Now, given that the mean is zero, the first term is just the integral for the variance due to x, the third term is the integral for the variance due to y, and the second term must be zero because it separates into the product of two integrals each of which must be zero as they calculate the average value of x and y, respectively, both zero. Therefore:

$$ {\rm var} (x + y) = {\rm var} (x) + {\rm var} (y) $$
(77)

The astute reader will notice that we could have performed the same sequence of operations with any pdf, not just a Gaussian, and arrived at the same conclusion. The key steps are multiplying the individual pdfs together, separating the resultant integral into three integrals, two of which are the individual variances and the third must be equal to zero because we defined the mean of each pdf to zero. That is, this is a general result, not just one pertaining to Gaussian forms of the distribution function.

Appendix 3: Deriving the Hanley Formula for AUC Error

Recall that we have the following equation for the variance of either the actives or the inactives, where w = AUC for the former and w = 1 − AUC for the later:

$$ {\text{Var}} = \frac{{{w^2}(1 - w)}}{{1 + w}} $$
(78)

The assumption by Hanley is that the pdf for both actives and inactives follows an exponential distribution, e.g.

$$ \begin{gathered} x \geqslant 0 \hfill \\ {p_{\text{active}}}(x) = {\lambda_{\text{a}}}{e^{ - {\lambda_{\text{a}}}x}} \hfill \\ {p_{\text{inactive}}}(x) = {\lambda_{\text{i}}}{e^{ - {\lambda_{\text{i}}}x}} \hfill \\ x < 0 \hfill \\ {p_{\text{active}}}(x) = {p_{\text{inactive}}}(x) = 0 \hfill \\ \end{gathered} $$
(79)

Here, x is a score for either active or inactive that determines its rank (higher = better). These forms integrate from 0 to positive infinity to 1.0 as required. Since we can always rescale x by a constant and still have the same rankings, let’s set the lambda for inactives to 1, i.e.,

$$ \begin{gathered} {p_{\text{active}}}(x) = \lambda {e^{ - \lambda x}} \hfill \\ {p_{\text{inactive}}}(x) = {e^{ - x}} \hfill \\ \end{gathered} $$
(80)

Given these probability density functions we can write down an expression for the AUC either in terms of the fraction of inactives of lower score than each active, or as the fraction of actives higher than each inactive. The math is a little cleaner if we do the latter:

$$ {\text{AUC}} = \int\limits_{x = 0}^{x = + \infty } {{p_{\text{inactive}}}(x)\int\limits_{y = x}^{y = + \infty } {{p_{\text{inactive}}}(y){\rm d} y{\rm d} x} } $$
(81)

The first term in the integral is the density of inactives at score x and this is multiplied by the fraction of actives with a score greater than x. If we substitute the Hanley pdfs we get:

$$ \begin{gathered} {\text{AUC}} = \int\limits_{x = 0}^{x = \infty } {{e^{ - x}}} \int\limits_{y = x}^{y = \infty } {\lambda {e^{ - \lambda y}}} {\rm d} y{\rm d} x \\ = \int\limits_{x = 0}^{x = \infty } {{e^{ - x}}} {e^{ - \lambda x}}{\rm d} x \\ = \frac{1}{{1 + \lambda }} \\ \end{gathered} $$
(82)

We can see that this looks correct because if lambda is greater than one the scores of the actives must fall off more quickly than the inactives and the AUC will be less than 0.5, but if it is less than one it has a longer tail of positive scores and so has an AUC greater than 0.5. Now, let’s consider the variance for the inactives:

$$ {\text{Va}}{{\text{r}}_{\text{inactives}}} = \int\limits_{x = 0}^{x = \infty } {{e^{ - x}}} {\left[ {\int\limits_{y = x}^{y = \infty } {\lambda {e^{ - \lambda y}}{\rm d} y} } \right]^2}{\rm d} x - {\text{AU}}{{\text{C}}^2} $$
(83)

This is just the equivalent of <p><p> 2 we normally see for a variance but we are integrating over the pdf for inactives. Expanding and solving the integral we get:

$$ \begin{gathered} {\text{Va}}{{\text{r}}_{\text{inactives}}} = \int\limits_{x = 0}^{x = \infty } {{e^{ - x}}{e^{ - 2\lambda x}}{\rm d} x - {\text{AU}}{{\text{C}}^2}} \\ = \frac{1}{{1 + 2\lambda }} - {\text{AU}}{{\text{C}}^2} \\ \end{gathered} $$
(84)

Now, the nice thing about the Hanley choice is that we can substitute for lambda from the AUC, i.e.,

$$ \begin{gathered} {\text{AUC}} = \frac{1}{{1 + \lambda }} \\ \lambda = \frac{{1 - {\text{AUC}}}}{\text{AUC}} \\ {\text{Va}}{{\text{r}}_{\text{inactives}}} = \frac{\text{AUC}}{{2 - {\text{AUC}}}} - {\text{AU}}{{\text{C}}^2} \\ = \frac{{{\text{AUC}}{{(1 - {\text{AUC}})}^2}}}{{2 - {\text{AUC}}}} \\ \end{gathered} $$
(85)

Setting w = 1 − AUC, we get:

$$ {\text{Va}}{{\text{r}}_{\text{inactives}}} = \frac{{{w^2}(1 - w)}}{{1 + w}} $$
(86)

And the result required is obtained. A further nice thing about the Hanley pdf is that we can get a simple expression for the ROC curve. If we want to know what fraction, f, of inactives or actives have a score greater than z we have:

$$ \begin{array} {c} {f_{{inactive}}}(z) = \int\limits_{y = z}^{y = \infty } {{e^{ - y}}} { d} y \cr = {e^{ - z}} \\{f_{{active}}}(z) = {e^{ - \lambda z}} \end{array} $$
(87)

But (f(z)inactive, f(z)active) are the points on the ROC curve, parameterized by z. Therefore, to express the one in terms of the other we simply have:

$$ \begin{gathered} x = {e^{ - z}} \hfill \\ y = {e^{ - \lambda z}} \hfill \\ \therefore y = {x^\lambda } \hfill \\ y = {x^{\frac{{1 - {\text{AUC}}}}{\text{AUC}}}} \hfill \\ \end{gathered} $$
(88)

This is the form of the Hanley ROC curve for a given AUC value. It can be a pretty good fit to real data!

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Humana Press

About this protocol

Cite this protocol

Nicholls, A. (2011). What Do We Know?: Simple Statistical Techniques that Help. In: Bajorath, J. (eds) Chemoinformatics and Computational Chemical Biology. Methods in Molecular Biology, vol 672. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-839-3_22

Download citation

  • DOI: https://doi.org/10.1007/978-1-60761-839-3_22

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-60761-838-6

  • Online ISBN: 978-1-60761-839-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics