Abstract
Twenty years ago, drug discovery was a somewhat plodding and scholastic endeavor; those days are gone. The intellectual challenges are greater than ever but the pace has changed. Although there are greater opportunities for therapeutic targets than ever before, the costs and risks are great and the increasingly competitive environment makes the pace of pharmaceutical drug hunting range from exciting to overwhelming. These changes are catalyzed by major changes to drug discovery processes through application of rapid parallel synthesis of large chemical libraries and high-throughput screening. These techniques result in huge volumes of data for use in decision making. Besides the size and complex nature of biological and chemical data sets and the many sources of data “noise”, the needs of business produce many, often conflicting, decision criteria and constraints such as time, cost, and patent caveats. The drive is still to find potent and selective molecules but, in recent years, key aspects of drug discovery are being shifted to earlier in the process. Discovery scientists are now concerned with building molecules that have good stability but also reasonable properties of absorption into the bloodstream, distribution and binding to tissues, metabolism and excretion, low toxicity, and reasonable cost of production. These requirements result in a high-dimensional decision problem with conflicting criteria and limited resources. An overview of the broad range of issues and activities involved in pharmaceutical screening is given along with references for further reading.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abt, M., Lim, Y., Sacks, J., Xie, M., and Young, S. S. (2001). A sequential approach for identifying lead compounds in large chemical databases. Journal of Biomolecular Screening, 16, 154–168.
Amidon, G., Lennernäs, H., Shah, V., and Crison, J. (1995). A theoretical basis for a biopharmaceutic drug classification: The correlation of in vitro drug product dissolution and in vivo bioavailability. Pharmaceutical Research, 12, 413–420.
Bejamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate—a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B, 57, 289–300.
Birkett, D. J. (1990). How drugs are cleared by the liver. Australian Prescriber, 13, 88–89.
Birkett, D. J. (1991). Bioavailability and first pass clearance. Australian Prescriber, 14, 14–16.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
Breiman, L. (1998). Arcing classifiers. The Annals of Statistics, 26, 801–849.
Breiman, L. (1999). Random forests, random features. Technical Report, University of California, Berkeley.
Breiman, L. (2001a). Random forests. Machine Learning, 45, 5–32.
Breiman, L. (2001b). Statistical modeling: The two cultures. Statistical Science, 16, 199–231.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and Regression Trees. CRC Press, New York.
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121–167.
Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multimodel Inference. Springer-Verlag, New York.
Campbell, C., Christianini, N., and Smola, A. (2000). Query learning with large margin classifiers. Proceedings of ICML2000, 8.
Comprehensive Medicinal Chemistry (2003). MDL Informations Systems, California.
Cook, R. D. and Nachtsheim, C. J. (1982). Model robust, linear-optimal design: A review. Technometrics, 24, 49–54.
Crivori, P., Cruciani, G., Carrupt, P., and Testa, B. (2000). Predicting blood-brain barrier permeation from three-dimensional molecular structure. Journal of Medicinal Chemistry, 43, 2204–2216.
Crum-Brown, A. and Fraser, T. R. (1869). On the connection between chemical constitution and physiological action. Part I. On the physiological action of the salts of the ammonium bases derived from strychnine, brucia, thebaia, codeia, morphia and nicotia. Part II. On the physiological action of the ammonium bases derived from atropia and conia. Transactions of the Royal Society of Edinburgh, 25, 151–203; 693–739.
Cummins, D. J., Andrews, C.W., Bentley, J. A., and Cory, M. (1996). Molecular diversity in chemical databases: Comparison of medicinal chemistry knowledge bases and databases of commercially available compounds. Journal of Chemical Information and Computer Sciences, 36, 750–763.
Dasarathy, B. (1991). Nearest Neighbor Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos, CA.
Derringer, G. and Suich, R. (1980). Simultaneous optimization of several response variables. Journal of Quality Technology, 12, 214–219.
Dorfman, R. (1943). The detection of defective members of large populations. Annals of Mathematical Science, 14, 436–440.
Drews, J. (2000). Drug discovery: A historical perspective. Science, 287, 1960–1964.
Dudoit, S., Fridlyand, J., and Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97, 77–87.
Engels, M. F. M., Thielemans, T., Verbinnen, D., Tollenaere, J. P., and Verbeeck, R. (2000). Cerberus: A system supporting the sequential screening process. Journal of Chemical Information and Computer Sciences, 40, 241–245.
Fix, E. and Hodges, J. L. (1951). Discriminatory analysis. Nonparametric discrimination: Consistency properties. Technical Report 4, U.S. Air Force, School of Aviation Medicine, Texas.
Frank, I. and Friedman, J. (1993). A statistical view of some chemometrics regression tools. Technometrics, 35, 109–148.
Goldberg, J. and Wittes, J. (1978). The estimation of false negatives in medical screening. Biometrics, 34, 77–86.
Hansch, C., Maolney, P. P., Fujita, T., and Muir, R. M. (1962). Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature, 194, 178–180.
Hartigan, J. (1975). Clustering Algorithms. John Wiley and Sons, New York.
Hastie, T. and Tibshirani, R. (1996a). Discriminant adaptive nearest-neighbor classification. IEEE Pattern Recognition and Machine Intelligence, 18, 607–616.
Hastie, T. and Tibshirani, R. (1996b). Discriminant adaptive nearest neighbor classification and regression. In Advances in Neural Information Processing Systems. Editors: D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, volume 8, pages 409–415, MIT Press, Cambridge.
Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning. Springer-Verlag, New York.
Hawkins, D. M., Basak, S. C., and Mills, D. (2003). Assessing model fit by cross-validation. Journal of Chemical Information and Computer Sciences, 43, 579–586.
Higgs, R., Bemis, K., Watson, I., and Wikel, J. (1997). Experimental designs for selecting molecules from large chemical databases. Journal of Chemical Information and Computer Sciences, 37, 861–870.
JMP (2003), Version 4.0.4. SAS Institute, North Carolina.
Johnson, M. E., Moore, L. M., and Ylvisaker, D. (1990). Minimax and maximum distance designs. Journal of Statistical Planning and Inference, 26, 131–148.
Kennard, R. and Stone, L. (1969). Computer aided design of experiments. Technometrics, 11, 137–148.
Kramer, C. Y. (1956). Extensions of multiple range tests to group means with unequal numbers of replications. Biometrics, 12, 309–310.
Leach, A. R. and Gillet, V. J. (2003). Introduction to Chemoinformatics. Kluwer Academic, Boston.
Maccs Drug Data Report (2003). MDL Informations Systems, California.
Major, J. (1999). What is the future of high-throughput screening? Journal of Biomolecular Screening, 4, 119–125.
Miller, A. J. (2002). Subset Selection in Regression, second edition. Chapman & Hall/CRC, New York.
Phatarfod, R. M. and Sudbury, A. (1994). The use of a square array scheme in blood testing. Statistics in Medicine, 13, 2337–2343.
Rohrer, S. P., Birzin, E., Mosley, R., Berk, S. C., Hutchins, S., Shen, D., Xiong, Y., Hayes, E., Parmar, R., Foor, R., Mitra, S., Degrado, S., Shu, M., Klopp, J., Cai, S. J., Blake, A., Chan, W. W. S., Pasternak, A., Yang, L., Patchett, A., Smith, R., Chapman, K., and Schaeffer, J. (1998). Rapid identification of subtype-selective agonists of the somatostatin receptor through combinatorial chemistry. Science, 282, 737–740.
Rusinko, A., III, Farmen, M. W., Lambert, C. G., Brown, P. L., and Young, S. S. (1999). Analysis of a large structure/biological activity data set using recursive partitioning. Journal of Chemical Information and Computer Sciences, 39, 1017–1026.
SAS System (2003), Version 8.2. SAS Institute, North Carolina.
Shao, J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association, 88, 486–494.
Shi, P. and Tsai, C.-L. (2002). Regression model selection—A residual likelihood approach. Journal of the Royal Statistical Society B, 64, 237–252.
Sittampalam, G. S., Iversen, P. W., Boadt, J. A., Kahl, S. D., Bright, S., Zock, J. M., Janzen, W. P., and Lister, M. D. (1997). Design of signal windows in high throughput screening assays for drug discovery. Journal of Biomolecular Screening, 2, 159–169.
Tukey, J. W. (1994). Reminder sheets for “Allowances for various types of error rate”. In The Collected Works of John W. Tukey, volume VIII, Multiple Comparisons: 1948–1983.. Editor: H. I. Braun, pages 335–339, Chapman & Hall, New York.
Tukey, J. W. (1997). More honest foundations for data analysis. Journal of Statistical Planning and Inference, 57: 21–28.
Vapnik, V. N. (1998). Statistical Learning Theory. Wiley-Interscience, New York.
Vapnik, V. N. (2000). The Nature of Statistical Learning Theory, second edition. Springer Verlag, New York.
Warmuth, M. K., Liao, J., Ratsch, G., Mathieson, M., Putta, S., and Lemmenk, C. (2003). Active learning with support vector machines in the drug discovery process. Journal of Chemical Information and Computer Sciences, 43, 667–673.
Weston, J., Perez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A., and Schölkopf, B. (2002). Feature selection and transduction for prediction of molecular bioactivity for drug design. Bioinformatics, 1, 1–8.
Wikel, J. H. and Higgs, R. E. (1997). Point: Applications of molecular diversity analysis in high throughput screening. Journal of Biomolecular Screening, 2, 65–66.
World Drug Index (2002). Thompson Derwent, London.
Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association, 93, 120–131.
Young, S. S., Ekins, S., and Lambert, C. G. (2002). So many targets, so many compounds, but so few resources. Current Drug Discovery, 1–6 (www.currentdrugdiscovery.com).
Zemroch, P. J. (1986). Cluster analysis as an experimental design generator. Technometrics, 28, 39–49.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Cummins, D.J. (2006). Pharmaceutical Drug Discovery: Designing the Blockbuster Drug. In: Dean, A., Lewis, S. (eds) Screening. Springer, New York, NY. https://doi.org/10.1007/0-387-28014-6_4
Download citation
DOI: https://doi.org/10.1007/0-387-28014-6_4
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-28013-4
Online ISBN: 978-0-387-28014-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)