Drugs and Drug-Like Compounds: Discriminating Approved Pharmaceuticals from Screening-Library Compounds

  • Amanda C. Schierz
  • Ross D. King
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5780)


Compounds in drug screening-libraries should resemble pharmaceuticals. To operationally test this, we analysed the compounds in terms of known drug-like filters and developed a novel machine learning method to discriminate approved pharmaceuticals from “drug-like” compounds. This method uses both structural features and molecular properties for discrimination. The method has an estimated accuracy of 91% in discriminating between the Maybridge HitFinder library and approved pharmaceuticals, and 99% between the NATDiverse collection (from Analyticon Discovery) and approved pharmaceuticals. These results show that Lipinski’s Rule of 5 for oral absorption is not sufficient to describe “drug-likeness” and be the main basis of screening-library design.


Inductive Logic Programming drug-likeness machine learning Rule of 5 compound screening library 


  1. 1.
    Leach, A.R., Gillet, V.J.: An Introduction to Chemoinformatics. Kluwer Academic Publishers, Dordrecht (2003)Google Scholar
  2. 2.
    Hann, M.M., Leach, A.R., Harper, G.: Molecular Complexity and Its Impact on the Probability of Finding Leads for Drug Discovery. Journal of Chemical Information and Computer Sciences 41(3), 856–864 (2001)CrossRefPubMedGoogle Scholar
  3. 3.
    Lipinski, C.A., Lombardo, F., Dominy, B.W., Feeney, P.J.: Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Delivery Rev. 23(1-3), 3–25 (1997)CrossRefGoogle Scholar
  4. 4.
    Ajay, W., Walters, W.P., Murcko, M.A.: Can We Learn To Distinguish between "Drug-like" and "Nondrug-like" Molecules? J. Med. Chem. 41(18), 3314–3324 (1998)CrossRefPubMedGoogle Scholar
  5. 5.
    Sadowski, J., Kubinyi, H.: A scoring scheme for discriminating between drugs and nondrugs. J. Med. Chem. 41, 3325–3329 (1998)CrossRefPubMedGoogle Scholar
  6. 6.
    Murcia-Soler, M., Pérez-Giménez, F., García-March, F.J., Salabert-Salvador, M.T., Díaz-Villanueva, W., Castro-Bleda, M.J.: Drugs and nondrugs: an effective discrimination with topological methods and artificial neural networks. J. Chem. Inf. Comput. Sci. 43(5), 1688–1702 (2003)CrossRefPubMedGoogle Scholar
  7. 7.
    Wagener, M., van Geerestein, V.J.: Potential drugs and nondrugs: prediction and identification of important structural features. J. Chem. Inf. Comput. Sci. 40 (2000)Google Scholar
  8. 8.
    Oprea, T.I., Davis, A.M., Teague, S.J., Leeson, P.D.: Is there a difference between leads and drugs? A historical perspective. J. Chem. Inf. Comput. Sci. 41, 1308–1315 (2001)CrossRefPubMedGoogle Scholar
  9. 9.
    Oprea, T.I.: Lead structure searching: Are we looking at the appropriate property? J. Comput.-Aided Mol. Design 16, 325–334 (2002)CrossRefGoogle Scholar
  10. 10.
    Veber, D.F., Johnson, S.R., Cheng, H.-Y., Smith, B.R., Ward, K.W., Kopple, K.D.: Molecular properties that influence the oral bioavailability of drug candidates. J. Med. Chem. 45, 2615–2623 (2002)CrossRefPubMedGoogle Scholar
  11. 11.
    Baurin, N., Baker, R., Richardson, C.M., Chen, I.-J., Foloppe, N., Potter, A., Jordan, A., Roughley, S., Parratt, M.J., Greaney, P., Morley, D., Hubbard, R.E.: Drug-like Annotation and Duplicate Analysis of a 23-Supplier Chemical Database Totalling 2.7 Million Compounds. Journal of Chemical Information and Modeling 44(2), 643–651 (2004)Google Scholar
  12. 12.
    King, R.D., Muggleton, S.H., Srinivasan, A., Sternberg, M.J.E.: Structure activity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity using inductive logic programming. Proceedings of the National Academy of Sciences, USA 93, 438–442 (1996)CrossRefGoogle Scholar
  13. 13.
    Buttingsrud, B., Ryeng, E., King, R.D., Alsberg, B.K.: Representation of molecular structure using quantum topology with inductive logic programming in structure-activity relationships. Journal of Computer-Aided Molecular Design 20(6), 361–373 (2006)CrossRefPubMedGoogle Scholar
  14. 14.
    Bader, R.F.W.: Atoms in Molecules - A Quantum Theory. Oxford University Press, Oxford (1990)Google Scholar
  15. 15.
    Liu, K., Feng, J., Young, S.S.: PowerMV: A Software Environment for Molecular Viewing, Descriptor Generation, Data Analysis and Hit Evaluation. J. Chem. Inf. Model. 45, 515–522 (2005)CrossRefPubMedGoogle Scholar
  16. 16.
    Guha, R., Howard, M.T., Hutchison, G.R., Murray-Rust, P., Rzepa, H., Steinbeck, C., Wegner, J.K., Willighagen, E.: The Blue Obelisk – Interoperability in Chemical Informatics. J. Chem. Inf. Model. 46(3), 991–998 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Codd, E.F.: Recent Investigations into Relational Data Base Systems. IBM Research Report RJ1385 (April 23, 1974); republished in Proc. 1974 Congress, Stockholm, Sweden. North-Holland, New York (1974)Google Scholar
  18. 18.
    Blockeel, H., De Raedt, L.: Top-down induction of first order logical decision trees. Artificial Intelligence 101(1-2), 285–297 (1998)CrossRefGoogle Scholar
  19. 19.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann series in Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Amanda C. Schierz
    • 1
  • Ross D. King
    • 2
  1. 1.Software Systems Research GroupBournemouth UniversityPoole
  2. 2.Computational Biology Research GroupAberystwyth UniversityAberystwyth

Personalised recommendations