Journal of Computer-Aided Molecular Design

, Volume 30, Issue 5, pp 425–446 | Cite as

Ring system-based chemical graph generation for de novo molecular design



Generating chemical graphs in silico by combining building blocks is important and fundamental in virtual combinatorial chemistry. A premise in this area is that generated structures should be irredundant as well as exhaustive. In this study, we develop structure generation algorithms regarding combining ring systems as well as atom fragments. The proposed algorithms consist of three parts. First, chemical structures are generated through a canonical construction path. During structure generation, ring systems can be treated as reduced graphs having fewer vertices than those in the original ones. Second, diversified structures are generated by a simple rule-based generation algorithm. Third, the number of structures to be generated can be estimated with adequate accuracy without actual exhaustive generation. The proposed algorithms were implemented in structure generator Molgilla. As a practical application, Molgilla generated chemical structures mimicking rosiglitazone in terms of a two dimensional pharmacophore pattern. The strength of the algorithms lies in simplicity and flexibility. Therefore, they may be applied to various computer programs regarding structure generation by combining building blocks.


Ring systems Structure generator Inverse QSPR/QSAR De novo design 



The authors are grateful to G. Schneider and D. Reker at the Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, ETH Zurich. G. Schneider supported the authors by giving valuable advice for the improvement of our structure generation algorithms, particularly the descriptor calculation and how to generate feasible structures in a chemistry point of view. D. Reker and the authors have discussed how to develop diversity-oriented generation algorithms. The authors also acknowledge the support of the Core Research for Evolutionary Science and Technology (CREST) Project ‘Development of a knowledge-generating platform driven by big data in drug discovery through production processes’ of the Japan Science and Technology Agency (JST). T.M. is a JSPS Research Fellow.

Supplementary material

10822_2016_9916_MOESM1_ESM.pdf (175 kb)
Supplementary material 1 (PDF 175 kb)
10822_2016_9916_MOESM2_ESM.sdf (17.6 mb)
Supplementary material 2 (SDF 18046 kb)
10822_2016_9916_MOESM3_ESM.sdf (751 kb)
Supplementary material 3 (SDF 751 kb)
10822_2016_9916_MOESM4_ESM.sdf (35 kb)
Supplementary material 4 (SDF 34 kb)
10822_2016_9916_MOESM5_ESM.sdf (10.5 mb)
Supplementary material 5 (SDF 10723 kb)


  1. 1.
    Faulon J-L, Bender A (2010) Handbook of chemoinformatics algorithms. CRC Press, Boca RatonCrossRefGoogle Scholar
  2. 2.
    Pólya G, Read RC (1987) Combinatorial enumeration of groups, graphs, and chemical compounds. Springer, New YorkCrossRefGoogle Scholar
  3. 3.
    Balaban AT, Kennedy JW, Quintas L (1988) The number of alkanes having N carbons and a longest chain of length D: an application of a theorem of Polya. J Chem Educ 65:304–313CrossRefGoogle Scholar
  4. 4.
    Gugisch R, Kerber A, Laue R, Meringer M, Weidinger J (2000) MOLGEN-COMB, a software package for combinatorial chemistry. MATCH 41:189–203Google Scholar
  5. 5.
    Ruch E, Klein DJ (1983) Double cosets in chemistry and physics. Theor Chim Acta 63:447–472CrossRefGoogle Scholar
  6. 6.
    Lindsay RK, Buchanan BG, Feigenbaum EA, Lederberg J (1993) DENDRAL: a case study of the first expert system for scientific hypothesis formation. Artif Intell 61:209–261CrossRefGoogle Scholar
  7. 7.
    Sasaki S, Kudo Y (1985) Structure elucidation system using structural information from multisources: CHEMICS. J Chem Inf Comput Sci 25:252–257CrossRefGoogle Scholar
  8. 8.
    Funatsu K, Miyabayashi N, Sasaki S (1988) Further development of structure generation in the automated structure elucidation system CHEMICS. J Chem Inf Comput Sci 28:18–28CrossRefGoogle Scholar
  9. 9.
    Benecke C, Grüner T, Kerber A, Laue R, Wieland T (1997) MOLecular structure GENeration with MOLGEN, new features and future developments. Fresen J Anal Chem 359:23–32CrossRefGoogle Scholar
  10. 10.
    Benecke C, Grund R, Hohberger R, Kerber A, Laue R, Wieland T (1995) MOLGEN+, a generator of connectivity isomers and stereoisomers for molecular structure elucidation. Anal Chim Acta 314:141–147CrossRefGoogle Scholar
  11. 11.
    Grüner T, Laue R, Meringer M (1997) Algorithms for group actions: homomorphism principle and orderly generation applied to graphs. In: DIMACS Series in Discrete Mathematics and Theoretical Computer Science; American Mathematical Society, vol 28, pp 113–122Google Scholar
  12. 12.
    Faulon JL (1992) On using graph-equivalent classes for the structure elucidation of large molecules. J Chem Inf Comput Sci 32:338–348CrossRefGoogle Scholar
  13. 13.
    Kawashita N, Yamasaki H, Miyao T, Kawai K, Sakae Y, Ishikawa T, Mori K, Nakamura S, Kaneko H (2015) <Review> A mini-review on chemoinformatics approaches for drug discovery. J Comput Aided Chem 16:15–29CrossRefGoogle Scholar
  14. 14.
    Schneider G, Fechner U (2005) Computer-based de novo design of drug-like molecules. Nat Rev Drug Discov 4:649–663CrossRefGoogle Scholar
  15. 15.
    Schneider G, Neidhart W, Giller T, Schmid G (1999) “Scaffold-Hopping” by topological pharmacophore search: a contribution to virtual screening. Angew Chem Int Ed 38:2894–2896CrossRefGoogle Scholar
  16. 16.
    Lewell XQ, Judd DB, Watson SP, Hann MM (1998) RECAP-retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J Chem Inf Comput Sci 38:511–522CrossRefGoogle Scholar
  17. 17.
    Hartenfeller M, Zettl H, Walter M, Rupp M, Reisen F, Proschak E, Weggen S, Stark H, Schneider G (2012) DOGS: reaction-driven de novo design of bioactive compounds. PLoS Comput Biol 8:e1002380CrossRefGoogle Scholar
  18. 18.
    Lessel U, Wellenzohn B, Lilienthal M, Claussen H (2009) Searching fragment spaces with feature trees. J Chem Inf Model 49:270–279CrossRefGoogle Scholar
  19. 19.
    Rella M (2011) Software review of FTrees and FTrees-FS in pipeline pilot FTrees and FTrees-FS in pipeline pilot. BioSolveIT GmbH. An Der Zieglei 79, 53757 Sankt Augustin, Germany. See Web Site for Pricing Information. J Am Chem Soc, vol 133, pp 17101–17102
  20. 20.
    Shimizu M, Nagamochi H, Akutsu T (2011) Enumerating tree-like chemical graphs with given upper and lower bounds on path frequencies. BMC Bioinform 12:1–9CrossRefGoogle Scholar
  21. 21.
    Zhao Y, Hayashida M, Jindalertudomdee J, Nagamochi H, Akutsu T (2013) Breadth-first search approach to enumeration of tree-like chemical compounds. J Bioinform Comput Biol 11:1343007CrossRefGoogle Scholar
  22. 22.
    Nakano S, Uno T (2005) Generating colored trees. In: Kratsch D (ed) Graph-theoretic concepts in computer science Lecture notes in computer science, vol 3787. Springer, Berlin, pp 249–260CrossRefGoogle Scholar
  23. 23.
    Suzuki M, Nagamochi H, Akutsu T (2014) Efficient enumeration of monocyclic chemical graphs with given path frequencies. J Cheminform 6:31CrossRefGoogle Scholar
  24. 24.
    Akutsu T, Fukagawa D, Jansson J, Sadakane K (2012) Inferring a graph from path frequency. Discrete Appl Math 160:1416–1428CrossRefGoogle Scholar
  25. 25.
    McKay BD (1998) Isomorph-free exhaustive generation. J Algorithms 26:306–324CrossRefGoogle Scholar
  26. 26.
    Jaworska J, Nikolova-Jeliazkova N, Aldenberg T (2005) QSAR applicability domain estimation by projection of the training set descriptor space: a review. ATLA 33:445–459Google Scholar
  27. 27.
    Miyao T, Kaneko H, Funatsu K (2014) Ring-system-based exhaustive structure generation for inverse-QSPR/QSAR. Mol Inform 33:764–778Google Scholar
  28. 28.
    Bemis GW, Murcko MA (1996) The properties of known drugs. 1. Molecular frameworks. J Med Chem 39:2887–2893CrossRefGoogle Scholar
  29. 29.
    Wester MJ, Pollock SN, Coutsias EA, Allu TK, Muresan S, Oprea TI (2008) Scaffold topologies. 2. Analysis of chemical databases. J Chem Inf Model 48:1311–1324CrossRefGoogle Scholar
  30. 30.
    Fisanick W, Lipkus AH, Rusinko A (1994) Similarity searching on CAS registry substances. 2. 2D structural similarity. J Chem Inf Comput Sci 34:130–140CrossRefGoogle Scholar
  31. 31.
    Rarey M, Stahl M (2001) Similarity searching in large combinatorial chemistry spaces. J Comput Aided Mol Des 15:497–520CrossRefGoogle Scholar
  32. 32.
    McKay BD, Royle G F (1985) Constructing the cubic graphs on up to 20 vertices. Department of Mathematics, University of Western AustraliaGoogle Scholar
  33. 33.
    Fink T, Reymond JL (2007) Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discove. J Chem Inf Model 47:342–353CrossRefGoogle Scholar
  34. 34.
    Blum LC, Reymond J-L (2009) 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc 131:8732–8733CrossRefGoogle Scholar
  35. 35.
    Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L (2012) Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inf Model 52:2864–2875CrossRefGoogle Scholar
  36. 36.
    Miyao T, Arakawa M, Funatsu K (2010) Exhaustive structure generation for inverse-QSPR/QSAR. Mol Inform 29:111–125CrossRefGoogle Scholar
  37. 37.
    Faulon JL (1996) Stochastic generator of chemical structure. 2. Using simulated annealing to search the space of constitutional isomers. J Chem Inf Comput Sci 36:731–740CrossRefGoogle Scholar
  38. 38.
    Virshup AM, Contreras-García J, Wipf P, Yang W, Beratan DN (2013) Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. J Am Chem Soc 135:7296–7303CrossRefGoogle Scholar
  39. 39.
    Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Krüger FA, Light Y, Mak L, McGlinchey S, Nowotka M, Papadatos G, Santos R, Overington JP (2014) The ChEMBL bioactivity database: an update. Nucleic Acids Res 42:1083–1090CrossRefGoogle Scholar
  40. 40.
    Landrum G RDKit (2016) Open-source cheminformatics Accessed 12 Mar 2016
  41. 41.
    Berthold MR, Cebron N, Dill F, Gabriel TR, Koetter T, Meinl T, Ohl P, Sieb C, Thiel K, Wiswedel B (2008) KNIME: the Konstanz information miner. In: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R (eds) Data analysis, machine learning and applications. Springer, Berlin, pp 319–326CrossRefGoogle Scholar
  42. 42.
    Taylor RD, MacCoss M, Lawson ADG (2014) Rings in drugs. J Med Chem 57:5845–5859CrossRefGoogle Scholar
  43. 43.
    Arakawa M, Yamada Y, Funatsu K (2005) Development of the computer software. J Comput Aided Chem 6:90–96CrossRefGoogle Scholar
  44. 44.
    Chemish: Chemometorics Software (2016) Accessed 12 Mar 2016
  45. 45.
    Rishton GM (1997) Reactive compounds and in vitro false positives in HTS. Drug Discov Today 2:382–384CrossRefGoogle Scholar
  46. 46.
    Rishton GM (2003) Nonleadlikeness and leadlikeness in biochemical screening. Drug Discov Today 8:86–96CrossRefGoogle Scholar
  47. 47.
    Pavlov D, Rybalkin M, Karulin B, Kozhevnikov M, Savelyev A, Churinov A (2011) Indigo: universal cheminformatics API. J Cheminform 3:4CrossRefGoogle Scholar
  48. 48.
    Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 42:1273–1280CrossRefGoogle Scholar
  49. 49.
    Ashton M, Barnard J, Casset F, Charlton M, Downs G, Gorse D, Holliday J, Lahana R, Willett P (2002) Identification of diverse database subsets using property-based and fragment-based molecular descriptions. Quant Struct Act Rel 21:598–604CrossRefGoogle Scholar
  50. 50.
    Rizos CV, Elisaf MS, Mikhailidis DP, Liberopoulos EN (2009) How safe is the use of thiazolidinediones in clinical practice? Expert Opin Drug Saf 8:15–32CrossRefGoogle Scholar
  51. 51.
    Miyao T, Kaneko H, Funatsu K (2016) Ring-system-based chemical structure enumeration for de novo design. Yakugaku Zasshi 136:101–106CrossRefGoogle Scholar
  52. 52.
    Miyao T, Kaneko H, Funatsu K (2016) Inverse QSPR/QSAR analysis for chemical structure generation (from Y to X). J Chem Inf Model 56:286–299CrossRefGoogle Scholar
  53. 53.
    Randic M (1975) Characterization of molecular branching. J Am Chem Soc 97:6609–6615CrossRefGoogle Scholar
  54. 54.
    Reutlinger M, Koch CP, Reker D, Todoroff N, Schneider P, Rodrigues T, Schneider G (2013) Chemically advanced template search (CATS) for scaffold-hopping and prospective target prediction for “Orphan” molecules. Mol Inform 32:133–138CrossRefGoogle Scholar
  55. 55.
    Baell JB, Holloway GA (2010) New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem 53:2719–2740CrossRefGoogle Scholar
  56. 56.
    Allu TK, Oprea TI (2005) Rapid evaluation of synthetic and molecular complexity for in silico chemistry. J Chem Inf Model 45:1237–1243CrossRefGoogle Scholar
  57. 57.
    Funatsu K, Sasaki S (1988) Computer-assisted organic synthesis design and reaction prediction system, “AIPHOS”. Tetrahedron Comput Methodol 1:27–37CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Tomoyuki Miyao
    • 1
  • Hiromasa Kaneko
    • 1
  • Kimito Funatsu
    • 1
  1. 1.Department of Chemical System EngineeringThe University of TokyoBunkyo-kuJapan

Personalised recommendations