Analyzing molecular landscapes using random walks and information theory
KeywordsSearch Space Simulated Annealing Correlation Length Search Heuristic Mutation Operator
Search heuristics for the in silico discovery of drug candidates have recently received increased attention. Most of these heuristics such as Simulated Annealing and Evolutionary Algorithms gradually improve molecules by exploiting that the similarity in the molecular structure often relates to the similarity in the properties of molecules such as activity against a target. However, it often remains unproven whether such continuity assumptions actually hold. Generally speaking, there is a need to better understand and assess the properties of molecular search landscapes in order to design/choose appropriate optimization methods to search these spaces.
The theory of combinatorial landscape analysis aims to provide such analysis tools. However, many of the methods proposed in this field require the complete knowledge of the landscape and thus are inappropriate for analyzing the huge search spaces of chemical structures. If the size of the search space forbids enumeration, statistical landscape analysis methods are the only available tool.
Following the approach of Vassilev et al.  we propose to estimate landscape properties from random walks using the variation operator of the search heuristic. Once a search heuristic is built, these random walks can be generated with little extra effort. The precision of the obtained results scales with the number and length of the random walks available. Given the data from random walks the following analysis methods can be used: (1) Correlation Length Analysis which reveals the validity of the continuity assumption; (2) Information Complexity which reveals the structural diversity of the search landscape; (3) Multimodality measures which estimate the frequency of local optima for different neighbourhood radii, and finally (4) Neutrality measures which account for the size distribution of plateaus in the landscape. Each measure indicates difficulties for optimization routines encountered when optimizing the objective function.
We apply random-walk based landscape analysis for two search spaces in the context of de-novo drug design: Firstly, we study the properties of the search space induced by the mutation operators of the Molecule Evoluator™  with an activity model as an objective function. Secondly, we study the properties of a peptide design problem using the software MOE. Here a ligand that binds tightly to a 14-3-3 isoform is searched for.
This article is published under license to BioMed Central Ltd.