The nature of ligand efficiency
KeywordsDrug design Fragment-based lead discovery Group efficiency Ligand efficiency Maximal binding affinity Molecular interactions Molecular recognition Property-based design Structure–activity relationship Thermodynamics
fragment-based lead discovery
half maximal inhibitory concentration
estimate for maximal ligand efficiency as function of NnH
ligand lipophilicity efficiency or lipophilic ligand efficiency
base 10 logarithm of octanol/water distribution coefficient
base 10 logarithm of octanol/water partition coefficient
molecular size efficiency
number of non-hydrogen atoms in a molecular structure
octanol/water partition coefficient
experimentally measured pKD
value of pKD predicted by model
rule of 3
rule of 5
target interaction potential
ligand efficiency calculated from standard free energy of binding
standard free energy of binding
change in number of chemical species
ligand efficiency calculated from logarithmically expressed KD without energy units
“I know I could,” the VP of Discovery responded tartly. “But what do you think you’re here for? I could order my own consumables, too, but that’s Milo’s job. Your job is to lead us in prayers, and from now on you’re going to lead us in a prayer for more ligand efficiency in every project. Is that clear? I think more ligand efficiency is something really worth praying for”.
Adapted from Joseph Heller, Catch 22
Ligand efficiency (LE) is, in essence, a good concept that is poorly served by a bad metric. It was introduced  as “a useful metric for lead selection”, has been discussed at length in reviews [2, 3, 4, 5, 6] and is routinely tracked in drug discovery projects. There are actually two ligand efficiencies in drug discovery and these can be seen as different manifestations of what might be called molecular size efficiency (MSE). First, the LE concept, sometimes summarized as ‘bang for buck’, which can be expressed in terms of the sensitivity of affinity to an increase in molecular size. Second, the compound-level LE metric (more accurately, family of metrics) that was introduced  with a view to normalizing affinity of compounds with respect to molecular size. While the LE concept has a solid basis, the LE metric cannot be regarded as physically meaningful because perception of efficiency varies with the concentration unit in which affinity is expressed [7, 8]. The difficulty stems from the inability of the logarithm function to take a dimensioned  argument which means that it is necessary to scale a KD value by an arbitrary concentration unit to enable its logarithm to be calculated.
Drug design is incremental in nature. This reflects a view [10, 11, 12] that it is easier to understand and predict differences in chemical behavior between structurally-related compounds than it is to make absolute predictions directly from molecular structure. Drug action is driven by concentration and affinity determines sensitivity of response to this driving force. Drug design is a multi-objective  endeavor and some objectives, such as maximization of affinity against the therapeutic target(s) and minimization of affinity against anti-targets, can be defined clearly. Other objectives, such as controllability of exposure are much more difficult to define and this means that drug design is typically indirect. One significant difficulty  in drug design is that unbound intracellular concentration [15, 16] cannot generally be measured for drugs in vivo.
Most chemical starting points for design lack the affinity required to function as drugs and optimization typically results in increased lipophilicity, molecular size and molecular complexity [17, 18]. This is the essence of lead-likeness . The Rule of 5 (Ro5)  highlights excessive molecular size and lipophilicity as primary design risk factors. Risks associated with molecular complexity  are more likely to be encountered in the screening phase of a project. Molecular complexity can also be seen inversely as the degree to which a compound is structurally prototypical [21, 22] (e.g., minimally substituted) and might also be defined in terms of molecular shape  or roughness [24, 25] of the molecular surface. Molecular recognition  provides much of the conceptual framework for drug design and many medicinal chemists consider molecular interactions  when elaborating chemical start points. While a structure–activity relationship (SAR) can point to the importance of individual interactions, the contribution of a protein–ligand contact to affinity is not, in general, an experimental observable [8, 28].
In property-based design [29, 30], risks associated with structural elaboration, such as poor oral absorption, are assessed according to physicochemical criteria. Within this framework, the most efficient optimization paths are those for which the necessary potency gains are accompanied by the smallest increases in perceived risk. One general objective of optimization projects has been stated  as “ensuring that any additional molecular weight and lipophilicity also produces an acceptable increase in affinity”. Efficiency can be seen as sensitivity of affinity to increased risk and this is the basis of what might be termed the LE concept. Kuntz et al.  examined the response of maximal affinity to number of non-hydrogen atoms and Hajduk  noted that “along the path of ideal optimization, an increase of 1 pKD unit can be expected for every 64 mass units”. Saxty et al.  defined group efficiency (GE) for substitutions by scaling the change in affinity resulting from addition of a substituent by the number of non-hydrogen atoms added. The idea of quantifying sensitivity of chemical behavior to changes in molecular structure can be traced to the work of Hammett [35, 36] and the activity cliff [37, 38] concept can be seen as part of the same general framework.
Although a quantity derived by scaling ΔG° by a risk factor does not have physical significance, offsetting affinity by a risk factor may give a physically meaningful quantity . Provided that ligand ionization is insignificant, ligand lipophilicity efficiency (LLE) , which is also known as lipophilic ligand efficiency (LLE)  and lipophilic efficiency (LipE) , can be interpreted as the ease of transfer of a ligand from 1-octanol to its binding site . Furthermore, some of the limitations of the 1-octanol/water partitioning system become less significant when working within structural series, as is usually the case for lead optimization . While physical interpretability is certainly a desirable feature for a drug design metric, this alone does not guarantee that a metric will be usefully predictive in drug design.
The principal objectives of this study are to provide an in-depth analysis of LE (and its variants) and to highlight ways in which consideration of LE as a concept might address the serious deficiencies of the compound-level metric. LE is discussed in terms of molecular interactions and binding thermodynamics and some of this discussion is likely to be generally relevant to drug design. A recurring theme in this study is a view that it is generally better to observe the response of affinity to molecular size directly rather than through the distorting lens of a flawed LE metric.
Molecular size and design risk
It is important that drug discovery scientists be fully aware of the assumptions on which the LE metric is based and that they carefully consider their motivation for using LE (or indeed any design guidelines). Property-based design [29, 30] can be seen in terms of balancing the risk associated with poor physicochemical characteristics against the risk of not being able to achieve the necessary level of affinity. Ro5  is based on analysis of property distributions of drugs (defined as compounds that had progressed into Phase 2 trials) and the assessment of risk is indirect because non-drugs were not included in the original analysis. Ro5  neither takes account of correlations between risk factors nor does it provide a means to deconvolute the risks associated with excessive molecular size and lipophilicity. The LE metric can be seen as a simple means with which to balance risk and there are more rigorous and sophisticated ways for doing this . Simple drug design guidelines based on molecular size and/or lipophilicity typically become progressively less useful as more measured data become available to the drug discovery team.
Drug design guidelines are typically based on trends observed in data and the strengths of these trends indicate how rigidly guidelines should be adhered to. While excessive molecular size and lipophilicity are widely accepted as primary risk factors in design, it is unclear how directly predictive they are of more tangible risks such as poor oral absorption, inadequate intracellular exposure and rapid turnover by metabolic enzymes. This is an important consideration because the strength of the rationale for using LE depends on the degree to which molecular size is predictive of risk. Drug discovery scientists need to be wary of correlation inflation  (the term voodoo correlation  is also used) which can be loosely defined as presentation or analysis of data in any way that makes trends appear to be stronger than they actually are. Correlation inflation is a particular concern when analysis of proprietary data is presented in support of a view that a set of guidelines is especially useful or predictive. Published analyses of relationships [41, 46] between pharmacological promiscuity and molecular size and lipophilicity exemplify the problem. Comparison of average values without taking account of variance is one way in which trends can be made to appear to be stronger than they actually are and correlation inflation is acknowledged [47, 48] as an issue in drug design. Variance in the dependent variable can also be hidden by representing a distribution (e.g., aqueous solubility of compounds with property forecast index in the range 6–7) by a single percentile (e.g., percentage of those compounds with aqueous solubility > 200 µM) .
The relevance of data must also be considered when using physicochemical characteristics such as molecular size to assess risk. For example, an activity threshold  of > 30% inhibition at 10 µM for promiscuity analysis is not especially relevant if considering the likelihood of off-target effects for a drug with a peak unbound plasma concentration of 100 nM. Sample bias can be significant, even in large datasets, as exemplified by divergent conclusions of two apparently similar studies [41, 46] with respect to the relationship between pharmacological promiscuity and molecular size. The observation that average molecular weight appears to decrease  with promiscuity is particularly relevant to the use of LE because promiscuity would generally be considered  to be an undesirable characteristic for a compound. Drug designers should not automatically assume that conclusions drawn from analysis of large, structurally-diverse data sets are necessarily relevant to the specific drug design projects on which they are working.
Thermodynamic aspects of ligand–protein association
The LE metric  was introduced in thermodynamic terms and it is sometimes believed that it measures the degree to which molecular interactions between ligand and target are optimal. For example, it has been asserted  “Because of these optimal interactions, fragments are very ‘atom efficient’ binders, demonstrated by high ligand efficiency”. This section focuses on thermodynamic [7, 39] aspects of protein–ligand association most relevant to LE and to the interpretation of affinity in terms of molecular interactions [26, 27].
It has been asserted  that “Ligand efficiency can be recast as a special case of group additivity where ΔG/HA is the group equivalent” but this does not properly account for the stoichiometry of the binding. Unlike in Eq. (9), there is no is A0 term when ΔG° (ΔN = − 1) is decomposed into a sum of NnH equal atom-based terms and this leads to significant difficulties. Specifically, each term in the sum must have an identical dependence on C° while the sum of terms needs to reproduce the dependence of ΔG° on C°. While this can be achieved algebraically by assigning a fractional stoichiometry to each atom-based term, the physical meaning of the resulting atom-based terms remains obscure. For example, the numerical values that result from dividing ΔG° values of 5 kcal/mol and 10 kcal/mol by 10 and 20 respectively are identical. However, the two quantities cannot be equated because they differ in their dependence on C°. In contrast, the enthalpy of binding, ΔH, does not depend on C° and so enthalpic efficiency  can be defined unambiguously.
The need to properly account for stoichiometry is one reason that the contribution of an intermolecular contact (or a substructure) to affinity is not an experimental observable  although this appears to be the case even when stoichiometry is not an issue . Some of the entropy of binding results from molecular interactions (e.g., between water molecules) that are non-local with respect to protein–ligand contacts. Some contributions to binding enthalpy, such as the enthalpic penalties associated with ligand and target adopting their bound conformations are also inherently non-local. A less obvious example of a non-local effect would be substitution at one position of a molecular structure preventing a substituent at another position from forming optimal interactions with the target. When interpreting binding thermodynamics in terms of molecular interactions, it should always be kept in mind that intermolecular contacts (e.g., between unbound ligand and solvent) that are not present in the protein–ligand complex also influence ΔH and ΔS°.
Target interaction potential (TIP) can be a helpful concept when considering association of ligands with their targets. TIP takes account of both the nature of the interactions (e.g., hydrogen bonds) and the fact that ligand-target association takes place in an aqueous environment. Hotspots  on the molecular surface of a target can be seen as regions of high TIP while ligandability  is determined both by the magnitude of TIP and the extent to which it can be exploited. An ability to reversibly form covalent bonds (e.g., catalytic cysteine thiol or a protein-bound metal cation such as zinc) with ligands would generally be associated with high TIP as would depletion  of water from a binding pocket or the “frustrated” hydration  resulting from the overlap of solvation spheres of adjacent hydrogen bond donors (or acceptors). A key challenge in drug design is to determine whether inadequate affinity is due to low TIP (i.e., target is the problem) or underexploited TIP (i.e., compound is the problem).
Perception of efficiency varies with concentration unit
Although the implications for LE of the dependence of ΔG° on C° were first highlighted by Zhou and Gilson , they have been overlooked by LE advocates. For example, Murray et al.  discussed the validity of LE but demonstrated no awareness of the relevance of the dependence of ΔG° on C° or the fact that the logarithm function cannot take dimensioned arguments . A Future Medicinal Chemistry editorial  claimed that “Ligand efficiency validated fragment-based design” while reassuring its readers that “There is no need to become overly concerned with noisy arguments for or against ligand efficiency metrics being exchanged in the literature.” However, this editorial  neither makes reference to criticism  of LE made in 2009 nor does it address the implications  of the nontrivial dependence of the metric on what is an entirely arbitrary concentration unit.
Dependence of ligand efficiency on C°
N nH b
C° = 0.1 M
C° = 1 M
C° = 10 M
The change in perception of efficiency that results from a change in C° shows that neither ηbind nor Δg has thermodynamic significance. A necessary, but not sufficient, condition for validity of thermodynamic analysis is that conclusions drawn from the analysis cannot depend on the choice of C°. Although C° is an integral component of the framework of solution thermodynamics, it can also be seen simply as a unit used to express affinity so that a logarithm can be calculated for KD. A physical quantity that is expressed in different units is still the same quantity. If perception changes when a quantity is expressed using a different unit then neither the change in perception nor the quantity itself can be regarded as physically meaningful. Provided that C° is known, − log(KD/C°) is physically meaningful and the effect of a change in C° is both constant and calculable. In contrast, knowing values of ηbind and C° does not allow us to calculate the ηbind value corresponding to another value of C°. Furthermore, the results presented in Table 1 show that a single ηbind value can transform to more than one ηbind value in response to a change in C°.
The change in perception resulting from a change of unit raises the question of whether or not LE can accurately be described as a metric. The defining characteristic of a metric is that it measures and it is necessary to state clearly what a quantity measures if claiming that the quantity is a metric. While units are essential for measurement, a valid and credible framework for measurement must allow for quantities to be expressed in different units (e.g. µM and nM). For example, readers might consider their likely responses to a hypothetical report that the space group for a crystal structure differed according to whether unit cell parameters were expressed in Ångstrom or in nanometer units. There are two reasons that LE should not be considered to be a metric. First, it is not clear what LE measures since neither the extent to which molecular interactions are optimal nor interaction quality are experimental observables. Second, LE has a unit (1 M) built into it and perception of efficiency is altered (Table 1) when another concentration unit is used. It would actually be more accurate to describe LE as a simple predictor of potency in cell-based assays or of in vivo activity that, like property forecast index , has neither been optimized nor validated for prediction.
LE is used to specify affinity cutoffs as a function of molecular size and a Δg value of 0.3 kcal/mol per non-hydrogen atom has been suggested . Specification of affinity cutoffs in this manner forces the line defining acceptable affinity to intersect the affinity axis at a point corresponding to a KD value of 1 M. This causes considerable difficulties when the range in NnH is large as is the case for beyond rule of 5 (bRo5)  compounds. The minimum Δg value of 0.12 kcal/mol per non-hydrogen atom recommended  for bRo5 compounds can be translated (C° = 1 M; T = 300 K) to pKD values corresponding to the lower (700 Da; NnH ≈ 50) and upper (3000 Da; NnH ≈ 214) limits for bRo5 space. The lower (pKD = 4.4) of these two values would not appear to be a useful design criterion while the higher value (pKD = 18.7) would not generally be measurable. In general, affinity acceptability thresholds should be specified directly and LE should only be used for this purpose if supported by the data.
LE was introduced  with the claim that it was useful but it is rarely, if ever, shown to be predictive of pharmaceutically-relevant behavior. As such, the utility of LE as a design metric hinges on it being meaningful and this places a burden of proof on those who advocate the use of LE to demonstrate that their choice of unit is universally appropriate. The importance of physicochemical properties is widely accepted in drug design and many medicinal chemists would regard it as routine to monitor progress in projects by plotting potency against molecular size or lipophilicity. A critique of LE metrics actually emphasized the importance of modeling relationships between affinity and risk factors for compounds of interest . However, a depiction  of an optimization path for a project that has achieved a satisfactory endpoint is not direct evidence that consideration of molecular size or lipophilicity made a significant contribution toward achieving that endpoint. Furthermore, explicit consideration of lipophilicity and molecular size in design does not mean that efficiency metrics were actually used for this purpose. Design decisions in lead optimization are typically supported by assays for a range of properties such as solubility, permeability, metabolic stability and off-target activity as well as pharmacokinetic studies. This makes it difficult to assess the extent to which efficiency metrics have actually been used to make decisions in specific projects, especially given the proprietary nature of much project-related data.
Ligand efficiency and fragment-based design
LE features prominently in the literature of fragment-based lead discovery (FBLD) [64, 65, 66, 67, 68, 69] to the extent that it is sometimes presented as an important rationale for screening fragments. For example, it has been claimed  that “fragment hits typically possess high ‘ligand efficiency’ (binding affinity per heavy atom) and so are highly suitable for optimization into clinical candidates with good drug-like properties”. It has been asserted  that “fragment hits form high-quality interactions with the target” although it is not clear if interaction quality involves aesthetic aspects in addition to the physical forces more usually associated with molecular recognition [26, 27]. I would argue that the rationale for screening fragments against targets of interest is actually based on two conjectures. First, chemical space can be covered most effectively by fragments because compounds of low molecular complexity [18, 21, 22] allow TIP to be explored [70, 71, 72, 73, 74] more efficiently and accurately. Second, a fragment that has been observed to bind to a target may be a better starting point for design than a higher affinity ligand whose greater molecular complexity prevents it from presenting molecular recognition elements to the target in an optimal manner. While proving either conjecture definitively is difficult, the success  of fragment-based approaches indicates that the underlying assumptions are reasonable.
Dependence on C° of mean changes in ligand efficiency for F2L programs surveyed in 
Mean Δη bind b
SE Δη bind c
Prob > |t|d
An observation that can be made about the Johnson et al.  analysis is that the start points for 15 of the 28 F2L projects surveyed do not appear to comply with the rule of 3 (Ro3)  if Ro5  hydrogen bond definitions are used. This would appear to contradict the claim  that “Most libraries consist of molecules that adhere to the ‘rule of three’.” It has been suggested [22, 79] that Ro3  may be overly restrictive and applying the rule would eliminate carboxylic acid bioisosteres  such as tetrazole  and N-acylsulfonamide  as well as the isocytosine fragment hit  that led to the discovery of potent β-secretase inhibitors . All lead compounds surveyed in Johnson et al.  are of greater molecular size than the corresponding fragments but this is not the case for lipophilicity. Calculated logP values for five of the leads were lower than for the fragment hits from which they were derived, suggesting that a logP cutoff value of 3 may be overly restrictive for design of compound libraries for FBLD. The Ro5  cutoff values for molecular weight (500 Da) and logP (5) were directly derived from the relevant data since they correspond to specific percentiles in the distributions observed for these quantities. However, it should not automatically be assumed that there is an analogous correspondence between the Ro3  cutoff values for molecular weight (300 Da) and logP (3).
Saxty et al.  reported a GE value of 1.5 kcal/mol per non-hydrogen atom for the structural prototype 1. The substructural transformation leading to 1 poses special difficulties for ΔΔG calculation since this requires that an affinity value be assigned to a species of zero molecular size. The ΔΔG value for this transformation was derived  by subtracting an estimate for rigid body entropy (ΔGrigid = 4.2 kcal/mol) lost on binding from the ΔG° value for 1. The large GE value calculated for 1 is presented  as evidence that the interactions of the pyrazole substructure with PKB make a particularly large contribution to affinity. One interpretation of the analysis presented in Saxty et al.  appears to be that to be that the molecular interactions of the pyrazole substructure of 6 are assigned full credit for overcoming the penalties resulting from loss of translational and rotational entropy. This interpretation appears to be based on two assumptions. First, 1 and 6 lose identical amounts of rigid body entropy when they bind to PKB. Second, the pyrazole substructure makes identical interactions with the protein when 1 and 6 bind to PKB.
In SAR analysis, it would not be considered generally feasible to infer the importance of a substructure as a determinant of affinity using only measurements for compounds in which the substructure is conserved. Calculation of GE for 1 appears to require that a value of KD be assigned to a species of zero molecular size. The value of GE derived in this manner is determined just as much by the affinity assumed for the zero molecular size species as by the affinity that is actually measured for the compound. The ΔG° values for 7 (− 5.9 kcal/mol) and 6 (− 10.6 kcal/mol) can also be used to derive a GE value of (0.9 kcal/mol per non-hydrogen atom) for the addition of the pyrazole to 6. It is unclear why the GE value of 1.5 kcal/mol per non-hydrogen atom is preferred to the value of 0.9 kcal/mol per non-hydrogen atom that can be derived from the ΔG° values measured for 6 and 7.
The F2L optimization reported by Saxty et al.  is essentially a sequence of substitutions and ΔΔG values can be associated with structural modifications in a consistent manner. Drug design frequently consists of optimization of groups at two or more substitution sites on a scaffold and non-additive [87, 88, 89, 90, 91, 92] SAR needs to be considered. Free and Wilson  were fully aware of the problems that can result from non-additive SAR and it is not possible to assign ΔΔG values (and therefore GE values) in a consistent manner to individual structural modifications if SAR is non-additive. Subadditive SAR should be should be anticipated whenever there is a high degree of constraint in the system and might be considered to be a natural consequence of high molecular complexity . Structural features likely to constrain ligand-target binding include conformational rigidity and multiple hydrogen bonds between ligand and target.
Maximal affinity of ligands and fit quality
Kuntz et al.  explored the limits that protein structure may impose on affinity and it is also is widely regarded to have introduced the LE concept. Kuntz et al.  used affinity measurements against multiple targets and medicinal chemists should not automatically assume that this study is directly relevant to specific targets on which they may be working. Put another way, if a micromolar activity against a target of interest has been observed for a compound, how useful is it to know that another compound of comparable molecular size has shown picomolar affinity against another target? Sample size is an important consideration in studies of biophysical limits of affinity since the observation of maximal affinity can be regarded as a relatively rare event. How many observations of affinity against how many targets would one need to make for compounds with 20 non-hydrogen atoms in order to have a 95% chance of observing affinity within 1.4 kcal/mol of a theoretical affinity limit for compounds of this molecular size?
While the response of ΔG° to NnH (-0.44 kcal/mol per non-hydrogen atom) shown in Fig. 3b is much less steep than the initial response of ΔΔGbinding to NnH (− 1.5 kcal/mol per non-hydrogen atom) reported by Kuntz et al. , its linearity appears to be maintained over the entire molecular size range (5 ≤ NnH ≤ 21) for the data. The findings from Kuntz et al.  do not appear to provide insights that would be useful for the interpretation of the results shown in Fig. 3b and this reflects the fact that affinity measured against multiple targets was used for the analysis in that study. While the ΔΔGbinding values used in Kuntz et al.  do not depend on C°, the line describing the initial response of ΔΔGbinding to NnH was constrained to pass through the point (NnH = 0; ΔΔGbinding = 0). Shultz noted  that imposition of this constraint (equivalent to assuming that KD = 1 M for zero molecular size) is likely to have biased the estimate for the steepness of the initial response and others  subsequently made similar observations.
If we ignore simple cations and anions, the data show a sharp improvement in binding free energy until ≈ 15 heavy atoms per molecule. The ΔΔGbinding binding of the tightest-binding ligands then plateaus at ≈ − 15 kcal/mol (i.e., picomolar dissociation constants). The initial slope is approximately − 1.5 kcal/mol per atom.
The response of maximal affinity to molecular size shown in Kuntz et al.  might be anticipated from consideration of molecular complexity  and it provides support for the view that additivity [87, 88, 89, 90, 91, 92] in SAR decreases with molecular size. Although the choice of intercept in Kuntz et al.  has been criticized [8, 94], the response of maximal affinity to molecular size was modeled directly in the study. In contrast, Reynolds et al.  modeled the response of maximal LE to molecular size. Reynolds et al.  asserted that “ligand efficiency is dependent on ligand size with smaller ligands having greater efficiencies, on average, than larger ligands” and Murray et al.  repeated this assertion. As shown in Fig. 1b, the apparently greater efficiency of smaller ligands can reflect the choice of unit used to express affinity and, therefore, should be not interpreted as having any special significance.
Reynolds et al.  used fit quality (FQ) to normalize LE with respect to molecular size and claimed that “the fit quality score provides a simple method for directly measuring how optimally a ligand binds relative to other ligands of any size”. However, the results presented in Table 1 show that it is not valid to claim that LE measures how optimally a ligand binds, even to a single protein, since rankings of compounds can vary with the concentration unit in which KD is expressed. Given that the degree to which a ligand binds optimally has not been shown to be an experimental observable, it would not be valid to make a claim for direct measurement even if perception of efficiency was independent of C°. FQ was introduced to address a perceived deficiency of LE and it has been stated  that “LE can break down when comparing ligands of disparate size (LLE, FQ and size independent ligand efficiency [SILE] are better)”.
The calculation of FQ involves first deriving the LE_Scale function by modelling the maximal LE as a function of NnH to provide a reference for scaling LE values . FQ is defined as the ratio of LE to LE_Scale which means that it is simply a ratio of ΔG° values and therefore dependent on C°. This is a separate issue from the dependence of LE on C° since the comparison between LE and LE_Scale is made using the same value of NnH. Although it should be possible to address the problems associated with using ΔG° ratios by using ΔΔG, there remains the issue that affinity values used for the calculation of LE and LE_scale do not generally correspond to the same protein. This means that a low value of FQ could just as plausibly be explained by low TIP of the target as by suboptimal interactions with the target.
The analysis presented in Reynolds et al.  can also be criticized from a general cheminformatic perspective. While the dependence of maximal binding affinity on molecular size may be of interest to drug discovery scientists, there are a number of reasons why this relationship would be better modelled directly with ΔG° (or pKD) as the dependent variable and NnH as the independent variable. First, using affinity as the dependent variable means that there are none of the difficulties caused by the dependence of LE on C° since a change in C° simply shifts affinity by a constant amount that is independent of molecular size (see Fig. 1a). Second, it is not generally possible to assess quality of fit in a meaningful manner when fitting a quantity (e.g., pKD/NnH) that depends explicitly on the independent variable (e.g., NnH). This is because, to some extent, the modelling process involves fitting the independent variable to itself. Third, scaling affinity by molecular size also scales the uncertainty in the affinity by molecular size and this needs to be properly accounted for when performing regression analysis. Sheridan has debunked the suggestion that LE is inherently more predictable than affinity .
Alternatives to ligand efficiency for normalization of affinity
Despite the criticisms made of the LE metric and its variants, the view that the best compounds punch above their weight is still valid. While it does not appear possible to define LE objectively in an absolute sense, the Hajduk  and Saxty et al.  studies showed that efficiency can be defined in relative terms. With appropriate data analysis, it might prove possible to establish a particular value of (ΔpKD/ΔNnH) as indicative that two compounds bind with equal efficiency.
LE was introduced  as a means to normalize affinity with respect to molecular size and this raises the question of whether or not meaningful normalization can be achieved without having to assume a particular value of C°. Although GE does not vary with C°, this metric is associated with structural transformations, rather than compounds, and so cannot be used to normalize affinity of compounds. To describe data as normalized would generally imply that some preliminary analysis has been performed on the data. For example, one might subtract the mean molecular weight for the fragments in a screening library from the molecular weight of each fragment. Mean-centering data in this manner makes it possible to determine at a glance whether or not a fragment in the library is larger than average.
Results of fitting linear model to data from Ref. 
− ΔG°/(kcal/mol) = A0 + (A1 × NnH)
SE A 0 b
SE A 1 c
Once it has been established that the pyrazole substructure is important for affinity, the non-pyrazole 7 can be excluded from the dataset to enable affinity to be normalized for the pyrazoles (Fig. 5b). The results in Table 3 show a very strong relationship (R2 = 0.98; RMSE = 0.42 kcal/mol) between ΔG° and NnH and practically all the variation in ΔG° for these compounds can be explained by differences in molecular size. The two residuals of greatest magnitude correspond to 5 (− 0.6 kcal/mol) and 6 (+ 0.5 kcal/mol) and these values reflect the large GE value of 1.6 kcal/mol per non-hydrogen atom reported for the [5 → 6] transformation. A significant portion (≈ − 0.4 kcal/mol) of the residual for 5 can probably be explained by its racemic nature. Had the more active enantiomer of 5 been used in the analysis, the GE value for the chloro-substitution might have been 1.2 kcal/mol per non-hydrogen atom rather than 1.6 kcal/mol per non-hydrogen atom as reported in Saxty et al. . Both the residual for 6 and the GE value for the [5 → 6] transformation highlight the importance of the chloro substituent as a molecular recognition element in this system. Given that the pyrazole ring is present in all structures, it is not possible to draw any inference about the contribution of this molecular recognition element to affinity although the excellent linear fit to the data shown in Fig. 5b is consistent with a view that structural elaboration did not compromise the hydrogen bonding between pyrazole and the hinge region of PKB.
Analyzing affinity data in this manner effectively partitions MSE for a compound into a term that characterizes the steepness of response of affinity to molecular size for the particular selection of compounds and a residual term that quantifies the extent to which the affinity of a compound beats (or is beaten by) the trend in the data. The residuals are invariant with respect to change in C° so there is no change in perception if affinity is expressed using a different concentration unit. Although residuals cannot be used to define efficiency in an absolute sense, compounds can still be ranked and there is no requirement, as is the case for analysis based on GE [34, 86], that the compounds be structurally related. Affinity can be normalized with respect to other risk factors (e.g., lipophilicity) using residuals and other properties (e.g., aqueous solubility) can be normalized in an analogous manner. When using residuals for normalization of affinity, there is no requirement that the model be either linear or univariate. This means that affinity can be normalized with respect to more than one risk factor in a single analysis.
Drug discovery scientists typically need be able to address a range of questions when interrogating project data. For example, it may be useful to focus analysis on the most active compounds in an optimization project. It is important to stress that residuals are not generated in isolation and they result from analysis that, arguably, should be performed anyway. The line fit to a plot of affinity against molecular size is likely to be a better predictor of outcome than a line that has been artificially forced to intercept the affinity axis at a point corresponding to a KD value of 1 M . The strength of the trend also provides an indication of how useful normalization of the data is likely to be. For example, the observation of a very weak correlation between affinity and molecular size for hits from a fragment screen may suggest that molecular size need not be accounted for when assessing the fragment hits in question. In an optimization project, a relatively weak correlation between affinity and molecular size may point to SAR that is specific to the extent that it cannot be adequately explained by molecular size alone.
LE has been discussed in depth from a physicochemical perspective in this study and the difficulty of interpreting affinity in terms of molecular interactions was highlighted. The nontrivial dependency of LE on the concentration unit in which affinity is expressed means that LE has no physical significance and, strictly, should not even be considered to be a metric. As such, LE is unsuitable for ranking compounds, setting acceptability thresholds for affinity and modeling relationships between affinity and molecular size. While it does not appear to be possible to quantify absolute efficiency of binding for compounds in an objective manner, efficiency can still be defined in a relative manner by scaling affinity differences by the corresponding molecular size differences.
PK designed the study, performed the analysis and prepared the manuscript. The author read and approved the manuscript.
I thank Michael Gilson for helpful comments on the manuscript and the two anonymous reviewers for their comments.
Availability of data and materials
All data used in this study have been taken from the published literature.
The author declares that he has no competing interests.
There is no funding to report for this study.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 2.Abad-Zapatero C, Perisic O, Wass J, Bento PA, Overington J, Al-Lazikani B, Johnson ME (2010) Ligand efficiency indices for an effective mapping of chemico-biological space: the concept of an atlas-like representation. Drug Discov Today 15:804–811. https://doi.org/10.1016/j.drudis.2010.08.004 CrossRefPubMedGoogle Scholar
- 16.Gordon LJ, Allen M, Artursson P, Hann MM, Leavens BJ, Mateus A, Readshaw S, Valko K, Wayne GJ, West A (2016) Direct measurement of intracellular compound concentration by RapidFire mass spectrometry offers insights into cell permeability. J Biomol Screen 21:156–164. https://doi.org/10.1177/1087057115604141 CrossRefPubMedGoogle Scholar
- 19.Teague SJ, Davis AM, Leeson PD, Oprea T (1999) The design of leadlike combinatorial libraries. Angew Chem Int Ed Engl 38:3743–3748. https://doi.org/10.1002/(SICI)1521-3773(19991216)38:24%3c3743:AID-ANIE3743%3e3.0.CO;2-U CrossRefPubMedGoogle Scholar
- 21.Boehm HJ, Boehringer M, Bur D, Gmuender H, Huber W, Klaus W, Kostrewa D, Kuehne H, Luebbers T, Meunier-Keller N, Mueller F (2000) Novel inhibitors of DNA gyrase: 3D structure based biased needle screening, hit validation by biophysical methods, and 3D guided optimization. A promising alternative to random screening. J Med Chem 43:2664–2674. https://doi.org/10.1021/jm000017s CrossRefPubMedGoogle Scholar
- 23.Nicholls A, McGaughey GB, Sheridan RP, Good AC, Warren G, Mathieu M, Muchmore SW, Brown SP, Grant JA, Haigh JA, Nevins N, Jain AN, Kelley B (2010) Molecular shape and medicinal chemistry: a perspective. J Med Chem 53:3862–3886. https://doi.org/10.1021/jm900818s CrossRefPubMedPubMedCentralGoogle Scholar
- 24.Richards FM (1977) Areas, volumes, packing, and protein structure. Ann Rev Biophys Bioeng 6:151–176. https://doi.org/10.1146/annurev.bb.06.060177.001055 CrossRefGoogle Scholar
- 31.Keserű GM, Erlanson DA, Ferenczy GG, Hann MM, Murray CW, Pickett SD (2016) Design principles for fragment libraries: maximizing the value of learnings from pharma fragment based drug discovery (FBDD) programs for use in academia. J Med Chem 59:8189–8206. https://doi.org/10.1021/acs.jmedchem.6b00197 CrossRefPubMedGoogle Scholar
- 42.Ryckmans T, Edwards MP, Horne VA, Correia AM, Owen DR, Thompson LR, Tran I, Tutt MF, Young T (2009) Rapid assessment of a novel series of selective CB2 antagonists using parallel synthesis protocols: a lipophilic efficiency analysis. Bioorg Med Chem Lett 19:4406–4409. https://doi.org/10.1016/j.bmcl.2009.05.062 CrossRefPubMedGoogle Scholar
- 48.Oprea TI, Hasselgren C (2017) Predicting target and chemical druggability. In: Chackalamannil S, Rotella D, Ward S (eds) Comprehensive medicinal chemistry III. Elsevier, Amsterdam, pp 429–439. https://doi.org/10.1016/B978-0-12-409547-2.12342-X CrossRefGoogle Scholar
- 55.Ladbury JE (2007) Enthalpic efficiency and the role of thermodynamic data in drug development: possibility or a pipeline dream. Eur Pharm Rev 12:59–62Google Scholar
- 61.May PC, Dean RA, Lowe SL, Martenyi F, Sheehan SM, Boggs LN, Monk SA, Mathes BM, Mergott DJ, Watson BM, Stout SL, Timm DE, LaBell ES, Gonzales CR, Nakano M, Jhee SS, Yen M, Ereshefsky L, Lindstrom TD, Calligaro DO, Cocke PJ, Hall DG, Friedrich S, Citron M, Audia JE (2011) Robust central reduction of amyloid-β in humans with an orally available, non-peptidic β-secretase inhibitor. J Neurosci 31:16507–16516. https://doi.org/10.1523/JNEUROSCI.3647-11.2011 CrossRefPubMedGoogle Scholar
- 62.Czaplewski LG, Collins I, Boyd EA, Brown D, East SP, Gardiner M, Fletcher R, Haydon DJ, Henstock V, Ingram P, Jones C, Noula C, Kennison L, Rockley C, Rose V, Thomaides-Brears HB, Ure R, Whittaker M, Stokes NR (2009) Antibacterial alkoxybenzamide inhibitors of the essential bacterial cell division protein FtsZ. Bioorg Med Chem Lett 19:524–527. https://doi.org/10.1016/j.bmcl.2008.11.021 CrossRefPubMedGoogle Scholar
- 66.Albert JS, Blomberg N, Breeze AL, Brown AJ, Burrows JN, Edwards PD, Folmer RH, Geschwindner S, Griffen EJ, Kenny PW, Nowak T, Olsson LL, Sanganee H, Shapiro AB (2007) An integrated approach to fragment-based lead generation: philosophy, strategy and case studies from AstraZeneca’s drug discovery programmes. Curr Top Med Chem 7:1600–1629. https://doi.org/10.2174/156802607782341091 CrossRefPubMedGoogle Scholar
- 67.Hubbard RE, Murray JB (2011) Experiences in fragment-based lead discovery. Methods Enzymol 493:509–531. https://doi.org/10.1016/B978-0-12-381274-2.00020-0 CrossRefPubMedGoogle Scholar
- 79.Köster H, Craan T, Brass S, Herhaus C, Zentgraf M, Neumann L, Heine A, Klebe GA (2011) Small nonrule of 3 compatible fragment library provides high hit rate of endothiapepsin crystal structures with various fragment chemotypes. J Med Chem 54:7784–7796. https://doi.org/10.1021/jm200642w CrossRefPubMedGoogle Scholar
- 84.Edwards PD, Albert JS, Sylvester M, Aharony D, Andisik D, Callaghan O, Campbell JB, Carr RA, Chessari G, Congreve M, Frederickson M, Folmer RH, Geschwindner S, Koether G, Kolmodin K, Krumrine J, Mauger RC, Murray CW, Olsson LL, Patel S, Spear N, Tian G (2007) Application of fragment-based lead generation to the discovery of novel, cyclic amidine beta-secretase Inhibitors with nanomolar potency, cellular activity, and high ligand efficiency. J Med Chem 50:5912–5925. https://doi.org/10.1021/jm070829p CrossRefPubMedGoogle Scholar
- 85.Silva DG, Ribeiro JFR, De Vita D, Cianni L, Franco CH, Freitas-Junior LH, Moraes CB, Rocha JR, Burtoloso ACB, Kenny PW, Leitão A, Montanari CA (2017) A comparative study of warheads for design of cysteine protease inhibitors. Bioorg Med Chem Lett 27:5031–5035. https://doi.org/10.1016/j.bmcl.2017.10.002 CrossRefPubMedGoogle Scholar
- 87.Bridges AJ, Zhou H, Cody DR, Rewcastle GW, McMichael A, Showalter HD, Fry DW, Kraker AJ, Denny WA (1996) Tyrosine kinase inhibitors. 8. An unusually steep structure-activity relationship for analogues of 4-(3-bromoanilino)-6,7- dimethoxyquinazoline (PD 153035), a potent inhibitor of the epidermal growth factor receptor. J Med Chem 39:267–276. https://doi.org/10.1021/jm9503613 CrossRefPubMedGoogle Scholar
- 89.Baum B, Muley L, Smolinski M, Heine A, Hangauer D, Klebe G (2010) Non-additivity of functional group contributions in protein-ligand binding: a comprehensive study by crystallography and isothermal titration calorimetry. J Mol Biol 397:1042–1054. https://doi.org/10.1016/j.jmb.2010.02.007 CrossRefPubMedGoogle Scholar
- 93.Murray CW, Carr MG, Callaghan O, Chessari G, Congreve M, Cowan S, Coyle JE, Downham R, Figueroa E, Frederickson M, Graham B, McMenamin R, O’Brien MA, Patel S, Phillips TR, Williams G, Woodhead AJ, Woolford A (2010) Fragment-based drug discovery applied to Hsp90. Discovery of two lead series with high ligand efficiency. J Med Chem 53:5942–5955. https://doi.org/10.1021/jm100059d CrossRefPubMedGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.