1 Introduction

Drug discovery research is a quintessential example of a team activity, given the complexity and breadth of the many scientific areas involved. The integrity, quality, and impact of new knowledge emerging from scientific research are based on individual and collective adherence to core values of objectivity, accountability, and stewardship (The National Academies of Sciences, Engineering, and Medicine 2017). This is of particular importance in life sciences and drug discovery, as we strive simultaneously to develop new knowledge to enable new discoveries that lead project teams to the invention of efficacious therapies that cause life-altering impact on patients suffering from devastating diseases. Not a small task!

Drug discovery may become a very costly undertaking, and investments in an ill-conceived strategy may be devastating for an organization, regardless of whether they are industrial or academic, for-profit or nonprofit, or well-established or starting up. Finding a causative relationship between a single drug target and a disease state remains a formidable exercise in understanding human biology and pathophysiology. As it is broadly acknowledged, significant knowledge gaps in these areas exist today.

Seeking to maximize the chances of success, every project in its conception is linked to a biological hypothesis of disease: The exact pronunciation of the scientific basis of how a concept is thought to lead to disease treatment. During project execution, teams aim either at disproving or adding support to such hypotheses. It may be argued that every nonredundant, reasonable study designed to deliver supporting evidence or rule out the central biological hypothesis must be conducted. However, reality dictates that resources are always finite. As a consequence, teams reduce the biological hypothesis to practice and define the smallest set of studies required to provide appropriate de-risking levels (which vary with every one of us) and support reasonably well the biological hypothesis, called the critical path. Therefore, establishing a sharply defined biological hypothesis linking the target to the disease is a key to framing the scope of the project team (Fig. 1).

Fig. 1
figure 1

Example of a biological hypothesis and its relationship to the target-disease link

The choice of the biological target linked to the disease to be treated is a necessary – but not sufficient – condition for a successful project. Work on an ill-conceived hypothesis or a poorly executed strategy will inevitably lead to a failure due to lack of efficacy – barring serendipity. Thus, the work of most effective project teams will find a way to deliver consensus-derived, logical, and sensible project milestones that either establish evidence that argues against the biological hypothesis and recommend the termination of the project or succeed in minimizing the risks moving forward into clinical validation. The reader will realize that assuring clinical efficacy is not within the realm of possible outcomes at this time.

Chemicals play a role in all biology studies conducted in the context of drug discovery, whether endogenously or exogenously added to the system under study. During the execution of the project’s strategy, multiple compounds will be acquired, by either purchase or chemical synthesis, and studied experimentally to illuminate the decisions made by project teams. These will be chemical probes, radioligands, imaging agents, and drug candidates. Thus, while it would seem natural to enlist a chemist as a team member, unfortunately that is not always the case for a number of reasons. The chemist’s contributions are often incorrectly perceived as not adding other values than providing the test article. While apparently adding efficiencies, this separation of tasks limits the potential synergies across science disciplines, which are key to develop new knowledge.

Be that as it may, this chapter aims to discuss some aspects of the qualification that different compounds, as well as other research reagents and tools, would meet in order to maximize the quality of the science and minimize the changes of misinterpretation of experiments. We do not intend to provide a foolproof guide to conduct biological experimentation, since the diversity of potential chemical/biological system interactions precludes us from such a goal.

2 Drugs in the Twenty-First Century

The concept of “druggability” was introduced around the turn of the last century as a way to qualify the perceived likelihood that a ligand could be found for a given binding site in a biological target, and it was initially conceived thinking of small molecules as drugs, generally compliant with Lipinski rule of 5 (Ro5), and proteins as biological targets (Hopkins and Groom 2003; Workman 2003). Today, the number and nature of chemical modalities that have led to the appropriate modulation of pathophysiology for therapeutic use have increased remarkably. These range, on one end of the spectrum, from small molecules designed with parameters increasingly beyond those set by the Ro5 [e.g., oxysterols (Blanco et al. 2018), cyclic peptides, millamolecular chemistry (millamolecular chemistry refers to macrocycles with a molecular weight between 500 and 1,000 daltons), PROTACs (Churcher 2017)] to molecular entities grouped under the name “biologics,” such as antibodies, antibody-drug conjugates, peptides, and oligonucleotides.

With so many different chemical modalities having been incorporated into the arsenal of scientists, it behooves drug discovery project teams to secure experimental support, in the way of preponderance of evidence, for the concept that the drug is acting according to the mechanism of action at the core of the biological hypothesis. In other words, secure mechanistic evidence that the drug is doing its job “as advertised.” This task includes a thorough characterization and qualification of the chemical probes used during biology studies.

2.1 Chemical Tools Versus Drugs

Drug repurposing consists of the use of advanced clinical compounds or approved drugs to treat different diseases. It takes advantage of work previously done on the active pharmaceutical ingredient, leading to reduced development costs and timelines, and has recently become an area of active interest (Doan et al. 2011). Often drugs already in clinical use are used as chemical probes based on their mode of action. It is important to note that having been approved by a regulatory agency for clinical or veterinary treatment does not necessarily qualify a compound as a high-quality chemical probe. For example, a chemical probe must meet very high standards in terms of its selectivity toward a biological target. On the other hand, for a drug aimed for clinical use, lack of selectivity for a molecular target (known as polypharmacology) may not only not be an issue, but actually provide the basis for its efficacy and differentiation from similar drugs.

3 First Things First: Identity and Purity

Where do the samples of chemical probes and drug candidates used for biological testing come from? Historically, these compounds used to be synthesized by medicinal chemists at intramural laboratories. The final step before submitting newly made compounds for biological testing was to determine that their identity and purity were within expected specifications.

Compound management departments collected these samples and were responsible for managing corporate compound collections that grew in size and structural diversity over the years. They would also deliver compounds for testing to the corresponding laboratory. This process ensured the right compound was delivered in the right amount to the right destination.

However, things have changed. Most drug research organizations today take advantage to some extent of external, independent, contract research organizations (CROs) to synthesize the small molecules for their research activities or commercial suppliers of compound libraries for their hit-finding activities. Indeed, a fair number of start-up biotechnology organizations lack laboratories and conduct 100% of their experimentation at off-site laboratories. Often, compounds travel thousands of kilometers to reach their testing destinations after being synthesized. How does one assure the identity of the compound used for testing is correct? How does all the travel and manipulation impact the purity of the test article and its potential degradation?

An obvious yet often overlooked step when requesting a chemical tool for a biology study without the help of a trained chemist – who would typically communicate using unambiguous chemical structures – is to make sure the compound ordered is the actual chemical intended to be used in the research. In other words, unequivocal identification of the research tools, including their sources (if commercial suppliers, include catalog number and batch number) and assays used in the characterization of the compound, has a major impact on the reproducibility of biological studies by providing a well-defined starting point.

Names given to chemicals may be ambiguous. Some drugs are given an official generic and nonproprietary name, known as international nonproprietary name (INN), with the exact goal of making communication more precise by providing a unique standard name for each active ingredient, to avoid prescribing errors. However, most compounds used in research do not get to receive an INN from regulatory agencies. For example, a SciFinder search for the chemicals “cholesterol” shows a number of “other names” for it, while the drug known as Prozac™ (fluoxetine) has more than 50 different names, some of them included in Table 1.

Table 1 Different names for cholesterol (left) or Prozac™ (right) found in SciFinder

Using a compound’s Chemical Abstracts Service Registry Number (CAS RN) is likely the best way to avoid errors communicating the name of a compound (https://www.cas.org/support/documentation/chemical-substances/faqs). The CAS Registry is considered the most authoritative collection of disclosed chemical substance information, covering substances identified from the scientific literature from 1957 to the present, with additional substances going back to the early 1900s. This database is updated daily with thousands of new substances. Essentially, a CAS RN is a unique numeric identifier that may contain up to ten digits, divided by hyphens into three parts. It designates only one substance. The numerical sequence has no chemical significance by itself. For example, the CAS RN of cholesterol is 57-88-5. However, CAS RN are not fool proof either. For example, fluoxetine free base and its hydrochloride salt have different CAS RN (54910-89-3 and 56296-78-7, respectively). Thus, it is recommended to consult with a chemist to avoid costly mistakes.

In terms of assessing the purity of a chemical sample, a growing number of organizations manage the on-demand syntheses of compounds at CROs by designated medicinal chemists. These two teams must collaborate closely to support project activities. Analogs made are not accepted for delivery unless extensive characterization with unambiguous analytical data exists consistent with the correct chemical structure and purity (often referred to as Certificate of Analysis or CoA). Typically, an elemental analysis is provided, which should be consistent with the compound molecular formula within experimental error, as well as a set of ultraviolet (UV), infrared, H-1 or C-13 nuclear magnetic resonance spectra, and a chromatographic trace with different detection methods (e.g., total ion current, UV absorption). Appearance (oil, powder, crystal, color) and expected purity (as a %) are also stated.

Catalog chemicals, acquired either as singletons or as part of compound libraries, are previously made and are not always accompanied by a current CoA or set of spectroscopic data. The reasons for this vary. So we would rather focus on ways to make sure the compound being tested is the right one.

Furthermore, compound management is also available from CROs, either at the synthesis site or not. Compounds bought are received, barcoded, stored, weighed, cherry-picked, and shipped for testing. When compound management occurs at extramural organizations requiring shipping or when the last chemical analysis was done some time in the past (e.g., a year), it is strongly recommended that before testing quality control be conducted to confirm that the compound’s identity is correct and its purity is within acceptable levels, ideally similar to those obtained when the sample was originally synthesized.

3.1 The Case of Evans Blue

Evans blue (1) is a chemical used to assess permeability of the blood–brain barrier (BBB) to macromolecules. First reported in 1914 (Evans et al. 1914; Saunders et al. 2015), this compound is extensively used for mechanistic studies seeking to interrogate the structural and functional integrity of the BBB. The basic principle is that serum albumin cannot cross a healthy BBB, and virtually all Evans blue is bound to albumin, leaving the brain unstained. When the BBB has been compromised, albumin-bound Evans blue enters the central nervous system (CNS), generating a typical blue color (Fig. 2). However, a reappraisal of the compound properties (as available from commercial sources, Fig. 3) under the scrutiny of modern techniques reveals caveats. For example, one source lists the chemical purity as 75%, with an elemental analysis differing significantly from the theoretical one, and with a variable solid state nature (crystal, amorphous) (www.sigmaaldrich.com/life-science.html). Differences between crystalline states may impact drug solubility, and the presence of up to 25% of unknown impurities introduces random factors in experiments conducted with this material. These factors are not aligned with experimental best practice and may be easily corrected by a skillful chemist in a research team.

Fig. 2
figure 2

A rodent brain showing the effects of Evans blue staining

Fig. 3
figure 3

Example of technical sheet for a commercial sample of Evans blue

The purity of the reagents used in biology research is highly linked to their quality and reproducibility. For example, lipopolysaccharides (LPS) are large molecules found in the outer membrane of Gram-negative bacteria. Chemically, they are a mixture formed by a diversity of lipids and polysaccharides joined by a covalent bond (Moran 2001). LPS is a microbe-associated molecular pattern that potently activates innate immune cells. Peripheral administration of LPS is used as an immunological challenge in animal models of inflammation. However, commercially available samples of LPS are subject to significant variability and typically display a range of potencies, in part due to the wide range of possible methods used to purify LPS (Darveau and Hancock 1983; Johnson and Perry 1976; Apicella 2008). Indeed, some preparations may contain large amounts of nucleic acid contaminants, which activate different innate immunity receptors, resulting in different biological effects. This causes variability in the results from biological studies using LPS, limiting their utility (Ray et al. 1991; Lee et al. 2013). A superior form of LPS for biological research is Control Standard Endotoxin (CSE), which is an LPS preparation whose activity is standardized using coagulation of horseshoe crab blood cells as a functional assay. Indeed, analysis of mouse brain and plasma cytokines and kynurenine pathway metabolites isolated upon CSE treatment suggests improvements in the quality of results over LPS in an in vivo neuroinflammation model (Lee et al. 2013).

Even when the purity and identity of compounds used in testing are assessed right after synthesis, some may deteriorate over time. This is true for solid samples, as well as those stored as solutions in dimethyl sulfoxide (DMSO). Cycles of freezing and thawing exacerbate deterioration, especially if the sample is exposed to air. For example, the arylpiperazine 2 was thought to be a hit in a high-throughput screen conducted to find melanin-concentrating hormone receptor 1 (MCHR1) antagonists. However, upon confirmation of the hit via resynthesis, it was discovered that the actual structure of the compound was 3 (Tarrant et al. 2017). The reason for this was that at the end of the preparation, compounds were isolated by precipitating them as their hydrobromide salts and then stored in DMSO at a 10 μM concentration. However, in the presence of oxygen (from the air), the thawed solutions reacted according to the chemical reaction shown in Fig. 4.

Fig. 4
figure 4

Transformation of arylpiperazine 2 as an HBr salt into bromopiperazine 3 during storage (Moran 2001)

In summary, confirming the identity and the purity of all compounds used in any experiment is a simple and necessary step to get the most out of the efforts, conduct rigorous scientific experimentation, and avoid costly mistakes.

3.2 Identity and Purity of Research Reagents

Not only is it important to unequivocally confirm the identity and purity of compounds, but it is essential to do the same for all the biological tools used in studies. For in vitro work, cell lines should be authenticated for their identity and lack of mycoplasma infection. It is surprisingly common in research for scientists to work on cell lines that are not what they think they are, resulting in spurious published reports, irreproducible data, and wasted resources (Chatterjee 2007; Drexler et al. 2002; Masters 2000). In a notorious set of studies, it was found that a large proportion of cell lines being used worldwide were HeLa cells (Nelson-Rees et al. 1974, 1981), the first human cancer cell line developed. In other studies, of cell lines submitted for banking, 17–36% were found to be different from what was claimed and were even of a different species (Markovic and Markovic 1998; MacLeod et al. 1999). The identity of cell lines can be evaluated by microscopic evaluation, growth curve analysis, and karyotyping. However, the most definitive test to confirm a cell lines identity is to perform DNA fingerprinting by short tandem repeat profiling (Masters 2000; MacLeod et al. 1997). This service is provided by a number of CROs.

In order to ensure reproducibility of published work, it is important to include all details of the cell culture experiments. This includes the source of any cell line used, including suppliers and catalog numbers, where applicable. It may also be relevant to specify the range of passage numbers used in experiments, as this may affect functional outcomes (Briske-Anderson et al. 1997; Esquenet et al. 1997; Yu et al. 1997; Wenger et al. 2004). The same applies to all culture conditions, including seeding density, time of culture, media composition, and whether antibiotics are used.

Another common pitfall in cell biology research is infection of cells by mycoplasma (Drexler and Uphoff 2002). Mycoplasma are a genus of bacteria that lack a cell wall and are resistant to common antibiotics. Mycoplasma infection can affect cell behavior and metabolism in many ways and therefore confound any results (Drexler et al. 2002; Kagemann et al. 2005; Lincoln and Gabridge 1998). Mycoplasma are too small to detect by conventional microscopy and are generally screened for in laboratories by DNA staining, PCR, or mycoplasmal enzyme activity (Drexler and Uphoff 2002; Lawrence et al. 2010). This should be done routinely, and with the availability of rapid kits it is relatively painless and cost-effective.

Many people use antibiotics in cell culture. However, it is a better practice to avoid antibiotics. Antibiotics mask errors in aseptic technique and quality control and select for antibiotic-resistant bacteria, including mycoplasma (Lincoln and Gabridge 1998). Furthermore, small quantities of bacteria killed by antibiotics may still have effects on cells due to microbe-associated molecules, such as LPS, that are potent activators of innate immune cells (Witting and Moller 2011).

For in vivo work, care should be taken to ensure that the correct mouse strains and lines are being used. Full genetic nomenclature and sources should be tracked and reported. Mouse strain can differ substantially depending on which vendor it is obtained from due to spontaneous mutations and founder effects. For instance, it was discovered that the commonly used C57BL/6J inbred mouse strain from one vendor contained a spontaneous deletion of the gene for α-synuclein, while the same strain from another vendor did not (Specht and Schoepfer 2001). Mutations in α-synuclein are a genetic cause of Parkinson’s disease, and many researchers in the field had been inadvertently using this strain, possibly confounding their results.

4 Drug Specificity or Drug Selectivity?

First off: drugs and chemical probes do not act specifically at any target. They can’t be, as even high-affinity drugs, at some concentration will start interacting with secondary targets. So, at best, some compounds are “highly selective.”

Typical ways to establish selectivity of a compound against a number of antitargets or subtypes are based on in vitro assays where individual test articles are tested for measures of binding affinity or functional activity at a target. The selectivity at a given target is usually qualified by the ratio between quantitative measures of drug effects at such targets. A fair number of these panels are available from commercial organizations, as well as government-funded agencies and academic institutes. However large as the number of such counterscreens may be, there are always targets that remain unknown or cannot be tested or modalities for which the in vitro test does not exist. For example, depending on the magnitude of their binding cooperativity, a compound may or may not show a signal competing with the orthosteric ligand in a radioligand binding assay because it binds to an allosteric site. In this case, a functional screen might appear to be a better choice (Christopoulos and Kenakin 2002). However, ruling out the possibility the compound acts as a silent allosteric modulator would require the combination of binding and functional screens. These may not always be available off-the-shelf, thus requiring additional research (Gregory et al. 2010).

Second, the selective actions of a compound depend on the concentration at which the test is conducted. This is a particularly significant issue when using cell-permeable chemical inhibitors to explore protein function, such as protein kinase inhibitors. In this area, determining compound selectivity is a difficult undertaking given the large number (>500) of such proteins encoded by the human genome and the highly conserved ATP binding site of protein kinases, with which most inhibitors interact. Due to the high importance of this field, a fairly large number of such inhibitors have become available from commercial suppliers and, most notably, offered online by academic organizations (http://www.kinase-screen.mrc.ac.uk/; http://www.chemicalprobes.org/; https://probeminer.icr.ac.uk/#/). Often, claims of “specificity” made for a compound toward a given kinase tested using broad panels have shown to be unrealistic, leading to erroneous conclusions regarding the participation of a kinase in a certain mechanism. Suggested criteria have been proposed for publication of studies using protein kinase inhibitors in intact cells (Cohen 2009). These include screening against large protein kinase panels (preferably >100), confirming functional effects using inhibitors in at least two distinct structural chemotypes, demonstrating that the effective concentrations are commensurate with those that prevent phosphorylation of an established physiological target, meaningful rank ordering of analogs, and replication at a different laboratory. For example, let’s say compound A inhibits kinase X with an intrinsic affinity K i = 1 nM and a 1,000-fold selectivity over undesired antitarget kinase Z (K i’ = 1 μM). Conducting an in vitro study in a comparable matrix at inhibitor concentration of 30 nM will selectively occupy the binding site of target X over the antitarget Z, and the effects measured may be considered as derived from kinase X. On the other hand, conducting the same study at inhibitor concentration of 10 μM will produce results derived from inhibiting kinases X (completely) and Z (highly) (Smyth and Collins 2009).

It is important to understand that therapeutic drugs used in the clinic do not need to be selective. Indeed, most are not, and the effects derived from cross-reactivity may even be beneficial to the therapeutic properties. However, during target identification and validation efforts, where a biological hypothesis is under examination, the risk of misinterpreting observations from a study due to cross-reactivity issues can easily be de-risked. In spite of efforts from the chemical biology community, a recent publication discusses the evidence of widespread continuing misuse of chemical probes and the challenges associated with the selection and use of tool compounds and suggests how biologists can and should be more discriminating in the probes they employ (Blagg and Workman 2017).

5 Species Selectivity

The term species selectivity refers to the observation that the effects of a compound may vary depending on the biological origin of the system where the test is conducted. The mechanistic origin of the observed differences may vary, including virtually every aspect of drug discovery, from lack of binding to the biological target due to differences in amino acid sequence, differences in nonspecific binding to matrix components, dissimilar drug absorption, stability to matrix components (e.g., plasma hydrolases) or drug metabolism by homologous (or even not expressed) proteins like cytochrome P450 or aldehyde oxidase, or simply differences in physiology across species (e.g., Norway rats vs. aged C57BL/6J mice) or even strains of the same animal species (e.g., Wistar vs. Sprague Dawley rats).

5.1 Animal Strain and Preclinical Efficacy Using In Vivo Models

Pharmacological responses to the action of a compound or the efficacious range of doses or exposures linked to effects observed during in vivo studies in preclinical species may vary when different strains of the same species are used. For example, C57BL/6J mice showed greater preference for saccharin and less avoidance of a cocaine-paired saccharin cue when compared with DBA/2J mice (Freet et al. 2013a). And in studies using opioid agonists in mice and rats, strain differences in the nociceptive sensitivity have been reported (Freet et al. 2013b). Wistar and Sprague-Dawley are most frequently the rat strains chosen in life sciences research, yet other outbred or inbred strains are sporadically used (Freet et al. 2013b; Festing 2014). A systematic study was recently conceived to evaluate the impact of rat strain (Lewis, Fischer F344, and Wistar Kyoto), as well as other important parameters, such as investigator, vendor, and pain assay, on the effects of morphine, a broadly studied analgesic used as a prototype in this work. Three experimental protocols were studied: hot plate, complete Freund’s adjuvant (CFA)-induced inflammatory hyperalgesia, and locomotor activity. Findings revealed strain- and vendor-dependent differences in nociceptive thresholds and sensitivity to morphine – both before and after the inflammatory injury. The authors conclude that the translational value of work conducted using a specific strain or preclinical model is limited and propose ways to mitigate this risk (Hestehave et al. 2019).

5.2 Differences in Sequence of Biological Target

Modifications in the chemical nature of the building blocks of a target receptor may lead to major differences in the biological activity of chemical probes – affinity or efficacy. Best known examples come from the area of small molecules acting at protein targets. Such alterations may be due to mutations and manifest themselves as loss-of-function or gain-of-function mutations. For example, autosomal dominant inherited mutations in the gene encoding leucine-rich repeat kinase 2 (LRRK2) are the most common genetic causes of Parkinson’s disease (Rideout 2017), while some have been linked to rare diseases (Platzer et al. 2017).

Species selectivity is a significant, yet not unsurmountable, challenge to drug discovery programs. For example, receptors in the purinergic family (including the adenosine (ARs), P2Y, and P2X receptors) are notorious for their proclivity to display reduced activity in rat compared with mouse or human receptors. Subtype selectivity values reported for some of the early tool compounds were revised following thorough pharmacological characterization across preclinical species. For example, tool compound 4, broadly used to study the activation of the A2A AR, has high AR subtype selectivity in rat and mouse, but it is reduced at human ARs (Jacobson and Muller 2016).

figure b

These observations are not rare, unfortunately. This suggests that best practice requires alignment and consistency when testing the activity for drug candidates or tool compounds using receptors corresponding to different relevant species and highlights the risk of extrapolating biological activity across species without proper experimental confirmation.

5.3 Metabolism

Lu AF09535 (5) is an mGluR5 negative allosteric modulator studied for the potential treatment of major depressive disorders. During early clinical development, an unanticipated low exposure of the drug was observed in humans, both by conventional bioanalytical methods and the highly sensitive microdosing of 14C-labeled drug. This observation was attributed to extensive metabolism through a human-specific metabolic pathway since a corresponding extent of metabolism had not been seen in the preclinical species used (rat and dog). A combination of in vitro and in vivo models, including chimeric mice with humanized livers compared with control animals, showed that aldehyde oxidase (AO) was involved in the biotransformation of Lu AF09535 (Jensen et al. 2017). There is no equivalent protein to AO expressed in rat or dog. Cynomolgus monkey has been recommended as a suitable surrogate to study potential human AO metabolism (Hutzler et al. 2014).

figure c

6 What We Dose Is Not Always Directly Responsible for the Effects We See

Often during research using chemical tools, especially during in vivo studies, the experimental observations are interpreted as derived from the compound being dosed. However, this is not always the case, and thorough research requires establishing this direct link between the parent compound and the biological target in agreement with the hypothesis under testing.

As an example, compound 6 is an MCHR1 antagonist with potent in vitro activity. When tested in vivo in a sub-chronic diet-induced obesity model (DIO), it showed efficacy. However, a metabolite identification study indicated that significant amounts of primary alcohol 7 remained in circulation. Alcohol 7 was synthesized and it demonstrated potent in vitro inhibition of MCHR1 effects. In vivo, rat pharmacokinetics was very good and the compound crossed the BBB. As anticipated, when tested in a DIO model, it showed efficacy superior to its methoxy precursor 6. Given its favorable physicochemical properties, compound 7 became NGD-4715 and reached Phase 1 clinical tests before being discontinued due to the observation of mechanistic effects altering sleep architecture (Moran 2001).

figure d

Furthermore, for compounds that are “well-behaved,” the unbound concentrations measured at the hypothetical site of action should be commensurate with affinity or efficacy obtained using in vitro binding or functional assays. It must be reminded that, at the end of the day, these drug concentrations represent a certain receptor occupancy, which is expected to be consistent across different tests used to assess, and ideally translate, to the clinic.

Oftentimes these relationships are visualized through “exposuregrams,” graphics where efficacious different in vitro, ex vivo, and in vivo tests are compared (Fig. 5).

Fig. 5
figure 5

Comparison of “exposuregrams” for two different compounds. The left graphic shows consistent efficacious unbound drug concentrations and a reasonable separation from unbound concentrations showing toxic effects. The graphic on the right shows inconsistent values for efficacious drug concentrations and overlapping with unbound concentrations leading to toxic effects

6.1 Conditions Where In Vitro Potency Measures Do Not Align

Occasionally, in vitro measurements of compound potency derived using recombinant receptor protein will not overlap perfectly with those obtained using a cellular matrix. Due to the increased chemical and structural complexity of the cellular assay matrix compared with the recombinant milieu, often lower potency measures are determined. These tend to be attributed to poor cell membrane permeability, reduction in unbound concentrations due to increased nonspecific binding, or simply differences in concentrations of relevant binding partners between the two assays (e.g., ATP concentrations too high for kinase inhibitors).

On the other hand, occasionally a compound’s potency increases in a cellular matrix compared with the recombinant assay. This rare effect may be explained by the formation of active drug metabolites or posttranslational modifications (cellular systems are metabolically able), the existence of unknown protein−protein interactions (purified recombinant systems ignore cellular localization and avoid contacts with other cell components such as proteins or nucleic acids), or intracellular localization of compound driven by transporter systems. Interpretation of these shifts is not always feasible, as target occupancy is not routinely established in cellular assays. For the case of allosteric drugs, a left shift in a cellular system suggests the presence of an endogenous component with positive cooperativity with the test article.

7 Chemical Modalities: Not All Drugs Are Created Equal

Target de-risking efforts benefit from experimentation with chemical tools of different nature. Due to advances in technology, probe compounds belonging to the group of biologics (e.g., proteins, oligonucleotides) tend to be faster to develop up to a quality good enough to conduct early testing of the biological hypothesis. On the other hand, developing a small molecule with the desired activity at the target and selectivity against antitargets usually requires a significant investment of financial and human resources. Table 2 compares some of the characteristics of these chemical modalities.

Table 2 Some characteristics of small-molecule drugs and biologics

8 Receptor Occupancy and Target Engagement

As discussed in the introduction, the ultimate test of any drug discovery project is the exploration of the clinical hypothesis in humans. Ideally, the drug will show efficacy and safety and eventually become marketed. However, most drug discovery projects fail to meet these objectives. The second best outcome is then being able to rule out the biological hypothesis. Logically, this requires, at a minimum, being able to demonstrate sufficient clinical receptor occupancy (i.e., the drug is binding to the biological target in the tissue linked to the pathophysiology) at the site of action and ulterior target engagement (i.e., the expected functional effects derived from receptor occupancy are seen).

Oftentimes target engagement is inferred from a functional observation in a biological system upon treatment with a probe compound. This may be reasonable when studying systems with extensive prior biochemical or behavioral phenotype knowledge. However, when conducting research on novel biological systems and targets, an actual determination of the degree of receptor occupancy provides a much more robust line of support to conclude that the phenotype observed is produced by a molecular interaction between the test article and the receptor. For example, GPCR antagonists or agonists may elicit their functional effects (functional inhibition or activation, respectively) at very different degrees of receptor occupation. For the former, receptor occupancy typically parallels level of inhibition of a receptor, whereas agonists may exert functional effects occupying a very small fraction of a receptor (Michel and Seifert 2015), sometimes so low that it is hard to measure practically after consideration of the experimental error (Finnema et al. 2015).

9 Radioligands and PET Ligands as Chemical Tools

Chemical tools labeled with radioactive atoms are often used in drug discovery projects in a number of tasks, including binding affinity measurements, drug pharmacokinetics, ex vivo or in vivo receptor occupancy, or BBB permeability, among other uses.

PET agents are generally synthesized containing carbon-11 (t ½ = 20.4 min) and/or fluorine-18 (t ½ = 109.7 min) as radioisotope. Radioligands most often contain tritium (t ½ = 12.3 years), carbon-14 (t ½ = 5,730 years), phosphorous-32 (t ½ = 14 days), sulfur-35 (t ½ = 87 days), or iodine-125 (t ½ = 60 days).

As explained for “cold” chemical tools (non-radiolabeled), it is important to understand that there is no hierarchy established between different types of radioligands. Each tool is designed with a fit-for-purpose mentality. In other words, a great human PET agent may or may not be acceptable for preclinical use, and the short half-life would most likely preclude its use as, for example, a radioligand binding assay.

Criteria have been developed to aid the discovery of high-quality novel PET ligands based on the physicochemical properties, brain permeability, and nonspecific binding for 62 clinically successful PET ligands and 15 unsuccessful radioligands considered as controls for undesired properties (Zhang et al. 2013). Properties chosen in this analysis are cLogP, cLogD, MW, TPSA, HBD, and pKa. It should be taken into consideration that this is just one approach to PET ligand design developed by one research group. Different approaches have been tested with successful results too. Target localization in the brain and its specific identity are important parameters to consider (Van de Bittner et al. 2014). Non-labeled ligands have also been used to measure in vivo receptor occupancy taking advantage of the increasing sensitivity of liquid chromatography coupled to mass spectrometry methods (LC-MS/MS). In a retrospective analysis, brain penetration, binding potential, and brain exposure kinetics were established for a number of non-labeled PET ligands using in vivo LC-MS/MS and compared with PET ligand performance in nonhuman primates and humans (Joshi et al. 2014).

A key parameter in the quality of a PET ligand is its selectivity at the biological target of interest. Best practice requires that this selectivity be thoroughly established using in vitro tests first and in vivo systems afterward to minimize the risk of misinterpreting experimental observations. An interesting example was recently reported for validation studies for the compound [11C]-JNJ-42491293 (8), a PET ligand candidate to study the metabotropic glutamate 2 receptor (mGluR2), a drug target for CNS diseases (Leurquin-Sterk et al. 2017). Compound 8 has high affinity for the human mGluR2 receptor (IC50 around 9 nM). Preclinical studies conducted in Wistar rats demonstrated moderate brain penetration followed by distribution in brain regions consistent with those expected based on known expression of mGluR2. However, an additional unexpected observation indicated high retention in heart tissue. In order to explore this issue, the team conducted comparative PET studies using wild-type rats with their mGluR2 knockout counterparts. These studies indicated off-target binding in vivo to a yet-unidentified molecular target and highlight the importance of conducting in vitro and in vivo comparative studies to conduct a rigorous validation of PET radioligands.

figure e

10 Monoclonal Antibodies as Target Validation Tools

Monoclonal antibodies (mAbs) offer a compelling alternative to small molecules as tools in support of target validation. mAbs can potentially be generated with high affinity and selectivity against targets of interest with relative speed compared to small molecules. mAbs also have an advantage over small molecules, as they have the potential to disrupt thermodynamically stable large molecule interactions. While this effect has also been accomplished by small-molecule allosteric modulators, the task of compound optimization may be complex (Watson et al. 2005). In vivo, mAbs generally exhibit much longer half-lives than small molecules and therefore may be more practical in long-term target validation experiments as they can be administered infrequently.

While mAbs are usually used as antagonists to test the target of interest, it is also possible to generate agonistic mAbs. The use and potential challenges of immune agonist antibody design and their development as potential therapies for cancer treatment have been reviewed (Mayes et al. 2018).

10.1 Targets Amenable to Validation by mAbs

Since antibodies are large molecules normally secreted by B cells into the circulation, the target repertoire of mAbs is traditionally thought to be confined to extracellular or cell surface proteins. In more recent years, it has become apparent, somewhat surprisingly, that mAbs may also possess the ability to target intracellular proteins, including cytosolic proteins. The mechanism for this is still unclear, as mAbs exist in a topologically distinct compartment from the cytosol. Many cells express receptors for the Fc constant region of antibodies, allowing their uptake into the endolysosomal compartment of the cell. Certain cytosolic antigens may be targeted to the endolysosomal compartment by autophagocytosis. For instance, this may be true for the neuronal protein, tau, which aggregates and becomes targeted for autolysosomal degradation in the pathological state (Congdon et al. 2013; Sankaranarayanan et al. 2015).

In addition, cells express a high-affinity Fc receptor in the cytosol called TRIM21, which is important in mediating intracellular immunity against viral infections (Mallery et al. 2010). TRIM21 also contains a ubiquitin ligase domain, allowing antigen/antibody complexes to be targeted to the proteasome for degradation. Antibody tools may be used to target endogenous proteins for degradation by this pathway in target validation experiments (Yanamandra et al. 2013). The mechanisms by which mAbs enter the cytosol remain poorly understood.

10.2 The Four Pillars for In Vivo Studies

The concept of the four pillars of drug action (Bunnage et al. 2013) applies to antibodies as well as small molecules. For a soluble target, it is important to determine the concentration of the target in the target tissue, as well as the soluble (“free”) fraction of the antibody achieved in that tissue. The free mAb concentration must significantly exceed the affinity of the mAb for its targets in order to ensure that the target is saturated by the mAb. For a cell surface target, it should be determined whether the target is released from the cell as part of the disease process. The soluble form of the target could act as a sink to sequester the mAb, and this will not necessarily be measurable by determining mAb concentration in the tissue of interest. Similarly, if the cell surface target is internalized upon mAb binding, this could rapidly deplete the mAb. However, this could be determined in an appropriate pharmacokinetic time-course dose-response study.

Another potential challenge is the development of immunogenicity to the mAb. That is, the host may raise antibodies against the test mAb, thereby neutralizing its activity and efficacy. Therefore, it is necessary to monitor an immune response against the biologics, particularly if no efficacy is observed. To reduce the risk of immunogenicity, the host of the test mAb should be the same species as the animal model (i.e., a mouse mAb should be used in a mouse model, but not a rat model, and vice versa).

10.3 Quality Control of Antibody Preparation

There are a variety of methods by which antibodies can be generated and purified. mAbs may be produced by hybridoma clones or by recombinant expression in an immortalized cell line. Purification of mAbs from supernatants can be performed with Protein A or G affinity columns and/or by size-exclusion chromatography (SEC). mAbs purified by Protein A/G columns alone may not be completely pure, as additional serum proteins may stick to the columns or antibodies. Furthermore, if regular serum is used to grow the cells, the Protein A/G column will also bind to the endogenous antibodies in the serum. This will make it more difficult to determine the concentration of the antibody of interest. Another potential pitfall is leaching of Protein A/G into the antibody preparation. This should be addressed directly.

SEC-purified antibodies are generally purer and more reliable. Not only does SEC separate mAbs from other contaminants but also allows for the purification of mAb monomers away from potential mAb aggregates. Aggregated antibody can potentially lead to artifacts, perhaps due to increased avidity for low-affinity antigens. This could result in off-target reactivity in binding assays (e.g., immunohistochemistry, ELISA) or even in in vitro or in vivo functional assays.

Another very important contaminant to consider is LPS or endotoxin. As discussed above, endotoxin is a component of the Gram-negative bacteria cell wall and a potent activator of mammalian immune cells. Sterilization does not remove endotoxin, so unless equipment and reagents are expressly endotoxin (or “pyrogen-”)-free, antibody preparations may contain endotoxin, even if sterile (Witting and Moller 2011; Weinstein et al. 2008). Endotoxin will certainly interfere with any immune-based endpoints and may also affect other endpoints in vivo, such as cognitive or motor tests, if the animal is suffering from inflammation-induced sickness behavior (Weinstein et al. 2008; Remus and Dantzer 2016).

10.4 Isotype

The choice of antibody isotype may profoundly impact the efficacy of an antibody in a validation experiment. Different isotypes have different effector functions due to their differential effects on Fcγ receptors (Jonsson and Daeron 2012; Wes et al. 2016). Engaging Fcγ receptors trigger a variety of cellular responses, such as phagocytosis and proinflammatory cytokine release. In some cases, effector function is needed in order to see efficacy, such as when clearance of the antigen by immune cells is desirable. In other cases, effector function is not important, such as when the antibody effect is driven by occluding interaction of a ligand with its receptor. Effector function may also confound interpretation of results by causing an inflammatory response at the site of action. In these cases, using effector function null antibodies may be desirable.

It is important to ensure that any effect of the antibody is not due to effector function rather than engagement of the antigen. For instance, if the mAb target is expressed on an immune cell that also expresses Fcγ receptors, modulation of the cell may be via the Fcγ receptor rather than the target. Therefore, negative control antibodies should always be of the same isotype.

10.5 Selectivity

In order to correctly interpret a target validation experiment using a mAb, it is of course critical to use a mAb that is highly selective for the target. Selectivity is often determined by testing the antibody on a Western blot following SDS-PAGE and observing a band of the correct size. However, SDS-PAGE Western blots detect denatured proteins, and therefore there may be additional proteins that the antibody recognizes in the native state. Also, a band of the correct size is not definitive proof that the band is the target of interest. One approach to test for specificity is to compete the signal using peptides of the antigenic sequence. However, cross-reactivity may be due to a shared epitope, and an antigen peptide may compete this out as well. The best control is to test the antibody in a knockout animal or, if not available, ablate the target in cells using CRISPR, shRNA, or other knockdown technologies. Indeed, there have been examples of mAbs that recognized a band of the correct size on a Western blot, but when knockouts of the target became available, the band did not disappear. The best approach to be sure that you are correctly interrogating the target of interest is to use to multiple independent mAbs, ideally with different epitopes, against your target.

GPCRs, ligand-gated ion channels (LGICs), and transporters are key cell membrane-bound biological targets often studied for a number of potential CNS treatments. Biological research with these receptors often uses antibodies targeting them, even though their usefulness has been questioned on the basis of their selectivity. An increasing number of reports suggest lack of selectivity for a number of GPCRs and LGICs, including those from commercial sources (Michel et al. 2009; Berglund et al. 2008). Two often-applied criteria to assess antibody selectivity are the disappearance of staining upon addition of blocking peptide and observing distinct staining patters in tissues when testing antibodies against different receptor subtypes. While reasonable, these may be insufficient to assert antibody selectivity. Criteria thought to be reliable enough to demonstrate selectivity of GPCR antibodies have been proposed:

  1. 1.

    The staining disappears in immunohistochemical studies or immunoblots of tissues from genetic animals not expressing the receptor.

  2. 2.

    A major reduction of staining by a given antibody when using animals or cell lines treated with genetic tools to knockdown expression of a given receptor.

  3. 3.

    The target receptor yields positive staining when using transfection of multiple subtypes of a given receptor into the same host cell line, and it does not in the related subtypes.

  4. 4.

    Multiple antibodies against different epitopes of a GPCR (e.g., N-terminus, intracellular loop and C-terminus) show a very similar staining pattern in immunohistochemistry or immunoblotting (Bradbury and Plückthun 2015). The issue of antibody quality and its impact on reproducibility of biological studies seem to be rather broad.

A report from 2008 states that less than half of ca. 5,000 commercial antibodies acted with the claimed selectivity at their specified targets. In addition, some manufacturers deliver consistently good antibodies, while others do not (Berglund et al. 2008). It is proposed that “if all antibodies were defined by their sequences and made recombinantly, researchers worldwide would be able to use the same binding reagents under the same conditions.” To execute this strategy, two steps would be required. First is obtaining the sequences for widely used hybridoma-produced monoclonal antibodies. Indeed, it is proposed that polyclonal antibodies should be phased out of research entirely. Second, the research community should turn to methods that directly yield recombinant binding reagents that can be sequenced and expressed easily (Bradbury and Plückthun 2015). The practicality of this proposal appears a challenge in the short term.

In summary, while antibodies are broadly used as chemical tools to aid target validation, they are not free from issues that may lead to reproducibility issues when they are not optimally characterized.

11 Parting Thoughts

Over the last decade, major progress has been witnessed in the life sciences area. This could not be achieved without increased sophistication and understanding aiding the design of high-quality probe compounds, in an increasing number of chemical modalities. In turn, improved tools enabled the formulation of new questions to interrogate novel hypotheses, leading to heightened understanding of fundamental biology and pathophysiology, a key step in the discovery of new therapies for the treatment of diseases. Chemistry is part of the solution to disease treatment (no pun intended). Better chemistry understanding leads to better drugs.

Drug discovery projects require major commitments from society. Scientists dedicate decades of efforts and sacrifices. Investors risk billions of dollars. Patients are waiting and deserve the most ethical behaviors from all of us seeking to find new palliatives to ease their suffering. The right path forward is one of high-quality and rigorous scientific research.