1 Introduction

The Human Genome Project was initiated on October 1, 1990, and the complete DNA sequence of the human genome was completed in 2001 (http://www.nhgri.nih.gov[1,2]). Central to the drug discovery process is gene identification followed by determination of the expression of genes in a given disease and understanding of the function of the gene products. It is of interest that the identification, in the early 1980s, of the gene believed to be responsible for cystic fibrosis took researchers approx 9 yr to discover, whereas the gene responsible for Parkinson’s disease was identified within a period of several weeks (3). This quantum leap in the ability to associate a specific gene with a disease can be attributed primarily to the extraordinary progress that has been made in the areas of gene sequencing and information technologies.

Selection and validation of novel molecular targets have become paramount in light of the abundance of new potential therapeutic drug targets that have emerged from human gene sequencing. The development of high-throughput methods in both biology and chemistry is therefore necessary. In addition, it has become increasingly challenging to translate successfully basic scientific discoveries into clinical experimental medicine and novel therapeutics. Consequently, a new paradigm for drug discovery has emerged. The integration of clinical, genetic, genomic, and molecular phenotype data partnered with cheminformatics is involved in this process. Central to this process, the data that are generated are managed, collated, and interpreted with the use of informatics. In this review, we address the use of new technologies that have arisen to deal with this new paradigm.

2 Target Validation

Several thousand molecular targets have now been cloned and are available as potential novel drug discovery targets. These targets include G protein-coupled receptors, ligand-gated ion channels, nuclear receptors, cytokines, and reuptake/transport proteins (4). The sheer volume of information being produced has shifted the emphasis from the generation of novel DNA sequences to the determination of which of these many new targets offer the greatest opportunity for drug discovery. Thus, with several thousand potential targets available, target selection and validation have become the most critical component of the drug discovery process and will continue to be so in the future.

An example of the new paradigm of target selection comes as a result of the pairing of the orphan G protein-coupled receptor GPR-14 with its cognate neuropeptide ligand urotensin II. Urotensin II is the most potent vasoconstrictor identified to date; it is approximately one order of magnitude more potent than endothelin-1 (5). Thus, GPR-14/urotensin II represents an attractive therapeutic target for the treatment of disorders related to or associated with enhanced vasoconstriction, such as hypertension, congestive heart failure, and coronary artery disease, to name but a few.

In general, most tissues express between 15,000 and 50,000 genes in different levels. In diseased tissue, gene expression levels often differ from those observed in normal tissues, with certain genes being over- or underexpressed, or new genes being expressed or completely absent. Localization of this differential gene expression is one of the first crucial steps in identifying an important potential molecular target for drug discovery. In addition to the traditional techniques of Northern analysis, a number of newer methods are used to localize gene expression. The techniques that typically yield the highest quality data are in situ hybridization (ISH) and immunocytochemistry, both of which are labor intensive. For example, ISH or immunohistochemical localization of a prospective molecular target to a particular tissue or subcellular region is likely to yield valuable information concerning gene function. Examples of the success of this approach include the case of the orexin peptides and receptors whose hypothalamic regional localization suggested an involvement in feeding (6).

Each of these localization techniques has its advantages and disadvantages. ISH can be initiated immediately following gene sequencing and cloning; however, gene detection is only at the transcriptional mRNA level. Immunocytochemistry, on the other hand, offers the ability to measure protein expression but requires the availability of antibodies having the requisite affinity and selectivity, which may often take several months to generate. With either of these techniques, target localization within the cell is possible at the microscopic level but is dependent on the availability of high-quality normal and diseased human tissues, which may often represent a limiting factor.

The localization of a gene in a particular tissue does not necessarily shed light on all of the functions of that gene. As an example, the discovery of orexin as a putative regulator of energy balance and feeding was initially concluded as a result of localization in the dorsal and lateral hypothalamic regions of the brain (6). However, subsequently this gene product was discovered to be a major sleep-modulating neurotransmitter that may represent the gene responsible for narcolepsy (7).

Technologies, such as microarray gridding (GeneChip™) and TaqMan® polymerase chain reaction (PCR) that would appear destined to play a more prominent role in the high-throughput localization of genes, and the identification of their regulation in disease, have emerged (8). Microarray gridding and Spotfire® data analysis are already evolving into procedures that allow the comprehensive evaluation of differences in gene expression patterns in normal, diseased, or pharmacologically manipulated systems (8, 9). For genes expressed in low abundance, more sensitive techniques may be required, and reverse transcriptase (RT)-PCR-based TaqMan technology offers the ability to detect changes in gene expression with as little as two copies per cell. TaqMan technology also has the potential to be developed into a robust methodology for high-throughput tissue localization.

3 Functional Genomics

The term functional genomics is now being used to describe the post-Human Genome Project era and encompasses the many efforts needed to elucidate gene function. Traditionally, functional genomics pertains to the use of genetically manipulated animals, such as knockout or knockin mice or transgenic mice, to study a particular gene’s function in vivo. Although these traditional methods are still a valuable tool in understanding gene function, more recently developed methods, such as RNA interference and mRNA silencing, offer an alternative that allows relatively faster methods of gene modification and function analysis in vivo (10,11).

Indeed, the phenotyping of genetically manipulated animals is informative in determining the biological function of a particular gene. However, in reality, the discipline of functional genomics has its foundation in the physiological and pharmacological sciences. Although the evaluation of genetically manipulated animals requires a thorough understanding of physiology and pharmacology, the experimental approach involves many new technologies. These methods include in vivo imaging (i.e., magnetic resonance imaging, micropositron emission tomography, ultrafast computed tomography, infrared spectroscopy), mass spectrometry (MS), and microarray hybridization, all of which enhance the speed and accuracy at which functional genomics is achieved.

4 Proteomics, Metabolomics, and Lipomics

As high-throughput drug discovery has progressed through the genome, it is now moving toward assessing the proteome and metabolome. It is recognized that mRNA expression does not always correlate with protein expression (12), and many factors such as alternative splicing, posttranslational modification (e.g., glycosylation, phosphorylation, oxidation, reduction), and mRNA turnover may account for this. Because modified proteins can have different biological activities, research and new technologies are now more focused on protein expression.

The term proteome refers to all the proteins produced by a species, much as the genome is the entire set of genes (13). However, unlike the genome, the proteome can vary according to cell type and the functional state of the cell (14,15). Proteomic analysis allows a point-in-time comparison of the protein profile, such as before and after therapeutic intervention. It can also be used to compare protein profiles in diseased and nondiseased tissues.

Microarrays are currently the major tool in the assessment of gene expression via cDNA and RNA analysis; however, they are also used to screen libraries of proteins and small molecules. Just as DNA microarrays allow the detection of changes in genes in various diseases, protein, peptide, tissue, and cell microarrays can be used to detect changes in proteins, phospholipids, or glycation of proteins in disease. Protein arrays are also used to examine enzyme-substrate, DNA-protein, and protein-protein interactions (16,17). The practical application of proteomics depends on the ability to identify and analyze each protein product in a cell or tissue (18). Because proteins cannot be amplified like DNA or RNA and proteins also tend to be degraded more readily, sensitive and rapid analyses are necessary to account for the small sample sizes and instability of proteins. Although this field is still developing, a MS and ProteinChip-surfaceenhanced laser desorption/ionization technologies using slides with various surface properties (e.g., ion exchange, hydrophobic interaction, metal chelation) to bind and selectively purify proteins from a complex biological sample are being utilized (18,19). An important challenge encountered with protein microarrays is maintaining functionality of the protein, such as posttranslational modifications and phosphorylation. Another important consideration is the retention of both secondary and tertiary structures. The use of immobilizing coatings, such as aluminum, gold, or hydrophilic polymers, on slides or imprinting the proteins on porous polyacrylamide gels is being explored to address these issues (17). For example, proteomic analysis has been used successfully to identify serum biomarkers. ProteinChip-surface-enhanced laser desorption/ionization technology, in conjunction with bioinformatics tools, has been utilized to identify a proteomic pattern in serum that is diagnostic for ovarian cancer (20). It is anticipated that proteomics and bioinformatics will facilitate the discovery of new and better biomarkers of disease.

Metabolomics is the study of the metabolome, which is the entire metabolic content of the cell or organism at any given moment (21). Although metabolomics generally focuses on biofluids, such as serum and urine, investigators are now evaluating the cell as well. Metabolic profiling has been used regularly to characterize toxicity and disease states, such as inborn errors of metabolism. Additionally, blood and urine are screened routinely for metabolites such as cholesterol and glucose in patients to test for cardiovascular disease and diabetes. However, a recent advance in metabolomics is the analysis of small molecules within a sample to find new markers for disease or metabolic patterns as indicators for drug toxicity. Techniques such as nuclear magnetic resonance spectroscopy, MS, and chromatographic analysis of cell extracts are used in metabolomic research (22,23). These techniques have been especially useful in generating lipid metabolome data (i.e., lipomics) to study the effects of dietary fats and lipid-lowering drugs on cardiac, plasma, adipose, and liver phospholipid metabolism (2227). Metabolic changes during tumor proliferation have also been studied using metabolomics. For example, the tumor metabolome is characterized by high glycolytic and glutaminolytic capacities and high phosphometabolite levels to allow tumor cells to proliferate over broad ranges in oxygen and glucose supply (28). It is anticipated that metabolics will provide insight into the metabolism of tumor cells that might be helpful in understanding and modifying tumor cell proliferation.

5 High-Throughput Screening, Cheminformatics, and Medical Chemistry

During the last decade, the pharmaceutical industry has sought to expand its collections of compounds for the purpose of high-throughput screening (HTS) against novel molecular targets (29). Many hit structures have been identified through HTS, and both the average potency and quality of these molecules continue to improve (30). Although it is possible that a sustainable chemical lead can be identified from HTS, it has been more commonly the case that “hits” emerging from HTS require substantial chemical optimization to provide molecules with the desired level of potency, selectivity, and suitable pharmacokinetic (PK) properties (31) to support a fully fledged drug discovery program. Furthermore, the data available from an HTS effort are still of limited utility from the point of view of generating structure-activity relationships (SARs) capable of directing medicinal chemistry efforts. Combinatorial chemistry in some of its earliest incarnations was seen as a means of rapidly synthesizing massive numbers of molecules for HTS. However, in recent years this has evolved into the synthesis of more focused, smaller arrays of molecules directed both at enhancement of the properties of early hits emerging from HTS and at optimization of lead molecules in the progression toward development of candidates (32,33). This change of emphasis has been enabled by significant developments in the areas of high-throughput purification and characterization.

In rising to the challenge of providing HTS data on collections of a million or more compounds, the scientist involved in HTS has sought increasing use of automation, as well as miniaturization, to reduce the demands on precious protein reagents and chemical supplies. Traditional radioligand-binding assays are giving way to more rapid and easily miniaturizable homogeneous fluorescence-based methods. The increased efficiency of ultra-HTS offers the potential to screen discrete collections of a million or more single compounds, at multiple concentrations, and thereby generate SAR information to “jump-start” a medicinal chemical lead optimization effort. Historically, medicinal chemical endeavors have involved the analysis of detailed biological data from hundreds or perhaps thousands of compounds. It is not surprising that the prospect of such an explosive growth of information from both screening- and program-directed combinatorial chemistry has driven the evolution of cheminformatics (34), in much the same way that genomic sequencing gave rise to the science of bioinformatics.

The successful medicinal chemical drug discovery effort of the future will rely on a hybrid approach of parallel and iterative (single-molecule) synthesis. As HTS collections are built up through parallel synthesis, lead structures will be amenable to highthroughput follow-up. Iterative analog preparation will, however, continue to play a key role. In particular, as a research program nears candidate selection, a greater level of iterative synthesis will likely become necessary to “fine-tune” the properties of the lead molecules. The lead optimization phase of the drug discovery process also relies heavily on SARs developed around absorption, distribution, metabolism, and excretion data, and physiochemical properties that improve the overall developability of the series (e.g., solubility, permeability, P450 activity, and human ether-a-go-go-related gene potassium channel activity). It is anticipated that the greater attention given to the evaluation of developability characteristics in candidate molecules will lead to reduced attrition, improving the likelihood that a compound will enter clinical development and be successful in getting to market in a time- and cost-effective manner.

6 Pharmacogenomics, Toxicogenomics, and Predictive Toxicology

Just as it is possible that a compound can affect gene expression in a positive manner, so too can it affect gene expression in a negative, toxic manner. In addition, a drug might not affect gene expression in a given subpopulation that is representative of the larger group. In an attempt to identify how genes are expressed following drug treatment or to determine toxicity issues associated with a compound, DNA arrays can be used for what are termed pharmacogenomic and toxicogenomic studies (35). Pharmacogenomics refers to the identification of genes that are involved in determining drug responsiveness and that may cause differential drug responses in different patients. These studies include the evaluation of allelic differences in gene expression, and the evaluation of genes responsible for drug resistance and sensitivity (35,36). Toxicogenomics refers to the characterization of potential genes involved in toxicity and the adverse effects of drugs (35,37). Toxicogenomics allows for a gene profile of candidate biomarkers of toxicity and carcinogenicity. Predictive toxicology is a term used for the application of toxicogenomics and the evaluation of compounds in silico against a panel of genes associated with toxicity (38). Models of SAR are used to predict potential toxic effects based on chemical structures and their properties (39). Predictive toxicology also takes into account computer-based predictions of adsorption, distribution, metabolism, and excretion and PK properties in addition to toxicology, all of which contribute to the lead optimization process.

7 Experimental Medicine

With the identification of many new targets for drug development, it is increasingly important to test rapidly and accurately the effects of new chemical entities in the early phase of development in relevant in vivo models, including humans. For example, drug candidates aimed at inflammation can be tested in a human blister model (40, 41). In this model, suction is applied to the forearm of healthy volunteers following pre exposure to ultraviolet-B light. The resulting blister is used to evaluate secreted inflammatory mediators and changes in gene expression within 48 h of insult. Cell counts, prostaglandin (PG)E2 (measured by enzyme-linked immunosorbent assay), PGD2 (measured by gas chromatography), and gene expression of cyclo-oxygenase-2, and PGE and PGD synthases (measured by real-time PCR) are assessed. This model allows the rapid analysis of effects of drugs on cellular infiltration, soluble mediator formation in blister fluid, and steady-state gene expression in cellular infiltrates. These markers (e.g., new transcription factors, cytokines, and other mediators) can be followed in both the inflammatory and resolution phases of human inflammation. In addition, coupled with DNA array technology, this model may be useful in defining new targets for the treatment of inflammation. Furthermore, toxicity and efficacy profiles of new and existing drugs can be studied. For example, a gene chip of the subset of human genes identified in blister fluid can be used to identify surrogate markers of toxicity and efficacy in modulating gene expression in drug evaluation.

Inflammation also plays an important role in the initiation and progression of atherosclerosis. To this end, identification of relevant signaling pathways that mediate plaque inflammation may provide therapeutic targets for the improvement of clinical outcomes in high-risk individuals. For example, inhibition of p38 mitogen-activated protein kinase (MAPK) attenuates inflammatory responses and matrix-degrading activities in human atherosclerotic plaque, suggesting a potential therapeutic strategy for the regression and stabilization of atherosclerosis. Signaling mechanisms involving p38 MAPK as well as other inflammatory responses can be characterized by TaqMan real-time RTPCR in human carotid atherosclerotic plaques and nonatherosclerotic vessels. In addition, the biological effects of the p38 MAPK pathway can be assessed in an ex vivo organ culture system (42). In these studies, a selective p38 MAPK inhibitor is added to the organ culture system and a variety of analyses are conducted. Current analyses of human plaque specimens include a panel of markers, such as interleukin (IL)-1β, IL-6, IL-8, monocyte chemoattractant protein-1, tumor necrosis factor-α, and matrix metalloproteinases. Other analyses include the evaluation of phosphorylated p38 MAPK, extracellular signal-regulated kinase 1/2, and c-Jun NH2-terminal kinase.

These and other in vivo and ex vivo human experimental platforms are of tremendous value and can be used to validate the efficacy and assess the toxicity of drugs early in development. Because these studies are conducted in the clinical setting, the field of experimental medicine offers great potential in identifying new therapeutic targets particularly relevant to human disease.

Pharmacogenomics, also referred to as pharmacogenetics, involves the study of variability in pharmacokinetics (absorption, distribution, metabolism, or elimination) or pharmacodynamics (relationship between drug concentrations and pharmacological effects or the time course of such effects) owing to hereditary factors in different populations. There is evidence that genotype may impact the incidence of adverse events for a given drug. The aim is to identify genetic polymorphic variants that represent risk factors for the development of a particular clinical condition, or that predict a given response to a specific therapeutic. More important, the rate-limiting step in revealing biologically relevant phenotype-genotype associations is the collection of human DNA samples from carefully phenotyped individuals (43). Two general approaches may be used to investigate genetic variation in drug handling or response: (1) the hypothesisdriven method is based on a priori hypotheses and involves selecting specific sections of DNA known to encode the drug target, drug-metabolizing enzymes, disease or genetic regions associated with mechanisms of action, or adverse effects; and (2) the genomewide scan investigates a large number of single nucleotide polymorphisms (SNPs) covering the entire genome with the aim of identifying a collection of SNPs that are associated with differential drug handling or response.

In addition to investigating variability in efficacy, pharmacogenetics allows detection of susceptibility to relatively uncommon but severe adverse events that otherwise would not be detected until large numbers of patients had been exposed to a given drug. For example, 4% of individuals with human immunodeficiency virus treated with the antiretroviral Abacavir develop a specific hypersensitivity reaction (44). Lai et al. (45) identified a 250,000-bp region of extended linkage disequilibrium that was associated with this hypersensitivity reaction. Several SNPs were predictive in that individuals with these SNPs (e.g., human leukocyte antigen B57) taking Abacavir had a 97% chance of experiencing the adverse event, although only 50% of the individuals who experienced the adverse event while taking Abacavir actually carried the variants (i.e., 97% specific, 50% sensitive). More important, these approaches may generate a large amount of data and statistical methods must adjust for multiple testing while attempting to tease out gene-gene and gene-environment interactions from gene-disease or gene-response to treatment associations.

8 Conclusion

The tremendous impact of unraveling the mysteries of the genome is currently being felt across all areas of drug discovery, and major challenges for the pharmaceutical industry are in the areas of drug target selection and validation. Figure 1 shows the progression of new molecular targets into novel drugs under this new paradigm for drug discovery. One can already anticipate the future availability of genetic structure and susceptibility to disease at the individual level. With such information available early in a research program, the drug discovery scientist is faced with the unprecedented opportunity to address the individual variability to drug therapy and safety prior to advancing a compound into clinical trials. The exponential growth of attractive novel molecular targets for potential drug therapy has heavily taxed the core disciplines of drug discovery, and automated methods of compound synthesis and biological evaluation will play an even more dominant role in the future of the pharmaceutical industry.

Fig. 1.
figure 1

Progression of molecular targets to novel therapeutics under a new paradigm for drug discovery.