1 Introduction

Drug discovery was once an empirical process when the effect of the medicine was purely based on phenotype readout, while the mode of action of drug molecules remained unknown. Later, reductionists began to research on the molecular mechanism of the drug-target interactions, believing that the drug is like a magic bullet towards the functioning targets [1]. This means a drug takes action on the disease by interacting with one specific therapeutic target. The idea of each drug being like a key (or ligand) matching each ‘lock’ (or protein) has guided the modern drug discovery practice for the last several decades. However, in the recent years, more and more evidence has shown that many drugs exert their activities by modulating multi-targets [24]. Besides, some drugs interact with anti-targets and induce strong side effects [5, 6]. Therefore, it is inappropriate to stick to the paradigm that drug interact with only one target. How to modulate a set of targets to achieve efficacy, while avoiding others to reduce the risk of side effects remains a central challenging task for pharmaceutical industry.

The Traditional Chinese Medicine (TCM), which has been widely used in China as well as other Asian countries for a long history, is considered to be the pioneer of the “Multi-component—multi-target” pharmacology [7, 8]. Thousands of years’ clinic practices in TCM have accumulated a considerable number of formulae that exhibit reliable in vivo efficacy and safety. Based on the methodology of holism, hundreds of different components in a TCM prescription can cure the diseases or relieve the patients by modulating a serial of potential therapeutic targets [9].

In recent years, great efforts have been made on modernization of TCM, most on identification of effective ingredients and ligands in TCM formulae and functioning targets [10, 11]. Several databases of TCM formulae, ingredients and compounds with chemical structures have been established such as Traditional Chinese Medicine Database (TCMD) [12]. However, the molecular mechanisms responsible for their therapeutic effectiveness are still unclear. On one hand, experimental validation of new drug-target interactions still remains very limiting and expensive, and very few new drugs and targets are identified as clinical applications every year [13, 14]. On the other hand, the complex composition and polypharmacology of TCM make it even harder to conduct a full set of experiments between compounds and targets and elucidate the multi-target mode of action from the holistic view on the biological network level.

On the contrary, in silico methods can predict a large number of new drug-target interactions, construct the drug-target networks, and explore the functional mechanism underlying the multi-component drug combinations at the molecular level. In the present stage, there have already been successfully applications in interpreting the action mechanism of TCM from the perspective of drug-target networks, although the quantity is limited. Compared with the huge amounts of TCM formulae and components, only a small portion of drug-target pairs have been validated by the laborious and costly biochemical experiments. This motivates the needs for constructing models that could predict genuine interacting pairs between ligands and targets, based on the existing small number of known ligand-target bindings.

In this article, we firstly investigate TCM databases for in silico methods that have been greatly expanded in size and data diversity in recent years. On that basis, different screening methods and strategies for identifying active ingredients and targets of TCM are outlined based on the amount of information available, both on sides of ligand bioactivity and the protein structures. Finally, successful applications in this area have been summarized and reviewed, including experimental and computational examples. Learning from the methods in modern western medicine (WM), different computing models and strategies can be used to confirm the effective components and related targets in TCM in order to build the ligand-target networks. One of the research directions of the modernization of TCM is to clarify the mode of action of TCM based on ligand-protein networks.

2 Databases for TCM

Data availability is the first consideration before any virtual screening or data mining task could be undertaken. The TCM databases can be classified in accordance to several categories, namely formulae, herbs, and compounds. The formula of TCM is a combination of herbs for treating a disease, while compounds are the bioactive molecules within herbs. In this section, we have summarized a list of databases for TCM herbs, formulations and compounds, as shown in Table 14.1.

Table 14.1 Basic information for main TCM databases

The elementary units of TCM databases are compounds, the bioactive components that exert efficacy through binding to therapeutic targets. Most of the compounds in TCM databases have two-dimensional structure, while some of them have three-dimensional structures deduced by force filed. In most TCM databases, the information of both herbs and compounds are collected while some even have formulae information as well.

The Traditional Chinese Medicine Database (TCMD) contains 23,033 chemical constituents and over 6,760 herbs that mainly come from Yan et al. [12]. The query keywords for the database include molecular formula, substructure, botanical identity, CAS number, pharmacological activity and traditional indications. Only a small proportion of herbs in TCMD have full coverage of compounds while most have partial coverage. Chinese Herb Constituents Database (CHCD) contains information on 8,264 compounds derived from 240 commonly used herbs with both botanical and Chinese pinyin names, the part of the herbs that contain the compounds, pharmacological and toxicological information, and other useful information [15]. Qiao et al. [16] have developed 3D structural database of biochemical components which covers 10,564 constituents from 2,073 herbs with 3D structures built and optimized using the MMFF94 force field [17]. This database uses MySQL as the data engine and contains detailed information such as basic molecular properties, optimized 3D structures, herb origin and clinical effects. The TCM Database@Taiwan was reported to be the world’s largest traditional Chinese medicine database. The web-based database contains more than 20,000 pure compounds isolated from 453 TCM herbs [18]. Both simple and advanced query methods are acceptable in terms of molecular properties, substructures, TCM ingredients and TCM classifications.

In addition to herbs and compounds, Traditional Chinese Medicine Information Database (TCM-ID) [19], TCM Drugs Information System [20], and Comprehensive Herbal Medicine Information System for Cancer (CHMIS-C) [21] also collect the information of TCM formulae. TCM-ID is developed by Zhejiang University together with National University of Singapore on all aspects of TCM herbs. TCM-ID currently takes in 1197 TCM formulae, 1,313 herbs and around 9,000 compounds. It covers ~4,000 disease conditions and more than half of the compounds have valid 3D structures. The data are collected from creditable TCM books as well as Journals and the records can be retried by different sets of query keywords. TCM Drugs Information System based on networks of five large databases has also been developed [20]. It includes information of 1,712 formulae, 2,738 herbs, 16,500 compounds, 868 dietotherapy prescriptions from the integration of Chinese herb database, Chinese patent medicine database, effective components database of Chinese herbs, Chinese medical dietotherapy prescription database, and Chinese medical recipe database. Herbal Medicine Information System for Cancer (CHMIS-C) integrates the information of 203 formulae that are commonly used to treat cancer clinically as well as 900 herbs and 8,500 compounds. The compounds in this database are linked to the entries in National Cancer Institute’s database and drugs approved by the U.S. Food and Drug Administration.

The China Natural Products Database (CNPD) [22], Marine Natural Products Database (MNPD) [23], and Bioactive Plant Compounds Database (BPCD) [15] only focus on the structures of the compounds in TCM and do not contain pertinent information on formulae and herbs. CNPD is built to meet the needs for drug discovery using natural products including TCM and collects the 2D and 3D structures of more than 45,055 compounds. MNPD has a collection of 8,078 compounds from 10,000 marine natural products, of which 3,200 have bioactivity data. BPCD contains information on 2,794 active compounds against 78 molecular targets, as well as the subunits of the target structures to which the compounds bind.

There are other databases from the internet focusing only on the clinical efficacy or side effects of formulae and herbs, without details of compounds. Acupuncture.com.au collects the TCM formulae according to their clinical action and efficacy. Both the English and Chinese names of TCM herbs are recorded to facilitate studies using both traditional and modern methods. The Dictionary of Chinese Herbs contains information on both clinical usage and side effects of the TCM herbs. It also includes the samples of TCM formulae for treating diseases such as cancer, dengue fever, diabetes, and hepatitis B. Besides, the compatibility of TCM herbs and certain drugs are listed to provide biochemical explanation for drug designers. The Plants for a Future database allows querying of herbs with special medicinal usage, and also lists the potential side effects, medical usage, and physical characteristics.

3 In Silico Methods for Ligand-Protein Interactions

The computational methods for drug discovery based on ligand-protein networks have been increasingly developed and applied in the area of TCM and other drugs in recent years [7, 8]. These methods mainly fall into the territories of ligand-based approach, target-based approach, and machine learning. Of course, these methods of predicting ligand-protein interactions are not isolated, and researchers often use them jointly to achieve better computational results, which can be easily shown in the following case studies.

3.1 Ligand-Based Approach

The ligand-based approach, also known as the chemical approach, is to reorganize pharmacological characteristics and protein associations, by means of ligand similarities rather than genomic space such as sequence, structural or pathway information. The basic assumption for ligand-based approach is that regardless that similar chemical structures may interact with proteins in different ways, similar ligands tend to bind to similar targets more than not [24]. The core of ligand-based approach is the calculation of chemical similarities, with the help of chemical descriptors. Before this approach can work, one need to answer how to describe molecular structures in a way that computers can recognize. Currently there are plenty of molecular descriptors to indicate the similarity of two different ligands.

3.1.1 Chemical Descriptors

In order to predict the ligand-target interactions, prior knowledge should be acquired in terms of the ligand information for the target [25]. By comparing the chemical structures of the new ligands against the know ligand set of a targeting protein, a threshold is usually set to decide whether the new ligand and the targeting protein can interact.

The most commonly used structure representation is the topological fingerprints that encode the sub-structural information [26]. In these fingerprints, the atom-centred feature pairs have also been proven to be very successful in many applications of virtual screening. The widely used samples of fingerprints are 2D Daylight [25] and Scitegic extended connectivity fingerprints [27] with atom types and the bond connectivity among them.

Although 2D fingerprints have been proven to be extremely robust and reliable in many chemoinformatics approaches, they seem hard to credit to be informative, therefore, consistent efforts have been made to develop more comprehensive three-dimensional fingerprints [28, 29]. 3D fingerprints encode the 3D geometry or scaffolds of molecular structures. One method to encode a compound is based on geometrical configuration of molecular structures. A common methodology in descriptors Flexible is the superposition of molecules onto one or multiple conformations of a reference bioactive ligand [3032].

There are other topological descriptors based on molecular features that have been developed to compare ligand profiles. The SHED (SHannon Entropy Descriptors) is derived from distributions of atom-centered pairs and calculates the variability in a feature-pair distribution [33]. Gregori-Puigjane then used the SHED descriptor to in silico profiling of 767 drugs against 684 related targets and revealed the promiscuity of the drugs targeting aminergic G protein-coupled receptors (GPCRs) [34]. On the other hand, RED (Renyi entropy descriptors) is another topological descriptor that measures the molecular features generalized Renyi Entropy. The scaffolds can also be used to predict the bioactivity of the compounds on target sets. In this particular research, 24,000 unique scaffolds were extracted from 458 target sets and the external test shows that to the high-priority virtual scaffolds have the predictive activities [35].

Also in the area of chemicogeneric-based predictive methods to screen ligand-target interactions Weil proposed a novel fingerprint encoding both ligand and target properties. The ligand properties are represented by common descriptors, while the cavity information of the target is incorporated by a fixed length bit string. This fingerprint shows preference to support vector machine (SVM) classifiers and the resulting precision is as high as 90 % in separating true and false pairs [36].

3.1.2 Similarity Coefficient

The most common way to compare molecular fingerprints for similarity analysis is by means of Tanimoto Coefficient (T C, also known as Jaccard index) [3739], which compares the number of bits shared between the two fingerprints to all possible matched bits between them,

$$T_{\text{C}} = N_{\text{AB}} /\left( {N_{\text{A}} + N_{\text{B}} {-}N_{\text{AB}} } \right)$$

where N A is number of features (ON bits) in compound A, N B is the number of features (ON bits) in compound B, and N AB is the number of features (ON bits) common to both A and B. If the Tanimoto Coefficient of two molecules is larger than 0.85, then they are considered to have a higher structurally similarity [40].

3.1.3 Ligand-Based Predictions

One advantage of the ligand-based similarity searching approach is that it does not need alignment between multiple molecules. The ligand-based approach describes a protein by the chemicgenomic space of its ligands. With the ligand-based descriptions of a protein, one can predict which targets are likely to be hit by a ligand, given its known structure.

In the area of ligand-based virtual screening, researchers have tried to evaluate whether novel ligand-target pairs could be identified, based on the chemical knowledge of ligands and ligand-target interactions. G protein-coupled receptors (GPCRs) are a family of effective drug targets with significant therapeutic value. Many researchers have built SVM models as well as substructural analysis to describe GPCRs from the perspective of ligand chemicogenerics [41]. Especially, the de-orphanization of receptors without known ligands was employed using the ligands of the related receptors. For 93 % of the orphan receptors, the prediction results are better than random, while for 35 % the performance was good.

A powerful ligand-based prediction method based on features of protein ligands is the Similarity Ensemble Approach (SEA), which was originally used to investigate protein similarity based on chemical similarity between their ligand sets with the main idea that similar ligands might tend to share same targets [3]. SEA calculates Z-score and E-value by summing up the T C over a threshold between two ligand sets as indicators to evaluate the possible interaction between two ligand sets in a way similar to BLAST. The similarity threshold for T C is chosen in a way that the Z-score best observes the extreme value distribution (EVD). This method was then applied to predict new molecular targets for known drugs [42]. The Author investigated 3000 FDA-approved drugs against hundreds of targets and found 23 new cases of drug-target interactions. By in vitro experiments, five of them were validated to be positive with affinities less than 100 nM. Besides Keiser’s research, SEA was also used to investigate the off-target effect of the some commercial available drugs against the target protein farnesyltransferase (PFTase) [43] and two drug loratadine and miconazole were found to be able to bind to PFTase.

The pharmacophore model is perhaps the most widely used methods that make use of the 3D structure representations of molecules [44]. A pharmacophore is defined to be the molecular features pertinent to bioactivity aligned in three dimensional spaces, including hydrogen bonding, charge transfer, electrostatic and hydrophobic interactions [45]. The underlying methodology of pharmacophore model was defined by different researchers [46]. Recently, this model was successfully applied in mesangial cell proliferation inhibitor discovery and virtual screening of potential ligands for many targets such as HIV integrase and CCR5 antagonist [4750]. In 3D pharmacophore model, the molecular spatial features and volume constraints represent the intrinsic interactions of small bioactive ligands with protein receptors. Wolber tried to extract ligand pharmacophores from protein cavities based on a define set of six types of chemical structures [51], and develop the algorithms for ligand extraction and interpretation as well as pharmacophore creation for multiple targets.

Pharmacophore screening only considers those compounds who are direct mimics of the ligand from which the pharmacophore has been generated and may neglect the other positive binding modes as well. In fact, the pharmacophore model limits to only one mode of action for small molecules [52]. However, this limitation can be conquered by combining multiple pharmacophore models with different modes of action. This method is called Virtual Parallel Screening and has been successfully applied to the identification of Natural Products’ activity [52, 53]. In such work, The PDB-based pharmacophores was firstly used for target fishing for TCM constituents. Results shown 16 constituents of Ruta graveolens were screened against a database of pharmacophores and good congruity was found between the potential predictions and their corresponding IC50 values.

Quantitative structure-activity relationships (QSAR) was first established in early 1960s when computational means were used to quantitatively describe pharmacodynamics and pharmacokinetic effects in biology systems and the chemical structures of compounds [54]. Generally speaking, any mathematical model or statistical method that builds relationship between molecular structures and biological properties may be considered as QSAR. The idea of QSAR is easy while training and application of QSAR is much difficult since similar structures may interact with totally different targets due to the diversity and complexity of biology [55]. Furthermore, the intrinsic noise in data to describe both the chemical space and biological effects brings much trouble in accurate modeling [56]. Despite these difficulties, in case robust biological data is available and few outliers coexist, thousands of QSAR models have been generated and stored in related database in the past 40 years [57, 58].

3.2 Target-Based Approach

The target-based approach, predicts ligand-target interactions by the structural information of protein targets as well as ligands. The target-based approach depends highly on the availability of the structural information of targets, either from wet experiments or numerical simulations [59, 60]. On one hand, these methods aim to predict the conformation and orientation of the ligand within the protein cavity. On the other hand, the binding affinity of the ligand and protein is simulated with scoring functions. The main target-based approach is docking, which predicts the preferred orientation of one molecule to another when they bound to each other to form a stable complex [61]. Usually, docking is implemented to search appropriate ligands for known targets with the lowest fitting energy. On the contrary, inverse docking seeks to fish targets from known ligands ‘from scratch’ and also plays an important role in virtual screening.

Despite more than 20 years’ research, docking and scoring ligands with proteins are still challenging processes and the performance is highly dependent on targets [6264]. Docking cannot be applied to proteins whose 3D structures are not identified [65]. The high-resolution structure of the protein target is preferably obtained from X-ray crystallography and NMR spectroscopy. However, approximately half of the currently approved drugs bind to the membrane proteins, whose structures are extremely difficult to be acquired experimentally. Alternatively, homology modeling is usually adopted to build a putative geometry and docking cavity [66]. Besides, threading and ab initio structure prediction together with molecular dynamics (MD) and Monte Carlo simulations are utilized to predict the target structures. However, the fidelity of homology modeling, threading and ab initio structures is still questioned by many researchers. Other important challenges of docking are the dynamic behavior, the large number of degrees of freedom and the complexity of the potential energy surface. This confines docking to be a low throughput method on a very small scale, which fails to predict interactions on the level of millions of ligands and targets.

To alleviate the situation that docking depends on the nature of targets, multiple active site has been used to compensate the ligand-dependent biases and the Consensus scoring has been also suggested to reduce the false positives in virtual screening [67]. The accuracy of scoring functions still remain the main weakness of docking approach [68]. Also, docking is starting to adopt the conformation information derived from protein-bound ligands as a strategy to overcome the limitations of current scoring functions and can predict the orientation of the ligands into the protein cavity [69]. Besides, molecular dynamics-assisted docking method has been applied in virtual screening against the individual targets in HIV to search for multi-target drug-like agents and KNI-765 was identified to be potential inhibitors [70].

Regardless of the all the limitations, virtual screening based on docking and inverse docking has been successfully utilized to identify and predict novel bioactive compounds in the past 10 years. Using the combinatorial small molecule growth algorithm, Grzybowski applied the docking to the design of picomolar ligands for the human carbonic anhydrase II [71, 72]. Inverse docking was firstly developed to identify multiple proteins to which a small molecule can bind or weakly bind. In some cases, the bioactivity of the TCM compounds is well recognized, while the underlying mode of action is not very clear. In 2001, INVDOCK [73] has been developed to search for the targets for TCM constitutes, and employed a database of protein cavities derived from PDB entries. The results of inverse docking involving multiple-conformer shape-matching alignment showed that 50 % of the computer-predicted potential protein targets were implicated or experimentally validated. The same approach was used to determine potential drug toxicity and side effects in early stages of drug development and results showed that 83 % of the experimentally known toxicity and side effects were predicted [74]. Zahler tried the inverse docking method to find potential kinase targets for three indirubin derivatives and examined 84 unique protein kinases in total [75]. Recently, one indirubin compound was found to possess therapeutic effects against myelogenous leukemia [76].

Docking is usually used as the second step to further validate the ligand-target binding features after the first round of virtual screening by other ligand-based approaches [7780]. Wei applied the docking together with similarity search and molecular simulation to search for Anti-SAS drugs [81], find the binding mechanism of H5N1 Influenza Virus with ligands [82], detect possible drug leads for Alzheimer’s Disease [83, 84] and identify the binding sites for several novel amide derivatives in the nicotinic acetylcholine receptors (AChRs) [85].

3.3 Machine Learning

The ligand-based approach and target-based approach predict potential ligand-target bindings by means of chemical similarity and structural information. Machine learning is a high-throughput method of artificial intelligence that enables computers to learn from data of knowns, including ligand chemistry, structural information and ligand-protein networks and to predict unknowns, such as new drugs, targets and drug-target pairs. This method gains stability and credibility, and has strong ability for classifications among large numbers of ligand-protein pairs that otherwise would be impossible to be connected based on chemical similarity alone.

Machine learning is to exact features from data automatically by computers [86]. Basically, machine learning can be categorized into unsupervised learning and supervised learning. In unsupervised learning, the objective is to extract and conjecture patterns and interactions among a series of input variables and there is no outcome to train the input variables. The common approaches in unsupervised learning are clustering, data compression and outlier detection, such as principal component based methods [87]. In supervised learning, the objective is to predict the value of an outcome variable based on the input variables [88]. The data is commonly divided into training and validation datasets, which are used in turn to finalize a robust model. The variable the supervised model predicts is typically the binding probability of ligands and targets.

Nidhi trained a multiple-category Laplacian-modified naive Bayesian model from 964 target classes in WOMBAT and predict the top three potential targets for compounds in MDDR with or without known targets information [89]. On average, the prediction accuracy with compounds with known targets is 77 %. Bayesian classifier was usually used in early prediction, while the Winnow algorithm was reported more recently [90]. With the same training datasets, the prediction result is slightly different with the Multiple-category laplacian. This indicates that it is necessary to apply different prediction methods and make comparisons even on the same training dataset.

The Gaussian interaction profile kernels, which represented the drug-target interactions, were used in Regularized Least Squares in combined with chemical and genomic space to achieve the prediction with precision-recall curve (AUPR) up to 92.7 [91]. Based on simple physicochemical properties extracted from protein sequences, the potential drug targets were related to the existing ones by several models [92]. The supervised bipartite graph inference is used to represent the drug interaction networks and can be solely be able to predict new interactions, or together with chemical and genomic space [93, 94]. Besides, semi-supervised learning method (Laplacian regularized least square FLapRLS) was also explored to effectively predict the results by integration of genomic and chemical space [95].

The Support Vector Machine (SVM) is a powerful classification tool in which appropriate kernel functions are selected to map the data space into higher dimensional space without increasing the computational difficulties. The performance of SVM is usually stronger than other probability based models. Wale and Karypis [96] made comparisons between a Bayes Classifier together with binary SVM, cascaded SVM, a ranking-based SVM, Ranking Perception and the combination of SVM and Ranking Perception in terms of the ability to predict the targets for small compound, and found that the cascaded SVM has better performance than the Bayes models and the combination of SVM and Ranking Perceptron has the best performance of all. Zhao et al. developed a SVM model based on the chemical-protein interactions from STITCH [97] using new features from ligand chemical space and interaction networks. Four new d-amino acid oxidase inhibitors were successfully predicted by this model and validate by wet experiments, and one may have a new application in therapy of psychiatric disorders other than being an antineoplastic agent [98].

Random forest, a form of multiple decision trees, recently has been applied to screen TCM database for potential inhibitors against several therapeutically important targets [99]. With the use of binding information from another database, random forest was performed to find multiple hits out of 8,264 compounds in 240 Chinese Herbs on an unbalanced dataset. Among all the predictions, 83 herb-target predictions were proved by literature search. Three Potential inhibitors of the human, aromatase enzyme (CYP19) myricetin, liquiritigenin and gossypetin, were screened by Random Forest as well as molecular docking studies. The virtual screening results were subsequently confirmed experimentally by in vitro assay [100].

Linear regression models have also been applied to predict ligand-target pairs. Zhao developed a computational framework, drugCIPHER to infer drug-target interactions based on pharmacology and genomic space [101]. In this framework, three linear regress models were created to relate drug therapeutic similarity, chemical similarity and target similarity on the basis of a protein-protein interaction network. The drugCIPHER achieved the performance with AUC of 0.988 in the training set and 0.935 in the test set and 501 new drug-target interactions were found, implying potential novel applications or side effects.

Although machine learning has strong performance in classification of protein-ligand interactions, its shortcoming is obvious. The process of some machine learning methods is implicit, like a black box, from which we cannot have an intuitive biological or physical relevance between proteins and ligands. SVM maps the classification problem into higher space, and acquires excellent performance with high computational efficiency. The tradeoff is that it can hardly explicitly create relationship between a protein and a ligand. Therefore, even with a very strong prediction tool, we can hardly move forward with innovations in theory of protein-ligand interactions.

3.4 Case Studies

3.4.1 Inhibiting Biological Transmethylation Reaction

Wei et al. focused on the discovery of potential inhibitors against S-adenosylhomocysteine hydrolase (SAH), a key reactant in duplication of virus life cycle. A similarity search in Traditional Chinese Medicine Database was performed and 17 hits with high similarity were retrieved. Followed by docking, they proposed the potential inhibitors by comparing best docked solutions and possible modification for the best inhibitors [79].

3.4.2 New d-Amino Acid Oxidase Inhibitor Discovery

Zhao et al. have developed a support vector machine (SVM) model based on the chemical-protein interactions from STITCH using new features from ligand chemical space and interaction networks. The model is used to search for the potential d-amino acid oxidase inhibitors from STITCH database and the predicted results are finally validated by wet experiments. Out of the ten candidates obtained, seven d-amino acid oxidase inhibitors have been verified, in which four are newly found, and one may have a new application in therapy of psychiatric disorders other than being an antineoplastic agent [98].

3.4.3 Drug Discovery for AIDs

From docking experiments for more than 9,000 compounds extracted from various Chinese medicines, Gao found that the compound agaritine distinguished itself from all the others in binding to the HIV protease with the most favorable free energy. It has been observed thru an extensive docking study that some of agaritine derivatives had markedly stronger binding interaction with the HIV protease than agaritine, suggesting that these derivatives might be good candidates for developing drugs for AIDS therapy [77].

3.4.4 Treating Alzheimer’s Disease

To find new drug candidates for treating Alzheimer’s disease, Zheng used the similarity search technique and GTS-21 as a template to search the Traditional Chinese Medicines Database. Then those molecules with higher score were selected for docking studies against the alpha7 nicotinic acetylcholine receptor. Though an in-depth structural analysis, it was found Mol 7,235 might be a promising candidate and need further experimental validation before it becomes an effective drug for treating Alzheimer’s disease [78].

4 Applications of Ligand-Protein Networks in TCM Pharmacology

Network-based pharmacology explores the possibility to develop a systematic and holistic understanding of the mode of actions of multi-drugs by considering their multi-targets in the context of molecular networks. It has also been suggested that relatively weak patterns of inhibition of many targets may prove more satisfactory than the highly potent single target inhibitors routinely developed in the course of a drug discovery program [102]. In drug discovery, the use of networks incorporating multiple components and the corresponding multiple targets, is one of the driving force to propel the current development in TCM pharmacology. Several successful examples have been accumulated both in experiments and in silico analysis, as shown in Table 14.2.

Table 14.2 Summary of multi-target drugs/preparations with TCM pharmacology based on ligand-protein networks

4.1 Experimental Study

Many bioactive compounds in TCM herbs may have synergetic effort with many non-TCM drugs in markets. Tannin, a component derived from a TCM, can be combined with HIV triple cocktail therapy to yield everlasting efforts in preventing HIV virus propagation. The underlying mechanism is that Tannin suppresses the activity of HIV-1 reverse transcriptase, protease and intergrase and cut off virus fusion and virus entry into the host cells [103]. Recently, Li proposed a new idea to induce immunetolerance in T cells by using matrine, a chemical derived from the root of Sophora flavescens AIT, targeting both the PKCy pathway and the NFAT pathway in cocktail preparations for treating AIDS [104].

Lam et al. recently showed in murine colon 38 allograft model that a formula containing 4 herbs (PHY906) has synergetic effect on reducing side effects and enhancing efficacy induced by CPT-11, a power anticancer agent with strong toxicity. The reason is that PHY906 can repair the intestinal epithelium by facilitating the intestinal progenitor or stem cells and several Wnt signaling components and suppress a batch of inflammatory responses like factor kB, cyclooxygenase-2, and inducible nitric oxide synthase [105].

Multi-component and multi-target interactions are the main mode of action for TCM formula, which exerts synergetic effects as a whole preparation rather than the primary active compound in TCM alone. Xie et al. demonstrated that other components in “Qingfu Guanjieshu” (QFGJS) could effectively influence the pharmacokinetic behavior and metabolic profile of paeonol in rats, indicating the synergy of herbal components. This synergy may be the result of enhanced adsorption of paeonol in the gastrointestinal tract induced by P-glycoprotein-mediate efflux change [106]. Another similar study, showed that paeoniflorin from the root of Paeonia lactiflora were markedly enhanced when co-administrated with sinomenine, the stem of Sinomenium acutum. Sinomenine promotes intestinal transportation via inhibition of P-glycoprotei, and affect the hydrolysis of paeoniflorin via interaction with b-glycosidase [107].

Huang-Lian-Jie-Du-Tang (HLJDT) is a TCM formula with anti-inflammatory efficacy, but the action mechanism is still not very clear. Zeng et al. investigated the effects of its component herbs and pure components on eicosanoid generation and found out the active components and their precise targets on arachidonic acid (AA) cascade. Results showed that Rhizoma coptidis and Radix scutellariae were the key herbs responsible for the suppressive effect of HLJDT on eicosanoid generation. Further experiments on the pure components of HLJDT revealed that baicalein derived from Radix scutellariae has significant inhibitory effect on 5-LO and 15-LO while coptisine from Rhizoma coptidis show medium inhibitory effects on LTA(4)H. Besides, baicalein and coptisine were proved to have synergetic inhibition on LTB(4) by the rat peritoneal macrophages [108].

A TCM formula, Realgar-Indigo naturalis formula (RIF), was applied to treat Acute promyelocytic leukemia (APL) and showed a high complete remission (CR rate) [109]. In RIF, multiple agents within one formula were found to work synergistically. A small-scale combinational study using Chou and Talalay combination index method was performed and three main active components of RIF and six core proteins they targets in mediating the auti-tumor effect were identified. The main active ingredients of RIF are tetraarsenic tetrasulfide (A), indirubin (I), and tanshinone IIA (T), from Realgar, Indigo naturalis, and Salvia miltiorrhiza, respectively. A acts as the principal component of the formula, whereas T and I serve as adjuvant ingredients. ATI leads to ubiquitination/degradation of promyelocytic leukemia (PML)-retinoic acid receptor oncoprotein, reprogramming of myeloid differentiation regulators, and G1/G0 arrest in APL cells by mediating multiple targets. Using multi-omics technologies, Zhang later proved that combined use of imatinib and arsenic sulfide from toxic herbal remedy exerted better therapeutic effects in a BCR/ABL-positive mouse model of chronic myeloid leukemia (CML) than either drug as a single agent. AS targets BCR/ABL through the ubiquitination of key lysine residues, leading to its proteasomal degradation, whereas IM inhibits the PI3 K/AKT/mTOR pathway [110].

4.2 Computational Framework

To target the complex, multi-factorial diseases more effectively, the network biology incorporating ligand-protein networks has been applied in multi-target drug development as well as modernization of traditional Chinese medicine in the systematic and holistic way. Zhao reviewed the available disease-associated networks, drug-associated networks that can be used to assist the drug discovery and elaborate the network-based TCM pharmacology [119]. Klipp discussed the possibility to use networks to aid the drug discovery process and focused on networks and pathways in which the components are related by physical interactions or biochemical process [121]. Leung investigated the possibility of network-based intervention for curing system diseases by means of network-based computational models and using medicinal herbs to develop into new wave of network-based multi-target drugs. It was concluded that further integration across various ‘omics’ platform and computational tools would accelerate the drug discovery based on network [122].

Barlow et al. screened among Chinese herbs for compounds that may be active against 4 targets in inflammation, by means of pharmacophore-assisted docking. The results showed that the twelve examples of compounds from CHCD inhibit multiple targets including cyclo-oxygenases 1 & 2 (COX), p38 MAP kinase (p38), c-Jun terminal-NH(2) kinase (JNK) and type 4 cAMP-specific phosphodiesterase (PDE4).The distribution of herbs containing the predicted active inhibitors was studied in regards to 192 Chinese Formulae and it was found that these herbs were in the formulae that were traditionally used to treat fever, headache and so on [111].

Many Traditional Chinese Medicines (TCMs) are effective to relieve complicated diseases such as type II diabetes mellitus (T2DM). Gu et al. employed the molecular docking and network analysis to elucidate the action mechanism of a medical composition-Tangminling Pills which had clinical efficacy for T2DM. It was found that multiple active components in Tangminling Pills interact with multiple targets in the biological network of T2DM. The 37 targets were classified into 3 clusters, and proteins in each cluster were highly relevant to each other. 10 known compounds were selected according to their network attribute ranking in drug-target and drug-drug network [112].

XFZYD, a recipe derives from Wang Q. R. in Qing dynasty, was widely used in cardiac system disease. From similarity search and alignment, the chemical space of compounds in XFZYD was found to share a lot of similarities with that of drug/drug-like ligands set collected from cardiovascular pharmacology while the chemical pattern in XFZYD are more diverse than drug/drug-like ligands for cardiovascular pharmacology. Docking protocol between compounds in XFZYD and targets related to cardiac system disease using LigandFit show that many molecules have good binding affinity with the targeting enzymes and most have interactions with more than one single target. The active components in XFZYD mainly target rennin, ACE and ACE2 in Renin-Angiotensin System (RAS), which modulates the cardiovascular physiological function. It was proved that promiscuous drugs in TCM might be more effective for treating cardio system diseases, which tends to result from multi-target abnormalities, but not from a single defect [113].

A lot of integrative computational tools and models have been developed and widely used to optimize the combination regimen of multi-components drugs and elucidating the interactive mechanism among ligand-target networks.

Li et al. built a method called Distance-based Mutual Information Model (DMIM) to identify useful relationships among herbs in numerous herbal formulae. DMIM combines mutual information entropy and distance between herbs to score herb interactions and construct herb network. Novel anti-angiogenic herbs, Vitexicarpin and Timosaponin A-III were discovered to have synergistic effects. Based on herb network constructed by DMIM from 3,865 collateral-related herbs, the interactions between TCM drugs and disease genes in cancer pathways and neuro-endocrine-immune pathways were inferred to contribute to the action of Liu-wei-di-huang formula, one of the most well-known TCM formula as potential treatment for a variety of diseases including cancer, dysfunction of the neuro-endocrine-immune-metabolism system and cardiovascular [114].

Wang et al. adopted a new method based upon lattice experimental design and multivariate regression to model the quantitative composition-activity relationship (QCAR) of Shenmai, a Chinese medicinal formula. This new strategy for multi-component drug design was then successfully applied in searching optimal combination of three key components (PD, PT and OP) of Shenmai. Experimental outcome of infarct rate of heart in mice with different dosage combination of the three components were finally measured and the fitted relationship equation showed that the optimal values of PD, PT and OP were 21.6, 39.2 and 39.2 %, respectively [115]. The proportion of two active components of Qi-Xue-Bing-Zhi-Fang, PF and FP, was also optimized in similar way using several fitting technique like linear regression, artificial neural network and support vector regression [116]. Although the underlying mechanism of drug synergy for the two formulae was still not very clear, the interactions of multiple weak bindings among different compounds and targets might be the contributory factors.

A network-based multi-target computational scheme for the whole efficacy of a compound in a complex disease was develop for screening the anticoagulant activities of a serial of argatroban intermediates and eight natural products respectively. Aimed at the phenotypic data of drugs, this scheme predicted bioactive compounds by integrating biological network efficiency analysis with multi-target docking score, which evolves from the traditional virtual screening method that usually predicted binding affinity between single drug molecule and target. A ligand can have impact on multiple targets based on the docking scores, and those with highest target network efficiency are regarded as potential anticoagulant agents. Factor Xa and thrombin are two critical targets for anticoagulant compounds and the catalytic reactions they mediate were recognized as the most fragile biological matters in the human clotting cascade system [117].

Sun et al. [118] presented a systematic target network analysis framework to explore the mode of action of anti-Alzheimer’s disease (AD) herb ingredients based on applicable bioinformatics resources and methodologies on clinical anti-AD herbs and their corresponding target proteins. The results showed that just as many FDA-approved anti-AD drugs do the compounds of these herbs binds to targets in AD symptoms-associated pathway. Besides, they also interact closely with many successful therapeutic targets related to diseases such as inflammation, cancer and diameters. This suggests that the possible cross-talks between these complicated diseases are prevalent in TCM anti-AD herbs [123]. Moreover, pathways of Ca(2+) equilibrium maintaining, upstream of cell proliferation and inflammation were found to be were intensively hit by the anti-AD herbal ingredients.

Based on the available experimental results, Zhao analyzed the molecular mechanism with the aid of pathways and networks and theoretically proved the multi-target effect of St. John’s Wort [119]. A comprehensive literature search was conducted and the neurotransmitter receptors, transporter proteins, and ion channels on which the SJW active compounds show effects were collected. Three main pathways that SJW intervenes were found by mapping these proteins onto KEGG pathways. Active components in SJW mainly intervene with neuroactive ligand-receptor interaction, the calcium signaling pathway, and the gap junction related pathway, pertinent to targets including NMDA-receptor, CRF1 receptor, 5-hydroxytryptamine receptor 1D, dopamine receptor D1. The networks show that the effect of SJW is similar to that of combinations of different types of antidepressants. However, the inhibitory effects of the SJW on each of the pathway are lower than other individual agents. Accordingly, the significant antidepressant efficacy and lower side effects are due to the synergetic effect of low-dose multi-target actions.

Zhang et al. established an integrative platform of TCM network pharmacology to discover herbal formulae on basis of systematic network. This platform incorporates a set of state-of-the-art network-based methods to explore the action mechanism, identify activate ingredients, and create new synergetic combinations of components. The Qing-Luo-Yin (QLY), an antirheumatoid arthritis (RA) formula was studied comprehensively using the new platform. It is found the target network of QLY is involved on RA-related key processes including angiogenesis, inflammatory response, and immune response. The four herbs in QLY work in concert to promote efficiency and reduce toxicity, as the jun, chen, zuo, shi in Chinese, respectively. Specifically, the synergetic effect of Ku-Shen (jun herb) and Qing-Feng-Teng (chen herb) may come from the feedback loop and compensatory mechanisms [120].

5 Discussion and Conclusion

In recent years, the bottleneck in western medicine has brought unprecedented opportunities in TCM research and development. For decades, the fundamental research has achieved great success, and laid the foundation of modern western medicine and the philosophical idea of “reductionism” was considered to own the credit.

The counterparty of “reductionism” in Chinese medicine is the philosophical idea of holism, which has thousands years’ history of practice in China as well as other Asian countries. Using this methodology, the effectiveness of TCM can only be verified from a large number of clinical trials given the unclear composition and unknown relationship among various components. This implicit effect without clear clarification at the molecular level has been hindering the modernization of TCM. How to learn from the accumulative knowledge of western medicine, in order to identify the effective compositions and explore the molecular mechanism of the efficacy is an urgent problem that needs to be solved in TCM.

The hypothesis of “multi-drug, multi-target and multi-gene” in fact bridges the gap between TCM and western medicine and is also a manifestation of unity of opposites on “reductionism” and “holism”. TCM uses the holistic method to investigate the effects of multi-component formula across the whole organism, such as the use of a variety of “ZHENG” in TCM theory [124]. However, the only option we have to uncover the underlying mechanism of TCM at the molecular level is to make use of the theory of reductionism. Of course, for complex systems, the reduction method can only reach to a certain depth since it becomes more troublesome as we get deeper. Therefore, some researchers tend to reduce the mechanism of TCM to the level of “multi-drug, multi-target, multi-gene” at present, and for further reduction to the level of “single-drug, single-target, single gene”, the problem of emergentism [125] in philosophy needs to be addressed properly. The theory of emergentism believes that some unique features or “ultimate features” of a system can never be reduced to properties at lower levels, nor the former can be predicted or explained by the latter, as shown in Fig. 14.1.

Fig. 14.1
figure 1

Unity of opposites on holism in traditional Chinese medicine and reductionism in Western medicine. Emergentism constructs the framework of the understanding of holism in TCM via accumulative practice of reductionism in WM

So far, ligand-protein network or “multi-drug, multi-target, multi-gene” is one of the few basic modules that can clearly reveal the pharmacology of TCM and is expected to be the future direction of the modernization of TCM. But just relying on experimental scientists to build ligand-protein interactions non-exhaustively will slow down both the modernization of TCM and the development of its industry. Therefore, the use of cross-platform database (TCM compounds and recipe database, see Sect. 14.2 in this paper) and the improvement on modeling technique (computational method of ligand-protein interactions, see Sect. 14.3 in this paper), will afford the basis of in silico research for future modernization and development of TCM. It can be foreseen that, one future direction is to use these TCM databases and predictive models to reveal the pharmacological effect of TCM, through the establishment of ligand-protein networks or, “multi-drug, multi-target, multi-gene” relationships. Nevertheless, the pharmacological mechanism of TCM can be very complex and may not be well explained only with the known ligand-protein network. After all, this is a process of reeling silk from cocoons and also one of the best choices we have right now.

The increasing availability of ligand-protein networks is a unique chance to boost success in the modernization of TCM based on the accumulative knowledge of TCM formulae and practices based on the assumption that TCM exerts the pharmacological efficacy in multi-drug-multi-target way. Although preliminary research has been initiated in this area, there is still a long way to go to further leverage these networks and modeling techniques. Virtual screening and informatics in the drug discovery area have already been proven to be quite useful either to predict potential new drug and target candidates for experimentalists or explore the functional mechanism at the molecular level. A large number of drug-target interactions have thus been gained and the resulted drug-target networks will also be quite beneficial to investigate the underlying mechanism of multi-component drugs, such as the TCM. With further applications of these methods in TCM area, we are expecting to reveal the mode of action underlying polypharmacology of TCM. This grants us the possibility to discover novel effective drug leads, understand the synergistic mechanism of drug combinations, and more importantly, develop drug portfolios against epidemic, chronic disease, cancer and other complex disease that are almost incurable by western medicine.