Keywords

1 Introduction

Prediction of drug–target interactions is one of the most prominent machine learning applications in the pharmaceutical industry, the importance of which is underlined by the fact that both time and expenditure related to drug development are enormous: on average, it costs \(\approx \)$1.8 billion and takes more than 10 years to bring a new drug to the market [17]. Drug–target interaction prediction (DTI) techniques promise to reduce the aforementioned cost and time, and to support drug repositioning [40], i.e., the use of an existing medicine to treat a disease that has not been treated with that drug yet.

Computational methods for DTI include approaches based on molecular docking simulations [9, 15] and ligand chemistry [21, 25]. Furthermore, text mining techniques have been proposed to identify biomedical entities and relations between them [7, 13, 28, 42]. However, a serious limitation of docking-based approaches is that they require information about the three-dimensional structure of candidate drugs and targets which is often not available, especially for G-protein coupled receptors (GPCRs) and ion channels. Additionally, the performance of ligand-based approaches is known to decrease if only few ligands are known. Therefore, machine learning techniques have been proposed for DTI [11, 19, 39]. Recent approaches are based on matrix factorization [5, 14, 41], support vector regression [34, 35], restricted Boltzmann machines [37], network-based inference [8, 10], decision lists [30] and bipartite local models (BLM) [4] with semi-supervised prediction [38], improved kernels [22] and the incorporation of neighbor-based interaction-profile inferring [23].

Real-world datasets in biology, chemistry and medicine [1], including drug–target interaction networks, have been shown to contain hubs, i.e., vertices that are connected to surprisingly many other vertices. For example, in the Enzyme dataset (described in Sect. 5.1), the vast majority of targets have less then 5 interactions, while some of the targets are very popular: each of 30 most popular targets interacts with 20 drugs at least. Despite such observations, none of the aforementioned variants of BLM took the presence of hubs into account. Furthermore, the presence of hubs has been observed in nearest neighbor graphs [29], which lead to the development of hubness-aware classifiers [33] and regression techniques [6]. Although hubness-aware techniques are among the most promising recent machine learning approaches, their potential to enhance drug–target interaction prediction methods has not been exploited yet.

In this paper, we extend BLM by the incorporation of a hubness-aware regression approach. Additionally, we propose an enhanced representation of drugs and targets in a multi-modal similarity space and build a projection-based ensemble. We call the resulting approach A dvanced L oc a l D rug-Target In teraction Prediction, or ALADIN for short. In order to assist reproducibility of our work, we perform experiments on publicly available real-world drug–target interaction datasets. The results show that our approach outperforms BLM-NII [23], a recent version of BLM, and two other drug–target prediction techniques.

The rest of this paper is organized as follows: in Sect. 2, we define the drug–target interaction prediction problem, this is followed by the review of BLM and hubness-aware regression in Sect. 3. We describe our approach, ALADIN, in Sect. 4 and present the results of experimental evaluation in Sect. 5. Finally, we conclude in Sect. 6.

2 Basic Notation and Problem Formulation

First, we define the Drug–Target Interaction Prediction problem. We are given a set \(\mathcal {D} = \{d_1, \ldots , d_n\}\) of n drugs, a set \(\mathcal {T}=\{t_1, \ldots , t_m\}\) of m pharmaceutical targets, an \(n \times n\) drug similarity matrix \(\mathcal {S}^D\), an \(m \times m\) target similarity matrix \(\mathcal {S}^T\) and an \(n \times m\) interaction matrix \(\mathcal {M}\). Each entry \(s^{D}_{i,j}\) of \(\mathcal {S}^D\) (and \(s^{T}_{i,j}\) of \(\mathcal {S}^T\), resp.) describe the similarity between drugs \(d_i\) and \(d_j\) (targets \(t_i\) and \(t_j\)). Each entry \(m_{i,j}\) of \(\mathcal {M}\) denotes if drug \(d_i\) and target \(t_j\) are known to interact:

$$ m_{i,j} = {\left\{ \begin{array}{ll} 1 &{}\text {if there is a known interaction between } d_i \text { and } t_j \\ 0 &{} \text {otherwise.} \end{array}\right. } $$

This formulation is in accordance with the usual setting in which only positive information is available: in case if \(m_{i,j}=0\), the corresponding drug \(d_i\) and target \(t_j\) may or may not interact, therefore, we call \(u_{i,j}=(d_i, t_j)\) an unknown pair. The task is to predict the likelihood of interaction for each unknown pair.

At the first glance, the above DTI problem seem to be similar to the problems considered in the recommender systems community. Note, however, that most recommender techniques consider only the interactions (“ratings”) because even a few ratings are thought to be more informative than metadata, such as users’ similarity based on their demographic information [27]. In contrast, drug–drug and target–target similarities play an essential role in DTI.

3 Background

In this section, we review the BLM approach and hubness-aware error correction for nearest neighbor regression.

3.1 Bipartite Local Model

BLM considers DTI as a link prediction problem in bipartite graphs [4]. The vertices in one of the vertex classes correspond to drugs, whereas the vertices in the other vertex class correspond to targets. There is an edge between drug \(d_i\) and target \(t_j\) if and only if \(m_{i,j}=1\).

Fig. 1.
figure 1

Two independent predictions of Bipartite Local Models.

The likelihood of unknown interactions is predicted as follows: we consider an unknown pair \(u_{i,j}=(d_i, t_j)\) and calculate the likelihood of interaction as the aggregate of two independent predictions.

The first prediction (Fig. 1, left panel) is based on the relations between \(d_i\) and the targets. Each target \(t_k\) (except \(t_j\)) is labeled as “1” or “0” depending on \(m_{i,k}\). Then a model is trained to distinguish “1”-labeled and “0”-labeled targets. Subsequently, this model is applied to predict the likelihood of interaction for the unknown pair \(u_{i,j}\). This first prediction is denoted by \(\hat{y}'_{i,j}\).

The second prediction, \(\hat{y}''_{i,j}\), is obtained in a similar fashion, but instead of considering the interactions of drug \(d_i\) and labeling the targets, the interactions of target \(t_j\) are considered and drugs are labeled (Fig. 1, right panel). The models that make the first and second predictions are called local models.

In order to obtain the final prediction of the BLM, we average the predictions of the aforementioned local models:

$$\begin{aligned} \hat{y}_{i,j} = \frac{ \hat{y}'_{i,j} + \hat{y}''_{i,j} }{2} \end{aligned}$$
(1)

Note that instead of averaging, other aggregation functions, such as minimum or maximum are possible as well.

BLM is a generic framework in which various regressors or classifiers can be used as local models. Bleakley and Yamanishi [4] used support vector machines with a domain-specific kernel. In contrast, we propose to use a hubness-aware regression technique, ECkNN, which is described next.

3.2 ECkNN: k-Nearest Neighbor Regression with Error Correction

In the last decades, various regression schemes have been introduced, such as linear and polynomial regression, support vector regression, neural networks, etc. One of the most popular regression techniques is based on k-nearest neighbors: when predicting the numeric label on an instance x with k-nearest neighbor regression, the k-nearest neighbors of x (i.e., k instances that are most similar to x) are determined and the average of their labels is calculated as the predicted label of x. In our case, instances may either correspond to drugs or targets, depending on whether the first or the second BLM-prediction is calculated.

While being intuitive and simple to implement, k-nearest neighbor regression is well-understood from the point of view of theory as well, see e.g. [3], and the references therein for an overview of the most important theoretical results. The theoretical results are also justified by empirical studies: for example, in their recent paper, Stensbo-Smidt et al. found that nearest neighbor regression outperforms model-based prediction of star formation rates [31], while Hu et al. showed that a model based on k-nearest neighbor regression is able to estimate the capacity of lithium-ion batteries [18].

Despite all of the aforementioned advantages of k-nearest neighbor regression, one of its recently explored shortcomings is its suboptimal performance in the presence of bad hubs. Intuitively, bad hubs are instances that appear as nearest neighbors of many other instances, but have substantially different labels from those instances. The presence of bad hubs has been shown to be related to the intrinsic dimensionality of the data. This means, roughly speaking, that bad hubs are expected in complex data, such as drug–target interaction data. For a more detailed discussion, we refer to [6].

In order to alleviate the detrimental effect of bad hubs, in [6] we proposed an error correction technique which is reviewed next. We define the corrected label \(y_c(x)\) of a training instance x as

$$\begin{aligned} y_c(x) = {\left\{ \begin{array}{ll} \frac{1}{|\mathcal {R}_x|} \sum \limits _{x_i \in \mathcal {R}_x} y(x_i) &{} \text {if } |\mathcal {R}_x| \ge 1\\ y(x), &{} \text {otherwise} \end{array}\right. }, \end{aligned}$$
(2)

where \(y(x_i)\) denotes the original (i.e., uncorrected) label of instance \(x_i\), and \(\mathcal {R}_x\) is the set of “reverse neighbors”, i.e. the set of training instances that have x as one of their k-nearest neighbors:

$$\begin{aligned} \mathcal {R}_x = \{ \forall x_i | x \in \mathcal {N}(x_i) \} \end{aligned}$$
(3)

where \(\mathcal {N}(x_i)\) denotes the set of k-nearest neighbors of \(x_i\).

In order to make predictions, k-nearest neighbor regression with error correction (ECkNN) uses the corrected labels. Given a “new” (unlabeled) instance \(x'\), its predicted label \(\hat{y}(x')\) is calculated as follows:

$$\begin{aligned} \hat{y}(x') = \frac{1}{k} \sum \limits _{ x_i \in \mathcal {N}(x') } y_c(x_i). \end{aligned}$$
(4)

4 Our Approach

Next, we present ALADIN, our Advanced Local Drug-Target Interaction Prediction approach. Following subsections describe the components of ALADIN.

4.1 Similarity-Based Representation

The given drug–drug similarities allow us to represent drugs in the similarity space: in particular, drug \(d_i\) is represented by the vector \( (s^D_{i,1}, \ldots s^D_{i,n})\). Given the target similarity matrices, targets may be represented in an analogous way, i.e., using their similarities to all the targets.

Fig. 2.
figure 2

Illustration of enhanced similarity-based representation of drugs and targets

Additionally to the given drug–drug and target–target similarities, we propose to compute drug–drug and target–target similarities based on the known interactions (i.e., interactions in the training set). In particular, using the interaction matrix, we calculate the Jaccard-similarity between drugs as well as between targets. Thus the enhanced similarity-based representation of a drug (or target, respectively) consists of its chemical (genetic) similarity to all the drugs (targets) and its interaction-based similarity to all the drugs (targets). This is illustrated in Fig. 2.

4.2 Projection-Based Ensemble

We propose to build a projection-based ensemble of BLMs as follows. Given the enhanced similarity-based representation of drugs and targets, we select a random subset of features and use only the selected features when training the local models (ECkNN) and making predictions. Denoting the size of the set of selected features by \(F_D\) and \(F_T\) (for drugs and targets, respectively), the above procedure first projects drugs into \(F_D\)-dimensional, and targets into an \(F_T\)-dimensional subspace. Subsequently, these lower dimensional representations are used with the prediction models.

The above process of random selection of features and making predictions using the resulting lower-dimensional representation is repeated N-times. This results in an ensemble of N prediction models. As each member of the ensemble is constructed in the same way, their expected prediction accuracies will be similar, therefore, we propose to average the predictions of the members of the ensemble. Thus the final output of the ensemble is:

$$\begin{aligned} \hat{y}_{i,j} = \frac{1}{N}\sum \limits _{l = 1}^{N}\hat{y}^{(l)}_{i,j} \end{aligned}$$
(5)

where \(\hat{y}^{(l)}_{i,j}\) is the prediction of the l-th BLM for the unknown pair \(u_{i,j}\).

The projection-based ensemble is illustrated in Fig. 3 for \(N=2\) base prediction models with \(F_D=F_T=3\) features selected from the enhanced similarity-based representation.

Fig. 3.
figure 3

Projection-based ensemble of BLMs using the enhanced similarity-based representation of drugs and targets.

4.3 Prediction for New Drugs and Targets

One of the shortcomings of the BLM approach is that it does not handle the case of new drugs/targets. With new drug (target, resp.), we mean a drug d (target t) that does not have any known interaction in the training data. In such cases, BLM labels all targets (drugs) as “0”, consequently, no reasonable local model can be learned. In order to alleviate this problem, we use the weighted profile [39] approach to obtain predictions for new drugs/targets.

Given a new drug \(d_i\), and a target \(t_j\), we predict the likelihood of the interaction between \(d_i\) and \(t_j\) as follows:

$$\begin{aligned} \hat{y}'_{i,j} = \frac{\sum \limits _{d_k \in \mathcal {D} \setminus \{ d_i \} } m_{k,j} \mathcal {S}^D_{i,k}}{\sum \limits _{d_k \in \mathcal {D} \setminus \{ d_i \} } \mathcal {S}^D_{i,k}}. \end{aligned}$$
(6)

The intuition behind Eq. (6) is that similar drugs are likely to behave similarly in terms of their interaction with a given target. Therefore, drugs are weighed according to their similarity to the new drug \(d_i\) and we calculate the weighted average of the known interactions of other drugs with the same target.

figure a

The case of new targets is analogous. Given a new target \(t_j\) and a drug \(d_i\), the weighted profile approach can be used to calculate the prediction for the likelihood of the interaction between \(d_i\) and \(t_j\) as follows:

$$\begin{aligned} \hat{y}''_{i,j} = \frac{\sum \limits _{t_k \in \mathcal {T} \setminus \{ t_j \} } m_{i,k} \mathcal {S}^T_{j,k}}{\sum \limits _{t_k \in \mathcal {T} \setminus \{ t_j \} } \mathcal {S}^T_{j,k}}. \end{aligned}$$
(7)

Although the weighted profile approach is more general than BLM, in the sense that it can be used for new drugs/targets as well, the predictions of the weighted profile approach are less accurate than the predictions of BLM. Therefore, we use the weighted profile approach instead of BLM only in case of new drugs and targets. We summarize the proposed approach in Algorithm 1.

5 Experimental Evaluation

In order to assist reproducibility of our work, we evaluated our approach on publicly available real-word drug–target interaction data. Next we describe the data and the experimental protocol in detail. This is followed by the discussion of our experimental results.Footnote 1

5.1 Experimental Settings

Datasets. We performed experiments on five drug–target interaction datasets (Table 1), namely Enzyme, Ion Channel, G-protein coupled receptors (GPCR), Nuclear Receptors (NR), and Kinase.Footnote 2 These datasets have been used in various studies previously, see e.g. [4, 12, 14, 24, 38, 39].

The first four datasets contain binary interaction matrices between drugs and targets, each entry of which indicates whether the interaction between the corresponding drug and target is known. In contrast, Kinase contains continuous values of binding affinity for all drug–target pairs of the dataset. In order to produce a binary interaction matrix, we used the same cutoff threshold as Pahikkala et al. [24].

Table 1. Number of drugs, targets and interactions in the datasets used in our study.

Additionally, each dataset contains a drug–drug similarity matrix and a target–target similarity matrix. In case of the Enzyme, Ion Channel, GPCR and NR datasets, chemical structure similarities between drugs were computed using the SIMCOMP algorithm [16], while the Kinase dataset contains 2D Tanimoto coefficients. Similarities between targets were determined by the Smith-Waterman algorithm, see [12, 39] for details.

Evaluation Protocol. Although leave-one-out cross-validation is popular in the DTI literature [4, 22, 23], in their recent study, Pahikkala et al. [24] argue that it may lead to overoptimistic results. Thus, we performed experiments according to the interaction-based \(5 \times 5\)-fold cross-validation protocol (in each round of the cross-validation, the test set contains one fifth of all the drug–target pairs).

Evaluation Metrics. We evaluated the predictions both in terms of Area Under ROC Curve (AUC) and Area Under Precision-Recall Curve (AUPR). AUC and AUPR values were calculated in each round of the cross-validation. We report averaged values. Additionally, we performed paired t-test at significance level \(p=0.01\) in order to judge if the observed differences are statistically significant.

Baselines. We compared our approach, ALADIN, with other drug–target interaction prediction techniques, such as BLM-NII, NetLapRLS and WNN-GIP. BLM-NII is a recent version of BLM that extends BLM with “neighbor-based interaction-profile inferring” [23]. NetLapRLS stands for “net Laplacian regularized least squares” [38], while WNN-GIP is a combination of weighted nearest neighbor and Gaussian interaction profile kernels [36].

Parameter Settings. We set the number of base prediction models (N) to 25 for ALADIN.Footnote 3 Other hyperparameters of ALADIN, whenever not indicated otherwise, were learned via grid-search in internal 5-fold cross-validation on the training data. In particular: the number of nearest neighbors for the local model, ECkNN, and the number of selected features, were chosen from {3, 5, 7} and {10, 20, 50} respectively.

Hyperparameters of the baselines were learned similarly. In particular: for BLM-NII, the max function was used to generate final predictions and the weight \(\alpha \) for the combination of structural and collaborative similarities was chosen from {0.0, 0.1, ..., 1.0}. In WNN-GIP, the decay hyperparameter T was chosen from {0.1, 0.2, ..., 1.0} and the weight \(\alpha \) for combination of structural and collaborative similarities was chosen from {0.0, 0.1, ..., 1.0}. The hyperparametersFootnote 4 of NetLapRLS, were chosen from \(\{10^{-6},10^{-5}, \ldots ,10^2\}\).

Implementation. We implemented our approach, ALADIN, in Python.Footnote 5 We used the ECkNN implementation from the publicly available PyHubs libraryFootnote 6 and methods from the NumPy machine learning library for the calculation of AUC and AUPR. We used implementations of NetLapRLS, BLM-NII and WNN-GIP from the publicly available PyDTI software library.Footnote 7

5.2 Experimental Results

Our results are shown in Figs. 4 and 5. The symbols \(+/-\) denote if the differences between the best-performing approach and other methods are statistically significant (\(+\)) or not (−).

Fig. 4.
figure 4

Experimental results: the performance of ALADIN and its competitors in terms of AUC (left) and AUPR (right) on the Enzyme, Ion Channel, GPCR and NR datasets. The best-performing method is underlined. The symbols \(+/-\) denote if the differences between the best-performing approach and other methods are statistically significant (\(+\)) or not (−).

Fig. 5.
figure 5

Experimental results: the performance of ALADIN and its competitors in terms of AUC (left) and AUPR (right) on the Kinase dataset. The best-performing method is underlined. The symbols \(+/-\) denote if the differences between the best-performing approach and other methods are statistically significant (\(+\)) or not (−).

As one can see, our approach, ALADIN outperformed its competitors, NetLapRLS, BLM-NII and WNN-GIP, on the Enzyme, Ion Channel, GPRC and Kinase datasets both in terms of AUC and AUPR. In the vast majority of the cases, the difference is statistically significant. In case of the NR dataset, the difference between ALADIN, BLM-NII and WNN-GIP is not significant. Note, however, that NR is an exceptionally small dataset, therefore, the results obtained on NR are likely to be less stable compared to other datasets.

Additionally, we examined the contribution of hubness-aware error correction: in particular, we run ALADIN with simple kNN regression instead of ECkNN. We found that ALADIN with ECkNN systematically outperformed ALADIN with kNN on all the examined datasets. The difference was statistically significant in most of the cases. In terms of AUC, we observed the largest difference on the Kinase dataset (0.93 versus 0.90), whereas in terms of AUPR, the largest difference was observed on the Enzyme dataset (0.83 versus 0.73). These results indicate that error correction is essential for accurate predictions.Footnote 8

Furthermore, we examined how ALADIN’s performance depend on k, the number of nearest neighbors in ECkNN. As one can see in Fig. 6, high performance is maintained for various k values and \(k=3\) seems to result in good results both in terms of AUC and AUPR.

Fig. 6.
figure 6

ALADIN’s performance in case of various k values in ECkNN.

5.3 Application for the Prediction of New Interactions

Next, we illustrate that, besides achieving high accuracy in terms of AUC and AUPR, the predictions of ALADIN may be relevant for pharmaceutical applications as well. We begin this discussion by noting that the drug–target interactions contained in the Enzyme, Ion Channel, GPCR and NR datasets were extracted from the Kyoto Encyclopedia of Genes and GenomesFootnote 9 (KEGG) several years ago and, in order to allow for comparison of prediction techniques, they have been kept unchanged. However, in the mean time, additional drug–target interactions have been validated chemically and the results have been uploaded to databases, such as KEGG, DrugBankFootnote 10 or MatadorFootnote 11.

Therefore, in order to demonstrate that our approach is able to predict new interactions, we trained ALADIN and its competitors, BLM-NII, NetLapRLS and WNN-GIP using all the interactions of the original datasets, and ranked the non-interacting drug–target pairs of the original datasets according to their predicted interaction scores. For simplicity, we use the term predicted new interactions for the top-ranked 20 drug–target pairs. We say that a predicted new interaction is validated if it is included in the current version of KEGG, DrugBank or Matador.

In terms of the number of validated interactions, ALADIN had the best overall performance. For example, on the Ion Channel and NR datasets, ALADIN was able to predict 12 and 8 validated interactions, whereas none of its competitors was able to predict more than 6 validated interactions on these datasets.

Most notably, numerous validated interactions were only predicted by our approach, for example, on the Enzyme dataset, the interactions between Ibuprofen (D00126) and arachidonate 15-lipoxygenase (hsa:246) and its second type (hsa:247); as well as the interaction between Phentermine (D05458) and monoamine oxidase A (hsa:4128); and the interaction between Dyphylline (D00691) and phosphodiesterase 7A (hsa:5150). On the GPCR dataset, only ALADIN was able to predict the validated interaction between Theophylline sodium acetate (D01712) and adenosine A2b receptor (hsa:136), as well as the interaction between Loxapine (D02340) and dopamine receptor D1 (hsa:1812).

6 Conclusions and Outlook

In this paper, we considered the drug–target interaction prediction problem which has important applications in understanding the mechanisms of how drugs effect, drug repositioning and prediction of adverse effects. We proposed an extension of BLM, one of the most prominent DTI models. In particular, we proposed the ALADIN approach which represents drugs and targets in a multi-modal similarity space, uses ECkNN, a hubness-aware regression approach as local model in BLM and builds a projection-based ensemble.

We performed experiments on widely-used publicly-available datasets, the results of which show that our approach is superior to BLM-NII, NetLapRLS and WNN-GIP. We also demonstrated that our approach is able to predict chemically validated new drug–target interactions.

While DTI is an essential task, we point out that ALADIN may be adapted for the prediction of interactions between other biomedical entities, such as protein–RNA interactions [32] or protein–protein interactions [2].

Furthermore, we believe that our approach may motivate new recommender systems techniques as well. Although it was shown that only a few ratings per user may be more relevant than content-based metadata [27], we argue that the continuous flow of new users causes ongoing cold start problem [20, 26] in many cases, such as small e-commerce enterprises. This indicates that hybrid prediction models incorporating both relevance feedback and metadata may be desirable. Methods like ALADIN can be applied in such domains, e.g., as a part of an alternating hybrid approach, where users with sufficient feedback receive purely collaborative recommendations.