# FKL-Spa-LapRLS: an accurate method for identifying human microRNA-disease association

## Abstract

### Background

In the process of post-transcription, microRNAs (miRNAs) are closely related to various complex human diseases. Traditional verification methods for miRNA-disease associations take a lot of time and expense, so it is especially important to design computational methods for detecting potential associations. Considering the restrictions of previous computational methods for predicting potential miRNAs-disease associations, we develop the model of FKL-Spa-LapRLS (Fast Kernel Learning Sparse kernel Laplacian Regularized Least Squares) to break through the limitations.

### Result

First, we extract three miRNA similarity kernels and three disease similarity kernels. Then, we combine these kernels into a single kernel through the Fast Kernel Learning (FKL) model, and use sparse kernel (Spa) to eliminate noise in the integrated similarity kernel. Finally, we find the associations via Laplacian Regularized Least Squares (LapRLS). Based on three evaluation methods, global and local leave-one-out cross validation (LOOCV), and 5-fold cross validation, the AUCs of our method achieve 0.9563, 0.8398 and 0.9535, thus it can be seen that our method is reliable. Then, we use case studies of eight neoplasms to further analyze the performance of our method. We find that most of the predicted miRNA-disease associations are confirmed by previous traditional experiments, and some important miRNAs should be paid more attention, which uncover more associations of various neoplasms than other miRNAs.

### Conclusions

Our proposed model can reveal miRNA-disease associations and improve the accuracy of correlation prediction for various diseases. Our method can be also easily extended with more similarity kernels.

## Keywords

MiRNA-disease association Similarity kernel Fast kernel learning Sparse kernel Laplacian regularized least squares## Abbreviations

- CMF
Collaborative matrix factorization

- CV
Cross validation

- FKL
Fast kernel learning

- GIP
Gaussian interaction profile

- GRMF
Graph regularized matrix factorization

- HMDD
Human microRNA disease database

- KBMF
Kernelized Bayesian matrix factorization

- KRLS
Kronecker regularized least squares

- LapRLS
Laplacian regularized least squares

- LLS
Log likehood score

- LOOCV
Leave-one-out cross validation

- NRLMF
Neighborhood regularized logistic matrix factorization

- SRMF
Similarity-regularized matrix factorization

## Background

MicroRNAs (miRNAs) are some of non-coding RNAs with 20∼25 nucleotides [1]. In the process of post-transcription, miRNAs are a part of messenger RNA (mRNA) sequences and affect protein synthesis [2, 3, 4]. Some previous studies have proved that miRNAs are related to various diseases including cancers. For example, the expression level of *hsa*-*mir*-21 leads to more than 125 diseases, such as Alzheimer Disease, Diabetes Mellitus, Lymphoma and so on. Thus, the research of miRNAs is helpful for the diagnosis and treatment of diseases [5]. The traditional experiments to detect the associations between miRNAs and diseases are time-consuming and expensive [6]. Therefore, it is especially important to find potential miRNA-disease associations by the computational methods [7]. Previous researches achieved massive miRNA-disease associations through the traditional experiments, and some databases have been constructed for miRNA-disease associations. Human MicroRNA Disease Database (HMDD) [8] collects 572 miRNAs, 378 Disease and 10368 miRNA-disease associations. The miR2Disease [9] includes 349 miRNAs, 163 disease and 3273 miRNA-disease associations. The dbDEMC contains of 2224 miRNAs, 36 cancer types and 20037 miRNA-disease associations through the high-throughput methods. Thus, these associations promote the development of the computing methods.

Up to now, it has achieved excellent performance that people find the potential disease-miRNA associations by the computational methods [10, 11, 12, 13, 14]. Most of these methods are based on the assumption that miRNAs with high similarity apt to be related with similar diseases and vice versa [15, 16]. Xuan et al. [17] proposed HDMP that achieves a score for one miRNA by weighting *k* most similar neighbors, and a larger score has higher possibility to associate with a specific disease, but HDMP can’t work for a new disease without known related miRNAs. Jiang et al. [18] devised a hypergeometric distribution-based model to calculate the score of each miRNA for a specific disease, and the miRNA with larger score tend to cause this disease. Scores of above two methods are based on miRNA neighbor information, which ignores entire informations of miRNA similarity network. Many models find miRNA-disease associations based on the similarity networks [19, 20, 21, 22, 23]. Chen et al. developed the RWRMDA model [24], which uses the information of miRNA functional similarity network and known miRNA-disease association network, and utilizes the random walk model to find the potential miRNA-disease association. However, RWRMDA is faced with the same problem as HDMP, because of the initial nonzero vector. Therefore, Chen et al. [25] proposed WBSMDA to find the potential association by integrating the miRNA functional similarity network, disease semantic similarity and known miRNA-disease association network. For the similarity between two miRNAs/diseases, WBSMDA integrates Gaussian Interaction Profile (GIP) kernel similarity for miRNA and disease, and calculates the association probability for miRNA-disease pair using Within-Score and Between-Score of disease and miRNA. Gu et al. [26] developed NCPMDA by constructing novel similarity kernel for miRNA and disease via the matrix operation and calculating the space projection scores of miRNA and disease. The final score between miRNA and disease is calculated by combining two space projection scores. The predictive performance of NCPMDA is superior over the previous methods when working for a disease without any known related miRNAs [13].

Many previous models are based on defining a cost function and minimizing this cost function. Chen et al. [27] developed RLSMDA, a semi-supervised method, which minimizes the Regularized Least Squares cost function and uncovers the potential miRNAs associated with various diseases. After that, Chen et al. [28] proposed LRSSLMDA, which is used to reveal the potential association between miRNA and disease. LRSSLMDA constructs comprehensive statistical features and graph theoretic features by combining the miRNA and disease similarity kernels. Then, Laplacian regularization term is used to add objective function. Experimental results demonstrate that LRSSLMDA is a valuable computational model. In addition, many previous methods are based on machine learning algorithms [29, 30], matrix completion [31, 32, 33] and graph theory [34]. For example, Shen et al. [35] proposed CMFMDA that uses WKNKN to estimate association probability for unknown associations between miRNA and disease, and uses Collaborative Matrix Factorization to uncover the potential association. You et al. [36] developed PBMDA that constructs a heterogeneous graph by integrating five networks, gets all scores of paths for a miRNA-disease pair, and calculates the miRNA-disease association possibility through the sum of all path score. PBMDA gets a remarkable performance to find the potential miRNA-disease association.

All above methods have achieved remarkable results, but there are still different limitations or restrictions. For example, most of the existing methods are based on the assumption that miRNAs with high similarity apt to be related with similar diseases. About constructing miRNA and disease similarity kernel, most researches use the functional similarity and GIP kernel similarity for miRNA, and use the semantic similarity and GIP kernel similarity for disease. To integrate two similarity kernels, lots of works only tend to accumulate or average [29, 37, 38]. Therefore, there is an urgent need to propose an effective method for integrating multiple miRNA and disease similarity kernels [39].

In this paper, we firstly extract the miRNA functional similarity, the miRNA sequence similarity and GIP kernel similarity for miRNA, and the disease semantic similarity, disease functional similarity and GIP kernel similarity for disease. Then, we use the Fast Kernel Learning method to construct one miRNA similarity kernel and one disease similarity kernel. Finally, we propose a novel Sparse Laplacian Regularized Least Squares method to uncover the miRNA-disease association. Here, three evaluation methods are used to assess performance, including global Leave-One-Out Cross Validation (global LOOCV), local Leave-One-Out Cross Validation (local LOOCV) and 5-fold cross validation (5-fold CV). In these three evaluation methods, our method obtains the remarkable performance (AUCs of 0.9563, 0.8398 and 0.9535, respectively) compared with other nine models. And also, we use case studies of eight Neoplasms for further analyzing the performance of our method. We find that 47 of top 50 candidates are confirmed to have associations with Lymphoma in global verification, and all top 50 candidates are confirmed to have associations with Breast and Colorectal Neoplasms in local verification. Moreover, we find that some of the miRNAs need to be paid more attention to uncover more associations with various neoplasms, including hsa-mir-106b, hsa-mir-19b, hsa-mir-29c, hsa-mir-1, hsa-mir-29a and so on.

## Methods

### Human miRNA-disease associations dataset

In this paper, the set of miRNAs is denoted by \(M=\left \{m_{i}\right \}_{i=1}^{m}\), and the set of diseases is denoted by \(D=\left \{d_{j}\right \}_{j=1}^{n}\), where *m* and *n* are the numbers of miRNAs and diseases respectively. The associations between miRNAs and diseases can be downloaded from HMDD database, which include 5430 associations between 495 miRNAs and 383 diseases. The associations are represented by a binary matrix *Y*∈*R*^{m×n}, where *y*_{i,j}∈{0,1}. if a miRNA *m*_{i} is association with a disease *d*_{j}, *y*_{i,j} is set to 1; otherwise, *y*_{i,j} is set to 0;

### MiRNA similarity

Basing on the assumption that miRNAs with high similarity tend to be associated with the same disease, we extract three classes of miRNA similarity, including functional similarity, sequence similarity and Gaussian Interaction Profile (GIP) kernel similarity.

#### MiRNA functional similarity

In the previous works, the MISIM method [40] proposed by Cui et al. calculated the score of miRNA functional similarity. We extract 495 functional similarity score through MISIM and construct kernel \(K_{1}^{m} \in R^{m\times m}\) to represent the miRNA functional similarity network, in which \(K_{1}^{m}(m_{i},m_{j}\)) is the functional similarity score between miRNAs *m*_{i} and *m*_{j}.

#### MiRNA sequence similarity

All 495 miRNA sequences are downloaded from miRBase database [41]. We extract miRNA sequence similarity using the Needleman-Wunsch Algorithm and get kernel \(K_{2}^{m}\in R^{m\times m}\) to represent the miRNA similarity of sequence network, in which \(K_{2}^{m}(m_{i},m_{j})\) is the similarity of sequence score between miRNA *m*_{i} and *m*_{j}.

#### GIP kernel similarity for miRNAs

*m*

_{i}and

*m*

_{j}is denoted as \(K_{3}^{m}\in R^{m\times m}\) and the calculation method is as Eq. (1)

where *I**P*(*m*_{i})∈*R*^{1×n} denotes the interaction profiles of miRNA *m*_{i} by observing whether miRNA *m*_{i} is associated with each disease or not, that is to say, the *i*-th row of the associations matrix *Y*; *γ*_{m} is used for kernel bandwidth control, which is set to − 1 in this paper.

### Disease similarity

We extract three classes of disease similarity, including semantic similarity, functional similarity and GIP kernel similarity.

#### Disease semantic similarity

*d*(

*i*) can be described as a node in Directed Acyclic Graph(

*DAG*) based on the MeSH [43] database (https://www.nlm.nih.gov/bsd/disted/meshtutorial/themeshdatabase/), and denoted as \({DAG}_{d_{i}}=(d_{i},T_{d_{i}},E_{d_{i}})\), in which \(T_{d_{i}}\) is the set of all ancestor nodes of

*d*

_{i}including node

*d*

_{i}itself and \(E_{d_{i}}\) is the set of corresponding links. A semantic score of each disease \(t \in T_{d_{i}}\) can be calculated by Eq. (2).

where *Δ* is the semantic contribution factor, which is set to 0.5 in this paper.

*d*

_{i}by Eq. (3).

*d*

_{i}and

*d*

_{j}is calculated by Eq. (4).

#### Disease functional similarity

where *L**L**S*(*g*_{i},*g*_{j}) represents LLS between the *i*-th and *j*-th genes; *L**L**S*^{∗}(*g*_{i},*g*_{j}) represents the LLS score after normalization; *L**L**S*_{min} and *L**L**S*_{max} indicate the minimum and maximum LLS scores in HumanNet respectively.

where *S*_{HumanNET} indicates the gene-gene associations in the HumanNet database; *e*(*i*,*j*) indicates the association between *i*-th and *j*-th genes.

*g*and a gene set

*G*is defined as Eq. (7).

*d*

_{i}is related to many genes, which is defined as gene set

*G*

_{i}, the associations between disease and genes are download from SIDD [46]. The disease functional similarity score is defined as Eq. (8)

#### GIP kernel similarity for diseases

where *I**P*(*d*_{i})∈*R*^{m×1} denotes the interaction profiles of disease *d*_{i} by observing whether disease *d*_{i} is associated with each miRNA or not, that is to say, the *i*-th column of the associations matrix *Y*; *γ*_{d} is used for kernel bandwidth control, which is set to − 1 in this paper.

### Fast kernel learning

*K*

^{m}∈

*R*

^{m×m}using the method of Fast Kernel Learning (FKL) [47]. We define

*K*

^{m}as Eq. (10).

*K*

^{m}should be close to the associations metrix

*Y*. We define the miRNAs associations similarity as Eq. (11).

*μ*

^{m}∈

*R*

^{3×1}using the following Eq. (12) to minimize the distance between

*K*

^{m}and

*Y*

^{m}.

where \(||K^{m}-Y^{m}||_{F}^{2} = \sum _{i}\sum _{j}\left (K_{i,j}^{m}-Y_{i,j}^{m}\right)^{2}\).

where *λ*^{m} is set to 200 in this paper.

### Laplacian regularized least squares

Given the similarity kernels of miRNAs and diseases, we use Sparse Laplacian Regularized Least Squares (Spa-LapRLS) to get a new association matrix, and find potential miRNA-disease associations. It includes Sparse kernel model and LapRLS model.

#### Sparse kernel model

*k*Neighbor model to reduce noise in integrated similarity kernel. For the miRNA subspace, we construct a weight matrix

*w*

_{m}∈

*R*

^{m×m}for

*K*

^{m}, whose elements are defined as Eq. (16), by the Top-

*k*Neighbor method.

where *k* satisfies condition 0<*k*<*m*; *T*(*k*,*i*) represents the *k*-th largest element of the *i*-th row in *K*^{m} and *T*(*k*,*j*) represents the *k*-th largest element of the *j*-th column in *K*^{m}.

Similarity, we also calculate the denoised disease similarity kernel as \(K_{d}^{*} \in R^{n \times n}\).

#### LapRLS for miRNA-disease interaction prediction

*D*

_{m}is the diagonal matrix of \(K_{m}^{*}\) in the form of \(D_{m}(i,i)=\sum _{j=1}^{m}K_{m}^{*}(i,j)\);

*β*

_{m}is the regularization coefficients, which is set to 2

^{−5}in this paper;

*α*

_{m}is renewed by the function Eq. (19) in [48].

The derivation of the optimization algorithm are presented in [48].

where \(F_{d}=K_{d}^{*} \alpha _{d} \in R^{n \times m}\); *β*_{d} is the regularization coefficients, which is set to 2^{−5} in this paper.

where *F*^{∗}∈*R*^{m×n}.

## Results and discussion

In this section, we study the performance of our method from different aspects on prediction of unknown miRNA-disease associations. First, we establish three evaluation methods and two assessment indicators to evaluate the accuracy of our method. Second, we analyze the performance of our method with different parameters by using 10-fold CV and local LOOCV. Third, we employ 10-fold CV and local LOOCV to analyze the performance of the FKL model. Fourth, we compare the performance of LapRLS with multiple matrix factorization method. Fifth, we compare the performance of FKL-Spa-LapRLS with nine outstanding methods. Finally, for a further validation, we implement the global and local verifications on eight neoplasms for case studies.

### Evaluation criteria

In this paper, we implement 10-fold CV, global LOOCV and local LOOCV to evaluate the prediction accuracy of our method. In the 10-fold CV, all miRNA-disease associations are randomly divided into ten uncrossed groups, one of which is regarded as test set and the other nine groups are used for training set in turns. In the global LOOCV, all 5430 miRNA-disease verified associations are regarded as objective research sample, and each association is left in turns served as a test sample and other known associations are regarded as training sample. In the local LOOCV, only considering miRNAs for a specific disease, for disease *d*(*i*), each miRNA related to *d*(*i*) is left out as test set, and other associations are regarded as training set. All the miRNA-disease associations in test set are reseted as 0 in the association matrix *Y*.

In our study, we use Area Under Curve (AUC) and Area Under the Precision-Recall curve (AUPR) to establish the assessment criteria for method prediction. AUC is the area under the receiver operating characteristic (ROC) created by plotting true positive rate against false positive rate at various threshold settings. An AUC value of 1 indicates perfect performance and an AUC of 0.5 indicates random performance. AUPR is the area under the curve created by plotting precision against recall at various threshold setting. The greater the value of AUPR, the better performance of the model.

### Parameter selection

In this section, we use 10-fold CV and local LOOCV to analyze several parameters, including *γ*_{m}, *γ*_{d}, *λ*_{m}, *λ*_{d}, *β*_{m}, *β*_{d} and *k* value.

*γ*

_{m}and

*γ*

_{d}are the parameters in the process of constructing GIP kernel similarity for miRNA and diseases, respectively. We just use GIP kernel similarity to predict potential miRNA-disease associations and use 10-fold CV to evaluate performance of GIP kernel with different parameters. Then, we take

*γ*

_{m}and

*γ*

_{d}from − 10 to 10 with step 1 and calculate AUCs, respectively. The results are shown in Fig. 2a. It shows that the performance of GIP similarity kernel is sensitive to

*γ*

_{m}and

*γ*

_{d}, and the optimal AUC is obtained when

*γ*

_{m}and

*γ*

_{d}equal to 0. However, the

*K*

_{m,3}and

*K*

_{d,3}are matrices with ones in all elements according to Eqs. (1) and (9) when two parameters equal to 0. Therefore, we adopt suboptimal

*γ*

_{m}=−1 and

*γ*

_{d}=−1 in this paper. Since most of elements in GIP similarity kernel are more than 1, we need to normalize GIP similarity kernel before integrating multiple kernels.

The *λ*_{m} and *λ*_{d} are the regularization coefficients of FKL. We use different *λ*_{m} and *λ*_{d} to integrate three miRNA similarity kernels and three disease similarity kernels, respectively. Then we use integrated similarity kernel and LapRLS to uncover potential associations and use 10-fold CV to evaluate performance of FKL with different parameters. The *λ*_{m} and *λ*_{d} are gradually varying from 0 to 15000 with step 100 in order to find the best value. The results are shown in Fig. 2b. It can be found that AUC keeps small fluctuation in the range between 0 to 15000. It demonstrates that FKL is insensitive to regularization coefficient. So, *λ*_{m} and *λ*_{d} are set to 200 in this paper.

The *β*_{m} and *β*_{d} are the regularization coefficients of LapRLS. We take *β*_{m} and *β*_{d} from 2^{−10} to 2^{10}, respectively. We adopt 10-fold CV to evaluate performance of LapRLS with different parameters. The results are shown in Fig. 2c. It can be found that AUC keeps small fluctuation in the range between 2^{−10} to 2^{−2}, and AUC has obvious change when *β*_{m} and *β*_{d} greater than 2^{−2}. We select the optimal *β*_{m} and *β*_{d} by the highest AUC value and set *β*_{m} and *β*_{d} as 2^{−5} in this paper.

*k*value in the process of sparse kernel is an important parameter in this paper. We use 10-fold CV and local LOOCV to analyze

*k*value. The value of

*k*is taken from 20 to 250 with step 5, are shown in Fig. 3. It can be clearly seen that the process of sparse kernel has positive effect on the discovery of potential miRNA-disease associations. In this study,

*k*value is set to 20 in the 10-fold CV and global LOOCV, and is set to 40 in the local LOOCV.

### FKL performance analysis

In this section, we analyze the performance of FKL. First, we compare FKL with single kernel and average kernel by the 10-fold CV and local LOOCV. Then, we compare FKL with two multiple kernels learning method by the 10-fold CV and local LOOCV.

#### Comparison with single kernel and average kernel

In the 10-fold CV, The AUC of FKL is the highest among five curves, and the AUC difference between the FKL model and the *K*_{1} is slight but the difference in AUPR is obvious. Local LOOCV is a measure that can express model performance excellently when we handle a new disease not having known associations with miRNA. In Fig. 4, the *AUC* of average kernel is greater than FKL kernel. In the process of KFL, we need to find a optimized *μ* to weight kernels. Here, we get \(\mathcal {\mu }^{m}=\left (0.6610,0.3390,1.1562\times 10^{-9}\right)\) and \(\mathcal {\mu }^{d}=\left (1,9.1453\times 10^{-10},7.3854\times 10^{-10}\right)\), that is to say, the miRNA functional similarity kernel and the miRNA sequence similarity kernel are more important than GIP kernel similarity, and disease semantic similarity kernel is the most important in the three kernels. The model loses a part of information in the weighting process. However, a new disease not having any known association with miRNA needs more detail information from different aspects. The average kernel method satisfies this requirement of more detail informations. That is why the AUC of FKL model is lower than average kernel, but the AUPR of FKL model is higher than average kernel method. Moreover, AUPR can evaluate the classifier performance better when dealing with unbalanced dataset. Therefore, it demonstrates that the FKL model is most significant in all kinds of models.

#### Comparison with other multiple kernel learning methods

### Comparison with matrix factorization

### Comparison with other methods

The comparison results between our method and other nine computational models

Methods | Global LOOCV | Local LOOCV | 5-fold CV |
---|---|---|---|

| | 0.8398 | |

PBMDA | 0.9169 | 0.8341 | 0.9172 |

MCMDA | 0.8749 | 0.7718 | 0.8767 |

MaxFlow | 0.8624 | 0.7774 | 0.8579 |

NCPMDA | 0.9073 | | 0.8763 |

WBSMDA | 0.8030 | 0.8031 | 0.8185 |

HDMP | 0.8366 | 0.7702 | 0.8342 |

RLSMDA | 0.8426 | 0.6953 | 0.8569 |

LRSSLMDA | 0.9178 | 0.8418 | 0.9181 |

HGIMDA | 0.8781 | 0.8077 | — |

### Case studies

In this section, we study several important diseases to further validate the predictive power of our method. We utilize the known miRNA-disease associations included in HMDD to find the potential miRNA-disease associations not included in HMDD, and verify the predicted results though two independent databases (dbDEMC [56] and miR2Disease [9]). In fact, dbDEMC and miR2Disease are commonly utilized to be benchmark datasets for many models, such as PBMDA and LRSSLMDA. The dbDEMC database includes 2224 miRNAs, 36 cancer types and 20037 miRNA-disease associations by the high-throughput method, and our model predicts the top five disease, including Colon Neoplasms, Gastric Neoplasms, Pancreatic Neoplasms, Colorectal Neoplasms and Esophageal Neoplasms. Furthermore, in previous work, Kidney Neoplasms, Breast Neoplasms and Lymphoma were used to infer their underlying associated miRNAs. Therefore, we use case studies of eight diseases to analyze the performance of FKL-Spa-LapRLS in this section.

We implement two methods, global validation and local validation, to evaluate the predicted performance of our method in case studies. In global verification, 5430 known miRNA-disease associations in HMDD are used as a training set to discover the potential associations. For each disease, we extract top 50 candidate associations that can’t be covered by training set. And we get all of 400 candidate associations that are checked by dbDEMC and miR2Disease databases. In the local validation, all known associations that are related to a special disease are reset to unknown ones. We use other known associations as training set to discover the potential associations. we also extract top 50 candidate associations for this special disease. And we obtain all of 400 candidate associations that are checked by the HMDD, miR2Disease and dbDEMC databases.

The verification results about eight neoplasms types

Disease name | Global verification | Local verification |
---|---|---|

Colon Neoplasms | 44 | 48 |

Gastric Neoplasms | 42 | 40 |

Pancreatic Neoplasms | 45 | 50 |

Colorectal Neoplasms | 45 | 50 |

Esophageal Neoplasms | 39 | 46 |

Kidney Neoplasms | 43 | 43 |

Breast Neoplasms | 39 | 50 |

Lymphoma | 47 | 48 |

## Conclusions

In this paper, we propose a FKL-Spa-LapRLS model to uncover potential miRNA-disease associations. We demonstrate that the KFL model is more importance than the average kernel method using 10-fold CV and local LOOCV, and the process of sparse kernal has a positive effect on noise elimination in similarity network. The LapRLS method contributes to accuracy of finding potential miRNA-disease associations.

FKL-Spa-LapRLS has been compared with nine prediction methods that have got excellent performance for prediction of miRNA-disease associations, including PBMDA, MCMDA, MaxFlow, NCPMDA, WBSMDA, HDMP, RLSMDA, LRSSLMDA and HGIMDA. FKL-Spa-LapRLS has the significantly highest accuracy in 5-fold CV and global LOOCV, albeit weakly lower than NCPMDA and LRSSLMDA in local LOOCV. To further analyze the performance of FKL-Spa-LapRLS, we implement case studies of eight Neoplasms. We find that 47 of top 50 candidates are confirmed to be associated with Lymphoma in global verification and all the top 50 candidates are confirmed to be associated with Breast and Colorectal Neoplasms in local verification, and some miRNAs need to be paid more attention.

Of course, FKL-Spa-LapRLS also have some limitations that need to be improved in the future. For example, our method needs more similarity kernels that are constructed by many information about gene-disease, disease-disease and miRNA-miRNA, and it would lose some detail information in the process of FKL when handling a new disease without the known associations with miRNAs.

## Notes

### Acknowledgements

Authors would like to thank the reviewers for their helpful comments on the original manuscript. Authors are grateful to the conference committee of The 29th International Conference on Genome Informatics (GIW 2018).

### Funding

This work is supported by a grant from the National Science Foundation of China (NSFC 61772362) and the Tianjin Research Program of Application Foundation and Advanced Technology (16JCQNJC00200). Publication costs are funded by the NSFC 61772362.

### Availability of data and materials

The code and all supporting data files are available from https://github.com/guofei-tju/FKL-Spa-LapRLS.

### About this supplement

This article has been published as part of *BMC Genomics Volume 19 Supplement 10, 2018: Proceedings of the 29th International Conference on Genome Informatics (GIW 2018): genomics*. The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/volume-19-supplement-9.

### Authors’ contributions

FG, YD and LJ conceived and designed the experiments; LJ performed the experiments and analyzed the data; YX wrote the paper. FG and JT supervised the experiments and reviewed the manuscript. All authors read and approved the final manuscript.

### Ethics approval and consent to participate

Not applicable.

### Consent for publication

Not applicable.

### Competing interests

The authors declare no conflict of interest.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary material

## References

- 1.Shi H, Zhang G, Zhou M, Cheng L, Yang H, Wang J, et al. Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations. Plos ONE. 2016; 11(2):e0148521.CrossRefGoogle Scholar
- 2.Zou Q, Li J, Hong Q, Lin Z, Wu Y, Shi H, et al. Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods. Biomed Res Int. 2015; 2015(10):810514.PubMedPubMedCentralGoogle Scholar
- 3.Yuan D, Cui X, Wang Y, Zhao Y, Li H, Hu S, et al. Enrichment Analysis Identifies Functional MicroRNA-Disease Associations in Humans. Plos ONE. 2015; 10(8):e0136285.CrossRefGoogle Scholar
- 4.Zou Q, Li J, Song L, Zeng X, Wang G. Similarity computation strategies in the microRNA-disease network: a survey. Brief Funct Genom. 2016; 15(1):55.Google Scholar
- 5.Zeng X, Liu L, Lu L, Zou Q. Prediction of potential disease-associated microRNAs using structural perturbation method. Bioinformatics. 2018; 34:2425–32.CrossRefGoogle Scholar
- 6.Zeng X, Zhang X, Zou Q. Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks. Brief Bioinform. 2016; 17(2):193.CrossRefGoogle Scholar
- 7.Mørk S, Pletscher-Frankild S, Palleja CA, Gorodkin J, Jensen LJ. Protein-driven inference of miRNA-disease associations. Bioinformatics. 2014; 30(3):392.CrossRefGoogle Scholar
- 8.Li Y, Qiu C, Tu J, Geng B, Yang J, Jiang T, et al. HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2014; 42(Database issue):D1070.CrossRefGoogle Scholar
- 9.Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, et al. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009; 37(1):D98—104.PubMedGoogle Scholar
- 10.Peng L, Peng M, Liao B, Huang G, Liang W, Li K. Improved low-rank matrix recovery method for predicting miRNA-disease association. Sci Rep. 2017; 7(1):6007.CrossRefGoogle Scholar
- 11.Luo J, Ding P, Liang C, Chen X. Semi-supervised prediction of human miRNA-disease association based on graph regularization framework in heterogeneous networks. Neurocomputing. 2018; 294:29–38.CrossRefGoogle Scholar
- 12.Zhao Q, Xie D, Liu H, Wang F, Yan GY, Chen X. SSCMDA: spy and super cluster strategy for MiRNA-disease association prediction. Oncotarget. 2018; 9(2):1826–42.PubMedGoogle Scholar
- 13.Liu Y, Zeng X, He Z, Quan Z. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans Comput Biol Bioinform. 2016; PP(99):1–1.Google Scholar
- 14.Shi H, Xu J, Zhang G, Xu L, Li C, Wang L, et al. Walking the interactome to identify human miRNA-disease associations through the functional link between miRNA targets and disease genes. BMC Syst Biol. 2013; 7(1):1–12.CrossRefGoogle Scholar
- 15.Luo J, Xiao Q. A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network. J Biomed Inform. 2017; 66:194–203.CrossRefGoogle Scholar
- 16.Lan W, Wang J, Li M, Liu J, Wu FX, Pan Y. Predicting microRNA-disease associations based on improved microRNA and disease similarities. IEEE/ACM Trans Comput Biol Bioinform. 2016; PP(99):1–1.Google Scholar
- 17.Xuan P, Han K, Guo M, Guo Y, Li J, Ding J, et al. Correction: Prediction of microRNAs Associated with Human Diseases Based on Weighted k Most Similar Neighbors. Plos ONE. 2013; 8(9):e70204.CrossRefGoogle Scholar
- 18.Jiang Q, Hao Y, Wang G, Juan L, Zhang T, Teng M, et al. Prioritization of disease microRNAs through a human phenome-microRNAome network. BMC Syst Biol. 2010; 4(S1):S2.CrossRefGoogle Scholar
- 19.Pasquier C, Gardès J. Prediction of miRNA-disease associations with a vector space model. Sci Rep. 2016; 6:27036.CrossRefGoogle Scholar
- 20.Yu Q, Zhang H, Cheng L, Xiao D. KATZMDA: Prediction of miRNA-disease associations based on KATZ model. IEEE Access. 2017; PP(99):1–1.CrossRefGoogle Scholar
- 21.Nalluri JJ, Kamapantula BK, Barh D, Jain N, Bhattacharya A, Almeida SSD, et al. DISMIRA: Prioritization of disease candidates in miRNA-disease associations based on maximum weighted matching inference model and motif-based analysis. BMC Genom. 2015; 16 Suppl 5(S5):S12.CrossRefGoogle Scholar
- 22.Liao B, Ding S, Chen H, Li Z, Cai L. Identifying human microRNA–disease associations by a new diffusion-based method. J Bioinform Comput Biol. 2015; 13(04):1550014.CrossRefGoogle Scholar
- 23.Zeng X, Liao Y, Liu Y, Zou Q. Prediction and Validation of Disease Genes Using HeteSim Scores. IEEE/ACM Trans Comput Biol Bioinform. 2016; 99:1–1.Google Scholar
- 24.Chen X, Liu MX, Yan GY. RWRMDA: predicting novel human microRNA–disease associations. Mol BioSyst. 2012; 8(10):2792.CrossRefGoogle Scholar
- 25.Chen X, Yan CC, Zhang X, You ZH, Deng L, Liu Y, et al. WBSMDA: Within and Between Score for MiRNA-Disease Association prediction. Sci Rep. 2016; 6:21106.CrossRefGoogle Scholar
- 26.Gu C, Bo L, Li X, Li K. Network Consistency Projection for Human miRNA-Disease Associations Inference. Sci Rep. 2016; 6:36054.CrossRefGoogle Scholar
- 27.Chen X, Yan GY. Semi-supervised learning for potential human microRNA-disease associations inference. Sci Rep. 2014; 4:5501.CrossRefGoogle Scholar
- 28.Chen X, Huang L. LRSSLMDA: Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction. Plos Comput Biol. 2017; 13(12):e1005912.CrossRefGoogle Scholar
- 29.Fu L, Peng Q. A deep ensemble model to predict miRNA-disease association. Sci Rep. 2017; 7(1):14482.CrossRefGoogle Scholar
- 30.Jiang Q, Wang G, Zhang T, Wang Y. Predicting human microRNA-disease associations based on support vector machine. Int J Data Min Bioinform. 2011; 8(3):282–93.CrossRefGoogle Scholar
- 31.Li JQ, Rong ZH, Chen X, Yan GY, You ZH. MCMDA: Matrix completion for MiRNA-disease association prediction. Oncotarget. 2017; 8(13):21187.PubMedPubMedCentralGoogle Scholar
- 32.Lan W, Wang J, Li M, Liu J, Pan Y. Predicting microRNA-disease associations by integrating multiple biological information. In: IEEE International Conference on Bioinformatics and Biomedicine. Bioinformatics and Biomedicine: 2015. p. 183–8.Google Scholar
- 33.Zeng X, Ding N, Rodríguez-Patón A, Quan Z. Probability-based collaborative filtering model for predicting gene–disease associations. BMC Med Genomics. 2017; 10(5):76.CrossRefGoogle Scholar
- 34.Chen X, Guan NN, Li JQ, Yan GY. GIMDA: Graphlet interaction-based MiRNA-disease association prediction. J Cel Mol Med. 2018; 22(3):1548–61.CrossRefGoogle Scholar
- 35.Shen Z, Zhang YH, Han K, Nandi AK, Honig B, Huang DS. miRNA-Disease Association Prediction with Collaborative Matrix Factorization. Complexity. 2017; 2017(9):1–9.CrossRefGoogle Scholar
- 36.You ZH, Huang ZA, Zhu Z, Yan GY, Li ZW, Wen Z, et al. PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction. Plos Comput Biol. 2017; 13(3):e1005455.CrossRefGoogle Scholar
- 37.You ZH, Wang LP, Chen X, Zhang S, Li XF, Yan GY, et al. PRMDA: personalized recommendation-based MiRNA-disease association prediction. Oncotarget. 2017; 8(49):85568–83.CrossRefGoogle Scholar
- 38.Peng L, Chen Y, Ma N, Chen X. NARRMDA: negative-aware and rating-based recommendation algorithm for miRNA-disease association prediction. Mol BioSyst. 2017; 13:2650–59.CrossRefGoogle Scholar
- 39.Chen X, Niu YW, Wang GH, Yan GY. MKRMDA: multiple kernel learning-based Kronecker regularized least squares for MiRNA–disease association prediction. J Transl Med. 2017; 15(1):251.CrossRefGoogle Scholar
- 40.Wang D, Wang J, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics. 2010; 26(13):1644–50.CrossRefGoogle Scholar
- 41.Kozomara A, Griffithsjones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2014; 42(Database issue):D68.CrossRefGoogle Scholar
- 42.Chen X, Niu YW, Wang GH, Yan GY. HAMDA: Hybrid Approach for MiRNA-Disease Association prediction. J Biomed Inform. 2017; 76:50–58.CrossRefGoogle Scholar
- 43.Lowe HJ, Barnett GO. Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. Jama. 1994; 271(14):1103–8.CrossRefGoogle Scholar
- 44.Luo J, Xiao Q, Liang C, Ding P. Predicting MicroRNA-Disease Associations Using Kronecker Regularized Least Squares Based on Heterogeneous Omics Data. IEEE Access. 2017; 5(99):2503–13.CrossRefGoogle Scholar
- 45.Lee I, Blom UM, Wang PI, Shim JE, Marcotte EM. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 2011; 21(7):1109.CrossRefGoogle Scholar
- 46.Liang C, Wang G, Li J, Zhang T, Xu P, Wang Y. SIDD: A Semantically Integrated Database towards a Global View of Human Disease. Plos ONE. 2013; 8(10):e75504.CrossRefGoogle Scholar
- 47.He J, Chang SF, Xie L. Fast kernel learning for spatial pyramid matching: Computer Vision and Pattern Recognition; 2008, pp. 1–7.Google Scholar
- 48.Xia Z, Zhou X, Sun Y, Wu LY. Semi-supervised Drug-Protein Interaction Prediction from Heterogeneous Spaces, Vol. 4; 2010. p. S6.Google Scholar
- 49.Nascimento ACA, Prudencio RBC, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC Bioinformatics. 2016; 17(1):46.CrossRefGoogle Scholar
- 50.Gonen M, Kaski S. Kernelized Bayesian Matrix Factorization. IEEE Trans Pattern Anal Mach Intell. 2014; 36(10):2047–60.CrossRefGoogle Scholar
- 51.Wang L, Li X, Zhang L, Gao Q. Improved anticancer drug response prediction in cell lines using matrix factorization with similarity regularization. BMC Cancer. 2017; 17(1):513.CrossRefGoogle Scholar
- 52.Zheng X, Ding H, Mamitsuka H, Zhu S. Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining: 2013. p. 1025–33.Google Scholar
- 53.Liu Y, Wu M, Miao C, Zhao P, Li X. Neighborhood Regularized Logistic Matrix Factorization for Drug-Target Interaction Prediction. PLoS Comput Biol. 2016; 12(2):e1004760.CrossRefGoogle Scholar
- 54.Ezzat A, Zhao P, Wu M, Li X, Kwoh CK. Drug-Target Interaction Prediction with Graph Regularized Matrix Factorization. IEEE/ACM Trans Comput Biol Bioinform. 2017; 14(3):646–56.CrossRefGoogle Scholar
- 55.Chen X, Yan CC, Zhang X, You ZH, Huang YA, Yan GY. HGIMDA: Heterogeneous graph inference for miRNA-disease association prediction. Oncotarget. 2016; 7(40):65257–69.PubMedPubMedCentralGoogle Scholar
- 56.Yang Z, Ren F, Liu C, He S, Sun G, Gao Q, et al. dbDEMC: a database of differentially expressed miRNAs in human cancers. BMC Genomics. 2010; 11(Suppl 4):1–8.CrossRefGoogle Scholar

## Copyright information

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.