Guanghui Li*a,
Jiawei Luo*b,
Qiu Xiaob,
Cheng Liangc and
Pingjian Dingb
aSchool of Information Engineering, East China Jiaotong University, Nanchang, 330013, China. E-mail: ghli16@hnu.edu.cn
bCollege of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China. E-mail: luojiawei@hnu.edu.cn; alcs417@hnu.edu.cn
cCollege of Information Science and Engineering, Shandong Normal University, Jinan, 250000, China
First published on 23rd January 2018
Identifying the associations between human diseases and microRNAs is key to understanding pathogenicity mechanisms and important for uncovering novel prognostic markers. To date, a series of computational approaches have been developed for the prediction of disease–microRNA associations. However, these methods remain difficult to perform satisfactorily for diseases with a few known associated microRNAs. This study introduces a novel computational model, namely, the Kronecker kernel matrix dimension reduction (KMDR) model, for identifying potential microRNA–disease associations. This model combines microRNA space and disease space in a larger microRNA–disease space by using the Kronecker product or the Kronecker sum. The predictive performance of our proposed approach was evaluated and validated based on known association datasets. The experimental results show that KMDR achieves reliable prediction with an average AUC of 0.8320 for 22 complex diseases, which indeed outperforms other competitive methods. Moreover, case studies on kidney cancer, breast cancer, and esophageal cancer further demonstrate the applicability of our method in the identification of new disease–microRNA pairs. The source code of KMDR is freely available at https://github.com/ghli16/KMDR.
Based on the principle that functionally related miRNA molecules are likely to be regulated in phenotypically similar diseases, a number of computational tools have been put forward to uncover latent links between diseases and miRNAs.9–13 For instance, Jiang et al.14 predicted disease–miRNA interactions using hypergeometric distribution on an integrated human phenome-microRNAome network. However, the efficacy of this method is limited in that it relies on predicted miRNA–target interactions, which may be inaccurate and incomplete. Xuan et al.15 established a miRNA functional similarity network derived from known disease–miRNA relationships, disease similarity, miRNA clusters and family data. Then, they predicted potential miRNAs related to a given disease based on weighted k-most similar neighbors. Considering that the aforementioned methods only utilize local network association information for ranking the potential links, Chen et al.16 developed a global network similarity model by implementing the random walk algorithm on a constructed miRNA–miRNA functional similarity network. Shi et al.17 also modeled the disease–miRNA relationship prediction process as a random walk on a protein–protein interaction network, which calculated functional associations between disease-related genes and miRNA-targeted genes. Similarly, MIDP18 extrapolated new disease–miRNA interactions based on random walk on the miRNA functional similarity network. This model assigned different transition matrices to known and unknown miRNAs in order to use the prior information known about these miRNAs. To implement prediction for new diseases, random walk was applied to a disease–miRNA bilayer network, namely, MIDPE. Furthermore, researchers have recently integrated multiple similarities, including semantic similarities between diseases, functional similarities between miRNAs, and Gaussian interaction profile kernel similarities of miRNAs and diseases, to achieve better prediction performance. For example, Chen et al.19 introduced a similarity search method named WBSMDA, based on the within-score and between-score of each candidate disease–miRNA pair, to predict novel disease–miRNA interactions. Subsequently, You et al.20 presented the approach of path-based miRNA–disease association prediction (PBMDA) to mine latent links between disease and miRNAs on the same types of biological datasets. In addition, machine learning methods have proved efficient in this field. Xu et al.21 extracted four topological features from a constructed miRNA target-dysregulated network and imported these features into a support vector machine (SVM) to identify positive miRNAs associated with prostate cancer from negative ones. However, the performance of this approach is far from satisfactory because it is currently rather difficult, or even impossible, to select negative miRNA–disease association samples. To overcome this limitation, a semi-supervised model called RLSMDA, which did not need negative samples, was proposed by Chen et al.22 This method is especially useful when applied to diseases with no known associations to any miRNA. By integrating known disease–miRNA interactions and the similarities of miRNAs and diseases, Luo et al.23 proposed a novel computational model named KRLSM, which performed predictions on the entire disease–miRNA space by using Kronecker product algebraic properties. Recently, the method of RKNNMDA24 used K-Nearest Neighbors algorithm to search for k-nearest-neighbors both for each miRNA and disease from the similarity scores of miRNAs and diseases, and finally obtained the candidate associations according to SVM Ranking model. However, the performance of the above models remains unsatisfactory for sparse miRNA–disease association datasets.
Considering that known miRNA–disease pairs are rare in current datasets, we address the problem of association prediction on sparse known miRNA–disease interaction networks. In this study, we propose a Kronecker kernel matrix dimension reduction model, which combines the cosine similarity matrices of miRNAs and diseases into one miRNA–disease similarity matrix by using Kronecker product or Kronecker sum to identify latent relationships between diseases and miRNAs. We tested the predictive performance of this method on HMDD datasets. The experiments show that, in terms of AUC, reliable results were achieved for 22 diseases associated with at least 60 miRNAs. Additionally, we have carried out the case studies on kidney cancer, breast cancer, and esophageal cancer to further make evaluation. Among these three important cancers, more than 90 percent of the top 50 miRNA candidates were verified by the published biological literature and by three public databases.
(1) |
After calculating the cosine value for each disease–disease pair, the disease similarity matrix Sd was established.
Similarly, the miRNA cosine similarity matrix Sm can be calculated as follows:
(2) |
In this equation, IP(mi) is the interaction pattern of miRNA mi, which encodes the presence or absence of interaction with each disease (i.e., the i-th column of the adjacency matrix A).
There are other methods to calculate a similarity matrix from interaction profiles. For instance, Chen et al.29,30 proposed using the Gaussian interaction profile (GIP) kernel. We have conducted brief experiments with GIP kernel, which indicate that cosine similarity method consistently outperform the method based on GIP kernel in terms of AUC for 22 selected diseases. The detailed results are presented in ESI Fig. S1.†
vec(Â) = S·vec(A) | (3) |
Assume that kernel matrix K is an n × n matrix. The eigen decomposition of K is expressed as K = VΛVT, where V = [v1, v2, …, vn]; vi is an eigenvector of K. Λ is a diagonal matrix whose elements are [Λ]ii = λi, where λi is an associated eigenvalue of vi. Therefore, according to linear algebra theory, we can obtain the eigen decomposition of K:
(4) |
For further simplification, we assume that the eigenvalues of K are sorted in a non-increasing order (i.e., λ1 ≥ λ2 ≥ … ≥ λn). Generally, larger eigenvalues are more important than smaller ones. Therefore, we only consider the larger eigenvalues of top p, and construct a link similarity matrix S as follows:
(5) |
Note that if p is not very large, λp is always greater than 0; thus, the rank of the link similarity matrix S is p, and the rank of the kernel matrix K is always not less than p. Hence, we call this method the kernel matrix dimension reduction method (KMDR). Finally, substituting eqn (5) into eqn (3), we obtained the general formula of KMDR as follows:
vec(Â) = VΛ*VT·vec(A) | (6) |
Obviously, if we use a different kernel matrix, the final prediction score matrix by KMDR will also be different. Hence, based on the Kronecker product kernel and Kronecker sum kernel, KMDR could result in two independent sub-algorithms: KMDR-KP and KMDR-KS; KP and KS are short for Kronecker product and Kronecker sum, respectively. Fig. 1 illustrates the overall flowchart of the KMDR method.
Note that there is a slight difference between this model and the method described by Kuang et al.31 We use the larger eigenvalues of top p to combine the symmetric matrix viviT (i ∈ {1, 2, …, p}), while the method described by Kuang et al. uniformly uses a single constant, and therefore, may not be able to distinguish between the importance of different eigenvalues.
K((di,mj),(dk,ml)) = Kd(di,dk)Km(mj,ml) | (7) |
Hence, the size of the kernel matrix K is ndnm × ndnm, which would require a large memory overhead even for a moderate number of diseases and miRNAs. To reduce computational cost, a more efficient improvement has been made on the basis of eigen decompositions, as performed in.32 Let Kd = VdΛdVdT and Km = VmΛmVmT be the eigen decompositions of the kernel matrices Kd and Km. As the vectors (eigenvalues) of a Kronecker product are the Kronecker product of vectors (eigenvalues), we can rewrite the Kronecker product kernel as K = Kd ⊗ Km = VΛVT, where V = Vd ⊗ Vm and Λ = Λd ⊗ Λm. To efficiently multiply this kernel matrix with vec(AT), we make good use of an algebraic property of the Kronecker product, that is, (B ⊗ C)vec(X) = vec(CXBT). After the conversion, the final prediction score matrix can be written as follows:
 = VdZTVTm | (8) |
There is a parameter p in the construction of the link similarity matrix S. Here, we choose p = [n × q], where n is the size of kernel matrix K, and q ∈ [0, 1] is a proportion coefficient. The symbol [·] represents the Gauss rounding function. Notably, q was set as 0.25 in all experiments, and 0.25 was also chosen as the optimal parameter q in the method described by Kuang et al.31 This is equivalent to projecting the data onto the subspace spanned by the top 25% principal components.
To demonstrate the effectiveness of the KMDR model, we compared its two sub-algorithms with six state-of-the-art models, namely, MIDP,18 MIDPE,18 RLSMDA,22 WBSMDA,19 KRLSM,23 and RKNNMDA.24 The parameters in MIDP, MIDPE, RLSMDA, KRLSM, and RKNNMDA are all chosen according to the author's recommendation.
Table 1 lists in detail the AUC values of the 22 diseases for each method of comparison. As the table shows, KMDR-KP and KMDR-KS consistently outperform the other six computational approaches for the most selected diseases. In particular, the performance of KMDR built with the Kronecker sum kernel was consistently better than that of the Kronecker product kernel. KMDR-KS has the highest average AUC score, which is 0.8320, whereas the respective AUCs of KMDR-KP, MIDP, MIDPE, RLSMDA, WBSMDA, KRLSM, and RKNNMDA were 0.8142, 0.7704, 0.7904, 0.6208, 0.7833, 0.7129, and 0.6650. The average AUCs obtained by KMDR-KS were 1.78%, 6.16%, 4.16%, 21.12%, 4.87%, 11.91%, and 16.70% higher than those of the other six methods. Meanwhile, Fig. 2 shows the comparison of the ROC curves from each method.
Disease name | #miRNAs | AUC | RKNNMDA | ||||||
---|---|---|---|---|---|---|---|---|---|
KMDR-KP | KMDR-KS | MIDP | MIDPE | RLSMDA | WBSMDA | KRLSM | |||
Breast neoplasms | 202 | 0.8168 | 0.8169 | 0.7250 | 0.7511 | 0.5418 | 0.7246 | 0.7541 | 0.7089 |
Hepatocellular carcinoma | 214 | 0.7415 | 0.7571 | 0.6811 | 0.7188 | 0.5868 | 0.7184 | 0.6377 | 0.6635 |
Non-small-cell lung carcinoma | 95 | 0.8454 | 0.8573 | 0.7380 | 0.7753 | 0.5742 | 0.8129 | 0.7279 | 0.7152 |
Renal cell carcinoma | 107 | 0.7826 | 0.7991 | 0.6924 | 0.7331 | 0.5803 | 0.7553 | 0.6870 | 0.6579 |
Squamous cell carcinoma | 80 | 0.8504 | 0.8726 | 0.7784 | 0.7911 | 0.6375 | 0.8230 | 0.6798 | 0.6750 |
Colonic neoplasms | 78 | 0.8414 | 0.8585 | 0.8086 | 0.8289 | 0.6140 | 0.7750 | 0.6502 | 0.7036 |
Colorectal neoplasms | 147 | 0.8082 | 0.8248 | 0.7191 | 0.7499 | 0.6395 | 0.6985 | 0.6558 | 0.6426 |
Endometriosis | 62 | 0.7974 | 0.8130 | 0.7746 | 0.7840 | 0.5834 | 0.7739 | 0.6825 | 0.6063 |
Esophageal neoplasms | 74 | 0.7665 | 0.7836 | 0.7298 | 0.7298 | 0.6898 | 0.7141 | 0.7001 | 0.6244 |
Glioblastoma | 96 | 0.7777 | 0.8001 | 0.7178 | 0.7394 | 0.5604 | 0.7912 | 0.5966 | 0.6769 |
Glioma | 71 | 0.8501 | 0.8665 | 0.7513 | 0.7798 | 0.5025 | 0.8265 | 0.7940 | 0.7573 |
Head and neck neoplasms | 64 | 0.8317 | 0.8597 | 0.7994 | 0.8122 | 0.4387 | 0.8155 | 0.8269 | 0.6323 |
Heart failure | 120 | 0.7690 | 0.7854 | 0.9116 | 0.9267 | 0.8493 | 0.7950 | 0.6574 | 0.6725 |
Leukemia, myeloid, acute | 64 | 0.7983 | 0.8400 | 0.8430 | 0.8443 | 0.6798 | 0.8705 | 0.6568 | 0.6761 |
Lung neoplasms | 132 | 0.8836 | 0.9027 | 0.8305 | 0.8595 | 0.7434 | 0.8509 | 0.7939 | 0.7460 |
Medulloblastoma | 62 | 0.7875 | 0.7900 | 0.7704 | 0.7832 | 0.6367 | 0.7585 | 0.6443 | 0.6586 |
Melanoma | 141 | 0.8199 | 0.8296 | 0.7764 | 0.7850 | 0.5479 | 0.7758 | 0.7364 | 0.6242 |
Ovarian neoplasms | 114 | 0.8889 | 0.8949 | 0.8552 | 0.8793 | 0.5993 | 0.8503 | 0.8114 | 0.6362 |
Pancreatic neoplasms | 99 | 0.8807 | 0.8961 | 0.8209 | 0.8406 | 0.7866 | 0.8436 | 0.7923 | 0.6617 |
Prostatic neoplasms | 118 | 0.8093 | 0.8353 | 0.7576 | 0.7864 | 0.6535 | 0.7747 | 0.7423 | 0.5936 |
Stomach neoplasms | 174 | 0.7608 | 0.7767 | 0.7425 | 0.7288 | 0.5318 | 0.6807 | 0.6763 | 0.6869 |
Urinary bladder neoplasms | 92 | 0.8039 | 0.8440 | 0.7261 | 0.7606 | 0.6797 | 0.8028 | 0.7810 | 0.6104 |
Average AUC | 0.8142 | 0.8320 | 0.7704 | 0.7904 | 0.6208 | 0.7833 | 0.7129 | 0.6650 |
Fig. 3 displays the PR curves and the average AUPR scores of the above eight methods. It is obvious that the PR curves of KMDR-KP and KMDR-KS lie above those of MIDP, MIDPE, RLSMDA, WBSMDA, KRLSM, and RKNNMDA. The average AUPR values achieved by KMDR-KS were 6.32%, 11.53%, 11.73%, 19.37%, 10.07%, 11.46%, and 12.65% higher than those of the other seven methods. These prediction results suggest that the KMDR model performs well with diseases that are associated with only a few known miRNAs. This might be attributed to the fact that KMDR successfully combines the spaces of diseases and miRNAs into a single disease–miRNA space by using Kronecker sum. However, for two diseases, namely, “Heart Failure” and “Leukemia, Myeloid, Acute”, MIDPE and WBSMDA achieve higher AUCs than KMDR-KS; this could be because our method only adopts the topological structure of the disease–miRNA bipartite network.
To further confirm the ability of KMDR to discover new miRNA–disease interactions, we present case studies of several important diseases (kidney neoplasms, breast neoplasms, and esophageal neoplasms). All known interactions included in the HMDD database are taken as the training set, and the non-interacting pairs of each disease are ranked according to the prediction scores. Predictive results were validated based on experimental literature and three recently updated disease–miRNA databases, namely, dbDEMC,26 miRCancer,27 and PhenomiR2.0.28
As a common urologic malignancy, the incidence and death rates of kidney cancer have been rising gradually. According to the report of the American Cancer Society in 2016, there would be approximately 62700 new cases of kidney cancer, and 14240 deaths, in America.33 Recent biological experiments have shown that many miRNAs are related to kidney cancer. Here, we implemented KMDR-KS to identify candidate kidney neoplasm-associated miRNAs. As a result, using the dbDEMC and miRCancer databases, all of the top 50 miRNA candidates were identified as being associated with kidney cancer (see Table 2). For the top 5 predicted candidates, hsa-mir-155 and hsa-mir-126 were found to be up-regulated in renal cell carcinoma,34,35 while hsa-mir-145, hsa-mir-200b, and hsa-mir-146a were identified as being down-regulated.36,37 Notably, only 7 known miRNAs were associated with kidney neoplasms in our gold standard dataset. Hence, this case study further demonstrates that the KMDR model is effective in predicting new associations for diseases that are associated with only a few known miRNAs.
Rank | miRNAs | Evidences | Rank | miRNAs | Evidences |
---|---|---|---|---|---|
1 | hsa-mir-155 | dbDEMC | 26 | hsa-mir-1 | miRCancer |
2 | hsa-mir-145 | dbDEMC, miRCancer | 27 | hsa-mir-203 | dbDEMC, miRCancer |
3 | hsa-mir-200b | dbDEMC | 28 | hsa-mir-19b | dbDEMC |
4 | hsa-mir-146a | dbDEMC | 29 | hsa-mir-375 | dbDEMC |
5 | hsa-mir-126 | dbDEMC | 30 | hsa-mir-9 | dbDEMC |
6 | hsa-mir-200a | dbDEMC | 31 | hsa-mir-222 | dbDEMC |
7 | hsa-mir-16 | dbDEMC | 32 | hsa-let-7b | dbDEMC |
8 | hsa-mir-125b | dbDEMC | 33 | hsa-mir-210 | dbDEMC, miRCancer |
9 | hsa-mir-34a | dbDEMC | 34 | hsa-mir-10b | dbDEMC |
10 | hsa-mir-20a | dbDEMC | 35 | hsa-mir-214 | dbDEMC |
11 | hsa-let-7a | dbDEMC | 36 | hsa-let-7c | dbDEMC |
12 | hsa-mir-17 | dbDEMC | 37 | hsa-mir-195 | dbDEMC |
13 | hsa-mir-143 | dbDEMC | 38 | hsa-mir-29c | dbDEMC |
14 | hsa-mir-221 | dbDEMC | 39 | hsa-mir-218 | dbDEMC |
15 | hsa-mir-31 | dbDEMC | 40 | hsa-mir-182 | dbDEMC |
16 | hsa-mir-92a | dbDEMC | 41 | hsa-mir-486 | dbDEMC |
17 | hsa-mir-29b | dbDEMC | 42 | hsa-mir-150 | dbDEMC |
18 | hsa-mir-29a | dbDEMC | 43 | hsa-mir-27a | dbDEMC |
19 | hsa-mir-205 | miRCancer | 44 | hsa-mir-146b | dbDEMC |
20 | hsa-mir-223 | dbDEMC, miRCancer | 45 | hsa-mir-183 | dbDEMC, miRCancer |
21 | hsa-mir-18a | dbDEMC | 46 | hsa-mir-181b | dbDEMC |
22 | hsa-mir-19a | dbDEMC | 47 | hsa-mir-101 | dbDEMC |
23 | hsa-mir-199a | dbDEMC, miRCancer | 48 | hsa-mir-196a | dbDEMC |
24 | hsa-mir-181a | dbDEMC | 49 | hsa-mir-24 | dbDEMC |
25 | hsa-mir-429 | dbDEMC | 50 | hsa-mir-15b | dbDEMC |
Breast cancer is the most commonly diagnosed cancer in women, especially in developed countries. The American Cancer Society had estimated that during 2016, breast cancer would result in approximately 246600 new cases and 40450 female deaths in America.33 Previous studies have shown that multiple miRNAs have links with the progression of breast neoplasms. By implementing KMDR-KS to predict novel miRNA candidates associated with breast neoplasms, we confirmed that 45 out of the top 50 predicted miRNAs are present in dbDEMC, miRCancer, and PhenomiR2.0 (see Table 3). Furthermore, some potential candidates were validated by searching the literature on the PubMed website. Specifically, the expression of hsa-mir-378a (ranked 8th) increases during breast cancer formation.38 Hsa-mir-542 (ranked 13th) has been identified as being significantly down-regulated in breast cancer cells.39 In addition, hsa-mir-532 (ranked 26th) is markedly up-regulated in breast cancer tissues relative to normal tissues.40
Rank | miRNAs | Evidences | Rank | miRNAs | Evidences |
---|---|---|---|---|---|
1 | hsa-mir-106a | dbDEMC, PhenomiR2.0 | 26 | hsa-mir-532 | PMID: 24866763 |
2 | hsa-mir-142 | miRCancer, PhenomiR2.0 | 27 | hsa-mir-95 | dbDEMC, PhenomiR2.0 |
3 | hsa-mir-99a | dbDEMC, miRCancer, PhenomiR2.0 | 28 | hsa-mir-517a | dbDEMC, miRCancer, PhenomiR2.0 |
4 | hsa-mir-130a | dbDEMC, miRCancer, PhenomiR2.0 | 29 | hsa-mir-30e | miRCancer, PhenomiR2.0 |
5 | hsa-mir-138 | dbDEMC | 30 | hsa-mir-372 | dbDEMC, PhenomiR2.0 |
6 | hsa-mir-330 | dbDEMC, PhenomiR2.0 | 31 | hsa-mir-32 | dbDEMC, miRCancer, PhenomiR2.0 |
7 | hsa-mir-150 | dbDEMC, miRCancer, PhenomiR2.0 | 32 | hsa-mir-211 | dbDEMC, miRCancer, PhenomiR2.0 |
8 | hsa-mir-378a | PMID: 20889127 | 33 | hsa-mir-381 | dbDEMC, miRCancer, PhenomiR2.0 |
9 | hsa-mir-186 | dbDEMC, PhenomiR2.0 | 34 | hsa-mir-370 | dbDEMC, miRCancer, PhenomiR2.0 |
10 | hsa-mir-185 | dbDEMC, miRCancer, PhenomiR2.0 | 35 | hsa-mir-181c | dbDEMC, PhenomiR2.0 |
11 | hsa-mir-15b | dbDEMC, PhenomiR2.0 | 36 | hsa-mir-181d | dbDEMC, PhenomiR2.0 |
12 | hsa-mir-192 | dbDEMC, PhenomiR2.0 | 37 | hsa-mir-361 | PhenomiR2.0 |
13 | hsa-mir-542 | PMID: 22051041 | 38 | hsa-mir-2110 | dbDEMC |
14 | hsa-mir-650 | dbDEMC | 39 | hsa-mir-1303 | dbDEMC |
15 | hsa-mir-98 | dbDEMC, miRCancer, PhenomiR2.0 | 40 | hsa-mir-744 | dbDEMC |
16 | hsa-mir-130b | dbDEMC, PhenomiR2.0 | 41 | hsa-mir-1249 | Unconfirmed |
17 | hsa-mir-92b | dbDEMC | 42 | hsa-mir-376a | dbDEMC |
18 | hsa-mir-196b | dbDEMC, PhenomiR2.0 | 43 | hsa-mir-520e | dbDEMC, miRCancer, PhenomiR2.0 |
19 | hsa-mir-216a | dbDEMC, PhenomiR2.0 | 44 | hsa-mir-134 | dbDEMC, PhenomiR2.0 |
20 | hsa-mir-508 | Unconfirmed | 45 | hsa-mir-144 | dbDEMC, miRCancer |
21 | hsa-mir-574 | miRCancer | 46 | hsa-mir-190a | dbDEMC |
22 | hsa-mir-449b | dbDEMC | 47 | hsa-mir-421 | dbDEMC, miRCancer |
23 | hsa-mir-212 | dbDEMC, miRCancer, PhenomiR2.0 | 48 | hsa-mir-526b | dbDEMC, miRCancer, PhenomiR2.0 |
24 | hsa-mir-99b | dbDEMC, PhenomiR2.0 | 49 | hsa-mir-208a | dbDEMC, PhenomiR2.0 |
25 | hsa-mir-449a | dbDEMC, miRCancer, PhenomiR2.0 | 50 | hsa-mir-362 | miRCancer |
Esophageal cancer is the eighth most frequently diagnosed cancer worldwide, and it is considered the sixth leading cause of cancer-related death on account of its poor prognosis. Early detection and timely treatment of esophageal cancer is very helpful in improving the chance of a patient's survival. In our standard association dataset, 74 known miRNAs are related to esophageal cancer. Among the top 50 predicted candidates ranked by KMDR-KS, 47 miRNAs are corroborated by the three aforementioned databases (see Table 4). Additionally, hsa-mir-200b (ranked 4th) was supported by experimental literature as being correlated with esophageal neoplasms.41
Rank | miRNAs | Evidences | Rank | miRNAs | Evidences |
---|---|---|---|---|---|
1 | hsa-mir-17 | dbDEMC | 26 | hsa-mir-7 | dbDEMC |
2 | hsa-mir-125b | dbDEMC, PhenomiR2.0 | 27 | hsa-mir-124 | dbDEMC, miRCancer |
3 | hsa-mir-218 | dbDEMC, miRCancer | 28 | hsa-let-7g | dbDEMC |
4 | hsa-mir-200b | PMID: 24064224 | 29 | hsa-mir-224 | dbDEMC |
5 | hsa-mir-16 | dbDEMC | 30 | hsa-mir-195 | dbDEMC |
6 | hsa-mir-18a | dbDEMC | 31 | hsa-mir-127 | dbDEMC |
7 | hsa-mir-221 | dbDEMC, miRCancer | 32 | hsa-let-7f | dbDEMC |
8 | hsa-mir-10b | dbDEMC, miRCancer | 33 | hsa-mir-125a | dbDEMC |
9 | hsa-mir-182 | dbDEMC | 34 | hsa-let-7i | dbDEMC |
10 | hsa-mir-19b | dbDEMC | 35 | hsa-mir-93 | dbDEMC, PhenomiR2.0 |
11 | hsa-mir-1 | dbDEMC | 36 | hsa-mir-429 | dbDEMC |
12 | hsa-let-7d | dbDEMC | 37 | hsa-mir-151a | Unconfirmed |
13 | hsa-mir-146b | dbDEMC, miRCancer | 38 | hsa-mir-107 | dbDEMC |
14 | hsa-mir-222 | dbDEMC | 39 | hsa-mir-135a | dbDEMC |
15 | hsa-mir-133b | dbDEMC | 40 | hsa-mir-191 | dbDEMC |
16 | hsa-mir-181a | dbDEMC | 41 | hsa-mir-24 | dbDEMC |
17 | hsa-mir-181b | dbDEMC | 42 | hsa-mir-18b | dbDEMC |
18 | hsa-let-7e | dbDEMC | 43 | hsa-mir-106a | dbDEMC |
19 | hsa-mir-142 | dbDEMC | 44 | hsa-mir-103a | dbDEMC |
20 | hsa-mir-9 | dbDEMC | 45 | hsa-mir-302b | Unconfirmed |
21 | hsa-mir-30c | dbDEMC | 46 | hsa-mir-27b | dbDEMC, PhenomiR2.0 |
22 | hsa-mir-29b | dbDEMC | 47 | hsa-mir-96 | dbDEMC, miRCancer |
23 | hsa-mir-199b | dbDEMC | 48 | hsa-mir-30d | dbDEMC |
24 | hsa-mir-29a | dbDEMC | 49 | hsa-mir-106b | dbDEMC |
25 | hsa-mir-30a | dbDEMC | 50 | hsa-mir-138 | dbDEMC |
The results of the case studies fully illustrate that KMDR-KS performs well in predicting potential disease-associated miRNAs. Therefore, we further used KMDR-KS and KMDR-KP to rank potential candidates associated with each disease contained in HMDD (shown in ESI Tables S1 and S2†), in the hope that these prediction results will be validated by future biological experiments.
The reliable performance of KMDR can be contributed to several factors. To begin with, our method combines the cosine similarity matrices of miRNAs and diseases into a larger miRNA–disease similarity matrix, which directly relates disease–miRNA pairs and could effectively improve the prediction performance. Second, negative miRNA–disease association samples are not needed in KMDR. Finally, KMDR is a global prediction model, which could be used to infer hidden miRNAs for all the diseases simultaneously.
Despite the efficiency and practicability of KMDR, there still exist some inevitable limitations that need further research. To begin with, like some other models,42–44 KMDR only depends on the topological structure of the miRNA–disease network, which means it cannot predict associations for a disease that does not exist within the network. To solve this problem, extensional biological information, like miRNA functional similarity data and disease semantic similarity data, can be integrated to expand the application range of KMDR. Second, our similarity matrices for KMDR might not be optimal in some scenarios. Finally, as the currently known miRNA–disease associations are insufficient, more information about diseases and miRNAs can be used for constructing more reliable disease-similarity and miRNA-similarity matrices, which may potentially improve prediction results. For example, we will integrate disease–gene interactions and miRNA-gene interactions in our future work.
Footnote |
† Electronic supplementary information (ESI) available: One supplementary figure and two supplemental tables are available as excel files. See DOI: 10.1039/c7ra12491k |
This journal is © The Royal Society of Chemistry 2018 |