Changlong Gua,
Bo Liao*a,
Xiaoying Lia,
Lijun Caia,
Haowen Chena,
Keqin Lib and
Jialiang Yangc
aCollege of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China. E-mail: dragonbw@163.com
bDepartment of Computer Science, State University of New York, New Paltz, New York 12561, USA
cDepartment of Genetics and Gnomic Science, Icahn School of Medicine at Mount Sinai, New York 10029, USA
First published on 20th September 2017
MicroRNAs (miRNAs) play important roles in the pathogenesis and development of many complex diseases. The experimental confirmation of disease-related miRNAs is costly and time-consuming. An efficient and accurate computational model for identifying potential miRNA–disease associations is a useful supplement for experimental approaches. In this study, we develop a new method for measuring miRNA and disease similarities, which are key issues in identifying miRNA–disease associations, based on normalized mutual information. Subsequently, a network-based collaborative filtering recommendation model, network-based collaborative filtering (NetCF), is proposed for predicting potential miRNA–disease associations by integrating miRNA and disease similarities along with experimentally verified miRNA–disease associations. Leave-one-out cross validation is implemented to evaluate the predicted performance of NetCF. NetCF obtains a reliable AUC value of 0.8960, which is superior to other competitive methods. Implementing NetCF to predict lung cancer and prostate cancer-related miRNAs, 94% of the top 50 predicted miRNAs of each cancer have been confirmed by other databases.
Several computational models have been developed to infer potential miRNA–disease associations. The aforementioned methods can be divided into two categories: network-based methods and machine-learning-based methods.13
The key problem of network-based methods is predicting that miRNA–disease associations are similar to the calculation among miRNAs and diseases over the networks. Some approaches have been reported to measure miRNA and disease similarities;14–16 Zou et al.17 reviewed the main similarity computation methods. Based on the common assumption that miRNAs are normally associated with phenotypically similar diseases and vice versa, Jiang et al.18 constructed a functionally related miRNA network and a human phenome-miRNAome network to prioritize potential disease-related miRNAs. However, the main limitation of this method is the high number of false positives that are produced in the miRNA target prediction step. To improve predicted performance, Jiang et al.19 subsequently proposed a Naive Bayes model to infer disease-related miRNAs by integrating multiple types of data resources. Some researchers have successfully applied the random walk algorithm to predict miRNA–disease associations.20–22 Based on global network similarity measures, Chen et al.21 constructed a miRNA–miRNA functional similarity network and implemented Random Walk with Restart (RWR) from known disease-related miRNA seed nodes to prioritize potential disease-related miRNAs. By integrating disease–gene associations, miRNA–mRNA interactions, and protein–protein interactions, Shi et al.22 developed an improved method based on RWR to predict disease-related miRNAs and achieved a satisfactory performance with cross validation. Liu et al.20 recently constructed a heterogeneous network by connecting disease and miRNA similarity subnetworks using known miRNA–disease associations and extended RWR method to infer potential miRNA–disease associations in the heterogeneous network. Although the RWR method presented good performance in predicting miRNA–disease associations, it cannot be implemented to diseases without any known associated miRNA. A similarity-based method called network-consistency-based inference (NetCBI) has been proposed by Chen et al.23 to predict miRNA–disease associations. NetCBI can predict disease-related miRNAs when diseases have no known associated miRNA. However, the cross validation exhibited poor performance. Xuan et al.24 presented a novel method (HDMP) after considering the local information of network, based on weighted k where most similar neighbors predict disease-related miRNAs. Cross validation and case studies of HDMP indicate good predicted performance, but it does not work for diseases without known related miRNAs. Furthermore, based solely on gene expression profiles, Zhao et al.25 presented a computational framework to identify the cancer-related miRNAs, and constructed a cancer–miRNA-pathway network, which can help explain how miRNAs are involved in cancer. Recently, Qin et al.26 proposed a method to predict disease-associated miRNAs based on protein domains. The results on real datasets demonstrate the effectiveness of the approach.
Some researchers proposed machine-learning-based methods to predict potential miRNA–disease associations. To distinguish positive miRNA–disease from large-scale negative miRNA–disease associations, Jiang et al.27 extracted a set of features from each positive and negative microRNA–disease association and trained a Support Vector Machine (SVM) classifier to predict novel miRNA–disease associations. Based on miRNA–disease heterogeneous network, Zeng et al.28 used a path-based measure named HeteSim,29 to calculate relevance between objects in the heterogeneous network and combined HeteSim scores with a machine learning method to predict novel miRNA–disease associations. The challenge of using machine-learning-based methods for predicting novel miRNA–disease associations is the difficulty in obtaining negative samples (a miRNA is not associated with a disease). Given that limited trials do not provide enough evidence to prove that miRNA is not associated with a disease, Chen et al.30 proposed the Regularized Least Squares for miRNA–disease Associations (RLSMDA) to prioritize potential miRNA–disease associations without utilizing negative samples. RLSMDA is a semi-supervised classification algorithm that can predict associations for disease without any associated miRNA.
By analyzing the aforementioned methods, the existing computation methods for predicting miRNA–disease associations are restricted by several limitations. First, some methods18 calculated miRNA similarities based on miRNA–mRNA database, and would produce higher false positives in the miRNA target prediction step. Second, some approaches21,23,24,30 calculated miRNA similarity based on the known miRNA–disease associations and evaluated their predicted performance through leave-one-out cross validation (LOOCV). The predicted performance would be overestimated, given that the similarity calculation has included the removal of the miRNA–disease association when LOOCV is performed.13 Third, some methods21,24 cannot be implemented to disease without any known associated miRNA. Finally, some machine-learning-based methods27,28 require negative samples to train classifiers; however, obtaining the negative samples is difficult.
To solve these complications, a network-based collaborative filtering recommend algorithm (NetCF) is proposed to reveal the potential associations between miRNAs and diseases. NetCF integrates miRNA and disease similarities along with the known miRNA–disease associations to reveal potential miRNA–disease associations. NetCF exhibits a clear advantage over other methods, which involve various features, such as LOOCV, case studies, global prediction for all diseases, prediction of disease without any known related miRNA (isolated disease), and prediction of miRNA with no associated disease (novel miRNA).
The main contributions of the paper are summarized as follows.
(1) miRNA similarities are calculated by experimentally verifying miRNA–mRNA interactions to eliminate false positives of the miRNA-target.
(2) miRNA and disease similarities are not dependent on the known miRNA–disease associations, so that LOOCV will not be too high to estimate its predicted performance.
(3) NetCF integrates miRNA- and disease-similarity-based recommendations to predict potential miRNA–disease associations. Therefore, when a disease is not related with any miRNA, the association can be predicted by miRNA-similarity-based recommendation. For the same reason, NetCF can also be applied to predict novel miRNA.
(4) NetCF uses similarity information and known miRNA–disease associations to infer potential miRNA–disease associations without requiring negative sample information.
(1) |
The normalized mutual information (NMI) of TmA and TmB is used to measure the functional similarity of miRNAs A and B:
(2) |
Based on the common assumption that two diseases with similar functions are normally associated with similar target genes, we also used the NMI of two disease-target gene sets to measure their functional similarity. Experimentally-verified disease–mRNA interactions are employed in this study and downloaded from the DisGeNET database. We use sets TdA = {TdA(1),TdA(2),…,TdA(da)} and TdB = {TdB(1),TdB(2),…,TdB(db)} to denote the target gene set of diseases A and B, where da and db refer to the number of target genes of diseases A and B, respectively. Similar to the miRNA function similarity calculation, the NMI of TdA and TdB is used to measure the functional similarity between diseases A and B as follows:
(3) |
An improved form of Wang's method14 for disease semantic similarity calculation is implemented in this paper. This method calculates disease semantic similarity based on the hierarchical structure of MeSH. A disease can be described as a directed acyclic graph (DAG), in which the nodes represent diseases, whereas the links represent the relationship between nodes. Let DAGd = (d,Td,Ed) denote the DAG graph of disease d, where Td is the node set (all ancestor nodes of disease d including disease d itself) and Ed is the connected edge set. Wang's method defines the semantic contribution of node t ∈ Td as follows:
(4) |
The importance of the disease term itself is considered; for example, disease terms “liver neoplasms” and “neoplasms” are specifically described as “liver neoplasms”, such that its semantic contribution value should be greater than the contribution value of “neoplasms.” We use information content (IC) to measure the importance of the disease term itself:
IC(t) = −log2(p(t)) | (5) |
(6) |
Finally, the similarity between diseases A and B is calculated by integrating disease functional and disease semantic similarities as follows:
DD(A,B) = αDF(A,B) + (1 − α)DS(A,B) | (7) |
The detailed implementation procedure of NetCF for calculating the predictor score between miRNA i and disease j is as follows.
First, based on the similarity information between miRNA i and their neighbors and the association information between the neighbors and disease j, the miRNA-similarity-based recommendation score between miRNA i and disease j is calculated. Evidently, if the similarity between a neighbor and miRNA i is extremely small, then the contribution of the neighbor can be ignored. The miRNA-similarity-based recommendation score according to the n most similar neighbors of miRNA i is calculated as follows:
(8) |
Second, the disease-similarity-based recommendation score between miRNA i and disease j is calculated based on the similarity information between disease j and their neighbors and the association information between the neighbors and miRNA i. For the same reason, the disease-similarity-based recommendation score is calculated by the m most similar neighbors of disease j as follows:
(9) |
Finally, miRNA- and disease-similarity-based recommendation scores are integrated as the final recommendation score of miRNA i and disease j as follows:
RS(i,j) = βRSm(i,j) + (1 − β)RSd(i,j) | (10) |
Using our proposed similarity computation method to measure the similarity of miRNA and disease, LOOCV is implemented on the benchmark dataset; and receiver operating characteristic (ROC) curve and the area under ROC curve (AUC) are adopted to evaluate the predicted performance of NetCF and comparison methods. The four parameters of NetCF are set to α = 0.5, β = nm/(nm + nd), n = 47, and m = 33. Optimal parameters are selected for Liu's method, NetCBI, RLSMDA, and KATZ as described in their literatures. The ROC curves of NetCF and comparison methods are plotted in Fig. 3, and the AUC values are indicated in the legends.
Fig. 3 Performance comparisons of NetCF, Liu's method, NetCBI, RLSMDA, and KATZMDA in terms of ROC curves and AUCs based on LOOCV. |
The AUC value of NetCF is 0.8960, whereas those of Liu's method, NetCBI, RLSMDA, and KATZ are 0.7974, 0.8105, 0.8406, and 0.8315, respectively. All methods obtained a reliable AUC value when LOOCV is implemented on the benchmark dataset, which proves the rationality of our miRNA and disease similarity measure.
Evidently, NetCF shows better predicted performance compared with Liu's method, NetCBI, RLSMDA, and KATZ.
Rank | miRNA | Evidences |
---|---|---|
1 | hsa-mir-16 | dbDEMC, PhenomiR |
2 | hsa-mir-195 | dbDEMC, PhenomiR |
3 | hsa-mir-429 | dbDEMC |
4 | hsa-mir-15a | dbDEMC, PhenomiR |
5 | hsa-mir-451a | dbDEMC |
6 | hsa-mir-141 | dbDEMC, PhenomiR |
7 | hsa-mir-106b | dbDEMC, PhenomiR |
8 | hsa-mir-449a | PhenomiR |
9 | hsa-mir-193b | dbDEMC, PhenomiR |
10 | hsa-mir-302d | PhenomiR |
11 | hsa-mir-383 | PhenomiR |
12 | hsa-mir-20b | dbDEMC, PhenomiR |
13 | hsa-mir-194 | dbDEMC, PhenomiR |
14 | hsa-mir-130a | dbDEMC, PhenomiR |
15 | hsa-mir-151a | dbDEMC |
16 | hsa-mir-99a | dbDEMC, PhenomiR |
17 | hsa-mir-296 | dbDEMC, PhenomiR |
18 | hsa-mir-320a | dbDEMC, PhenomiR |
19 | hsa-mir-215 | PhenomiR |
20 | hsa-mir-378a | dbDEMC |
21 | hsa-mir-15b | dbDEMC, PhenomiR |
22 | hsa-mir-153 | dbDEMC, PhenomiR |
23 | hsa-mir-328 | dbDEMC, PhenomiR |
24 | hsa-mir-149 | dbDEMC, PhenomiR |
25 | hsa-mir-302c | PhenomiR |
26 | hsa-mir-130b | dbDEMC, PhenomiR |
27 | hsa-mir-122 | PhenomiR |
28 | hsa-mir-302a | PhenomiR |
29 | hsa-mir-449b | PhenomiR |
30 | hsa-mir-10a | dbDEMC, PhenomiR |
31 | hsa-mir-152 | dbDEMC, PhenomiR |
32 | hsa-mir-147 | dbDEMC |
33 | hsa-mir-302b | PhenomiR |
34 | hsa-mir-204 | dbDEMC, PhenomiR |
35 | hsa-mir-181d | dbDEMC, PhenomiR |
36 | hsa-mir-139 | dbDEMC, PhenomiR |
37 | hsa-mir-372 | PhenomiR |
38 | hsa-mir-196b | dbDEMC, PhenomiR |
39 | hsa-mir-423 | dbDEMC, PhenomiR |
40 | hsa-mir-148b | dbDEMC, PhenomiR |
41 | hsa-mir-520g | Unconfirmed |
42 | hsa-mir-615 | dbDEMC |
43 | hsa-mir-151b | dbDEMC |
44 | hsa-mir-373 | PhenomiR |
45 | hsa-mir-452 | dbDEMC, PhenomiR |
46 | hsa-mir-367 | PhenomiR |
47 | hsa-mir-630 | Unconfirmed |
48 | hsa-mir-324 | dbDEMC, PhenomiR |
49 | hsa-mir-519c | Unconfirmed |
50 | hsa-mir-625 | dbDEMC |
Rank | miRNA | Evidences |
---|---|---|
1 | hsa-mir-18a | dbDEMC, PhenomiR |
2 | hsa-mir-155 | PhenomiR |
3 | hsa-mir-429 | Unconfirmed |
4 | hsa-mir-9 | dbDEMC, PhenomiR |
5 | hsa-mir-19b | dbDEMC, PhenomiR |
6 | hsa-mir-19a | dbDEMC, PhenomiR |
7 | hsa-mir-181a | dbDEMC, PhenomiR |
8 | hsa-mir-196a | dbDEMC, PhenomiR |
9 | hsa-mir-29c | dbDEMC, PhenomiR |
10 | hsa-mir-10b | PhenomiR |
11 | hsa-mir-138 | PhenomiR |
12 | hsa-mir-24 | dbDEMC, PhenomiR |
13 | hsa-mir-7 | dbDEMC, PhenomiR |
14 | hsa-mir-210 | dbDEMC, PhenomiR |
15 | hsa-mir-150 | PhenomiR |
16 | hsa-mir-451a | dbDEMC |
17 | hsa-let-7e | dbDEMC, PhenomiR |
18 | hsa-mir-30a | dbDEMC, PhenomiR |
19 | hsa-mir-125a | dbDEMC, PhenomiR |
20 | hsa-mir-149 | dbDEMC, PhenomiR |
21 | hsa-mir-103a | dbDEMC |
22 | hsa-let-7g | dbDEMC, PhenomiR |
23 | hsa-mir-192 | dbDEMC |
24 | hsa-mir-186 | dbDEMC, PhenomiR |
25 | hsa-mir-140 | dbDEMC |
26 | hsa-mir-20b | dbDEMC |
27 | hsa-mir-302d | PhenomiR |
28 | hsa-mir-128 | dbDEMC, PhenomiR |
29 | hsa-mir-328 | dbDEMC, PhenomiR |
30 | hsa-mir-215 | dbDEMC, PhenomiR |
31 | hsa-mir-383 | dbDEMC, PhenomiR |
32 | hsa-mir-26b | dbDEMC, PhenomiR |
33 | hsa-mir-302a | PhenomiR |
34 | hsa-let-7f | dbDEMC, PhenomiR |
35 | hsa-mir-181d | dbDEMC |
36 | hsa-mir-142 | PhenomiR |
37 | hsa-mir-449b | Unconfirmed |
38 | hsa-mir-197 | dbDEMC, PhenomiR |
39 | hsa-mir-10a | dbDEMC, PhenomiR |
40 | hsa-mir-302b | PhenomiR |
41 | hsa-mir-615 | dbDEMC |
42 | hsa-mir-365a | dbDEMC |
43 | hsa-mir-92b | Unconfirmed |
44 | hsa-mir-139 | dbDEMC, PhenomiR |
45 | hsa-mir-423 | dbDEMC, PhenomiR |
46 | hsa-mir-212 | dbDEMC, PhenomiR |
47 | hsa-mir-137 | PhenomiR |
48 | hsa-mir-181c | dbDEMC, PhenomiR |
49 | hsa-mir-497 | dbDEMC, PhenomiR |
50 | hsa-mir-302c | PhenomiR |
The high mortality rate of lung cancer makes it the most common cause of cancer-related death in men and second in women.37 Many researchers have demonstrated that miRNA dysregulation is associated with lung cancer, and in the benchmark dataset, 128 lung cancer-related miRNAs are verified by biological experiments. Unknown lung cancer-related miRNAs are predicted by NetCF. Among the top 50 predicted miRNAs, 47 of them are confirmed by the dbDEMC and PhenomiR databases; and only 3 miRNAs (hsa-mir-520g, hsa-mir-630 and hsa-mir-519c, ranked 41st, 47th and 49th, respectively) are not confirmed. The confirmation of the top 40 predictions is particularly gratifying. Moreover, Cao et al.38 reported that has-mir-630 inhibits cell proliferation of lung cancer by targeting cell-cycle kinase 7 (CDC7); and Cha et al.39 identified has-mir-519c as a tumor suppressor involved in lung cancer progression.
Prostate cancer is the most common cancer in males in 84 countries,37 occurring more commonly in the developed world.
Biological experiments have demonstrated several important associations between prostate cancer and dysregulation of miRNAs. NetCF is implemented to predict potential prostate cancer-related miRNAs. Of the top 50 predicted miRNAs, 47 are confirmed based on the dbDEMC and PhenomiR databases; and only 3 miRNAs (hsa-mir-429, hsa-mir-449b and hsa-mir-92b, ranked third, 37th and 43rd, respectively) are not found in the two databases. Further literature search demonstrated that hsa-mir-429 inhibits cell proliferation by targeting p27Kip1 in human prostate cancer cells.40
Rank | miRNA | Evidences |
---|---|---|
1 | hsa-mir-16 | dbDEMC, PhenomiR |
2 | hsa-mir-15a | dbDEMC, PhenomiR |
3 | hsa-mir-195 | dbDEMC, PhenomiR |
4 | hsa-mir-141 | dbDEMC, PhenomiR |
5 | hsa-mir-151a | dbDEMC |
6 | hsa-mir-130a | dbDEMC, PhenomiR |
7 | hsa-mir-302b | PhenomiR |
8 | hsa-mir-106b | dbDEMC, PhenomiR |
9 | hsa-mir-429 | dbDEMC |
10 | hsa-mir-296 | dbDEMC, PhenomiR |
11 | hsa-mir-122 | PhenomiR |
12 | hsa-mir-451a | dbDEMC |
13 | hsa-mir-99a | dbDEMC, PhenomiR |
14 | hsa-mir-193b | dbDEMC, PhenomiR |
15 | hsa-mir-708 | dbDEMC |
16 | hsa-mir-378a | dbDEMC |
17 | hsa-mir-302c | PhenomiR |
18 | hsa-mir-152 | dbDEMC, PhenomiR |
19 | hsa-mir-625 | dbDEMC |
20 | hsa-mir-204 | dbDEMC, PhenomiR |
21 | hsa-mir-15b | dbDEMC, PhenomiR |
22 | hsa-mir-149 | dbDEMC, PhenomiR |
23 | hsa-mir-328 | dbDEMC, PhenomiR |
24 | hsa-mir-20b | dbDEMC, PhenomiR |
25 | hsa-mir-129 | dbDEMC, PhenomiR |
26 | hsa-mir-139 | dbDEMC, PhenomiR |
27 | hsa-mir-302a | PhenomiR |
28 | hsa-mir-194 | dbDEMC, PhenomiR |
29 | hsa-mir-10a | dbDEMC, PhenomiR |
30 | hsa-mir-320a | dbDEMC, PhenomiR |
31 | hsa-mir-449a | PhenomiR |
32 | hsa-mir-302d | PhenomiR |
33 | hsa-mir-196b | dbDEMC, PhenomiR |
34 | hsa-mir-148b | dbDEMC, PhenomiR |
35 | hsa-mir-215 | PhenomiR |
36 | hsa-mir-151b | dbDEMC |
37 | hsa-mir-99b | dbDEMC, PhenomiR |
38 | hsa-mir-452 | dbDEMC, PhenomiR |
39 | hsa-mir-367 | PhenomiR |
40 | hsa-mir-342 | dbDEMC, PhenomiR |
41 | hsa-mir-373 | PhenomiR |
42 | hsa-mir-345 | dbDEMC, PhenomiR |
43 | hsa-mir-449b | PhenomiR |
44 | hsa-mir-339 | dbDEMC, PhenomiR |
45 | hsa-mir-425 | dbDEMC, PhenomiR |
46 | hsa-mir-23b | dbDEMC, PhenomiR |
47 | hsa-mir-130b | dbDEMC, PhenomiR |
48 | hsa-mir-211 | PhenomiR |
49 | hsa-mir-92b | PhenomiR |
50 | hsa-mir-181d | dbDEMC, PhenomiR |
Rank | miRNA | Evidences |
---|---|---|
1 | hsa-mir-18a | dbDEMC, PhenomiR |
2 | hsa-mir-155 | PhenomiR |
3 | hsa-mir-19a | dbDEMC, PhenomiR |
4 | hsa-mir-9 | dbDEMC, PhenomiR |
5 | hsa-mir-10b | PhenomiR |
6 | hsa-mir-210 | dbDEMC, PhenomiR |
7 | hsa-mir-19b | dbDEMC, PhenomiR |
8 | hsa-mir-181a | dbDEMC, PhenomiR |
9 | hsa-mir-7 | dbDEMC, PhenomiR |
10 | hsa-mir-138 | PhenomiR |
11 | hsa-mir-196a | dbDEMC, PhenomiR |
12 | hsa-mir-24 | dbDEMC, PhenomiR |
13 | hsa-mir-142 | PhenomiR |
14 | hsa-mir-29c | dbDEMC, PhenomiR |
15 | hsa-mir-30a | dbDEMC, PhenomiR |
16 | hsa-mir-125a | dbDEMC, PhenomiR |
17 | hsa-mir-302b | PhenomiR |
18 | hsa-mir-199b | dbDEMC, PhenomiR |
19 | hsa-let-7i | dbDEMC, PhenomiR |
20 | hsa-let-7g | dbDEMC, PhenomiR |
21 | hsa-let-7e | dbDEMC, PhenomiR |
22 | hsa-mir-499a | Unconfirmed |
23 | hsa-mir-150 | PhenomiR |
24 | hsa-mir-429 | Unconfirmed |
25 | hsa-mir-135a | dbDEMC, PhenomiR |
26 | hsa-let-7f | dbDEMC, PhenomiR |
27 | hsa-mir-451a | dbDEMC |
28 | hsa-mir-192 | dbDEMC |
29 | hsa-mir-302c | PhenomiR |
30 | hsa-mir-18b | dbDEMC |
31 | hsa-mir-139 | dbDEMC, PhenomiR |
32 | hsa-mir-103a | dbDEMC |
33 | hsa-mir-625 | dbDEMC |
34 | hsa-mir-140 | dbDEMC |
35 | hsa-mir-20b | dbDEMC |
36 | hsa-mir-215 | dbDEMC, PhenomiR |
37 | hsa-mir-128 | dbDEMC, PhenomiR |
38 | hsa-mir-129 | dbDEMC, PhenomiR |
39 | hsa-mir-137 | PhenomiR |
40 | hsa-mir-302a | PhenomiR |
41 | hsa-mir-10a | dbDEMC, PhenomiR |
42 | hsa-mir-149 | dbDEMC, PhenomiR |
43 | hsa-mir-26b | dbDEMC, PhenomiR |
44 | hsa-mir-328 | dbDEMC, PhenomiR |
45 | hsa-mir-497 | dbDEMC, PhenomiR |
46 | hsa-mir-30b | dbDEMC, PhenomiR |
47 | hsa-mir-302d | PhenomiR |
48 | hsa-mir-542 | dbDEMC |
49 | hsa-mir-342 | dbDEMC, PhenomiR |
50 | hsa-mir-338 | dbDEMC, PhenomiR |
In this work, we develop a new method for measuring miRNA and disease similarities based on normalized mutual information. This method combines disease associated genes and disease DAG graphs to calculate disease similarity; and it calculates miRNA similarity based on miRNA–mRNA interactions. Given that no known association information is used in the similarity computation process, LOOCV does not overestimate the predicted performance. We then proposed NetCF for predicting new miRNA–disease associations by integrating miRNA and disease similarities with known miRNA–disease associations. Reliable AUC values for all comparable methods demonstrated that our proposed similarity computation method is reasonable and feasible. The AUC value of NetCF is superior to the other comparable methods, which indicates that NetCF has reliable predicted performance. Case studies further demonstrated the good predicted performance of NetCF for predicting potential and isolated disease-related miRNAs.
Even with the favorable results obtained using our method, this study reveals certain limitations. First, miRNA pair similarity is calculated based on the known common target genes because known miRNA–mRNA are scarce; thus, the similarities of many miRNA pairs are 0. The problem will be addressed considering that miRNA-target genes are increasingly recognized. In our future work, we will integrate more miRNA-related data to further improve the miRNA similarity measure. Second, future work should consider parameter optimization. For example, for the numbers of miRNA's neighbors and disease's neighbors, we separately choose 10% of the numbers of miRNAs and diseases based on experiments. This selection works well in our dataset, but not necessarily for other datasets.
This journal is © The Royal Society of Chemistry 2017 |