Yi Zhang†
a,
Min Chen†*b,
Xiaohui Chenga and
Zheng Chenb
aSchool of Information Science and Engineering, Guilin University of Technology, 541004 Guilin, China
bSchool of Computer Science and Technology, Hunan Institute of Technology, 421002 Hengyang, China. E-mail: chenmin@hnit.edu.cn
First published on 20th September 2019
Lots of research findings have indicated that miRNAs (microRNAs) are involved in many important biological processes; their mutations and disorders are closely related to diseases, therefore, determining the associations between human diseases and miRNAs is key to understand pathogenic mechanisms. Existing biological experimental methods for identifying miRNA–disease associations are usually expensive and time consuming. Therefore, the development of efficient and reliable computational methods for identifying disease-related miRNAs has become an important topic in the field of biological research in recent years. In this study, we developed a novel miRNA–disease association prediction model using a Laplacian score of the graphs and space projection federated method (LSGSP). This integrates experimentally validated miRNA–disease associations, disease semantic similarity scores, miRNA functional scores, and miRNA family information to build a new disease similarity network and miRNA similarity network, and then obtains the global similarities of these networks through calculating the Laplacian score of the graphs, based on which the miRNA–disease weighted network can be constructed through combination with the miRNA–disease Boolean network. Finally, the miRNA–disease score was obtained via projecting the miRNA space and disease space onto the miRNA–disease weighted network. Compared with several other state-of-the-art methods, using leave-one-out cross validation (LOOCV) to evaluate the accuracy of LSGSP with respect to a benchmark dataset, prediction dataset and compare dataset, LSGSP showed excellent predictive performance with high AUC values of 0.9221, 0.9745 and 0.9194, respectively. In addition, for prostate neoplasms and lung neoplasms, the consistencies between the top 50 predicted miRNAs (obtained from LSGSP) and the results (confirmed from the updated HMDD, miR2Disease, and dbDEMC databases) reached 96% and 100%, respectively. Similarly, for isolated diseases (diseases not associated with any miRNAs), the consistencies between the top 50 predicted miRNAs (obtained from LSGSP) and the results (confirmed from the above-mentioned three databases) reached 98% and 100%, respectively. These results further indicate that LSGSP can effectively predict potential associations between miRNAs and diseases.
The biological experiments, such as qRT-PCR and microarray profiling, used for discovering the associations between miRNAs and diseases are time consuming and labor intensive.13,14 Moreover, evidence relating to the associations between miRNAs and diseases discovered through biological experiments is only the tip of the iceberg, meaning that our understanding of the biological functions of miRNAs has a long way to go, although lots of miRNA–disease associations have been explored by scientists. It is an extremely urgent requirement to develop rapid and efficient computational methods to predict disease-related miRNAs to guide biological experiments.15,16
Based on the hypothesis that miRNAs with similar functions are often associated with diseases of similar phenotypes,17–19 Jiang et al.20 used a hypergeometric distribution to predict the associations between miRNAs and diseases. Based on the weighted-k-most-similar-neighbour method, Xuan et al.21 proposed HDMP to predict the relationship between miRNA and disease. On the basis of the method proposed be Xuan et al., Han et al.22 proposed DismiPred, which used topology information between nodes. Chen et al.23,24 designed two KNN-based disease association ranking algorithms (RKNNMDA and BLHARMDA). Chen et al.25 used random walks to predict disease-related miRNAs. However, these methods cannot predict diseases without any known related miRNAs. To solve this problem, Chen et al.26 used disease semantic similarity, miRNA similarity, Gaussian interaction profile kernel similarity and experimentally validated miRNA–disease associations to construct a heterogeneous graph approach, named HGIMDA, for revealing potential miRNA–disease associations. Shi et al.27 further integrated miRNA–gene relationships and random walks to predict miRNA–disease associations. Liao et al.28 proposed a new prediction method for disease-related miRNAs using the Laplacian score of the graphs and a random walk method. Chen et al.29 also proposed a new computational method named WBSMDA to uncover potential miRNAs related to multiple complex diseases through integrating known miRNA–disease association, semantic disease similarity, miRNA functional similarity, Gauss's nuclear spectrum of disease and miRNA to obtain final relevance scores for unconfirmed miRNA–disease associations. These methods have achieved good predictive performance and can be used for the prediction of isolated diseases.
Sun et al.30 proposed a method, named NTSMDA, using network topology to predict disease–miRNA associations. Nalluri et al.31 designed DISMIRA, a prediction method for disease-related miRNAs, from the two aspects of a maximum weighted matching model and motif-based analyses, respectively. You et al.32 proposed a path-based prediction method named PBMDA through integrating different biological data. Chen et al.33 proposed a bipartite heterogeneous network link prediction method (BHCN) based on bipartite network co-neighbours to predict miRNA–disease associations. Chen et al.34 proposed a method named NetCBI to predict disease-associated miRNAs using consistency of disease networks. Gu et al.35 and Chen et al.36 predicted potential miRNA–disease associations using bipartite network projections. Le et al.37 applied RWR, PRINCE, PRP and KSM to correlation analysis for predicting miRNA–disease associations. Chen et al.38 used network distance analysis. Yu et al.39 used global linear neighbours to predict miRNA–disease associations.
Machine learning methods have also entered the field of bioinformatics research.40–42 Support vector machines (SVMs) were used by Jiang et al.,43 Xu et al.,44 Zeng et al.45 and Wang et al.,46 a logistic model tree was used by Wang et al.,47 and a decision tree was used by Zhao et al.;48 these are excellent classification tools with global optimality and better generalization abilities to predict potential disease-related candidate miRNAs, but such methods require known negative sample information related to disease-related miRNAs that is difficult to obtain. In order to solve the problem of negative sample acquisition, Chen et al.49 used a regularized least squares approach to optimize similarity networks of miRNAs and diseases, respectively, and the final miRNA–disease associations were linear weightings of miRNA similarity scores and disease similarity scores. Restricted Boltzmann machine,50 auto-encoder,51 extreme gradient boosting machine,52 convolutional neural network,53 kernelized Bayesian matrix factorization,54,55 non-negative matrix factorization,56,57 singular value decomposition,58 Kronecker regularized least squares,59,60 Laplacian regularized sparse subspace learning,61 regularized least squares62 and semi-supervised link integrated prediction methods all were used to infer the relationships between potential diseases and miRNAs with good prediction results. Jiang et al.63 proposed a novel similarity kernel fusion (MDA-SKF) method via integrating multiple similarity kernels (three miRNA similarity kernels and three disease similarity kernels) to overcome the limitations through which some initial information may be lost in the process and some noise may exist in the integrated similarity kernel. SKF as an accurate network similarity construction method for MDA-SKF utilized the Laplacian regularized least squares method to uncover potential miRNA–disease associations, and it can be used as an accurate and efficient computational tool for guiding traditional experiments. Zou et al.64 utilized two methods of social network analysis (KATZ and CATAPULT) to predict potential disease-related candidate miRNAs. Li et al.65 utilized recommendation systems to predict associations between environmental factors, miRNAs and diseases. Peng et al.66 combined negative-aware and rating-based recommendation algorithms to predict miRNA–disease associations. Chen et al.67 constructed a similarity network and utilized ensemble learning to combine ranked results, called ensemble learning and link prediction for miRNA–disease association prediction. Chen et al.68 presented a HAMDA model that considered not only the network structure and information propagation but also field-related information to reveal miRNA–disease associations through mixing graph-based recommendation algorithms, and it obtained satisfactory prediction results.
For experimentally verified less well-known miRNA–disease associations and hard-to-obtain negative samples of miRNA–disease associations, Zeng et al.,69 Li et al.,70 Chen et al.71 and Peng et al.72 utilized matrix completion to estimate potential miRNA–disease associations. Chen et al.73 combined a sparse learning method with a heterogeneous graph inference method for miRNA–disease association predictions. Tang et al.74 fully exploited miRNA functional similarity and disease semantic similarity to achieve the matrix completion of miRNA–disease association through using a dual Laplacian regularization term, which transformed miRNA–disease association prediction into a matrix completion problem. This achieved good prediction effects, only needing experimentally validated miRNA–disease associations, and it provided new ideas for solving the problems that occur when miRNA–disease association data is insufficient.
Although existing computational methods have made outstanding contributions to the field of miRNA–disease association prediction, they still have the following defects:
(1) These prediction methods are not accurate enough;
(2) Isolated diseases and new miRNAs (miRNAs not associated with any disease) cannot be predicted; and
(3) Negative samples of miRNA–disease associations are required.
In order to overcome these defects, our proposed LSGSP model mainly consists of the following four steps to predict miRNA–disease associations:
(1) Reconstructing similarity networks for diseases and miRNAs, using known miRNA–disease associations, disease semantic similarity, miRNA family information and miRNA functional similarity, respectively;
(2) Obtaining the global similarity scores of the disease similarity networks and miRNA similarity networks through calculating the Laplacian scores of the graphs;
(3) Constructing miRNA–disease weight networks on the basis of experimentally verified miRNA–disease Boolean networks combined with global disease similarity networks and global miRNA similarity networks;
(4) Representing the miRNA–disease association scores using vector projections.
Therefore, LSGSP, as a global approach that does not require negative samples, can simultaneously predict all miRNA–disease associations, and can be used to predict isolated diseases and new miRNAs with good prediction effects in LOOCV and case analysis.
Functional similarity scores between miRNAs obtained from the ESI in ref. 19 were represented by the matrix MM. MiRNA family information obtained from the miRBase database76 was represented by the matrix MMfa. MMfa(i,j) is set to 1 if the miRNA node mi is associated with the miRNA node mj, otherwise it is set to 0. We used the matrix DD to represent the semantic similarity scores between diseases obtained from the ESI in ref. 66.
Firstly, we used the known matrix MD to calculate the disease similarity information DDas, which can be represented by:
(1) |
DDfs(i,j) = μ × DD(i,j) + (1 − μ) × DDas(i,j) | (2) |
MMfs(i,j) = θ × MM(i,j) + (1 − θ) × MMfa(i,j) | (3) |
(4) |
(5) |
Similarly, the Laplacian score of the graphs between all miRNAs is represented by MMla, which is as follows:
(6) |
By integrating the global similarity matrix of disease DDla and the experimentally verified Boolean network MD of miRNA–disease associations, the weighted network MDdl of miRNA–disease associations was constructed based on the global similarity information of diseases.
(7) |
Through integrating the global similarity matrix of miRNAs MMla and the experimentally verified Boolean network MD of miRNA–disease associations, the weighted network MDml of miRNA–disease associations was constructed based on the global similarity information from miRNAs.
(8) |
(1) Spatial projection scores based on the Laplacian similarities of diseases:
We used the projected scores of the disease similarity networks in the weighted network MDml of miRNA–disease associations to represent the miRNA–disease association scores; the calculation is as follows:
(9) |
(2) spatial projection scores based on the Laplacian similarities of miRNAs:
We used the projected scores of miRNA similarity networks in the weighted network MDdl to represent the miRNA–disease scores; the calculation is as follows:
(10) |
(3) Final integrated spatial projection scores based on Laplacian similarities of diseases and miRNAs:
Finally, we integrated the spatial projection scores based on the Laplacian similarities of diseases and spatial projection scores based on Laplacian similarities of miRNAs to calculate the final prediction scores, as shown below:
MDfs(i,j) = ω×MDTpm(i,j) + (1 − ω) × MDpd(i,j) | (11) |
Although many researchers have used Laplacian regularization to identify miRNA–disease associations (such as LRSSLMDA,61 MDA-SKF,63 and DLRMC74), our proposed LSGSP differs from these research approaches in the following three aspects:
Firstly, it differs in terms of the data preparation process. MDA-SKF used miRNA sequence similarity, but others (LSGSP, LRSSLMDA and DLRMC) did not. LSGSP uses miRNA family information, but others (MDA-SKF, LRSSLMDA and DLRMC) do not.
Secondly, it differs in terms of the purposes of Laplacian regularization utilization. LRSSLMDA, MDA-SKF and DLRMC used Laplacian regularization in the classification decision stage. LRSSLMDA built an objective function from the common miRNA/disease subspace for miRNA/disease feature spaces, an L1-norm constraint and Laplacian regularization, and finally combined these optimization results to attain the final prediction outcomes. MDA-SKF optimized objective Laplacian regularized least squares functions to obtain a predicted association matrix, which uncovered potential miRNA–disease associations. DLRMC used a matrix completion model to calculate the potential missing entries of the miRNA–disease association matrix, and then used dual Laplacian regularization to regularize the miRNA–disease association matrix. The purpose of using Laplacian scores of the graphs in LSGSP is to obtain global network similarity, and for missing miRNA–disease association calculations, a network projection method was used.
Thirdly, it differs in the type of model used. From a classifier perspective, LRSSLMDA, DLRMC and MDA-SKF all utilized a machine learning-based model for miRNA–disease association prediction, which needed to optimize objective functions to obtain prediction results. However, our LSGSP is a network analysis-based computable model, whose missing miRNA–disease association calculations do not need the optimal solution to obtain an objective function. The implementation process of LSGSP is simple, and the prediction results of LSGSP are intuitive and easy to interpret.
(1) The weight parameters θ and μ for similarity network construction.
The weight parameter θ represents the proportion of the functional similarity scores from Wang et al.19 used for constructing the miRNA similarity network. In order to find the optimal θ value, we first set the parameters to fixed values (μ = α = β = γ = δ = ω = 0.5), and changed the value of θ from 0.1 to 0.9. Through experiments involving cross-validating and calculating AUC values from the benchmark dataset, we found that the AUC value increased gradually from 0.9006 to 0.9010 when θ went from 0.1 to 0.2 and the AUC value decreased gradually from 0.9010 to 0.8892 when θ went from 0.2 to 0.9. From the changing curve shown in Fig. 2, the AUC value reached a maximum when θ = 0.2; therefore, we set θ = 0.2 to obtain good prediction performance.
The weight parameter μ from the disease similarity network indicates the semantic similarity score proportion in the constructed network. On the basis of θ = 0.2, we set the rest of the parameters to 0.5 (θ = 0.2, α = β = γ = δ = ω = 0.5). By taking 0.1 as the step size to increase the μ value, we found that the AUC value reached a maximum when μ = 0.3 and the AUC value decreased gradually when μ went from 0.3 to 0.9, as shown in Fig. 2. Therefore, we set μ = 0.3 for good prediction performance.
(2) The equilibrium parameters α and β for the global similarity network.
The Laplacian similarity equilibrium factor α, used for the disease similarity network, and the Laplacian similarity equilibrium factor β, used for the miRNA similarity network, were initially set to 0.1 and gradually changed to the same value using a step size of 0.1. The other three types of parameter values were set to fixed values (θ = 0.2, μ = 0.3, γ = δ = ω = 0.5) at the same time. When α and β increased gradually, the AUC value decreased from 0.9093 to 0.8805 gradually in the experiment; therefore the AUC value was optimal when α and β were set to 0.1.
(3) The equilibrium parameters γ and δ for miRNA–disease weight network construction.
Similarly, the third type of parameter included the equilibrium parameters γ and δ, used for miRNA–disease weight network construction; their values were set to the same value. The effects of the equilibrium parameters γ and δ on LSGSP were tested in the same way as before, and the AUC value reached an optimal value of 0.9113 when γ andδ were set to 0.1.
(4) The weight parameterω for spatial projection scores.
Finally, in order to obtain the optimal ω value, we gradually increased the value of ω, taking 0.1 as the step size. Through experiment, we found that the AUC value increased gradually from 0.9113 to 0.9221 when the value of ω was increased from 0.1 to 0.3. When the value of ω was increased from 0.3 to 0.9, the AUC value decreased from 0.9221 to 0.8812. Therefore, we set ω = 0.3 to obtain the optimal AUC value, which indicated that our prediction results depended more on the spatial projection scores based on the Laplacian similarities of miRNAs.
In summary, our parameter selections from the benchmark dataset were: θ = 0.2; μ = 0.3; α = β = 0.1; γ = δ = 0.1; ω = 0.3. By using the same method, the parameter selections from the prediction dataset were: θ = 0.2; μ = 0.1; α = β = 0.1; γ = δ = 0.9; ω = 0.9. For the compare dataset, the parameter θ was set to 1 because family information was not used. From the same method as used before, the parameter selections from the compare dataset were: θ = 1; μ = 0.1; α = β = 0.1; γ = δ = 0.9; ω = 0.3.
(1) Reconstructing the miRNA network using family information;
(2) Reconstructing the disease network using miRNA–disease association pairs;
(3) Obtaining the global similarity network using the Laplacian scores;
(4) Constructing the miRNA–disease weighted network using the global disease similarity network, the global miRNA similarity network and miRNA–disease association information;
(5) Obtaining the prediction scores using vector space projection.
We evaluate the predictive performance of LSGSP in the following five situations:
(1) The predictive performance without considering miRNA network reconstruction and disease network reconstruction (LSGSP without NR);
(2) The predictive performance in the case of reconstructing the miRNA network (LSGSP with MNR);
(3) The predictive performance in the case of reconstructing the disease network (LSGSP with DNR);
(4) The predictive performance in the case of reconstructing the miRNA network and disease network without reconstructing the miRNA–disease weight network (LSGSP without MDWN); and
(5) The predictive performance with all relevant information (LSGSP with all information).
From the results from performing LOOCV shown in Fig. 3, it can be found that the worst predictive performance occurred in the situation of LSGSP without MDWN, where the AUC value was only 0.7809. However, once the miRNA–disease weighted network was constructed, even without considering the reconstruction of the miRNA network and disease network (LSGSP without NR), the AUC value reached 0.8973, which indicated that miRNA–disease weighted network construction had a significant effect on the improvement of prediction performance. In the situation of LSGSP with MNR, the AUC value increased from 0.8973 to 0.9135. After reconstructing the disease network through adding structural information relating to the known association network (LSGSP with DNR), the AUC value increased from 0.8973 to 0.9049, and the AUC value in the situation of LSGSP with all information was increased to 0.9221. This shows that LSGSP is commendable at predicting the associations between miRNAs and diseases.
Fig. 3 ROC curves and AUC values based on LOOCV in different situations, using the benchmark dataset. |
To avoid data dependence, the prediction dataset was used to further compare the four methods mentioned above. According to the prediction dataset, with more known associations than the benchmark dataset, the accuracy of all four methods greatly improved. The AUC values of RLSMDA, IDNC, GSTRW and LSGSP for the prediction dataset were 0.9232, 0.9434, 0.9512 and 0.9745, respectively, as shown in Fig. 5. The AUC value of LSGSP using the prediction dataset was the highest, with a value 5.26%, 3.19% and 2.39% higher, respectively, than those of RLSMDA, IDNC and GSTRW. The prediction results showed the excellent predictive abilities of LSGSP, mainly due to the use of Laplacian scores and network projection, and LSGSP showed more outstanding advantages with less experimentally verified miRNA–disease associations.
So far, LRSSLMDA,61 MDA-SKF63 and DLRMC74 have obtained good predictive results from the compare dataset using Laplacian regularization to identify miRNA-disease associations. To compare LSGSP with the above-mentioned three methods equally, the AUC values from LSGSP, LRSSLMDA, MDA-SKF and DLRMC given from the compare dataset in Table 1 are the optimal values described in the papers that they belong to. When using the same available experimental data without any family information for LSGSP, LRSSLMDA and DLRMC equally, the AUC value of LSGSP was 0.9194, which was higher than those of LRSSLMDA and DLRMC, as shown in Table 1. MDA-SKF showed the best prediction results, with an optimal AUC value of 0.9576, which were attributed to its accurate SKF network construction method. However, it is unfair to compare the prediction results of MDA-SKF with those from LSGSP directly, because MDA-SKF used extra miRNA sequence similarity information but LSGSP did not. Using SKF for network reconstruction with LSGSP (named LSGSP-SKF) to compare with MDA-SKF under the same experimental conditions, the AUC value was 0.9675, shown as LSGSP-SKF in Table 1; this value was the highest among all methods.
No. | Method | AUC |
---|---|---|
1 | LSGSP | 0.9194 |
2 | LRSSLMDA | 0.9178 |
3 | DLRMC | 0.9174 |
4 | MDA-SKF | 0.9576 |
5 | LSGSP-SKF | 0.9675 |
We implemented LOOCV on the benchmark dataset to evaluate the predictive performance of LSGSP for new miRNAs and isolated diseases. For each new miRNA verified, the associations between the miRNA and all diseases were removed to simulate a new miRNA. The ROC curves and AUC values predicted by LSGSP using the benchmark dataset are shown in Fig. 6, in which the AUC of LSGSP was 0.8597. Similarly, the associations between the disease and all miRNAs were removed to simulate an isolated disease, and the AUC value from the benchmark dataset was 0.7767. According to the prediction results, LSGSP showed excellent predictive performance in predicting new miRNA-related diseases and isolated disease-related miRNAs.
We used LSGSP for training and prediction, using 34 known associations between prostate neoplasms and miRNAs from the prediction dataset. Only 2 of the top 50 miRNAs predicted to be associated with prostate neoplasms were not confirmed from the updated HMDD, miR2Disease, and dbDEMC databases (shown in Table 2), which were hsa-mir-429 and hsa-mir-7 (ranked 23rd and 50th in predictive results, respectively). However, we found evidence of associations between these two miRNAs and prostate neoplasms upon searching the latest literature. Ouyang et al.84 found that the down-regulation of hsa-mir-429 inhibited the proliferation of prostate cancer cells. Zhou et al.85 identified a total of 130 differentially expressed miRNAs via miRNA microarray studies and found that hsa-mir-7-1 was up-regulated. Sánchez et al.86 proposed synergy between miR-21-5p and miR-7p in the regulation of prostate carcinogenesis. However, the dates of publication for these literature studies were all after the last updates of the three databases, further confirming the effectiveness of LSGSP.
Rank | miRNA name | Evidence | Rank | miRNA name | Evidence |
---|---|---|---|---|---|
1 | hsa-mir-18a | dbDEMC | 26 | hsa-mir-9 | dbDEMC |
2 | hsa-mir-19b | HMDD, dbDEMC, miR2Disease | 27 | hsa-mir-30d | HMDD, dbDEMC |
3 | hsa-let-7a | dbDEMC, miR2Disease | 28 | hsa-mir-15b | dbDEMC |
4 | hsa-mir-19a | dbDEMC | 29 | hsa-mir-30b | dbDEMC |
5 | hsa-mir-34a | HMDD, dbDEMC, miR2Disease | 30 | hsa-mir-302a | dbDEMC |
6 | hsa-let-7d | HMDD, dbDEMC, miR2Disease | 31 | hsa-mir-143 | HMDD, dbDEMC, miR2Disease |
7 | hsa-let-7e | dbDEMC, miR2Disease | 32 | hsa-mir-218 | dbDEMC, miR2Disease |
8 | hsa-mir-155 | dbDEMC | 33 | hsa-mir-92b | dbDEMC |
9 | hsa-let-7f | dbDEMC, miR2Disease | 34 | hsa-mir-302b | dbDEMC |
10 | hsa-mir-200b | HMDD, dbDEMC | 35 | hsa-mir-372 | dbDEMC |
11 | hsa-let-7b | HMDD, dbDEMC, miR2Disease | 36 | hsa-mir-200c | dbDEMC |
12 | hsa-let-7c | HMDD, dbDEMC, miR2Disease | 37 | hsa-mir-24 | dbDEMC, miR2Disease |
13 | hsa-mir-20b | dbDEMC | 38 | hsa-mir-181a | dbDEMC |
14 | hsa-let-7i | dbDEMC | 39 | hsa-mir-339 | hsa-miR-339-5p |
15 | hsa-mir-92a | dbDEMC | 40 | hsa-mir-302c | dbDEMC, miR2Disease |
16 | hsa-mir-34b | HMDD, dbDEMC | 41 | hsa-mir-151 | dbDEMC |
17 | hsa-mir-29a | HMDD, dbDEMC, miR2Disease | 42 | hsa-mir-27a | HMDD, dbDEMC, miR2Disease |
18 | hsa-mir-141 | HMDD, dbDEMC, miR2Disease | 43 | hsa-mir-215 | dbDEMC |
19 | hsa-mir-18b | dbDEMC | 44 | hsa-mir-320 | dbDEMC, miR2Disease |
20 | hsa-mir-126 | HMDD, dbDEMC, miR2Disease | 45 | hsa-mir-1 | dbDEMC |
21 | hsa-mir-200a | HMDD, dbDEMC | 46 | hsa-mir-29c | dbDEMC |
22 | hsa-mir-125a | dbDEMC, miR2Disease | 47 | hsa-mir-196a | dbDEMC |
23 | hsa-mir-429 | Unconfirmed | 48 | hsa-mir-383 | dbDEMC |
24 | hsa-let-7g | dbDEMC, miR2Disease | 49 | hsa-mir-195 | HMDD, dbDEMC, miR2Disease |
25 | hsa-mir-125b | dbDEMC, miR2Disease | 50 | hsa-mir-7 | Unconfirmed |
Due to the low detection rate of lung neoplasms, a common lethal disease, they pose a great threat to people's lives,87,88 especially in developing countries. Recent studies have found that miRNA dysregulation can be considered a diagnostic biomarker for lung neoplasms, such as the expression of mir-1246 and mir-1290, which can be a key driving factor promoting tumor initiation and progression in human non-small cell lung cancer89. Lin et al.90 confirmed that mir-324-5p and mir-324-3p play carcinogenic roles with respect to lung cancer. MiR-101 represses lung cancer via down-regulating CXCL12.91 With the discovery of more and more lung neoplasm-related miRNA functions, their study can provide more help for the early detection of lung neoplasms.
We used 72 lung neoplasm–miRNA associations from the prediction dataset to train LSGSP and then predicted the remaining unknown associations. We found supporting evidence for all the first 50 miRNAs related to lung neoplasms predicted by LSGSP using the above-mentioned three databases (as shown in Table 3).
Rank | miRNA name | Evidence | Rank | miRNA name | Evidence |
---|---|---|---|---|---|
1 | hsa-mir-106b | dbDEMC | 26 | hsa-mir-302b | dbDEMC, miR2Disease |
2 | hsa-mir-93 | dbDEMC | 27 | hsa-mir-27a | HMDD, dbDEMC |
3 | hsa-mir-200b | HMDD, dbDEMC | 28 | hsa-mir-215 | dbDEMC |
4 | hsa-mir-20b | HMDD, dbDEMC | 29 | hsa-mir-151 | dbDEMC |
5 | hsa-mir-25 | dbDEMC | 30 | hsa-mir-339 | dbDEMC, miR2Disease |
6 | hsa-mir-127 | HMDD, dbDEMC | 31 | hsa-mir-373 | dbDEMC |
7 | hsa-mir-429 | dbDEMC | 32 | hsa-mir-302a | dbDEMC |
8 | hsa-mir-141 | dbDEMC | 33 | hsa-mir-367 | HMDD, dbDEMC, miR2Disease |
9 | hsa-mir-92b | HMDD, dbDEMC | 34 | hsa-mir-181a | dbDEMC, miR2Disease |
10 | hsa-mir-18b | dbDEMC | 35 | hsa-mir-148a | dbDEMC |
11 | hsa-mir-98 | HMDD, dbDEMC, miR2Disease | 36 | hsa-mir-15a | dbDEMC |
12 | hsa-mir-221 | HMDD, dbDEMC, miR2Disease | 37 | hsa-mir-520b | dbDEMC |
13 | hsa-mir-200a | dbDEMC | 38 | hsa-mir-103 | dbDEMC |
14 | hsa-mir-200c | dbDEMC, miR2Disease | 39 | hsa-mir-133a | dbDEMC |
15 | hsa-mir-222 | dbDEMC | 40 | hsa-mir-372 | HMDD, dbDEMC, miR2Disease |
16 | hsa-mir-16 | HMDD | 41 | hsa-mir-107 | HMDD, dbDEMC |
17 | hsa-mir-10b | HMDD, dbDEMC, miR2Disease | 42 | hsa-mir-99b | dbDEMC |
18 | hsa-mir-194 | HMDD, dbDEMC, miR2Disease | 43 | hsa-mir-130a | dbDEMC, miR2Disease |
19 | hsa-mir-195 | dbDEMC, miR2Disease | 44 | hsa-mir-451 | dbDEMC |
20 | hsa-mir-7 | dbDEMC | 45 | hsa-mir-15b | dbDEMC, miR2Disease |
21 | hsa-mir-181b | dbDEMC | 46 | hsa-mir-499 | dbDEMC, miR2Disease |
22 | hsa-mir-320 | HMDD, dbDEMC, miR2Disease | 47 | hsa-mir-204 | dbDEMC, miR2Disease |
23 | hsa-mir-296 | dbDEMC | 48 | hsa-mir-23b | dbDEMC |
24 | hsa-mir-135b | dbDEMC | 49 | hsa-mir-302d | dbDEMC |
25 | hsa-mir-302c | dbDEMC | 50 | hsa-mir-153 | dbDEMC |
Rank | miRNA name | Evidence | Rank | miRNA name | Evidence |
---|---|---|---|---|---|
1 | hsa-mir-21 | HMDD, dbDEMC, miR2Disease | 26 | hsa-mir-146a | HMDD, dbDEMC, miR2Disease |
2 | hsa-mir-155 | HMDD, dbDEMC, miR2Disease | 27 | hsa-mir-137 | dbDEMC |
3 | hsa-mir-15a | HMDD, dbDEMC, miR2Disease | 28 | hsa-let-7a | HMDD, miR2Disease |
4 | hsa-mir-377 | HMDD | 29 | hsa-mir-205 | dbDEMC |
5 | hsa-mir-373 | HMDD, dbDEMC | 30 | hsa-mir-141 | dbDEMC |
6 | hsa-mir-372 | HMDD, dbDEMC, miR2Disease | 31 | hsa-mir-302a | dbDEMC |
7 | hsa-mir-29c | HMDD, dbDEMC, miR2Disease | 32 | hsa-mir-181a | dbDEMC, miR2Disease |
8 | hsa-mir-34a | dbDEMC | 33 | hsa-mir-200b | HMDD, dbDEMC |
9 | hsa-mir-302b | dbDEMC | 34 | hsa-mir-30a | dbDEMC |
10 | hsa-mir-451 | HMDD, dbDEMC, miR2Disease | 35 | hsa-mir-143 | HMDD, dbDEMC, miR2Disease |
11 | hsa-mir-184 | dbDEMC, miR2Disease | 36 | hsa-let-7e | dbDEMC |
12 | hsa-mir-29a | HMDD | 37 | hsa-let-7b | HMDD, dbDEMC, miR2Disease |
13 | hsa-mir-16 | HMDD, dbDEMC, miR2Disease | 38 | hsa-mir-223 | HMDD, dbDEMC, miR2Disease |
14 | hsa-mir-19a | dbDEMC | 39 | hsa-let-7d | HMDD, dbDEMC, miR2Disease |
15 | hsa-mir-17 | HMDD, dbDEMC, miR2Disease | 40 | hsa-let-7c | HMDD, dbDEMC, miR2Disease |
16 | hsa-mir-211 | dbDEMC | 41 | hsa-let-7f | dbDEMC, miR2Disease |
17 | hsa-mir-20a | HMDD, dbDEMC, miR2Disease | 42 | hsa-let-7i | dbDEMC |
18 | hsa-mir-125b | dbDEMC | 43 | hsa-let-7g | dbDEMC, miR2Disease |
19 | hsa-mir-18a | HMDD, dbDEMC, miR2Disease | 44 | hsa-mir-9 | dbDEMC |
20 | hsa-mir-10a | dbDEMC, miR2Disease | 45 | hsa-mir-302c | dbDEMC |
21 | hsa-mir-221 | HMDD, dbDEMC, miR2Disease | 46 | hsa-mir-15b | HMDD, dbDEMC |
22 | hsa-mir-19b | dbDEMC | 47 | hsa-mir-145 | HMDD, dbDEMC |
23 | hsa-mir-92a | HMDD, dbDEMC | 48 | hsa-mir-92b | dbDEMC |
24 | hsa-mir-222 | HMDD, dbDEMC, miR2Disease | 49 | hsa-mir-302d | Unconfirmed |
25 | hsa-mir-181b | HMDD, dbDEMC, miR2Disease | 50 | hsa-mir-127 | dbDEMC |
Rank | miRNA name | Evidence | Rank | miRNA name | Evidence |
---|---|---|---|---|---|
1 | hsa-mir-21 | HMDD, dbDEMC, miR2Disease | 26 | hsa-mir-18a | HMDD, dbDEMC |
2 | hsa-mir-373 | dbDEMC | 27 | hsa-mir-137 | HMDD, dbDEMC |
3 | hsa-mir-29c | HMDD, dbDEMC, miR2Disease | 28 | hsa-mir-146a | HMDD, dbDEMC, miR2Disease |
4 | hsa-mir-302b | dbDEMC | 29 | hsa-mir-19b | HMDD, dbDEMC, miR2Disease |
5 | hsa-mir-451 | dbDEMC, miR2Disease | 30 | hsa-mir-92a | HMDD, dbDEMC |
6 | hsa-mir-34a | HMDD, dbDEMC | 31 | hsa-let-7a | HMDD, dbDEMC, miR2Disease |
7 | hsa-mir-184 | dbDEMC | 32 | hsa-mir-141 | dbDEMC, miR2Disease |
8 | hsa-mir-29a | HMDD, dbDEMC | 33 | hsa-mir-181a | HMDD, dbDEMC |
9 | hsa-mir-16 | dbDEMC, miR2Disease | 34 | hsa-mir-30a | HMDD, dbDEMC, miR2Disease |
10 | hsa-mir-372 | dbDEMC | 35 | hsa-mir-200b | HMDD, dbDEMC |
11 | hsa-mir-155 | HMDD, dbDEMC, miR2Disease | 36 | hsa-mir-223 | HMDD, dbDEMC |
12 | hsa-mir-148a | HMDD, dbDEMC, miR2Disease | 37 | hsa-let-7e | HMDD, dbDEMC, miR2Disease |
13 | hsa-mir-211 | dbDEMC | 38 | hsa-let-7b | HMDD, dbDEMC, miR2Disease |
14 | hsa-mir-148b | dbDEMC | 39 | hsa-let-7d | HMDD, dbDEMC, miR2Disease |
15 | hsa-mir-152 | dbDEMC | 40 | hsa-let-7c | HMDD, dbDEMC, miR2Disease |
16 | hsa-mir-15a | dbDEMC | 41 | hsa-let-7i | HMDD, dbDEMC |
17 | hsa-mir-125b | HMDD, dbDEMC, miR2Disease | 42 | hsa-let-7f | HMDD, dbDEMC, miR2Disease |
18 | hsa-mir-17 | HMDD, dbDEMC, miR2Disease | 43 | hsa-let-7g | HMDD, dbDEMC, miR2Disease |
19 | hsa-mir-19a | HMDD, dbDEMC, miR2Disease | 44 | hsa-mir-143 | HMDD, dbDEMC, miR2Disease |
20 | hsa-mir-221 | HMDD, dbDEMC, miR2Disease | 45 | hsa-mir-9 | HMDD, dbDEMC |
21 | hsa-mir-10a | dbDEMC | 46 | hsa-mir-302c | dbDEMC |
22 | hsa-mir-20a | HMDD, dbDEMC, miR2Disease | 47 | hsa-mir-302a | dbDEMC |
23 | hsa-mir-222 | HMDD, dbDEMC | 48 | hsa-mir-92b | HMDD, dbDEMC |
24 | hsa-mir-205 | HMDD, dbDEMC, miR2Disease | 49 | hsa-mir-302d | dbDEMC |
25 | hsa-mir-181b | HMDD, dbDEMC | 50 | hsa-mir-145 | HMDD, dbDEMC, miR2Disease |
In a case study, LSGSP, when used in selected prostate neoplasm and lung neoplasm cases, achieved 96% and 100% accuracy in potential disease-related miRNA prediction, and 98% and 100% accuracy for isolated disease prediction, respectively, further demonstrating the excellent predictive performance of LSGSP; it also provided supporting evidence for the top 50 predicted disease–miRNA associations in the updated HMDD, mir2Disease and dbDEMC databases. Supporting evidence for the other miRNA–disease associations not verified in the above three databases was found in the latest literature studies; this demonstrated that LSGSP shows excellent predictive performance for potential associations between miRNAs and diseases. This is helpful for understanding pathogenic mechanisms at the level of miRNAs and finding disease-related miRNAs.
The excellent predictive performance of LSGSP is mainly attributed to the following factors. (1) The good construction of the relationship networks: we reconstructed the disease similarity network and the miRNA similarity network using known miRNA–disease association information, disease semantic similarity, miRNA family information and miRNA functional similarity. (2) The full utilization of network topology characteristics; we used Laplacian scores of the graphs to obtain the global similarities of the miRNA network and the disease network. (3) The accurate construction of weighted networks; we integrated the global similarities of diseases, global similarities of miRNAs and the experimentally validated miRNA–disease Boolean network to construct the miRNA–disease weighted network with a more accurately portrayed miRNA–disease relationship. (4) The use of a calculable projection of network space; we used vector projection to represent the miRNA–disease association degree.
Although LSGSP has achieved creditable predictive results, there are still some capabilities that need to be improved in the future to make the model more efficient and general: (1) the time for selecting the optimal parameters needs to be shortened; and (2) the accuracy of the representation of miRNA–miRNA similarities needs to be improved further through using biological information data, such as lncRNA–miRNA interactions and miRNA expression profiles.
Footnote |
† The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint first authors. |
This journal is © The Royal Society of Chemistry 2019 |