Mingjian Jiang,
Zhen Li*,
Shugang Zhang,
Shuang Wang,
Xiaofeng Wang,
Qing Yuan and
Zhiqiang Wei
Department of Computer Science and Technology, Ocean University of China, China. E-mail: lizhen0130@gmail.com
First published on 1st June 2020
Computer-aided drug design uses high-performance computers to simulate the tasks in drug design, which is a promising research area. Drug–target affinity (DTA) prediction is the most important step of computer-aided drug design, which could speed up drug development and reduce resource consumption. With the development of deep learning, the introduction of deep learning to DTA prediction and improving the accuracy have become a focus of research. In this paper, utilizing the structural information of molecules and proteins, two graphs of drug molecules and proteins are built up respectively. Graph neural networks are introduced to obtain their representations, and a method called DGraphDTA is proposed for DTA prediction. Specifically, the protein graph is constructed based on the contact map output from the prediction method, which could predict the structural characteristics of the protein according to its sequence. It can be seen from the test of various metrics on benchmark datasets that the method proposed in this paper has strong robustness and generalizability.
Virtual screening is a very common strategy in computer-aided drug design, which has been widely used. Drug–target affinity (DTA) prediction is an important step in virtual screening, which can quickly match target and drug and speed up the process of drug development. DTA prediction provides information about the binding strength of drugs to target proteins, which can be used to show whether small molecules can bind to proteins. For proteins with known structure and site information, we can use molecular simulation and molecular docking to carry out detailed simulations, thus get more accurate results, which is called structure-based virtual screening.16–18 Nevertheless, there are still many proteins for which there is no structural information. Even using homology modelling, it is still difficult to acquire structural information of many proteins. So it is an urgent problem to predict protein binding affinity with drug molecules using sequences (sequence-based virtual screening), which is also the focus of this paper. Due to the complicated structure of proteins and small molecules, accurate description and feature of target and drug is the most difficult part of affinity prediction, which is also a research hotspot in computer-aided medicine, especially with the rise of deep learning in the past decade.
At present, most of the latest sequence-based virtual screening prediction algorithms are based on deep learning. More specifically, for any pair of drug–target entries, the deep learning method is utilized to extract the representations of drug and target respectively, which will be concatenated into one vector for final prediction. In some cases, DTA prediction is treated as a binary problem. The model is a binary classifier used for determining whether the drug can bind to the target or not, such as NRLMF,19 KronRLS-MKL,20 and SELF-BLM.21
With the improvement of the accuracy of neural network and the increasing demands of high-precision drug design, accurate DTA prediction has received more and more attention, in which DTA is regarded as a regression problem. The output is the binding affinity between drug and target, and dissociation constants (Kd),22 inhibition constants (Ki)23 or the 50% inhibitory concentrations (IC50)22 are commonly used to measure the strength. Currently, there are some methods that have achieved good performance in affinity prediction. For example, DeepDTA24 constructed two convolutional neural networks (CNN) to extract the representations of the drug and the protein respectively, finally the two representations being concatenated to predict the affinity. In addition, DeepDTA collected previous data and built two benchmark datasets, where the drug is expressed as SMILES and protein is described through sequence. Two convolution networks were designed to obtain the representations of molecule and protein respectively, which achieved good results in the benchmark. WideDTA25 was further improved on the basis of DeepDTA, in which Live Max Common Substructure (LMCS) and Protein Motifs and Domains (PDM) were introduced and four CNNs were used to encode them into four representations. Huang et al. proposed a novel fingerprint feature vector for the molecule and the protein sequence was represented as a Pseudo Substitution Matrix Representation (Pseudo-SMR) descriptor for drug–target interaction prediction.26 In addition, Lee et al. compared different target features for predicting drug–target interactions.27 For molecule representation, molecular fingerprint is a common way, which can encode the structure of a molecule into a string or binary digits, such as extended connectivity fingerprints,28 atom environment descriptors (MOLPRINT2D)29 and molecular access system keys (MACCS).30 MoleculeNet provided lots of open-source tools of molecular featuring and learning algorithms, which also can be used for molecule representation.31 Altae-Tran et al. reported how to learn meaningful small-molecule representations when there are lower amounts of data.32 There are also many works attempting to characterize proteins. Westen et al. summarized a total of 13 different protein descriptor sets.33 DeepLSTM represented proteins using position-specific scoring matrix (PSSM) and Legendre moment.34
Moreover, the graph neural network (GNN) has been widely used in various fields. A graph composed of nodes and edges is used as the input of GNN and there is no limit to the size of the input graph, which provides a flexible format to extract in-depth information of molecules. Graph convolutional network (GCN)35 and graph attention network (GAT)36 are widely used GNN models, and they have been gradually applied in computer-aided drug design, such as drug property prediction37 and molecular fingerprint generation.38 In addition, PADME utilized molecular graph convolution in drug–target interaction prediction, which suggests the potential of GNN in drug development.39 Similarly, GraphDTA40 introduced GNN into DTA prediction, which constructed a graph with atoms as nodes and bonds as edges to describe drug molecules. CNN was used to extract protein sequence representation, and GNN models were implemented on the molecular graph, which improved the DTA prediction performance.
But in GraphDTA, CNN was used to obtain protein features through the sequence, which did not construct a graph for each protein. Proteins contain a large number of atoms, and if the graph of a protein is constructed with atoms as nodes, its structure will be very large and the cost of training very high. If the graph of a protein is constructed with residues as nodes, the constructed graph is only a long chain linked by peptide bonds, which cannot be regarded as a graph for calculation. Therefore, building a protein graph through a protein sequence is an ongoing problem to be solved.
Actually, a protein is not only a chain, but also a folded and complex structure formed by non-bonded interactions such as hydrogen bonds and van der Waals forces. If the spatial structure of a protein can be predicted and described through its sequence, it will be helpful for DTA prediction. Inspired by GraphDTA, GNN is also introduced in this work for DTA prediction. But unlike GraphDTA, we have not only constructed the graph of the drug molecule, but also constructed the protein graph. The number of residues of a protein is about several hundred, so it is suitable to construct graph with residues as nodes. However, the connection of residues is only a long chain without any spatial information. So the contact map is introduced in this paper. The contact map is a kind of representation of a protein structure, which is a 2D (two-dimensional) representation of the 3D (three-dimensional) protein structure,41 and it is often used as the output of protein structure prediction. More importantly, the output contact map, usually a matrix, is exactly consistent with the adjacency matrix in GNNs, which provides an efficient way to combine both data sources together. Therefore, how to introduce the contact map into the protein graph construction to improve the performance of affinity prediction is the focus of this work.
In order to bridge the huge gap between the speed of structure analysis and the speed of sequencing, protein structure prediction methods have emerged. These methods predict the 3D structure of proteins by mining the hidden information in the protein sequences. Contact maps (or distance maps) are the prediction results of many protein structure prediction methods, which show the interaction of residue pairs in the form of a matrix. Raptor-X-Contact42 integrated both evolutionary coupling and sequence conservation information and used residual neural networks to predict protein contact maps. DNCON2,43 which consists of six CNNs, used various distance thresholds as features to improve precision and achieved a great performance in contact map prediction. SPOT-contact44 utilized residual networks to congregate the short-range relations and 2D Bidirectional-ResLSTMs and proved its usefulness in contact prediction. Currently, there are other protein structure prediction methods, such as DeepContact,45 DeepConPred,46 MetaPSICOV,47 CCMpred,48 etc., which also have good performance. Nevertheless, these methods need to install a large number of dependencies, which could slow down the process of contact map prediction for large-scale proteins, and thus they are not suitable for contact map prediction for DTA prediction. Pconsc4 (ref. 49) is a fast, simple and efficient contact map prediction method, and its performance is consistent with that of the current state of the art methods. Therefore, Pconsc4 is introduced in this paper to construct protein contact map and protein graph.
In the interaction between protein and drug molecule, the structural information will directly affect their binding strength. The protein structure can be obtained by crystallization in the laboratory, and the process takes a lot of time and labor costs. In drug design, especially in DTA prediction, a large number of protein structures are unknown, and only the protein sequence is used as the input for the prediction method. So protein structure prediction, the output of which is the contact map, is utilized in this paper which provides more structural information for DTA. The protein graph based on the contact map of the protein is constructed firstly, and a new method called DGraphDTA (double graph DTA predictor) is proposed for DTA prediction, which encodes both small drug molecule and protein using GNN. As far as we know, the proposed method is the first attempt to construct a protein graph based on the contact map of the protein. We apply GNNs on both protein and molecular graphs to improve performance, and obtain good prediction results in the benchmark datasets.
(1) |
Number | Dataset | Proteins | Compounds | Binding entities |
---|---|---|---|---|
1 | Davis | 442 | 68 | 30056 |
2 | KIBA | 229 | 2111 | 118254 |
Because of the limitation of memory, only one large protein and its related entries were removed from the KIBA dataset. Through testing on the two datasets, the prediction performance of the method can be measured comprehensively.
Number | Feature | Dimension |
---|---|---|
1 | One-hot encoding of the atom element | 44 |
2 | One-hot encoding of the degree of the atom in the molecule, which is the number of directly-bonded neighbors (atoms) | 11 |
3 | One-hot encoding of the total number of H bound to the atom | 11 |
4 | One-hot encoding of the number of implicit H bound to the atom | 11 |
5 | Whether the atom is aromatic | 1 |
All | 78 |
The purpose of protein structure prediction is to analyse and construct the 3D structure of the protein according to the protein sequence. The structural information of a protein contains the connection angle and distance of different residue pairs. The contact map is a kind of output of structure prediction methods, which is usually a matrix. Assuming that the length of the protein sequence is L, then the predicted contact map M is a matrix with L rows and L columns, where each element mij of M indicates whether the corresponding residue pair (residue i and residue j) is contacted or not. Generally speaking, two residues are considered to be in contact if the Euclidean distance between their Cβ atoms (Cα atoms for glycine) is less than a specified threshold.41 In this paper, Pconsc4 is used to predict the contact map, which is a fast, simple, open-source and efficient method.
The model of Pconsc4 is implemented using U-net architecture,52 which operates on the 72 features calculated from each position in the multiple sequence alignment. The output of Pconsc4 is the probability of whether the residue pair contacts, then a threshold of 0.5 is set to get the contact map with a shape of (L, L), where L is the number of nodes (residues). The result just corresponds to the adjacency matrix of the protein. In the obtained adjacency matrix, the spatial information of protein is well preserved which can be extracted effectively through GNN.
After getting the adjacency matrix of the protein, the node features need to be extracted for further processing. Because the graph is constructed with the residue as the node, the feature should be selected around the residue, which shows different properties due to the different R groups. These properties include polarity, electrification, aromaticity and so on. In addition, PSSM53 is a common representation of proteins in proteomics. In PSSM, each residue position can be scored based on sequence alignment result, which is used to represent the feature of residue node. To sum up, 54 bit features are used in this paper to describe the residue node. Details of these features are shown in Table 3. Then the shape of node features is (L, 54). And the adjacency matrix and node features are processed through GNN to obtain the vector representation of the corresponding protein.
Number | Feature | Dimension |
---|---|---|
1 | One-hot encoding of the residue symbol | 21 |
2 | Position-specific scoring matrix (PSSM) | 21 |
3 | Whether the residue is aliphatic | 1 |
4 | Whether the residue is aromatic | 1 |
5 | Whether the residue is polar neutral | 1 |
6 | Whether the residue is acidic charged | 1 |
7 | Whether the residue is basic charged | 1 |
8 | Residue weight | 1 |
9 | The negative of the logarithm of the dissociation constant for the –COOH group64 | 1 |
10 | The negative of the logarithm of the dissociation constant for the –NH3 group64 | 1 |
11 | The negative of the logarithm of the dissociation constant for any other group in the molecule64 | 1 |
12 | The pH at the isoelectric point64 | 1 |
13 | Hydrophobicity of residue (pH = 2)65 | 1 |
14 | Hydrophobicity of residue (pH = 7)66 | 1 |
All | 54 |
For PSSM calculation, in order to decrease computation time, its simplified calculation has been implemented. At first, a basic position frequency matrix (PFM)53 is created by counting the occurrences of each residue at each position, which is illustrated in eqn (2):
(2) |
(3) |
When running the program of Pconsc4 and calculating PSSM, the input is the result of protein sequence alignment. So in the pre-processing stage, the alignments of all proteins in the benchmark datasets need to be done at first. In order to increase the computation speed, HHblits55 is used to carry out the protein sequence alignment. After alignment, the HHfilter55 and the CCMPred48 scripts are implemented on the results to get alignments in the PSICOV56 format.
(4) |
(5) |
(6) |
In DGraphDTA, GNNs are introduced to obtain the representations of molecule and protein. Fig. 5 shows the model architecture. In our experiment, we found that it is most effective to extract the features of small molecules and proteins by using three-layer convolution network. Implementation details can be found in the experiment part.
Fig. 5 The network of DGraphDTA. The graphs of molecule and protein pass through two GNNs to get their representations. Then the affinity can be predicted after multiple fully connected layers. |
Unified GNN model is constructed for different datasets, so the proposed method is simple and easy to implement. After the graphs of drug molecule and protein are constructed, they are fed into two GNNs for training. After convolution of multiple GNN layers, the representations of both molecule and protein are effectively extracted. Then the overall features of the corresponding small molecule–protein pair for DTA prediction are obtained. Finally, the prediction is carried out through two full connection layers.
For small drug molecules, the atoms that compose a molecule are connected by covalent bonds, and different atoms and structures will eventually behave as different molecular properties and interact with the outside world through the connections. Therefore, using graph convolution, the relations between these different atoms are fully considered, so the representation of the molecule will be effectively extracted.
For protein graph, another GNN is used to extract the representation. There is much spatial information in the protein structure, which is important for the binding affinity of protein and molecule. The protein contact map obtained by the structure prediction method can extract the information of each residue, which is mainly reflected in the relative position and interaction of residue pairs. The interaction of these residue pairs can fully describe the spatial structure of proteins through the vectors obtained by GNN. In computer-aided drug design, it is a difficult task to obtain the representation of a protein only by sequence. By using GNN, DGraphDTA can map the protein sequence to the representation with rich features, which provides an effective method for feature extraction of proteins. The proposed method utilized Pconsc4 to construct the topological structure of the protein on the premise of only knowing the sequence, and discovering the hidden information of the whole structure of the protein which is useful for affinity prediction. In addition, there are many factors that affect the performance of network structure, such as the number of network layers, the choice of GNN model and the probability of dropout. Because the training process needs a lot of time, some hyperparameters are selected by human experience. For other important hyperparameters, comparison and determination were implemented in the experimental part.
For each graph of molecule and protein, the dimension of the feature of each node is fixed, but the number of nodes of each graph is not fixed which depends on the number of atoms or residues. So the size of the GNN output matrix varies with the number of nodes and global pooling is added after the two GNNs to ensure that the same size of representation can be output for proteins and molecules with different node numbers. Supposing the last GNN layer outputs the protein representation with shape (L, Fl), then the global pooling can be calculated as:
Hpi = pool (Hl(i)) | (7) |
(8) |
(9) |
MSE is also a common metric to measure the difference between the predicted value and the real value. For n samples, the MSE is calculated as the average of the sum of the square of the difference between the predicted value pi (i = 1, 2,…,n) and the real value yi. A smaller MSE means that the predicted values of the sample are closer to the real values:
(10) |
In WidedDTA, another metric, the Pearson correlation coefficient,59 is used for performance comparison, which is calculated through eqn (11). In the equation, cov is the covariance between the predicted value p and the real value y, and σ indicates the standard deviation. In our experiment, the metric is also introduced to evaluate the prediction performance of the proposed method.
(11) |
In addition, the metric rm2 index60 is involved in DeepDTA, which is also introduced as a measure in the proposed method. The calculation of rm2 is described in eqn (12):
(12) |
Hyperparameter | Setting |
---|---|
Epoch | 2000 |
Batch size | 512 |
Learning rate | 0.001 |
Optimizer | Adam |
Fully connected layers after GNN | 2 |
Fully connected layers after concatenation | 2 |
Model | Number of layers | Layer1(in, out, head) | Layer2(in, out, head) | Layer3(in, out, head) |
---|---|---|---|---|
GCN | 1 | GCN(54, 54) | — | — |
GCN | 2 | GCN(54, 54) | GCN(54, 108) | — |
GCN | 3 | GCN(54, 54) | GCN(54, 108) | GCN(108, 216) |
GAT | 1 | GAT(54, 54, h = 2) | — | — |
GAT | 2 | GAT(54, 54, h = 2) | GAT(54, 108, h = 2) | — |
GAT | 3 | GAT(54, 54, h = 2) | GAT(54, 108, h = 2) | GAT(108, 216, h = 2) |
GAT&GCN | 1&1 | GAT(54, 54, h = 2) | GCN(54, 108) | — |
GCN&GAT | 1&1 | GCN(54, 54) | GAT(54, 108, h = 2) | — |
Model | Number of layers | CI (std) | MSE (std) | Pearson (std) |
---|---|---|---|---|
GCN | 1 | 0.891(0.003) | 0.221(0.004) | 0.852(0.006) |
GCN | 2 | 0.891(0.004) | 0.216(0.003) | 0.856(0.006) |
GCN | 3 | 0.894(0.002) | 0.216(0.003) | 0.856(0.006) |
GAT | 1 | 0.890(0.004) | 0.220(0.005) | 0.853(0.009) |
GAT | 2 | 0.893(0.002) | 0.216(0.004) | 0.856(0.008) |
GAT | 3 | 0.889(0.002) | 0.218(0.006) | 0.854(0.010) |
GAT & GCN | 1 & 1 | 0.892(0.005) | 0.218(0.004) | 0.854(0.008) |
GCN & GAT | 1 & 1 | 0.891(0.003) | 0.216(0.005) | 0.859(0.008) |
It is obvious to see that the representation is more accurate when the three-layer GCN model is used to describe the protein, where the MSE value is 0.216 and CI value is 0.894. At the same time, it also gives the great performance on the metric of Pearson correlation coefficient, which could reach 0.856. Comparing between GCN and GAT, the performance of GCN is better. In GraphDTA, a combination of GCN and GAT is used, which is a GCN layer following a GAT layer. And in our implementation, two combinations were used but none of them can reach the best performance. It is possible that the protein features cannot be extracted effectively with the attention mechanism.
Fig. 6 illustrates that when the probability of dropout is 0.2, the performance is the best, with a lower MSE value. Too large a dropout probability will lead to model under-fitting and could not extract protein features effectively, while small probability will not be able to prevent over-fitting completely. So only an appropriate dropout probability can produce the best prediction effect.
The results indicate that the mean pooling achieves the best performance for the three metrics. The mean pooling could balance the influence of the different nodes by averaging node features across the node dimension; the averages are enough to describe proteins and small molecules.
Fig. 8 reveals that PSSM plays an important role in graph convolution and DTA prediction. PSSM is obtained by protein sequence alignment, which contains rich protein evolution information, influences the interaction between residues and ultimately determines the spatial structure and feature of protein. The PSSM could extract the information quickly and effectively, thus improving the accuracy of protein description and the prediction performance of DTA.
Method | Proteins and compounds | CI | MSE | Pearson |
---|---|---|---|---|
KronRLS | S–W & Pubchem Sim | 0.871 | 0.379 | — |
SimBoost | S–W & Pubchem Sim | 0.872 | 0.282 | — |
DeepDTA | S–W & Pubchem Sim | 0.790 | 0.608 | — |
DeepDTA | CNN & Pubchem Sim | 0.835 | 0.419 | — |
DeepDTA | S–W & CNN | 0.886 | 0.420 | — |
DeepDTA | CNN & CNN | 0.878 | 0.261 | — |
WideDTA | PS + PDM & LS + LMCS | 0.886 | 0.262 | 0.820 |
GraphDTA | GIN & 1D | 0.893 | 0.229 | — |
DGraphDTA | GCN & GCN | 0.904 | 0.202 | 0.867 |
Method | Proteins and compounds | CI | MSE | Pearson |
---|---|---|---|---|
KronRLS | S–W & Pubchem Sim | 0.782 | 0.411 | — |
SimBoost | S–W & Pubchem Sim | 0.836 | 0.222 | — |
DeepDTA | S–W & Pubchem Sim | 0.710 | 0.502 | — |
DeepDTA | CNN & Pubchem Sim | 0.718 | 0.571 | — |
DeepDTA | S–W & CNN | 0.854 | 0.204 | — |
DeepDTA | CNN & CNN | 0.863 | 0.194 | — |
WideDTA | PS + PDM & LS + LMCS | 0.875 | 0.179 | 0.856 |
GraphDTA | GAT + GCN & 1D | 0.891 | 0.139 | — |
DGraphDTA | GCN & GCN | 0.904 | 0.126 | 0.903 |
Compared with DeepDTA, WideDTA and GraphDTA, the proposed model with three-layer GCNs has significant performance improvement. All metrics for prediction, including CI, MSE and Pearson correlation coefficient, have been significantly improved. For MSE metric, DGraphDTA can reach 0.202 and 0.126 for two datasets. The spatial structure and topological information of molecule and protein contain a lot of binding information, especially proteins, whose spatial structure determines their binding sites and functions. By constructing their graphs and the corresponding GCNs, their features and spatial information can be effectively encoded into representation, and then the affinity can be predicted accurately.
In the benchmark proposed by DeepDTA, there is another metric, rm2. Therefore, for a more comprehensive assessment of DGraphDTA, rm2 is also used for a better evaluation. Tables 9 and 10 display the rm2 results of the predictions of DGraphDTA and other methods.
Method | Proteins and compounds | rm2 |
---|---|---|
KronRLS | S–W & Pubchem Sim | 0.407 |
SimBoost | S–W & Pubchem Sim | 0.644 |
DeepDTA | CNN & CNN | 0.630 |
DGraphDTA | GCN & GCN | 0.700 |
Method | Proteins and compounds | rm2 |
---|---|---|
KronRLS | S–W & Pubchem Sim | 0.342 |
SimBoost | S–W & Pubchem Sim | 0.629 |
DeepDTA | CNN & CNN | 0.673 |
DGraphDTA | GCN & GCN | 0.786 |
The two tables illustrate that the prediction performance of DGraphDTA is better than that of DeepDTA, which achieves rm2 of 0.700 and 0.786. Thus, the prediction and generalization performances of DGraphDTA are better than those of other methods.
(13) |
Threshold: 6 Å | Threshold: 8 Å | Threshold: 10 Å | |
---|---|---|---|
Accuracy | 98.3% | 98.4% | 96.8% |
The table illustrates that the contact map predicted by Pconsc4 is basically consistent with the actual contact map, which can reach an accuracy of 98% with a threshold of 8 Å. It also indicates that the contact map predicted by Pconsc4 can show the spatial structure of the protein to a certain extent, so it can be used in the prediction of affinity.
In addition, we used the actual contact map (with a threshold of 8 Å) and the contact map predicted from Pconsc4 to train two independent models to predict the affinity with DGraphDTA using the same training and test sets. There are 12016 drug–target pairs in the training set and 2451 drug–target pairs in the test set that cover these 35 proteins in the KIBA dataset. The results are shown in Table 12. It can be seen from Table 12 that the predictions using the contact map predicted by Pconsc4 are basically the same as those using the actual contact maps. The result with Pconsc4 is slightly better than that with actual contact map. On the one hand, because the actual protein structure is more or less missing some amino records, the actual contact map obtained is only a part of the whole map, which may lose some structural information. On the other hand, Pconsc4 uses a combination of predictions with different thresholds for further analysis. The output contact map is not the result under a certain threshold, but a more comprehensive contact map. Whether using the contact map predicted by Pconsc4 or the actual contact map, the prediction performance of the training model has declined compared with the results using the whole training set, because the 12016 drug–target pairs that can cover the 35 proteins in the training set are only a small part of the whole original data set (with 98585 pairs in the training set).
Contact map type | CI (std) | MSE (std) | Pearson (std) |
---|---|---|---|
Contact map (actual) | 0.863 | 0.228 | 0.810 |
Contact map (Pconsc4) | 0.861 | 0.212 | 0.825 |
The residues of proteins have various properties, such as hydrophobicity, aromaticity, solubility, etc. These properties will be reflected by various non-bonded interactions such as hydrophobic forces and hydrogen bonds, and influence the binding of proteins. So this information cannot be ignored when binding with small molecules. In the sequence-based DTA prediction, if only the residue type is considered, the sequence will be regarded as a symbol string, and the important property will be ignored. In DGraphDTA, the information and the topological connection between residues will be convoluted and extracted by the GNN, so it can extract the spatial structure and attribute information of the protein, and represents it more comprehensively.
It is worth mentioning that many protein structure prediction methods have emerged, so with the further improvement of their accuracy, the performance of DGraphDTA will also be improved. At the same time, due to the limitation of our hardware environments, only three layers of GNN are explored. When there are better GPUs to explore more types of GNN (such as more layers), there may be a better prediction result. In addition, the speed of the method much depends on the speed of the sequence alignment and contact map prediction of Pconsc4. Therefore, when the processes of these two aspects are accelerated, the prediction will be more rapid. The code of DGraphDTA and the relevant data are freely available at: https://github.com/595693085/DGraphDTA.
This journal is © The Royal Society of Chemistry 2020 |