Qun Su‡
,
Jike Wang‡,
Qiaolin Gou‡,
Renling Hu,
Linlong Jiang,
Hui Zhang,
Tianyue Wang,
Yifei Liu,
Chao Shen
,
Yu Kang
,
Chang-Yu Hsieh
* and
Tingjun Hou
*
College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China. E-mail: tingjunhou@zju.edu.cn; kimhsieh@zju.edu.cn
First published on 17th February 2025
Accurate estimation of protein–ligand (PL) binding free energies is a crucial task in medicinal chemistry and a critical measure of PL interaction modeling effectiveness. However, traditional computational methods are often computationally expensive and prone to errors. Recently, deep learning (DL)-based approaches for predicting PL interactions have gained enormous attention, but their accuracy and generalizability are hindered by data scarcity. In this study, we propose LumiNet, a versatile PL interaction modeling framework that bridges the gap between physics-based models and black-box algorithms. LumiNet utilizes a subgraph transformer to extract multiscale information from molecular graphs and employs geometric neural networks to integrate PL information, mapping atomic pair structures into key physical parameters of non-bonded interactions in classical force fields, thereby enhancing accurate absolute binding free energy (ABFE) calculations. LumiNet is designed to be highly interpretable, offering detailed insights into atomic interactions within protein–ligand complexes, pinpointing relatively important atom pairs or groups. Our semi-supervised learning strategy enables LumiNet to adapt to new targets with fewer data points than other data-driven methods, making it more relevant for real-world drug discovery. Benchmarks show that LumiNet outperforms the current state-of-the-art model by 18.5% on the PDE10A dataset, and rivals the FEP+ method in some tests with a speed improvement of several orders of magnitude. We applied LumiNet in the scaffold hopping process, which accurately guided the discovery of the optimal ligands. Furthermore, we provide a web service for the research community to test LumiNet. The visualization of predicted inter-molecular energy contributions is expected to provide practical value in drug discovery projects.
The recent trend of utilizing deep learning (DL) to predict molecular properties has sparked innovative data-driven approaches,10–15 which may be adapted to estimate ABFE efficiently and accurately. Notable examples of such approaches include IGN,16 OnionNet,17 and GIGN.10 These first-generation DL solutions predominantly rely on learning statistical patterns from training data, often neglecting well-established physical principles. However, the experimental data used for training and testing DL models, compiled from diverse sources, is not ideal. Undocumented or neglected variations in experimental conditions inevitably result in disparities in binding affinity reports for the same compound. Additionally, bias is prevalent in modern scientific data. Despite a wide range of data sources for binding affinity, the chemical space covered remains limited in most cases. These data issues severely impede DL models from accurately capturing the authentic patterns of protein–ligand interactions, making them susceptible to dataset-dependent bias.18,19
A promising approach to address the aforementioned challenge lies in minimizing the dependence of DL models on data volume and exploring interaction patterns grounded in physics.20–25 For instance, in 2022, Moon et al. proposed PIGNET, a physics-inspired DL scoring function,20 which achieved competitive results on the CASF-2016 (ref. 26) benchmark but showed limited improvement in scoring performance. In 2024, the same team proposed an enhanced version, PIGNET2,21 which further incorporated data augmentation for active compounds. This upgraded model notably outperformed those trained solely on PDBbind. However, while PIGNET aims to develop a physics-aware model, its integration of DL and physical principles still have room for improvement. The current methods directly map a high dimensional latent vector (learned by a graph neural network, GNN) to a few physical parameters for scoring, lacking direct feedback on these parameters as in supervised learning. This leads to an ineffective incorporation of domain knowledge, and the model's reliability remains heavily dependent on the quality of training data. In another line of research, PBCNET22 has been proposed to predict the relative binding free energy (RBFE) between a pair of ligands towards the same receptor. By utilizing approximately 0.6 million training samples containing ligand pairs and proteins, PBCNET achieved comparable results to the FEP+ method on both the FEP1 (ref. 27) and FEP2 (ref. 28) tests through active learning. This method focuses on structural variations between ligand pairs, facilitating data-driven learning of possible RBFE values rather than directly modeling protein–ligand interaction patterns. Therefore, in the current scenario with limited labeled data, RBFE methods have a significant advantage in predicting accuracy. However, in practical ABFE prediction, estimating binding free energies through RBFE calculations limits its applicability. Some studies emphasize learning structural information to infer ABFE. For example, DSMbind24 introduced an interesting hypothesis: assuming that the default crystal structure represents the lowest protein–ligand energy state, it employs SE(3) denoising score matching (DSM) to estimate the likelihood of complexes, thereby inferring binding free energy. GenScore,23 on the other hand, initially utilizes a mixture density network (MDN) to fit the distance distribution of residue-atom pairs in protein–ligand complexes, learning more structural information. The model is then fine-tuned with actual binding free energy labels, delivering remarkable results on the FEP2 dataset developed by Schindler.28 While both GenScore and DSMbind exhibit notable statistical correlation between predicted and experimental energies, their actual errors remain substantial for practical applications. For instance, when dealing with specific systems, direct application is challenging and requires a certain number of known ABFE values for correction. In summary, the reliance of current DL-based models on data may introduce biases, thereby limiting their generalizability and practical applications. Additionally, the inherent opacity of machine learning (ML) algorithms contributes to a lack of clear and intuitive physical explanations in existing models. Therefore, there is an urgent need to develop improved ABFE prediction models with robust generalization, wide applicability, and a high level of interpretability.
In this study, we present LumiNet, an innovative approach designed for robust PL interaction modeling and accurate calculation of ABFE. We adopt a ‘divide and conquer’ approach, fully leveraging the powerful structural representation capabilities of deep learning and well-established physical principles. LumiNet transformed the original ABFE estimation task into a process that leverages structural data of protein–ligand complexes to calculate ‘effective’ atomic distances, which are then incorporated into a physics-driven scoring function for ABFE computation. Specifically, in LumiNet, we developed a subgraph transformer to extract multiscale information from molecular graphs to fit distances between PL pairs, providing a comprehensive comprehension of the underlying structural features. Then by fine-tuning the atom pair distance with geometric neural networks, we convert them into key parameters for classical force fields, used to calculate various non-bonded interactions, encompassing van der Waals forces, hydrogen bond interactions, hydrophobic interactions, and metal interactions.29 Furthermore, we introduce Trotor to account for ligand entropy, thereby enhancing the model's generalization ability. These energy terms are computed for each atomic pair and can be visualized through an intuitive web interface. It means we can swiftly locate atom pairs or groups with significant interactions across various interaction types, directly aiding our analysis in drug discovery. The benchmark tests on CASF-2016 resulted in a Pearson correlation coefficient (PCC) of 0.85. When evaluated on the FEP1 and FEP2 holdout sets, the model achieved PCC values of 0.65 and 0.46, respectively. This performance is competitive with all current DL approaches for direct ABFE prediction. The notable strengths of LumiNet in prediction accuracy and computational efficiency were confirmed by the predictions on the SARS-CoV-2 inhibitor dataset30 and the PDE10A inhibitor dataset.31 To improve portability, stability, and performance on target receptors, we adopted a semi-supervised learning approach that promotes structural awareness, yielding more reliable results. On the FEP1 and FEP2 datasets, the average PCC values are 0.696 and 0.534, respectively, demonstrating significant improvement over PIGNET2 on FEP1 (0.64) and GenScore on FEP2 (0.51). Impressively, fine-tuning with only 6 data points in a single iteration achieved a PCC of 0.73 on FEP1, with an RMSE of 0.82 kcal mol−1, closer to the actual experimental values than the FEP+ RMSE of 1.08 kcal mol−1. Moreover, we applied LumiNet in the scaffold hopping process reported by Wu et al.,3 which accurately guided the discovery of the optimal ligands. We provided a web service to compute ABFE and pinpoint key atomic interactions in PL complexes at https://www.ai2physic.top.
The first module in LumiNet incorporates encoding methods for proteins and ligands, accompanied by an MDN designed to fit the distance (dij). This MDN functions as a pre-training model, learning the structural-to-distance mapping. Extracting protein pockets at 5 Å yielded less comprehensive global information than those at 10 Å, but atomic-level features provided more refined insights. We explored using larger fully atomic pockets, but the computational complexity escalates quickly. With a 10 Å pocket, the training time increased by roughly 11-fold. While there was an improvement in prediction accuracy, it didn't justify the steep rise in computational cost. Conversely, using smaller pockets resulted in a significant drop in prediction performance. Ultimately, we found a 5 Å pocket to offer the best trade-off between computational efficiency and predictive accuracy. For protein encoding, a Graph Transformer is employed, whereas ligand encoding utilizes a Subgraph Transformer (SubGT). Their respective coordinates are separately fed into the network. In the SubGT block, ligands are decomposed into substructures and subgraphs, encoded individually, and then aggregated via centroid, subgraph, and context blocks after 6 iterations to derive the feature representation of each atom. These representations are then concatenated with protein–ligand features to fully capture structural information, which the MDN module then uses to fit the distance. The second module incorporates an interaction block and four energy terms for physical scoring, serving to fine-tune the pre-trained model. The information extracted by the pre-trained model, combined with real distance data, is inputted into the bidirectional EGCL module, functioning as an interaction block for additional structural refinement to improve the fit of This block determines interaction edges based on protein–ligand atom distances, enabling authentic information exchange. Unlike the pre-training phase that inferred atom pairs information in a result-driven manner, this method directly aggregates information across atom pairs. By integrating these two approaches, the objective of capturing structural information smoothly shifts from dij to
The resultant information is then evaluated by four energy terms and further integrated with the mdn score to compute the final score. A more detailed description of the model's architecture is provided in the Methods section.
First, we tested our pre-trained model on CASF-2016 to validate its screening power. We found that, through the efficient information extraction of the Subgraph Transformer and the sufficient fitting of the MDN, the enrichment factor for the top 1% of ligands reached a noteworthy value of 26.5, suggesting that the structure module has proficiently learned the structural information of protein–ligand interactions. Next, we fine-tuned our pre-trained model using two distinct loss functions and individually assessed their efficacy in predicting ABFE. Unfortunately, both approaches exhibited significant limitations, rendering them insufficient for achieving high-precision predictions of ABFE.
In the encoding module for small molecules, we used the Subgraph Transformer to enable the model to understand ligand information more deeply. This is logical given that small molecules collectively possess extensive structural diversity, and the model tends to use them to distinguish overall structural states. However, this approach may introduce bias as the model heavily depends on ligands for ABFE prediction. To mitigate this, we effectively convert the structural information of protein–ligand interactions into complex distance distributions, encapsulating the overall structural context despite differing labels. The Subgraph Extractor explores each central atom and its adjacent atoms to construct new subgraphs, which are then aggregated via the GT module. This process surpasses the limitations of the first-order Weisfeiler–Leman (1-WL) isomorphism test,20 which can capture higher-order information and integrate structural information at different levels, resulting in superior expressive ability.
Our investigation revealed that direct fine-tuning substantially improves the scoring performance of the structure module, albeit with limitations. As illustrated in Table S1,† we examined two fine-tuning techniques: one using Mean Squared Error (MSE) as the loss function and the other utilizing the Pearson correlation coefficient. The results indicate that the latter outperforms the former in terms of the Pearson correlation coefficient on the final test set. However, while the Pearson-based method predominantly yields results in the 100–140 range (Fig. 2), the MSE-based method aligns closely with the actual value distribution. This difference mainly stems from the direct mapping of the fitted distance values to ABFE through a simple logarithmic transformation. Notably, using the Pearson correlation coefficient as the loss minimizes trade-offs between fitting distance and scoring tasks due to their inherent correlation. Conversely, with MSE as the loss, balancing both objectives becomes more challenging, resulting in slightly inferior performance. For ranking tasks, the Pearson-based method is preferable for training, whereas the MSE-based method is more suitable for precise ABFE calculations. To meet both requirements, we employ transfer learning on the pre-trained model to fine-tune the parameters of physical energy terms, thereby achieving ABFE prediction results with exceptional generalization and accuracy.
Despite the lack of intricate data preprocessing or extensive augmentation, the model still effectively learned the patterns and information governing protein–ligand interactions. This success can be attributed to the fusion of physical scoring and mixed density scoring. We acknowledge that energy-based scoring methods inherently rely on multidimensional approximations, especially for non-bonded interactions where energy terms overlap without clear boundaries. Consequently, we incorporated linear fitting when combining energy terms, enabling the model to independently learn their weight relationships. Furthermore, to bridge the gap between the four energy terms and ABFE, we transformed the mdn score scoring into a bias energy term, thereby enhancing the rationality of each term (as detailed in the Methods section). For feedback propagation, we carefully designed multiple loss functions to ensure the model's interpretability. Although the actual values of individual energy terms were unknown during training, the predicted values proved to be significant. Naturally, if these energy values were derived through high-throughput calculations and integrated into our training model, its performance would undoubtedly be further improved.
However, it is evident that the model's predictive performance has markedly declined across various targets in the FEP2 dataset. In contrast to datasets like FEP1, molecules in FEP2 undergo more transformations, such as changes in net charge and charge distribution, ring openings and core hopping.28 These transformations lead to substantial alterations in solvent interactions. Additionally, the ligand sets display a slightly broader range of structural diversity compared to previous benchmarks. For our model, the influence of solvent interactions is significant. For instance, in the SYK target, a ligand with two aromatic rings extending into the solvent results in a notable discrepancy between predicted and actual values. Similar situations can be observed with small molecules in the PFKFB3 target. Since solvent molecules were not included in the modeling process, the model is less sensitive to changes in this aspect of the energy. Regarding net charge changes, the model can partially distinguish them. For ligands in the c-Met target, which involve six perturbations with a change in net charge, the predictive performance aligns with the average level in the FEP1 dataset. This is attributed to the inclusion of nuclear charge and covalent bonds in the input features. Although charge interactions are not directly included in the energy terms due to the computational complexity of accurately calculating atomic partial charges and discrepancies in semi-empirical charges, the bias term can be learned from embeddings, indirectly compensating for this process. Overall, the results on the FEP1 and FEP2 datasets highlight the model's accuracy and generalization ability for ABFE predictions. This represents a relatively successful attempt at AI scoring based on physics and structure.
Moreover, we designed a new test utilizing a dataset curated by Tosstorff et al.31 in 2022, which contains 1162 PDE10A inhibitors. PDE10A, a crucial regulator of the striatal signaling pathway, is a promising target for schizophrenia due to its capacity to ameliorate abnormal striatal conditions. It should be noted that the protein structures in this dataset differ significantly from those in the training set, rendering testing on this practically significant dataset more meaningful and convincing. To ensure experimental fairness, our testing protocol follows the approach outlined in Tosstorff's paper. Furthermore, instead of relying on the models fine-tuned with ABFE labels from PDBbind models, we directly used the protein–ligand structures and ABFE data provided in the article for fine-tuning, exclusively building upon the pre-trained model. We implemented seven different data partitioning techniques, including three rolling temporal splits where the test data is evaluated subsequent to the training data, enabling an assessment of the model's prospective performance. Additionally, we used random splits and three structure-based combination mode splits, wherein the model is trained on the data from two binding mode classes to predict a third, assessing the model's extrapolation capability, akin to a scaffold hop scenario.
During LumiNet training, we also do not update the pre-trained model parameters, instead opting to fine-tune solely the physical scoring component while retaining the original structural parameters. This strategy ensures rapid adaptation to new systems. As shown in Table S4,† we selected several of the best-performing and most classical models reported for this dataset as baselines, including four DL-based models and four classical method-based models. Among them, the 2D3D hybrid model31 combines AttentiveFP37 and RF-PLP,38 while the “extend data” approach signifies the model's utilization of both docked poses and supplementary training with molecules possessing only 2D structures. Moreover, we included the models developed by Isert et al.39 in 2024, who predicted ABFE using electron density-based geometric DL and tested it on PDE10A, as part of our baselines.
The comprehensive results for the other models are shown in Tables 1, S4 and S5,† illustrating that LumiNet achieves a Spearman correlation coefficient higher than 0.5 across various data partitioning scenarios, achieving state-of-the-art results in five partitioning methods. This suggests the robust applicability of the models across different systems, with the structural knowledge gained from pre-training exhibiting significant transferability. However, a notable challenge lies in the models' inconsistency in maintaining stable performance across diverse data partitioning scenarios. Nevertheless, we remain optimistic that with the continual expansion of datasets in this field, this challenge can be overcome. As the model gains a deeper understanding of the interactions between PDE10A and ligands, it is anticipated to perform well even in the presence of temporal and binding mode variations. We eagerly look forward to the practical application of these findings to accelerate research advancements on this target.
Model | Random | Split 2011 | Split 2012 | Split 2013 | Mode 1 | Mode 2 | Mode 3 | Average |
---|---|---|---|---|---|---|---|---|
a The above seven data partitioning methods were used for each model, modes 1, 2, and 3 define different protein–ligand binding modes, all sharing a key feature: a hydrogen bonding acceptor atom of the ligand is in a hydrogen bonding distance with the amino group of the Gln726 sidechain. However, structural variations of ligands lead to different binding configurations. Based on these differences, ligands are classified into three categories: aminohetaryl-C1-amide, C1-hetaryl-alkyl-C2-hetaryl, and aryl-C1-amide-C2-hetaryl.31 The Pearson correlation coefficient was used as an indicator, where bold numbers represent the best results. | ||||||||
BCP-based graph | 0.525 | 0.246 | −0.009 | 0.480 | 0.207 | 0.064 | 0.139 | 0.236 |
NCP-based graph | 0.601 | 0.300 | 0.331 | 0.559 | 0.299 | 0.355 | 0.328 | 0.390 |
2D3D hybrid | 0.712 | 0.608 | 0.559 | 0.637 | 0.453 | 0.333 | 0.494 | 0.541 |
LumiNet | 0.799 | 0.587 | 0.736 | 0.687 | 0.564 | 0.640 | 0.490 | 0.643 |
Recently, the core hopping FEP method has successfully conducted RBFE calculations for minor and limited scaffold hopping cases. However, most scaffold hopping procedures typically involve substantial topology changes of the entire ligand. To accurately predict the binding free energies of ligands following scaffold hopping, it is essential to use ABFE calculations rather than RBFE calculations. In 2022, Wu et al.3 utilized ABFE calculations to guide scaffold hopping in the design of PDE5 inhibitors. Following their reported procedure, we used Glide to dock L1 to the crystal structure of PDE5-tadalafil (PDB ID: 1XOZ) and to dock L2 to L12 to the crystal structure of PDE5-L1 (PDB ID: 7FAQ). The LumiNet model was then used to perform ABFE predictions in a zero-shot setting. Notably, compounds L1 to L12 were designed based on tadalafil and LW1607, with significant structural differences from their initial structures.
L1 to L4 served as the initial structures. The design of L7 primarily aimed to study the structure–activity relationships (SAR) at the 2-position. As shown in Fig. 4, our prediction results were consistent with the bioassay results, indicating that replacing the 2-position with an oxygen-containing alkane chain led to a significant enhancement in inhibitory activity compared to L1 through L4. Based on the design of L7, L5, L6, L8, L9, L10, and L12 the aim was to explore how the length and charge characteristics of the substituents at the 2-position influence activity. Our prediction results remained consistent with the bioassay results, exhibiting comparable trends. Notably, L12, with a moderately long hydrocarbon chain at the 2-position and a difluoromethoxy substituent at the 5-position, was theoretically supported by both prediction results and SAR analysis.
The original article used the FEP-ABFEP calculation method and achieved a Pearson correlation coefficient of 0.72 between the computational results and experimental activity values. Our LumiNet model, evaluated in a zero-shot setting, exhibited a correlation coefficient of 0.67 with the experimental values. Obviously, both the LumiNet method and the original FEP-ABFEP method demonstrated consistency in prediction accuracy and trends related to lead compound optimization. Therefore, our model offers valuable insights for lead compound optimization and remains applicable even in cases where significant structural differences exist between optimized and initial structures.
To ensure practicality in real-world scenarios, we designed a semi-supervised workflow that accounts for the scarcity of labels for novel targets and the complexity of ligand molecules. We have tried two strategies. The first strategy involves optimizing the entire test set by leveraging predictions from the previous round as pseudo-labels and incorporating them into the model's training process. This method, which shares the results across all targets using a common checkpoint, yielded promising outcomes, as evidenced by a Pearson correlation coefficient of 0.7 after two iterations as shown in Table S6.† However, it is also evident that the deviation from the true values (RMSE) is gradually increasing, attributed to the uncertainty of pseudo-labels. Although hyperparameter tuning can alleviate this issue, as evidenced by improved LumiNet indicators on the FEP2 dataset in Table S7† the model remains susceptible to fluctuations due to insufficient monitoring. To enhance the workflow's robustness, we implemented the second strategy: randomly selecting data with two or more known true value labels and integrating them into the training set for real-time monitoring. Furthermore, we incorporated the pseudo-labels of these data points during training to further strengthen the model. Training was terminated when the RMSE of the true data points reached a minimum.
To ensure the reproducibility and stability of the second strategy experiment, we assigned five random seeds (0–4) for data extraction. However, this approach poses a challenge of potentially being trapped into a local optimum. As the model tends to prioritize the best checkpoint for a limited dataset, complicating generalization to other candidate molecules for the target. Therefore, it is prone to overfitting, when the monitoring data is scarce, as repeated semi-supervised iterative training can exacerbate this issue. To address these limitations, we incorporated more real value data into training and supervision. As the volume of real-value data increases, the predictive performance of the model across various targets gradually improves. When the number of real values reached six, the performance on the FEP1 dataset matched that of the FEP+ method named LumiNet-opt, as shown in Fig. 3D and Table S8.† The RMSE further confirms that our method yields predictions closer to true values. We did not continue to the next iteration, considering this performance a realistic benchmark for our current modeling approach. Although continued iterations may enhance performance on the current test set, they could introduce significant bias when applied to larger datasets, restricting us to a local optimum within the vast search space. Hence, we view this method as a valuable reference, particularly suitable for specific systems.
As data volume increases, we can eliminate this constraint while still achieving good results. Our model can quickly predict the energy values for each atom pair and visually demonstrate key protein–ligand interaction. As shown in Fig. 5, we predicted the affinity between the protein 6ht8 and its ligand, using color differentiation to represent the affinity of each atom pair and setting a threshold to highlight significant interactions. Consequently, our model readily unveils the interaction strength between atom pairs, pinpointing relatively important atom pairs or groups. However, due to current data constraints, the detailed energies provided should be considered as references rather than exact predictions. Nevertheless, during training, if accurate labels for each energy component are provided, the model can learn true interactions, thereby improving prediction accuracy. In conclusion, LumiNet undoubtedly offers researchers a straightforward assessment of protein–ligand interactions.
For the protein, it is also abstracted into a topological graph. Unlike many models, we still represent each atom as a node rather than using residues. This is done to facilitate the calculation of atomic pair energy terms in the subsequent steps. Although there are a large number of atoms in proteins, not all of them actively participate in the binding between the protein and ligand. Therefore, a protein pocket is extracted to reduce the computational load. Additionally, in the processing of edges, due to the complex protein structures, using only covalent bonds as edges to describe the connectivity of the entire topological graph may cause information loss. Hence, we consider not only covalent bonds as edges but also atom pairs within a 6 Å distance as connected edges. To differentiate them, we assign a feature value of 1 to covalent bond edges and 0 to non-covalent bond edges. Finally, the protein is represented as [Gp = (Hp, Ep, Xp)], where Hp denotes node features, Ep denotes edge features, and Xp denotes atomic coordinates.
h0i = W0hhi + b0h; e0ij = W0eeij + b0e | (1) |
qik,l = WQk,lNorm(hil) | (2) |
kjk,l = WKk,lNorm(hjl) | (3) |
vjk,l = WVk,lNorm(hjl) | (4) |
eijk,l = WEk,lNorm(eijl) | (5) |
![]() | (6) |
ĥil+1 = hil + Wh0lDropout(Concatk∈1,…,H(Aggregation_Sumj∈ N(i)(wijk,lvjk,l))) | (7) |
êijl+1 = eijl + We0lDropout(Concatk∈1,…,H(wijk,l)) | (8) |
hil+1 = ĥil+1 + Wh2lDropout(SiLU(Wh1lNorm(ĥil+1))) | (9) |
eijl+1 = êijl+1 + We2lDropout(SiLU(We1lNorm(êijl+1))) | (10) |
For an input graph [G = (H, E, X)], the molecular graph is first split based on the adjacency matrix:
SubGraphi = {i, j ∈ E|Hi, Hj, Hij, Xi, Xj} | (11) |
For the i-th node, the nodes directly connected to it are identified, and these nodes and edges collectively form SubGraphi. The partitioning continues until the generated subgraphs completely cover the input molecular graph. Each subgraph is then processed through the Graph Transformer. For the i-th convolutional layer, the process is defined as follows:
hv(l+1) = GT(l)(G(l)[Nk(v)]), l = 0, 1, …, L − 1 | (12) |
hG = POOL(hv(L)|v ∈ V) | (13) |
This iteration repeats six times until reaching the output layer, where Emb(i|SubGraphi), Emb(j|SubGraphi), and Emb(i|SubGraphj)are computed. These three computations are concatenated, and the linear layer is applied to obtain the feature representation of each node.
mij(pl), mji(pl) = φ(Zij, hi(pl), hj(pl), ‖xi(pl) − xj(pl)‖) | (14) |
![]() | (15) |
It is noteworthy that the module does not perform any coordinate updating operations during its execution, as equivariance is not necessary to achieve in this way. However, even without coordinate updates, we still need to use distance information to guide subsequent computations to more accurately capture interactions between proteins and ligands.
hp,l = Dropout(ELU(BatchNorm(Wp,lConcat(hp, hl) + bp,l))) | (16) |
μp,l = ELU(Wμhp,l + bμ) + 1 | (17) |
σp,l = ELU(Wσhp,l + bσ) + 1.1 | (18) |
πp,l = Softmax(Wπhp,l + bπ) | (19) |
LMDN = −log![]() | (20) |
![]() | (21) |
(1) van der Waals interaction: the van der Waals interaction is primarily calculated using the Lennard-Jones potential formula, as shown in the following equation:
![]() | (22) |
During the data preprocessing stage, we obtained the indices of non-metal atoms, pairing each index of protein atoms with those of ligand atoms to include as many potential pairs of van der Waals interactions as possible. The van der Waals interaction energy is then calculated for each pair. The parameter cij is initialized during preprocessing and updated during model iterations to better approximate the true van der Waals energy values. Finally, the van der Waals interaction energy between the protein and ligand is obtained by summing the energies of all atom pairs.
(2) Hydrogen bond interaction, hydrophobic interaction, and metal interaction: these interactions share the same expression but have different parameters, which need to be learned based on the energy of each interaction type, such as c1, c2 and ω, where eij represents the energy between each atom pair, as shown in the following equation:
![]() | (23) |
(3) MDN score as a bias term: due to incomplete consideration of energy terms such as charge interactions and overlaps or computational deficiencies among various energy terms, we introduce a bias term for correction. Based on the initial model, we compute the distance distribution outputted by the MDN block against the actual distance and modify it using specified weight values to obtain our energy bias term. It is mainly represented by the following formula:
![]() | (24) |
(4) Calculation of absolute binding free energy: the rotation penalty term Trotor aims to consider the entropy loss due to the free rotation of chemical bonds during protein–ligand binding within the binding pocket. We assume that the entropy loss is proportional to the number of rotatable bonds in the ligand molecule. Trotor can be described as follows:
Trotor = 1 + Crotor + Nrotor | (25) |
The total binding free energy can be described as a linear combination of multiple energy terms. The reason for not directly using summation is that there may exist biases that cannot be calculated well, and this approach can minimize errors as much as possible.
![]() | (26) |
Given that our work primarily focuses on high-precision prediction of binding free energies, we mainly use FEP1 and FEP2 datasets for testing. The two challenging datasets are widely employed as important benchmarks by models such as GenScore, PBCNet, and PIGNET2. The FEP1 dataset includes eight targets: BACE, CDK2, Jnk1, MCL1, p38, PTP1B, thrombin, and Tyk2, with a total of 199 data points. The FEP2 dataset comprises eight targets: cdk8, cmet, eg5, hif2a, pfkfb3, shp2, syk, and tnks2, with 264 data points. For all proteins, we select the portion where the distance to the ligand atoms is not more than 5 Å as the molecular pocket and compute the distances between atoms within the protein. Atoms with distances less than 6 Å and having a covalent bond are considered to be connected by an edge. Additionally, based on different interaction rules, we extract the corresponding atom indices for each energy term. The atom indices for the ligand atoms are also extracted accordingly.
L = LMDN + 0.001 × Latom + 0.001 × Lbond | (27) |
LMDN = −log![]() | (28) |
Next, fine-tuning the pre-trained model uses the same database, where ABFE is utilized as the label. All data points are used for training, with the coreset serving as the test set and FEP1 and FEP2 as external test sets. The information learned by the SubGT block and GT block is input into the interactive block to transform the distance distribution into the to fit the parameters of the LJ potential formula. Multiple energy terms are linearly regressed to obtain the final ABFE. Three loss functions are involved in the training process.
L = Lphysical score + Lmdn score + Lvdw | (29) |
Lphysical score = MSE(pred, label) | (30) |
Lmdn score = Pearson(mdn_pred, label) | (31) |
Lvdw = |MSE(vdwpred, 0.8 × label) − delta| | (32) |
The physical score mainly consists of van der Waals energy terms, hydrogen bond energy terms, hydrophobic interaction energy terms, metal interaction energy terms, and an MDN score term. Among them, the vdw score and MDN score are the most important. To ensure the rationality of each energy term, separate loss functions are applied to enhance the model's interpretability. In the end, several energy terms can be printed out individually for analysis.
Finally, the effectiveness of specific systems is optimized through semi-supervised training. Taking the FEP1 dataset as an example, firstly the fine-tuned scoring model in the previous step is utilized to predict the ABFE of 199 PLs in FEP1 dataset. The predicted values are then treated as pseudo-labels. In semi-supervised training, only the physical scoring component is trained, with gradients retained solely for this part, including 536772 parameters. Leading to significantly improved training efficiency. Subsequently, each epoch trains the 199 data points from PDBbind and the 199 data points from FEP1 together. The training is conducted on an RTX Tesla V100 GPU, with a batch size set to 32. On average, each batch iteration requires 0.67 seconds. Consequently, each epoch typically takes approximately 15 seconds to complete. The overall loss function is as follows:
L = Lreal + α × Lvir | (33) |
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4sc07405j |
‡ Equivalent authors. |
This journal is © The Royal Society of Chemistry 2025 |