Efficient calculation of protein–ligand binding free energy using GFN methods: the power of the cluster model†
Abstract
Protein–ligand interactions are crucial in many biochemical processes and biomedical applications, yet accurately calculating the binding free energy of the interactions still remains challenging. In this work, we systematically investigate the performance of a generic force field GFN-FF and some semi-empirical quantum mechanical (SQM) methods (GFNn, n = 0, 1, 2) in terms of the accuracy of the calculated binding free energy. It is found that the performance of the GFN-FF method is quite good in a neutral-ligand system since the Pearson correlation coefficient (rp) is 0.70 and the mean absolute error (MAE) is 5.49 kcal mol−1. However, it may fail in a charged-ligand system (the MAE is 18.98 kcal mol−1). Moreover, we also propose a cluster model (i.e., truncating the protein at a given cutoff) along with the SQM method in the GFN family. Importantly, the GFN2-xTB shows the best performance among the SQM methods (the MAE is 4.91 kcal mol−1 and 10.25 kcal mol−1 in the neutral-ligand and charged-ligand systems, respectively), much better than GFN-FF in the charged-ligand system. Notably, the computing cost of the GFN2-xTB in the appropriate cluster model is even lower than that of the GFN-FF (in the entire complex). The present study sheds some light on the potential power of the GFN family in the efficient calculation of the binding free energy in bio-systems.