Shae-Lynn J.
Lahey
and
Christopher N.
Rowley
*
Memorial University of Newfoundland, St. John's, Newfoundland and Labrador, Canada. E-mail: crowley@mun.ca
First published on 23rd January 2020
Drug molecules adopt a range of conformations both in solution and in their protein-bound state. The strain and reduced flexibility of bound drugs can partially counter the intermolecular interactions that drive protein–ligand binding. To make accurate computational predictions of drug binding affinities, computational chemists have attempted to develop efficient empirical models of these interactions, although these methods are not always reliable. Machine learning has allowed the development of highly-accurate neural-network potentials (NNPs), which are capable of predicting the stability of molecular conformations with accuracy comparable to state-of-the-art quantum chemical calculations but at a billionth of the computational cost. Here, we demonstrate that these methods can be used to represent the intramolecular forces of protein-bound drugs within molecular dynamics simulations. These simulations are shown to be capable of predicting the protein–ligand binding pose and conformational component of the absolute Gibbs energy of binding for a set of drug molecules. Notably, the conformational energy for anti-cancer drug erlotinib binding to its target was found to be considerably overestimated by a molecular mechanical model, while the NNP predicts a more moderate value. Although the ANI-1ccX NNP was not trained to describe ionic molecules, reasonable binding poses are predicted for charged ligands, but this method is not suitable for modeling charged ligands in solution.
Development of accurate models of potential energy terms of protein–ligand binding and their optimal parameters is a longstanding objective in computational chemistry. The electrostatic,1,2 repulsive, and dispersion3,4 interaction terms have been developed actively; however, accurate representation of intramolecular potential energy of the ligand is particularly challenging and no complete, general solution has been developed. Current force fields approximate intramolecular forces using simple but generally effective terms that were introduced more than 50 years ago,5 where bond angles and stretches are described with harmonic potentials (i.e., spring-like) and torsional barriers are defined as the sum of a handful of cosine functions. Force fields for drug-like compounds are particularly difficult to develop because of the enormous variety of chemical motifs, which often feature complex chemical effects like conjugation, hyperconjugation, and aromaticity. This is compounded by the enormous variety of chemical motifs that are possible in chemical drug space, where each could require a distinct set of parameters. For example, the proprietary OPLS3 force field defines 124 atom types and 48142 torsional parameters.6 Other methods provide options to reparameterize force fields automatically using ab initio calculations,7–11 although this complicates the simulation workflow and can be computationally expensive.
Recently, machine-learned neural network potentials (NNPs) have emerged as an alternative to conventional MM force fields.12 The ANI models13,14 developed by Roitberg and coworkers define the atomic positions in terms of a set of “symmetry functions”,15 which are constructed from the position of a given atom relative to nearby atoms. A neural network is trained to reproduce the high-level ab initio electronic energies (i.e., CCSD(T)) from these data. These potentials are remarkably robust and predict the structures and relative stabilities of molecular conformations across a broad set of chemical structures with similar accuracy to the high-level ab initio data they were trained to reproduce. The computational cost of NNPs is comparable to molecular mechanical models, so they can be used to perform nanosecond (ns) length simulations of molecules containing dozens or hundreds of atoms routinely.
Here, we present a strategy to simulate protein–ligand complexes using a machine-learned NNP to represent the intramolecular interactions of the ligand. This model is embedded inside a conventional MM force field for the protein and solvent, so established models for these components can be used without modification. We call this method NNP/MM, as it functions the same as Quantum Mechanical/Molecular Mechanical (QM/MM) models do, but with the NNP used in place of the QM method. This method is tested for its ability to predict the poses of protein-bound drugs in comparison to electron density distributions determined by X-ray crystallography. The Gibbs energies for restraining the ligands to their bound conformations are calculated using NNP/MM and compared to the CGenFF force field.
(1) |
where r is the coordinates of the whole system, rMM is the coordinates of the ligand environment, and rNNP is the coordinates of the ligand. The MM region is represented using a conventional MM force field, so is calculated in the normal fashion for an additive force field. For non-covalent protein–ligand binding, the term is the conventional MM non-bonded interactions between the protein and the ligand, which is simply the sum of Lennard-Jones and pairwise coulombic interactions between the NNP atoms and MM atoms (eqn (2)).
(2) |
This functions similarly to mechanically-embedded QM/MM models,16 where the NNP serves as the “QM” model embedded within the MM system. This method can be employed in many established simulation codes without modification because they can implemented using existing QM/MM features, which allow the energy and forces of a critical subsection of the system to be calculated using an external method.
The immediate advantage of this method is that highly-accurate intramolecular forces can be calculated for ligands without parameterization and without modifications to current molecular simulation codes. A limitation of this approach is that the protein–ligand interactions are still calculated by the CGenFF/CHARMM electrostatic and Lennard-Jones terms. The development of efficient NNPs that are capable of describing the entire system could provide more accurate and non-empirical protein–ligand binding energies.
There have been several reports where QM/MM simulations were used to model protein–ligand complexes.17–19 The drawback of these QM methods is that typically they use semi-empirical quantum mechanics in order to calculate the energy and forces of the ligand sufficiently quickly to perform sufficiently long MD simulations. These methods generally are less accurate than the ANI NNPs for the calculations of the relative conformational stability of ligand conformations and the computational cost is generally greater. One advantage of QM/MM methods over the NNP/MM method used here is that the electron density of the ligand can be polarized by the protein and solvent (i.e., through electrostatic-embedding QM/MM16). This is not possible for the NNPs used here because these methods do make any calculation of the electron density of the ligand, so they are effectively mechanically embedded.
The calculation of the erlotinib potential energy surface was performed using ORCA 4.2.1.28 Optimizations with constraints on the amine torsional angle were performed using the resolution of identity 2nd-order Møller–Plesset theory (RI-MP2) with the def2-TZVP basis set.29 Single point energy evaluations were performed at these optimized structures using Domain-based Local Pair Natural Orbital – Coupled Cluster Singles and Doubles with perturbative triples30 with the def2-TZVP basis set (DLPNO-CCSD(T)/def2-TZVP//RI-MP2/def2-TZVP) to generate the QM potential energy surface.
This term can be calculated by defining the root-mean-square deviation (RMSD) of the ligand relative to its bound conformation (ζ) and then calculating the Gibbs energy required to impose a harmonic restraint on the RMSD so that the ligand is restricted to hold its bound conformation. This procedure is performed for the ligand in solution and in the site to obtain Gibbs energies for restricting the conformation of the ligand in each of these states. The difference of these energies provides the conformational or “strain” component of the absolute binding energy (ΔGcons).
Using umbrella sampling, the potential of mean force (PMF) can be calculated as a function of the RMSD. Integration of this PMF biased by the harmonic restraining function provides the ΔGcons,site/solvent (eqn (3)).
(3) |
These PMFs are calculated from an umbrella sampling simulation where the windows were separated by 0.5 Å and a harmonic biasing potential with a spring constant of 50 kcal mol−1 Å−2 was used. Each window was sampled by performing a 1 ns equilibration simulation followed by a 4 ns sampling simulation. The PMF was constructed from the umbrella sampling simulations using Weighted Histogram Analysis Method (WHAM) with statistical uncertainties of the profiles estimated by bootstrap analysis.36–38
These calculations are performed for the ligand bound to the protein and in solution to yield ΔGcons,site and ΔGcons,solvent, respectively. The difference of these two energies provides ΔGcons (eqn (4)).
ΔGcons = ΔGcons,site − ΔGcons,solvent | (4) |
One notable success of the NNP/MM potential is in predicting the binding pose of erlotinib to the epidermal growth factor receptor (EGFR). The core scaffold of this drug is composed of amine-linked ethynyl-phenyl and quinazoline rings. Crystallographic structures of the protein-bound complex show the quinazoline ring bound in the adenosine-binding site while the ethynyl-phenyl group binds in a pocket formed by the T702, T830, and K721 residues. The binding pose predicted by CGenFF is inconsistent with the XRD data, in which the two rings form a more acute angle relative to each other (ϕ1 = 63 ± 1°, ϕ2 = 4 ± 1°). The simulation using the NNP/MM model is more consistent with the crystallographic data, (ϕ1 = 44 ± 1°, ϕ2 = 4 ± 1°).
Surprisingly, the poses predicted for the ligands that contain charged functional groups (2HYY, 3ETA, and 3EIG) are reasonable even though the ANI-1ccX potential was not designed to describe charged species and none of the molecules this NNP was trained for were charged.
Amongst the neutral ligands, the NNP/MM conformational energies are generally similar in magnitude to the CGenFF strain energies. This indicates that the ANI-1ccX model can achieve similar results to the CGenFF model despite the lack of any explicit parameterization for these molecules. The conformational energies of 4HJO (erlotinib bound to EGFR) show the largest difference, with the NNP/MM strain energy being 4.7 kcal mol−1 smaller than the CGenFF strain energy. The high strain predicted by the CGenFF model is due to the amine functional group of erlotinib holding a pyramidal geometry in the solution simulations, creating a large energetic penalty to force the drug into its bound conformation. In the NNP/MM simulation of erlotinib in solution, the amine group remains close to a co-planar geometry with respect to the quinazoline ring, with a moderate skew in the dihedral angle between the phenyl group and the amine.
The ligands that contain charged functional groups (2HYY, 3ETA, and 3EIG) have anomalously high conformational energies. This issue originates from the use of the ANI-1ccX NNP, which was only trained on neutral molecules. This NNP predicts reasonable geometries of the ammonium and carboxylate groups in these molecules, but these ionic functional groups form spurious intramolecular contacts in the solution NNP/MM MD simulations. For example, the ligand of 3EIG adopts a conformation where the carboxylates groups are in close contact, rather than repelling each other like they should (see ESI†). This results in the stabilization of regions of the PMF corresponding to large structural deviations from the bound pose. As the NNP(ANI-1ccX) model was not designed for the description of charged molecules like this, it is unsuitable for calculating their conformational energies.
Extensive MD simulations are needed to calculate ΔGcons by calculating the PMF of the RMSD, but these simulations were completed at a modest computational cost because of efficient implementations of the ANI model for execution on graphical processing units (GPUs). For example, the NNP/MM MD simulations of imatinib (69 atoms) executed at a rate of 3.4 ns per day on a single Titan Xp NVIDIA GPU. Even faster performance is anticipated after the planned integration of NNPs directly into NAMD and other molecular simulation codes.
Empirical force fields are parameterized in an internally consistent manner, so it is possible that the MM parameters used to describe the non-bonded interactions between the ligand and its surroundings will not be optimal for the NNP/MM term. In particular, the balance between the MM ligand–water, ligand–protein, and the NNP ligand intramolecular dispersion interactions will not necessarily be consistent.3,4 This issue has been addressed in some QM/MM models by defining new parameters for the QM–MM Lennard-Jones terms.40,41 Nevertheless, the common practice has been to parameterize the intramolecular terms of ligands to gas phase potential energy surfaces, so the ANI-1ccX should a suitable replacement for these terms. If there were a serious inconsistency between the NNP and MM interaction energies, it would likely lead to a systematic difference in the conformational energies of the ligands, but the CGenFF and NNP/MM conformational energies are in close in magnitude for 1XOZ, 2W6N, 3EYG, and 4NCT.
Fig. 2 The potential of mean force for the deviation of the structure of erlotinib from its bound conformation when it is bound to EGFR (top, PDB ID: 4HJO) and when it is in solution (bottom) calculated using the hybrid NNP/MM and pure MM(CGenFF) methods. |
The geometry of the erlotinib amine linker and its aromatic substituents deviates sharply from the bound pose in the CGenFF solution structure (Fig. 3(b)); the amine is partially pyramidalized and the aromatic substituents are skew to each other. In contrast, in the NNP/MM simulation, the amine predominantly remains in a planar geometry, conjugated with the quinazoline and phenyl rings.
Fig. 3 (a) The fragment of erlotinib used to calculate the potential energy surface. Truncated groups are shown in grey. (b) Representative solution conformations of erlotinib for the CGenFF MM model (green) and NNP/MM model (red) overlaid with the ligand pose from the 4HJO crystal structure. (c) The relaxed potential energy surfaces for rotation around the erlotinib fragment amine bonds calculated using (i) DLPNO-CCSD/def2-TZVP//MP2/def2-TZVP (ii) NNP(ANI-1ccX) and (iii) the CGenFF MM model. Energies are in kcal mol−1. |
The potential energy surface corresponding to rotations around the amine torsion angles of erlotinib is presented in Fig. 3(c). The minima on the CGenFF surface corresponds to structures where the amine is significantly pyramidal and the substituent phenyl and quinazoline rings adopt angles that reduce steric repulsion between them. The ANI-1ccX surface is consistent with the DLPNO-CCSD(T) surface, where there is a broad global minimum centered around ϕ1 = 0°, ϕ2 = 0° and the amine nitrogen holds a planar arrangement with the aromatic groups.
The failure of the CGenFF force field stems from the lack of a distinct atom type for amines conjugated with aromatic rings. While it would be possible to adjust the parameters of the CGenFF force field to improve its description of the arylamine potential energy surfaces, this introduces a new fitting stage and requires computationally demanding QM calculations to provide the target data. Generally, it is not immediately apparent where a general-purpose force field will fail. By using NNPs to calculate these interactions, these issues are avoided entirely because energy surfaces with near-CCSD(T) accuracy can be generated efficiently and without the need to parameterize the intramolecular potential energy surface explicitly.
These methods can be incorporated directly into existing confine-and-release methods to calculate the absolute binding energy because these methods include a step where the ligand's conformation is constrained to its bound pose. In several cases, the conformational energies calculated using the NNP(ANI-1ccX)/MM model were similar to those predicted by the popular general-purpose CGenFF force field.
The scope of the ligands that can be modelled is limited by the choice of the NNP, which is particularly true for the ANI-1ccX model. Firstly, the ANI-1ccX model only supports molecules containing C, N, O, and H, although many important drug molecules also contain sulfur, phosphorus, boron, or halogen atoms. Secondly, ANI-1ccX NNP was not designed for modeling ionic species. Although the binding poses predicted for these compounds were reasonable, the NNP spuriously favored high-energy conformations in solution. This reflects that the ANI-1ccX training data did not include ionic species. ANI-type models that are trained to describe molecules containing sulfur and halogens, as well as charged molecules, are currently in development.42
Footnote |
† Electronic supplementary information (ESI) available: Details of crystallographic structures and example NNP/MM NAMD input file. See DOI: 10.1039/c9sc06017k |
This journal is © The Royal Society of Chemistry 2020 |