Lucía Díaz†
a,
Daniel Soler†a,
Gary Tresadern*b,
Christophe Buyckb,
Laura Perez-Benitob,
Suwipa Saen-Oona,
Victor Guallarcd and
Robert Soliva*a
aNostrum Biodiscovery, Jordi Girona 29, Nexus II D128, 08034, Barcelona, Spain. E-mail: rsoliva@nostrumbiodiscovery.com
bComputational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, B-2340 Beerse, Belgium. E-mail: gtresade@its.jnj.com
cBarcelona Supercomputing Center, Join IRB-BSC Program in Computational Biology, Spain
dICREA, Passeig Lluís Companys 23, E-08010 Barcelona, Spain
First published on 17th February 2020
In silico binding site location and pose prediction for a molecule targeted at a large protein surface is a challenging task. We report a blind test with two peptidomimetic molecules that bind the flu virus hemagglutinin (HA) surface antigen, JNJ7918 and JNJ4796 (recently disclosed in van Dongen et al., Science, 2019, 363). Tests with a series of conventional approaches such as rigid (receptor) docking against available X-ray crystal structures or against an ensemble of structures generated by quick methodologies (NMA, homology modeling) gave mixed results, due to the shallowness and flexibility of the binding site and the sheer size of the target. However, tests with our Monte Carlo platform PELE in two protocols involving either exploration of the whole protein surface (global exploration), or the latter followed by refinement of best solutions (local exploration) yielded remarkably good results by locating the actual binding site and generating binding modes that recovered all native contacts found in the X-ray structures. Thus, the Monte Carlo scheme of PELE seems promising as a quick methodology to overcome the challenge of identifying entirely unknown binding sites and modes for protein–protein disruptors.
The complexity of protein–protein interactions can be understood by considering that many proteins have evolved elongated structures whose domains are usually composed of α-helices and β-sheets placed parallel to the longest protein axis, stretching out a series of extended, narrow and shallow grooves, which are engaged by their protein partners via insertion of a series of side chains in a “ladder-shaped” pattern. Small molecules targeted at the surface interaction sites on such proteins must be fairly rigid and capable of inserting themselves in these grooves if they are to have good potency. However, this is not easily achieved, as the long grooves display a dynamic adaptation at different levels of the “ladder”, something that cannot be inferred from the static X-ray crystal structures. Examples of this phenomenon have been described both for extracellular proteins such as IL-2/IL-2R disruptors5,6 and intracellular proteins such as the androgen receptor.7
Drug discovery of inhibitors disrupting PPIs poses many challenges and traditional computational methods often struggle to overcome these. A recent systematic analysis revealed that PPI cavities show almost no overlap in property space with those of druggable protein ligand complexes,8 thus identifying chemical hit matter can be tough, and optimizing it to adhere to drug-like physicochemical properties is a further challenge.9 Still, classical docking approaches have been applied to find hits for PPI targets but the challenge for docking can be understood given that most scoring functions have evolved and been evaluated using available protein-ligand structure datasets which only contain a small proportion of ligands bound at PPI binding sites. Some successes have been reported but similar limitations of the available chemical molecules also apply; and obviously these methods are of no use when the binding sites maybe unknown. Alternatively, despite a few reports of success, hit/lead generation strategies such as modern fragment-based drug design also struggles for targeting large protein–protein interfaces.3
Qualitative computational methods have emerged as useful to identify possible binding site and interaction “hot-spots”.10,11 Site finding methods such as FTMap12 and SiteMap13 can provide guidance for regions where small molecules may interact, but their power to discriminate true from false sites breaks down for less druggable sites.14 Given the dynamic nature of protein–protein interactions,15 it is perhaps not surprising that methodologies based on molecular simulation are proving useful. So-called “mixed solvent MD” methods use molecular dynamics (MD) simulations performed with small co-solvent organic fragments to reveal protein surface binding sites.16,17 Meanwhile, sophisticated large-scale MD studies coupled with statistical modelling have been able to identify binding sites in a de novo fashion but at significant computational cost.18 It remains to be seen if these approaches can impact prospective drug discovery with the throughput and robustness to identify unknown binding sites and importantly, binding poses. In general, there are few reliable computational approaches that can be applied to PPI drug discovery. Here we demonstrate how Monte Carlo (MC) simulations performed with PELE19 are capable of identifying both the binding site and mode, with relative ease and via a simple and reproducible application protocol.
Hemagglutinin (HA) is one of the two proteins found on the surface of the influenza viruses. It is an antigenic glycoprotein assembled as a homo-trimer of identical subunits that are formed of two disulphide-linked polypeptides: membrane-distal HA1 and the smaller, membrane-proximal, HA2. HA is responsible for host cell binding and subsequent fusion in the endosome after the virus has been taken up by endocytosis. Therefore, its interaction with host-cell receptors is a critical step in the infectious cycle of the virus. Janssen researchers first developed potent, broadly neutralizing antibodies (bnABS) that bind the highly conserved HA stem region at the interface of HA1 and HA2.20 Subsequently, the study of the complementarity determining regions (CDRs) of those bnABS inspired the development of potent cyclic peptidic inhibitors21 whose X-ray complexes show they closely mimic the “ladder-type” interaction revealed for bnABS at the HA1/HA2 groove. Lastly, this rich structural information led to the development of small molecules targeted at the same site. Through an HTS campaign Janssen discovered micromolar compound JNJ7918 (1 in Fig. 1), which finally evolved to lead compound JNJ4796 (2, in Fig. 1).22 This peptidomimetic remarkably mimics bnABS and cyclic peptides and owes its potency to a series of slight, dynamic rearrangements in the “ladder-type” groove at HA1/HA2.
The goal of the present study was to test the performance of a series of relatively efficient approaches that can be employed to determine the binding site and binding mode for precursor JNJ7918 and its derived lead JNJ4796 on the whole surface of HA. The work was designed considering limited time and resources typical of some drug discovery projects and therefore slower, compute intensive techniques such as molecular dynamics were not considered. We found that the most simplistic approaches such as docking or ensemble docking with several techniques did not, for the most part, yield useful models. However, a variety of protocols based on the Monte Carlo program PELE consistently delivered good results in simulations whilst taking only a few hours on a modest compute cluster. Of note, at the time of the test, the work was performed in a blind fashion as the HA-JNJ4796 complexes (entries 6CF7, 6CFG) were not available in the protein data bank and not provided to scientists at Nostrum Biodiscovery.
For the present work, the adaptive version of the algorithm was applied. Adaptive PELE is composed of three main steps: sampling, clustering, and spawning, which are run iteratively.36 In the sampling phase a series of independent trajectories are run (typically from a few dozen up to thousands). These trajectories are generated with the classical PELE approach described above. We use rounds (epochs) of N simulations (trajectories) of length M, each one running on a computing core (using an MPI implementation). The clustering phase then cluster all conformations generated in all previous epochs. A number of approaches can be implemented. Typically, we use ligand RMSD as a metric for clustering. Each cluster has a central conformation and a similarity RMSD threshold, so that a structure belongs to that cluster if its RMSD with the central conformation is smaller than the threshold. When a structure does not belong to any cluster, a new one is created, defining a new cluster centre. In the clustering process, the maximum number of comparisons is k × n, where k is the number of clusters, and n is the number of explored conformations in the current epoch. The ruggedness of the energy landscape sets the most suitable RMSD value. The more complex the energy landscape is, the lower the RMSD thresholds should be to ensure a proper discretization in regions that are difficult to sample. Finally, the spawning phase chooses the seeds to be used for the next iteration (next epoch). By stopping simulations and adaptively spawning them with new initial structures for the next iteration, we bypass the problem of getting trapped in local minima. This effectively improves the search in poorly sampled regions. The selection strategy can also be used for biasing the sampling to interesting areas based on user knowledge of the system. A series of reward functions can be implemented, fine-tuning the degree of bias (the default protocol rewards poorly sampled regions).
In the present work, two different PELE protocols were applied, depending on the result to be achieved, namely:
In all calculations, PELE used the all atom OPLS2005 force field37 with an OBC implicit solvent model. Parameters for the ligand were obtained by the following protocol. The charges were derived from RESP methodology, calculated at M06/6-31G** level of theory using Jaguar. The force field parameters were transferred from OPLS2005. PELE energy profiles presented in the results section are the interaction energy plot against distance to the centre of mass (COM) of Thr318 (one of the key residues anchoring compound 1). Interaction energies are defined by E(AB) − E(A) − E(B), where AB stands for the complex, A for the receptor and B the ligand.
The rigid (receptor) docking results prompted us to investigate whether a productive pose could be found by first generating conformational ensembles for hemagglutinin, followed by docking both molecules on every ensemble conformer. Two short methodologies were chosen: normal mode analysis (NMA) and homology modelling (HM).
An NMA-generated ensemble with 10 conformers was built based on PDB entry 5W6T (HA bound to a cyclic peptide). All docking grids engulfed the whole protein structure. No binding mode close to the known binding site (Thr318) was found for compound 1. Additionally, only two out of all generated poses for compound 2 were close to the X-ray pose (as in 6CF7); the best in terms of RMSD being scored as the 5th best pose out of 100, reproducing the key hydrogen bond between the carbonyl oxygen of JNJ4796 and the hydrogen bond donor of Thr318 and yielding a native like conformation with 2.63 Å heavy atom RMSD with respect to the X-ray.
In addition, an HM-generated ensemble with 10 different conformers based on PDB entry 5W6T as template was also built. For compound 1, only one out of the 100 generated poses was close to the actual binding site (ranked as top 1), reproducing the key hydrogen bond to Thr318. However, for compound 2, no native poses could be found.
Therefore, rigid (receptor) docking failed at generating useful models when performed on X-ray crystal structures, whereas ensemble rigid (receptor) docking only gave mixed results: near native poses could be found for compound 1 in the case of HM-generated ensembles, and for compound 2 in the case of NMA-ensemble. Furthermore, the scoring itself was not accurate enough, as it is doubtful that the 5th ranking pose for compound 2 would have been selected in a prospective application scenario, assuming the “correct” final pose was not known. Probably, the ensembles built were not rich enough to properly capture the induce fit needed to adequately bind both compounds under study. Thus, in order to efficiently capture the dynamic rearrangements in the “ladder-type” groove at the HA1/HA2 interface in the presence of either compound, a series of PELE simulations were run.
Fig. 3 Interaction energy plot (left) against distance to Thr318 (one of the key residues anchoring compound 1). The superimposition of the lowest energy pose 1 (in salmon) with the crystal structure of related compound 2 6CF7 (in grey) is pictured on the right hand side image. |
The 7 minima highlighted in Fig. 4, which are far apart from one another, were then subjected to a second round of PELE simulations, now in local exploration mode (details on the simulation parameters can be found in the Methodologies section). The energy profile for the refinement of the 7 poses highlighted in Fig. 4 can be found in Fig. 5. Remarkably, it shows the pose that H-bonds to Thr318 as the one with the lowest interaction energy (−51 kcal mol−1). The lowest energy minimum circled in green Fig. 5 (left) is also seen in Fig. 5 (right). It overlaps with the binding mode disclosed in PDB entries 6CF7 and 6CFG (RMSD 1.47 Å). In fact, a detailed inspection of the predicted binding mode reveals most of the critical contacts responsible for the nanomolar activity of compound 2 are recovered in our model. The A ring (benzoxazole) is placed in the small hydrophobic cavity lined by Val40, Leu42 and Val52, Asn53 and Ile56, its amide function partly solvent exposed, with its terminal methyl group contacting Ser291 and Leu292. The critical CH–π contact with Val52 is also recovered. The B ring (pyridine) is placed between Thr49 and Thr318, the latter H-bonding directly to the carbonyl linker connecting rings B and C, as is found in the X-ray. The C–E rings are seen in the experimental structure to engage in CH–π contacts with His18, His38, Trp21 and Ile45.
Fig. 5 (Left) Interaction energy plot against distance to Thr318 for the PELE simulations starting with the 7 poses generated by the global exploration (Fig. 4); (right) the lowest interaction energy pose (green circle on the left plot) out of the second refinement simulation can be seen superimposed with the actual binding mode as subsequently published in van Dongen et al.22 and found in PDB entry 6CF7 (grey). The overlap with the experimental binding mode shows how the model captures the H-bond to Thr318, as well as the CH–π bonds of compound 2 with the residues lining the shallow groove. |
Thus, an exhaustive exploration of the surface of HA followed by a refinement simulation yields a model that is at a remarkable 1.47 Å RMSD with respect to the experimentally determined structure. This test places PELE as an efficient and effective approach to not only navigate the whole surface of a protein in search of dynamic hotspots but also to predict the actual binding mode of a small molecule protein–protein disruptor.
Since the first disclosure of the PELE software,19 the platform has undergone significant methodological improvements. A big upgrade was achieved when the original Fortran code was ported to C++ and the MPI parallelization was optimized with Paraver software to avoid overhead and maximize job performance.38 Along the years, more features have been developed to address specific problems that broadened PELE's applicability domain. For instance, adding PCA components to PELE to account for non-harmonic movements,31 a feature to perturb explicit water molecules in the MC algorithm,39 the adaptive PELE version (explained above) to enhance sampling and minimize the computational demand36 and a series of changes to make the code more flexible in term of input/output formats which minimize user storage needs. Early reports involving the study of cytochrome P450s, myoglobin or fatty acid binding protein19,40 have now been complemented with multiple recent studies involving kinases, GPCRs, HIV-1 protease, epoxide hydrolase and various nuclear hormone receptors.31,32,36,41–46 Thus the application of PELE as a reliable induced-fit binding tool has been amply proven. We show here for the first time, the application of PELE in the context of protein–protein disruptor design, where many in silico methodologies struggle.
Our tests revealed that rigid (receptor) docking could not locate the binding site nor mode. Rigid (receptor) could yield near-native poses when performed on ensembles of structures built with approaches such as NMA or homology modelling, but results were not amongst the best predicted or consistent and would have been difficult to identify in a prospective manner. However, the Monte Carlo platform PELE did find the experimental binding mode for compound 2 (and probably 1) in relatively quick simulations performed on a small compute cluster. Our results suggest PELE is a promising tool for the study of challenging protein–protein disruptors, one of the next frontiers in small molecule drug discovery.47–49
Footnote |
† Equally contributing authors. |
This journal is © The Royal Society of Chemistry 2020 |