Yong
Wei
a,
Amy X.
Chen
b,
Yuewei
Lin
c,
Tao
Wei
*d and
Baofu
Qiao
*e
aDepartment of Computer Science, High Point University, High Point, NC 27268, USA
bThomas Jefferson High School for Science and Technology, Alexandria, VA 22312, USA
cComputational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
dDepartment of Chemical Engineering and Department of Biomedical Engineering, University of South Carolina, Columbia, SC 29208, USA. E-mail: tao.wei@Howard.edu
eDepartment of Natural Sciences, Baruch College, City University of New York, New York, NY 10010, USA. E-mail: baofu.qiao@baruch.cunyu.edu
First published on 2nd February 2024
Allosteric regulation is common in protein–protein interactions and is thus promising in drug design. Previous experimental and simulation work supported the presence of allosteric regulation in the SARS-CoV-2 spike protein. Here the route of allosteric regulation in SARS-CoV-2 spike protein is examined by all-atom explicit solvent molecular dynamics simulations, contrastive machine learning, and the Ohm approach. It was found that peptide binding to the polybasic cleavage sites, especially the one at the first subunit of the trimeric spike protein, activates the fluctuation of the spike protein's backbone, which eventually propagates to the receptor-binding domain on the third subunit that binds to ACE2. Remarkably, the allosteric regulation routes starting from the polybasic cleavage sites share a high fraction (39–67%) of the critical amino acids with the routes starting from the nitrogen-terminal domains, suggesting the presence of an allosteric regulation network in the spike protein. Our study paves the way for the rational design of allosteric antibody inhibitors.
Coronavirus disease 2019 (COVID-19), due to infection of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has caused a global pandemic for three years, leading to over 6.9 million deaths and about 0.77 billion confirmed cases worldwide according to the report of the World Health Organization (https://covid19.who.int/). SARS-CoV-2 virus attacks human cells via the binding of its spike protein with the angiotensin-converting enzyme 2 (ACE2) receptor, which is highly expressed on the surface of type II cells.11–13 The coronavirus spike protein, typically known as the spike protein, is a trimeric glycoprotein. It appears on the virus surface as outward-facing 23 nm molecular “spikes”14 and thus plays a key role in binding receptors.15 A spike protein is composed of three subunits, each composed of around 1270 amino acids.16 Therefore, each trimeric spike protein has around 3800 amino acids, where there exists a huge amount of protein–protein interactions, standing for a highly complicated example of folded proteins.
The allosteric regulation of SARS-CoV-2 spike protein has been reported experimentally17–21 and in computer simulations.22 Specifically, Chi, et al.17 found that antibody 4A8, which was isolated from recovered patients, binds to the nitrogen-terminal domains (NTDs) of the spike protein. These NTDs are around 4–8 nm away from the binding interface between the spike's receptor-binding domain (RBD) and human cell receptor ACE2. Another antibody CR3022, also isolated from a recovered patient, was found to target a highly conserved epitope of SARS-CoV-2 (and SARS-CoV in 2013), which is distal from the spike RBD.18 And antibody 47D11 was reported to bind to a non-RBD epitope of the SARS-CoV-2 (and SARS-CoV) spike protein.19 Very recently, Tulsian et al.20 presented extensive studies on the allosteric regulation in the SARS-CoV-2 spike protein upon the binding of nine antibodies (four from their work and five existing ones, including 4A8 and CR3022). Same as 4A8, antibody LSI-CoVA-017 was found to bind to the spike NTD. Impressively, upon the LSI-CoVA-017 binding to NTD, the S1/S2 cleavage site and other distal domains of the spike protein displayed notable changes in the conformational dynamics using the hydrogen–deuterium exchange mass spectrometry (HDX-MS). Conformational changes were also observed in distal sites of NTD and the S2 subunits of the spike protein when the other antibodies were bound to the spike RBD. The allosteric regulation between the S1/S2 cleavage site and RBD was also predicted by Qiao and Olvera de la Cruz22 using all-atom explicit solvent molecular dynamics (MD) simulations and reported by Chen et al.21 using HDX-MS. Note that the influences of glucans on allosteric regulation in the SARS-CoV-2 spike protein were also examined20 as well as the possible allosteric regulation in ACE2.23,24 Taken together, these experimental observations and in silico prediction support that the allosteric regulation in the SARS-CoV-2 spike protein could reach up to a distance of around 4–8 nm between NTD and RBD,17,20 around 10 nm between the S1/S2 cleavage site and RBD,20–22 or more than 10 nm between the S2 subunit and RBD.20
Even though the allosteric regulation in the spike protein has been reported, the mechanism remains elusive. Specifically, the pathway of signal transmission from the allosteric sites (NTD, the S1/S2 cleavage site) to the active site (RBD) is unknown, which is nevertheless required for the rational design of allosteric neutralizing ligands for the SARS-CoV-2 spike protein. Inspired by the recent findings on the allosteric regulation between the S1/S2 cleavage site and RBD20–22 and between NTD and RBD,17,20 here we examined the routes for the allosteric regulation in the SARS-CoV-2 spike protein. All-atom explicit solvent MD simulations were carried out along with the contrastive learning25 and Ohm26 approaches. These methods collectively reveal the route of the allosteric regulation in the spike protein, which will be beneficial for our understanding of the mechanism of allosteric regulations as well as allosteric inhibitor design.
These simulations were performed using the package GROMACS (version 2019.6)27 at the Texas Advanced Computing Center. Like in our previous work,22 the CHARMM 36m potential28 was used, along with the recommended CHARMM TIP3P water model29 with the water structures constrained using the SETTLE algorithm.30
The spike–ACE2 complex structure was downloaded from the Zhang-Server.15 The subunit C of the spike protein was in the “Up” conformation and binding to ACE2. The spike–ACE2 complex was solvated in a water box with a size of 16 nm × 18 nm × 24 nm. A salt concentration of 0.15 M was applied. The system had 692370 atoms in total.
The energy minimization of the whole system was first conducted using the steepest descent algorithm to remove possible close contact between different molecules. Subsequent equilibrations were conducted for one simulation of 1 ps using the canonical ensemble (constant number of particles, volume, and temperature, NVT) and another simulation of 1 ps using the isothermal–isobaric ensemble (constant number of particles, pressure, and temperature, NPT). The velocity-rescale temperature coupling and the Berendsen pressure coupling were applied. Afterward, the solvated system was equilibrated for another 10 ns under the NPT ensemble with the Nosé–Hoover temperature coupling at 298 K and the Parrinello–Rahman barostat at 1.0 bar.31 The integration time step of 2 fs was used with all the hydrogen-involved covalent bonds constrained using the LINCS algorithm.32,33 In the equilibration simulations above, the coordinates of the non-hydrogen atoms of the spike protein trimer, ACE2, and the tetrapeptides were restrained using a force constant of 1000 kJ mol−1 nm−2 to preserve the binding structure. The restraints were then removed in the production simulations. The other parameters were the same as those in the production simulation. Each production simulation was carried out for a duration of 100 ns using the NPT ensemble. The simulation trajectory was saved at a frequency of 10 frames per 1 ns. A total of 1000 snapshots were thus extracted for each system to collect the contact map of the spike protein Cα atoms.
The contact map between all the Cα atoms of the spike protein from each extracted snapshot was calculated using gmx distance, a utility program of GROMACS. The evolution trajectory of the spike protein was represented by a sequence of contact maps. A contact map C was a two-dimensional matrix whose element, C(i, j), was the spatial Euclidean distance between the Cα atoms of the ith and jth amino acids of the spike protein at a particular moment.
It is noteworthy that additional simulations were performed which had three tetrapeptides EELE, each binding to one of the three PCSs on the spike trimer. These simulations suggested the relatively stronger binding affinity between PCS-A and the peptide EELE neighbor. Specifically, only the binding between PCS-A and the associated peptide EELE was stable for the whole simulation duration of 100 ns. In contrast, the peptides bound to PCS-B and PCS-C became dissociated at less than 100 ns: the peptide EELE bound to PCS-B became dissociated at around 10 ns in the first parallel simulation and was stable for 100 ns in the second simulation; the peptide bound to PCS-C became dissociated at around 40 ns and 30 ns in the two parallel simulations. This is qualitatively consistent with our previous observations.22 The observed dissociation of peptides bound to PCS-B and PCS-C also rationalized the simulation duration of 100 ns for protein–peptide interactions here, though it is likely too short for protein–protein interactions of specifically the large-sized spike protein. Moreover, calculations of the RBD–ACE2 binding affinity in the presence of varying number of neutralizing peptide EELE also supported the dominant role of PCS-A in destabilizing the distal RBD–ACE2 binding (Table S1, ESI†). Therefore, in what follows only the simulations with one peptide EELE bound to PCS-A were further analyzed.
Contrastive learning25 represents a state-of-the-art self-supervised learning algorithm. It is dataset-agnostic and has demonstrated its efficacy across a broad spectrum of applications, including the study of protein structures.41,42 As shown in Fig. 1, the contrastive learning algorithm learns the feature representations of contact maps by maximizing the agreement between a positive pair (i,
j) via a loss function, in which
i and
j are correlated views of the same contact map x, generated by stochastic data augmentations t ∼ T and t′ ∼ T, respectively. The loss function between a positive pair is defined in eqn (1).
![]() | (1) |
![]() | ||
Fig. 1 The framework of contrastive learning contact map feature extraction and protein structure transition stage detection. (![]() ![]() |
Here, the Ohm server was employed. By specifying the start of the allosteric pathway (PCS and NTD, see Table S2, ESI†) and the end which is the RBD on the subunit C (residues L455–Y505), Ohm reports all the critical amino acids on the allosteric route.
Note that antibody 4A8 primarily binds to two motifs of NTD3 and NTD5 of the spike protein.17 After examining the amino acids on the NTD3 and NTD5 epitopes, we found that the NTD3 epitope (L141GVYYHK147NNK150SWMESE156) is similar to PCS. Specifically, the central fragment K147NNK150 is positively charged and exposed, akin to PCS. It might be promising to design NTD3-targeting, negatively charged neutralizing peptides that could destabilize the spike–ACE2 binding given the fact that the spike protein and ACE2 are both negatively charged.22 Therefore, in the present work we are focusing on the NTD3 motif when identifying the allosteric regulation routes from NTD to the RBD on the subunit C.
We hypothesize that the strong electrostatic attractions between the positively charged PCS motif (R682RAR685) and the negatively charged EELE tetrapeptides trigger a local structural fluctuation that might eventually lead to a global conformational adaptation of the spike protein. In this work, we are primarily examining the route from the tetrapeptide EELE-triggered local structural fluctuation to the distal influence in destabilizing the RBD–ACE2 binding.
Fig. 3 shows the results of the contrastive ML analysis of one of the three parallel MD trajectories of spike protein structure transition in the process of protein–ACE2 binding. The corresponding contrastive ML analyses for the other two parallel MD trajectories are provided in Fig. S2 (ESI†). A sequence of 1000 contact maps of the spike protein was generated based on the atomistic MD trajectory. Feature vectors of the contact maps are extracted by the backbone feature extractor of the contrastive learning model, which is a deep resnet50 model43 in this work. The contrastive learning model is trained by maximizing the agreement of positive pairs (augmented views of the same contact maps). The details of contrastive learning are discussed in the Methods section. After the contrastive model is trained, the feature extractor is utilized to generate feature vectors of the contact maps. A feature vector in this work is a 2048 × 1 vector. These contact map feature vectors are then grouped using the k-means algorithm to find the stages of the protein structure transition. To find the optimal number of clusters k, cluster numbers ranging from 1 to 15 were tried, and the elbow method and the average silhouette scores method were utilized (Fig. 3(a) and (b)) to determine the optimal number of clusters. Both the elbow and silhouette score methods indicate an optimal number of clusters of k = 10. The contact map that is the closest to the centroid of a cluster is used as the representative of the state. As shown in Fig. 3(c), these contact map IDs are 26, 91, 222, 351, 467, 567, 644, 729, 860, and 954 (occurred at 2.6 ns, 9.1 ns, 22.2 ns, 35.1 ns, 46.7 ns, 56.7 ns, 64.4 ns, 72.9 ns, 86.0 ns, and 95.4 ns, respectively).
As demonstrated in Fig. 4(a), PCS-A displays the strongest structural fluctuation in the first stage which is ascribed to the tetrapeptide EELE binding, which became gradually weakened over the simulation time. The RBD on the subunit C, which directly binds ACE2, displayed elevated structural fluctuation from stage 2 till the end of the simulation (Fig. 4(c)). In contrast with the remarkable structural fluctuation of PCS-A, it is much weaker for the PCSs (R682RAR685) on the subunits B and C (Fig. 4(b) and (c)), indicating their negligible impacts.
The residues close to the N-terminal (residue numbers less than 300) display relatively larger fluctuations for all three subunits. This motif is actually the N-terminal domain (NTD), which is known to bind antibodies 4A817 and LSI-CoVA-017.20 The calculated RMSFs thus support the intrinsically flexible feature of NTD, which is desired for structural fluctuation and propagation. The relatively large fluctuations on the C-terminal residues (residues 1000–1273), which are located on the S2 subunits, are ascribed to the partially unstructured features and are thus ignored here.
We examined the allosteric regulation route from all three PCSs to the RBD on the subunit C. As illustrated in Fig. 5, the backbone atoms of the critical residues are highlighted to direct the pathway. Impressively, the allosteric regulation route is found to propagate across different subunits. For instance, the route starting with PCS-A propagates starting from subunit A to subunit C, to subunit B, and eventually to subunit C (Fig. 5(a)), indicating the nonlinear nature of allosteric regulation.
![]() | ||
Fig. 5 Pathway of the allosteric regulation in the spike protein obtained via Ohm from (a) PCS-A, (b) PCS-B, and (c) PCS-C, to the RBD on subunit C. The three subunits of the trimeric spike protein are colored in ice blue/cyan/orange for the subunits A/B/C, respectively. PCSs are colored in blue and the RBD on subunit C in orange, which is in direct contact with ACE2 (in silver). The backbone atoms on the connecting amino acids are indicated by magenta; the names of the connecting amino acids are provided in the illustration. The corresponding rotation movies are provided as Movie S1 (ESI†). |
Moreover, the protein backbone-labeled route is shown to be discontinuous at a couple of sites. See also the Movies (ESI†). It supports that in addition to the backbone atoms, the side chains of the critical residues are also involved in the structural propagation, which is absent in the contrastive ML (Fig. 3) and the RMSF calculations (Fig. 4). That said, for a complete understanding of the allosteric regulation route, different approaches are collectively desired.
Owing to the long distance of approximately 10 nm from the PCSs to the RBD, there exist 27 amino acids on the allosteric regulation routes from PCS-A to RBD, and 23 amino acids from the other two PCSs to RBD.
Given the significant role of NTD observed in the experiments17,20 and our RMSF calculations (Fig. 4), we also examined the route of the allosteric regulation from NTD to RBD (Fig. 6). Even though NTD-A is physically closer (∼9.1 nm) to the RBD–ACE2 binding interface than PCS-A to the interface (∼10.6 nm), there exist more critical residues than that for PCS-A (33 vs. 27), indicating the indirect feature of allosteric regulation. Surprisingly, the route from NTD-A to RBD shares a large number (i.e., 18) of critical residues with the route from PCS-A to RBD. This accounts for 54.5% (18/33) for the NTD-A–RBD route and 66.7% (18/27) for the PCS-A–RBD route, respectively. Similarly, the route starting with NTD-B shares 63.6% (14/22) critical residues with the route starting with PCS-B (14/23 = 60.9%), and the route from NTD-C shares 39.1% (9/23) critical residues with the one from PCS-A (9/23 = 39.1%). It thus indicates that the routes from the PCSs and the NTDs are likely correlated, that is, the presence of an allosteric regulation network in the spike protein (Movie S3, ESI†).
![]() | ||
Fig. 6 Pathway of the allosteric regulation in the spike protein obtained via Ohm from (a) NTD-A, (b) NTD-B, and (c) NTD-C, to the RBD on subunit C. NTDs are colored in dark blue. The other color codes are the same as those in Fig. 5. The corresponding rotation movies are provided as Movie S2 (ESI†). |
In summary, by coupling contrastive learning-based contact map feature extraction, all-atom explicit solvent MD simulations, and Ohm, we have revealed the route of allosteric regulation in the spike protein of SARS-CoV-2. Impressively, the NTDs are found to share the majority of route of allosteric regulations with the PCSs. This work thus sheds insights into the fundamental understanding of allosteric regulations in protein–protein interactions as well as into the rational design of allosteric drugs.
Footnote |
† Electronic supplementary information (ESI) available: Supplementary figures, table, and movies. See DOI: https://doi.org/10.1039/d4cp00106k |
This journal is © the Owner Societies 2024 |