Jiahui
Chen‡
a,
Kaifu
Gao‡
a,
Rui
Wang‡
a and
Guo-Wei
Wei
*abc
aDepartment of Mathematics, Michigan State University, MI 48824, USA. E-mail: weig@msu.edu
bDepartment of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
cDepartment of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
First published on 13th April 2021
Antibody therapeutics and vaccines are among our last resort to end the raging COVID-19 pandemic. They, however, are prone to over 5000 mutations on the spike (S) protein uncovered by a Mutation Tracker based on over 200000 genome isolates. It is imperative to understand how mutations will impact vaccines and antibodies in development. In this work, we first study the mechanism, frequency, and ratio of mutations on the S protein which is the common target of most COVID-19 vaccines and antibody therapies. Additionally, we build a library of 56 antibody structures and analyze their 2D and 3D characteristics. Moreover, we predict the mutation-induced binding free energy (BFE) changes for the complexes of S protein and antibodies or ACE2. By integrating genetics, biophysics, deep learning, and algebraic topology, we reveal that most of the 462 mutations on the receptor-binding domain (RBD) will weaken the binding of S protein and antibodies and disrupt the efficacy and reliability of antibody therapies and vaccines. A list of 31 antibody disrupting mutants is identified, while many other disruptive mutations are detailed as well. We also unveil that about 65% of the existing RBD mutations, including those variants recently found in the United Kingdom (UK) and South Africa, will strengthen the binding between the S protein and human angiotensin-converting enzyme 2 (ACE2), resulting in more infectious COVID-19 variants. We discover the disparity between the extreme values of RBD mutation-induced BFE strengthening and weakening of the bindings with antibodies and angiotensin-converting enzyme 2 (ACE2), suggesting that SARS-CoV-2 is at an advanced stage of evolution for human infection, while the human immune system is able to produce optimized antibodies. This discovery, unfortunately, implies the vulnerability of current vaccines and antibody drugs to new mutations. Our predictions were validated by comparison with more than 1400 deep mutations on the S protein RBD. Our results show the urgent need to develop new mutation-resistant vaccines and antibodies and to prepare for seasonal vaccinations.
The immune system is a host defense system that protects the host from pathogenic microbes, eliminates toxic or allergenic substances, and responds to an invading pathogen.14 It has the innate immune system and adaptive immune system as two major subsystems. The innate system provides an immediate but non-specific response, while the adaptive immune system provides a highly specific and effective immune response. Once the pathogen breaches the first physical barriers, such as the epithelial cell layers, secreted mucus layer, and mucous membranes, the innate system will be triggered to identify pathogens by pattern recognition receptors (PRRs), which is expressed on dendritic cells, macrophages, or neutrophils.15 Specifically, PPRs identify pathogen-associated molecular patterns (PAMPs) located on pathogens and then activate complex signaling pathways that introduce inflammatory responses mediated by various cytokines and chemokines, which promote the eradication of the pathogen.16,17 Notably, the transmission of SARS-CoV-2 even occurs in asymptomatic infected individuals, which may delay the early response of the innate immune response.8 Another important line of host defense is the adaptive immune system. B lymphocytes (B cells) and T lymphocytes (T cells) are special types of leukocyte that are the acknowledged cellular pillars of the adaptive immune system.18 Two major subtypes of T cells are involved in the cell-mediated immune response: killer T cells (CD8+ T cells) and helper T cells (CD4+ cells). The killer T cells eradicate cells invaded by pathogens with the help of major histocompatibility complex (MHC) class I. MHC class I molecules are expressed on the surface of all nucleated cells.19 The nucleated cells will firstly degrade foreign proteins via antigen processing when viruses infect them. Then, the peptide fragments will be presented by MHC class I, which will activate killer T cells to eliminate these infected cells by releasing cytotoxins.20 Similarly, helper T cells cooperate with MHC class II, a type of MHC molecule that is constitutively expressed on antigen-presenting cells, such as macrophages, dendritic cells, monocytes, and B cells.21 Helper T cells express T cell receptors (TCR) to recognize antigen bound to MHC class II molecules. However, helper T cells do not have cytotoxic activity. Therefore, they cannot kill infected cells directly. Instead, the activated helper T cells will release cytokines to enhance the microbicidal function of macrophages and the activity of killer T cells.22 Notably, an unbalanced response can result in a “cytokine storm,” which is the main cause of the fatality of COVID-19 patients.23 Correspondingly, a B cell gets involved in the humoral immune response and identifies pathogens by binding to foreign antigens with its B cell receptors (BCRs) located on its surface. The antigens that are recognized by antibodies will be degraded to peptides in B cells and displayed by MHC class II molecules. As mentioned above, helper T cells can recognize the signal provided by MHC class II and upregulate the expression of the CD40 ligand, which provides extra stimulation signals to activate antibody-producing B cells,24 making millions of copies of antibodies (Ab) that recognize the specific antigen. Additionally, when the antigen first enters the body, the T cells and B cells will be activated, and some of them will be differentiated to long-lived memory cells, such as memory T cells and memory B cells. These long-lived memory cells will play a role in quickly and specifically recognizing and eliminating a specific antigen that encountered the host and initiated a corresponding immune response in the future.25 The vaccination mechanism is to stimulate the primary immune response of the human body, which will activate T cells and B cells to generate the antibodies and long-lived memory cells that prevent infectious diseases, which is one of the most effective and economical means of combating COVID-19 at this stage.
As mentioned above, secreted by B cells of the adaptive immune system, antibodies can recognize and bind to specific antigens. Conventional antibodies (immunoglobulins) are Y-shaped molecules that have two light chains and two heavy chains.26 Each light chain is connected to the heavy chain via a disulfide bond, and heavy chains are connected through two disulfide bonds in the mid-region known as the hinge region. Each light and heavy chain contains two distinct regions: the constant region (stem of the Y) and variable region (“arms” of the Y).27 An antibody binds the antigenic determinant (also called epitope) through the variable regions in the tips of the heavy and light chains. There is an enormous amount of diversity in the variable regions. Therefore, different antibodies can recognize many different types of antigenic epitope. To be specific, there are three complementarity determining regions (CDRs) that are arranged non-consecutively in the tips of each variable region. CDRs generate most of the variations between antibodies, which determine the specificity of individual antibodies. In addition to conventional antibodies, camelids also produce heavy-chain-only antibodies (HCAbs). HCAbs, also referred to as nanobodies, or VHHs, contain a single variable domain (VHH) that makes up the equivalent antigen-binding fragment (Fab) of conventional immunoglobulin G (IgG) antibodies.28 This single variable domain can typically acquire affinity and specificity for antigens comparable to conventional antibodies. Nanobodies can easily be constructed into multivalent formats and have higher thermal stability and chemostability than most antibodies do.29 Another advantage of nanobodies is that they are less susceptible to steric hindrances than large conventional antibodies.30
Considering the broad specificity of antibodies, seeking potential antibody therapies has become one of the most feasible strategies to fight against SARS-CoV-2. In general, an antibody therapy is a form of immunotherapy that uses monoclonal antibodies (mAb) to target pathogenic proteins. The binding of an antibody and pathogenic antigen can facilitate an immune response, direct neutralization, radioactive treatment, the release of toxic agents, and cytokine storm inhibition (aka immune checkpoint therapy). The SARS-CoV-2 entry into a human cell is facilitated by the process of a series of interactions between its spike (S) protein and the host receptor angiotensin-converting enzyme 2 (ACE2), primed by host transmembrane protease serine 2 (TMPRSS2).31 As such, most COVID-19 antibody therapeutic developments focus on the SARS-CoV-2 spike protein antibodies that were initially generated from the patient immune response and T-cell pathway inhibitors that block T-cell responses. A large number of antibody therapeutic drugs are in clinical trials. Fifty-five S protein antibody structures are available in the Protein Data Bank (PDB), offering a great resource for mechanistic analysis and biophysical studies.
Currently, most of the antibody therapy developments focus on the use of antibodies isolated from patients' convalescent plasma to directly neutralize SARS-CoV-2,32–34 although there are efforts to alleviate the cytokine storm. A more effective and economical means to fight against SARS-CoV-2 is the vaccine,35 which is the most anticipated approach for preventing the COVID-19 pandemic. A vaccine is designed to stimulate effective host immune responses and provide active acquired immunity by exploiting the body's immune system, including the production of antibodies, and is made of an antigenic agent that resembles a disease-causing microorganism, or surface protein, or genetic material that is needed to generate the surface protein. For SARS-CoV-2, the first choice of surface proteins is the spike protein. There are four types of COVID-19 vaccine, as shown in Fig. 1. (1) Virus vaccines use the virus itself in a weakened or inactivated form. (2) Viral-vector vaccines are designed to genetically engineer a weakened virus, such as measles or adenovirus, to produce coronavirus S proteins in the body. Both replicating and non-replicating viral-vector vaccines are being studied now. (3) Nucleic-acid vaccines use DNA or mRNA to produce SARS-CoV-2 S proteins inside host cells to stimulate the immune response. (4) Protein-based vaccines are designed to directly inject coronavirus proteins, such as S protein or membrane (M) protein, or their fragments, into the body. Both protein subunits and viral-like particles (VLPs) are under development for COVID-19.36 Among these technologies, nucleic-acid vaccines are safe and relatively easy to develop.36 However, they have not been approved for any human use before.
However, the general population's safety concerns are the major factors that hinder the rapid approval of vaccines and antibody therapies. A major potential challenge is an antibody-dependent enhancement, in which the binding of a virus to suboptimal antibodies enhances its entry into host cells. All vaccine and antibody therapeutic developments are currently based on the reference viral genome reported on January 5, 2020.37 SARS-CoV-2 belongs to the Coronaviridae family and the Nidovirales order, which has been shown to have a genetic proofreading mechanism regulated by non-structure protein 14 (NSP14) in synergy with NSP12, i.e., RNA-dependent RNA polymerase (RdRp).38,39 Therefore, SARS-CoV-2 has a higher fidelity in its transcription and replication process than other single-stranded RNA viruses, such as the flu virus and HIV. However, the S protein of SARS-CoV-2 has been undergoing many mutations, as reported in ref. 40 and 41. As of January 20, 2021, a total of 5003 unique mutations on the S protein have been detected on 203346 complete SARS-CoV-2 genome sequences. Among them, 462 mutations were on the receptor-binding domain (RBD), the most popular target for antibodies and vaccines. Therefore, it is of paramount importance to establish a reliable paradigm to predict and mitigate the impact of SARS-CoV-2 mutations on vaccines and antibody therapies. Moreover, the efficacy of a given COVID-19 vaccine depends on many factors, including the SARS-CoV-2 biological properties associated with the vaccine, mutation impacts, vaccination schedule (dose and frequency), idiosyncratic response, and assorted factors such as ethnicity, age, gender, and genetic predisposition. The effect of COVID-19 vaccination also depends on the fraction of the population that accepts vaccines. It is essentially unknown at this moment how these factors will unfold for COVID-19 vaccines.
There is no doubt that any preparation that leads to an improvement in the COVID-19 vaccination effect will be of tremendous significance to human health and the world economy. Therefore, in this work, we integrate genetic analysis and computational biophysics, including artificial intelligence (AI), as well as additional enhancement from advanced mathematics to predict and mitigate mutation threats to COVID-19 vaccines and antibody therapies. We perform single nucleotide polymorphism (SNP) calling41 to identify SARS-CoV-2 mutations. For mutations on the S protein, we analyze their mechanism,42 frequency, ratio, and secondary structural traits. We construct a library of 56 existing antibody structures by January 1, 2021 from the PDB and analyze their two-dimensional (2D) and three-dimensional (3D) characteristics. We further predict the mutation-induced binding free energy (BFE) changes of antibody and S protein complexes using a topology-based network tree (TopNetTree),43 which is a state-of-the-art model that integrates deep learning and algebraic topology.44–46 In this work, TopNetTree is trained with newly available deep mutation datasets on the S protein, ACE2, and some antibodies and its predictions are validated with thousands of experimental data points. Our studies indicate that most mutations will significantly disrupt the binding of essentially all known antibodies to the S protein. Therefore, vaccines and antibody drugs that were developed based on the early SARS-CoV-2 genome will be seriously compromised by mutations. Additionally, we show that most known mutations will strengthen the binding between the S protein and ACE2, which gives rise to more infectious variants. Our studies also reveal that SARS-CoV-2 is at an advanced stage of evolution with respect to its ability to infect humans. Although the human immune system is able to produce antibodies that are optimized with respect to a pathogen, the antibodies, once produced, are very vulnerable to attack by mutants.
As mentioned before, the S protein has become the first choice for antibody and vaccine development. Among the 203346 complete genome sequences, 5003 unique single mutations are detected on the S protein. The number of unique mutations (NU) is determined by counting the same type of mutation in different genome isolates only once, while the number of non-unique mutations (NNU, i.e., frequency) is calculated by counting the same type of mutation in different genome isolates repeatedly. Table 1 lists the distribution of 12 SNP types among unique and non-unique mutations on the S protein of SARS-CoV-2 worldwide. It can be seen that C > T and A > G are the two dominant SNP types, which may be due to the innate host immune response via APOBEC and ADAR gene editing.42
SNP type | Mutation type | N U | N NU | R U | R NU |
---|---|---|---|---|---|
A > T | Transversion | 454 | 5236 | 9.07% | 1.12% |
A > C | Transversion | 341 | 2571 | 6.82% | 0.55% |
A > G | Transition | 700 | 199015 | 13.99% | 42.56% |
T > A | Transversion | 356 | 1614 | 7.12% | 0.35% |
T > C | Transition | 779 | 19313 | 15.57% | 4.13% |
T > G | Transversion | 277 | 1940 | 5.54% | 0.41% |
C > T | Transition | 542 | 158898 | 10.83% | 33.98% |
C > A | Transversion | 313 | 10301 | 6.26% | 2.20% |
C > G | Transversion | 156 | 968 | 3.12% | 0.21% |
G > T | Transversion | 435 | 34421 | 8.69% | 7.36% |
G > C | Transversion | 225 | 6090 | 4.50% | 1.30% |
G > A | Transition | 425 | 27237 | 8.49% | 5.82% |
Moreover, 144 non-degenerate mutations occurred on the S protein RBD, which are relevant to the binding of SARS-CoV-2 S protein and most antibodies as well as ACE2. Additionally, the 218 mutations that occurred on the S protein N-terminal domain (NTD) (residue id: 14 to 226) are relevant to the binding of another two antibodies (4A8 and FC05) and SARS-CoV-2 S protein.
Furthermore, since antibody CDRs are random coils, the complementary antigen-binding domains must involve random coils as well. Table 2 lists the statistics of non-degenerate mutations on the secondary structures of SARS-CoV-2 S protein. Here, the secondary structures are mostly extracted from the crystal structure of 7C2L,52 and the missing residues are predicted by RaptorX-Property.53 We can see that for both unique and non-unique cases, the average mutation rates on the random coils of the S protein have the highest values. Particularly, the 23403 A > G-(D614G) mutation on the random coils has the highest frequency of 192284. If we do not consider the 23403 A > G-(D614G) mutations, then the unique and non-unique average rates on the random coils of S protein still have the highest values (2.81 and 212.01), indicating that mutations are more likely to occur on the random coils. Consequently, the natural selection of mutations may tend to disrupt antibodies.
Secondary structure | Length | N U | N NU | ARU | ARNU |
---|---|---|---|---|---|
Helix | 249 | 516 | 9535 | 2.07 | 38.29 |
Sheet | 276 | 711 | 20422 | 2.58 | 73.99 |
Random coils | 748 | 2100 | 350659 | 2.81 | 468.80 |
Whole spike | 1273 | 3327 | 380616 | 2.61 | 298.99 |
Fig. 3 Aligned structures of 46 complexes of the S protein and ACE2 and single antibodies. (a)–(j) The 3D alignment of the available unique 3D structures of SARS-CoV-2 S protein RBD in binding complexes with 42 antibodies (MR17-K99Y is excluded because its binding mode is the same as that of MR17). (k) The 3D alignment of the three antibodies binding outside RBD. (m) The 3D structure of S protein RBD. The red, green, and blue colors represent helix, sheet, and random coils of RBD, respectively. The darker color represents the higher mutation frequency on a specific residue. The structures are (a) ACE2 (6M0J),57 BD-629 (7CH5), H11-H4 (6ZBP); (b) CC12.3 (6XC4),58 B38 (7BZ5),59 CR3022 (6XC3);58 (c) BD-604 (7CH4), MR17 (7C8W),56 Fab 2-4 (6XEY);56 (d) S304 (7JW0),60 CB6 (7C01),61 Fab 52 (7K9Z),62 S2H13 (7JV6),60 H11-D4 (6YZ5),63 Fab 298 (7K9Z);62 (e) CV30 (6XE1),64 BD23 (7BYR),65 SR4 (7C8V),56 S309 (6WPS);66 (f) CC12.1 (6XC2),58 EY6A (6ZCZ),67 BD-236 and nanobody (Nb) (7CHE),68 BD-368-2 (7CHH);68 (g) H014 (7CAH),69 COVA2-04 (7JMO),70 COVA2-39 (7JMP),70 P2B–2F6 (7BWJ);71 (h) P2C-1A3 (7CDJ), CV07-270 (6XKP),72 S2H14 (7JX3),60 A fab (7CJF), S2E12 (7K45);73 (i) CV07-250 (6XKQ),72 P2C–1F11 (7CDI), VH binder (7JWB),74 S2A4 (7JVA),60 COVA1-16 (7JMW);75, (j) C1A (7KFV),76 STE90-C11 (7B3O),77 Sb23 (7A29),78 S2M11 (7K43),73 P17 (7CWM);79; and (k) 4A8 (7C2L),52 FC05 (7CWU),54 and 2G12 (7L06).55 |
Fig. 4 Illustration of the contact positions of the antibody and ACE2 paratope with SARS-CoV-2 S protein RBDs on RBD 2D sequences. The corresponding PDB IDs are given in parentheses. |
Fig. 3 reveals, except for Fab 52,62 S309,57 CR3022,63 EY6A,67 4A8,52 FC05,54 and 2G12,55 all the other 38 antibodies have their binding sites spatially clashing with that of ACE2. Notably, the paratopes of H014 (ref. 69) and S304 (ref. 60) do not overlap with that of ACE2 directly, but in terms of 3D structures, their binding sites still overlap. This suggests that the bindings of 39 antibodies are in direct competition with that of ACE2. Theoretically, this direct competition reduces the viral infection rate. Such antibodies with strong binding ability will directly neutralize SARS-CoV-2 without the need for antibody-dependent cell cytotoxicity (ADCC), antibody-dependent cellular phagocytosis (ADCP), or other immune mechanisms.
The paratopes of S309, Fab 52, CR3022, and EY6A on the RBD are away from that of ACE2, leading to the absence of binding competition.66,67,80 One study shows that the ADCC and ADCP mechanisms contribute to the viral control conducted by S309 in infected individuals.66 For Fab 52, it was suggested that its mechanism could involve S protein destabilization.62 For CR3022, one research indicates that it neutralizes the virus in a synergistic fashion.81 For EY6A, the hypothesis is that glycosylation of ACE2 accounts for at least part of the observed crosstalk between ACE2 and EY6A.67 More radical examples are 4A8, FC05, and 2G12. 4A8 binds to the NTD of the S protein (Fig. 3(h)), which is quite far from the RBD. It is speculated that 4A8 may neutralize SARS-CoV-2 by restraining the conformational changes of the S protein, which is very important for the SARS-CoV-2 cell entry.52 FC05 is combined with P17 or H014 to form a cocktail.542G12 binds to the S protein S2 domain.55 Any antibody or drug that can inhibit the serine protease TMPRSS2 priming of the S protein priming can effectively stop the viral cell entry.31
In Fig. 4, the paratopes of 42 individual antibodies (excluding MR17-K99Y) and ACE2 were aligned on the S protein RBD 2D sequence, and their contact regions are highlighted. From the figure, one can see that, except for Fab 52, S309, CR3022, EY6A, H014, and S304, all the other 36 antibodies have their antigenic epitopes overlapping with the ACE2, especially on the residues from 486 to 505 of the RBD. Although the paratopes of H014 and S304 do not overlap with that of ACE2 directly, their binding sites still overlap in 3D structures. Therefore, these 38 antibodies competitively bind against ACE2 as revealed in Fig. 3.
The above similarity indicates that the adaptive immune systems of individuals have a common way to generate antibodies. On the other hand, the existence of five distinct clusters, as well as antibodies 4A8, FC05, and 2G12 suggests the diversity in the immune response. Note that we have also included ACE2 in our MSA as a reference, but none of the existing antibodies are similar to ACE2 because they were created from entirely different mechanisms.
Four antibody–S protein complexes are examined in this section. Next, we present a library of mutation-induced BFE changes for all mutations and 51 antibodies, as well as ACE2. The statistical analysis of mutation impacts on antibodies is discussed.
Fig. 5 Illustration of SARS-CoV-2 mutation-induced binding free energy changes for the complexes of S protein and 4A8 (PDB: 7C2L). The blue color in the structure plot indicates a positive BFE change while the red color indicates a negative BFE change, and toning indicates the strength. Here, mutations R102I, W152C, W152L, S247N, and Y248H could potentially disrupt the binding of antibody 4A8 and S protein. |
Next, we study the BFE changes (ΔΔG) induced by 80 mutations on the SARS-CoV-2 S protein RBD for the antibody Fab 2-4 (PDB: 6XEY) in Fig. 6. Antibody Fab 2-4 shares a similar binding domain with ACE2 and thus is a potential candidate for the direct neutralization of SARS-CoV-2. Most mutations induce small changes in the binding free energies, while mutations E484K, E484Q, F486L, and F490S have large negative BFE changes. Overall, 38 out of 80 mutations on the RBD lead to negative BFE changes, which means 48% of mutations will potentially weaken the binding between antibody Fab 2-4 and S protein. For positive BFE changes, the largest value is only 0.55 kcal mol−1 and the average of positive BFE changes is 0.16 kcal mol−1. However, many mutations with negative BFE changes have a very large magnitude, indicating that antibody Fab 2-4 was an immune product optimized with respect to the original un-mutated S protein. In general, the mutations on S protein weaken the Fab 2-4 binding with S protein and make it less competitive with ACE2 as most mutations strengthen the S protein and ACE2 binding. It is interesting to note that mutation E484K is the so-called South Africa variant. It indeed has a strong vaccine-escape effect.
Fig. 6 Illustration of SARS-CoV-2 mutation-induced binding free energy changes for the complexes of S protein and Fab 2-4 (PDB: 6XEY). The blue color in the structure plot indicates a positive BFE change while the red color indicates a negative BFE change, and toning indicates the strength. Here, mutations E484K, E484Q, F486L, and F490S could potentially disrupt the binding of antibody Fab 2-4 and the S protein. |
In Fig. 7, we illustrate the mutation-induced BFE changes for antibody MR17 (PDB: 7C8W), which shares the binding domain with ACE2 as well. One can notice that five mutations, L452R, E484K, F486L, F490S, and S494L, have BFE changes less than −1 kcal mol−1 as well as high frequencies. The rest of the mutations have a small magnitude of changes. 27 out of 80 mutations have positive BFE changes with the largest value less than 0.25 kcal mol−1. Our results indicate that antibody MR37 is likely to be isolated from patients at the early stage and thus, it was optimized based on an early version of the SARS-CoV-2 virus. Mutations L452R, E484K, F486L, F490S, and S494L will reduce its competitiveness with ACE2 (Fig. 7).
Fig. 7 Illustration of SARS-CoV-2 mutation-induced binding free energy changes for the complexes of S protein and MR17 (PDB: 7C8W). Blue in the structure plot indicates a positive BFE change while red indicates a negative BFE change, and toning indicates the strength. Here, mutations L452R, E484K, F486L, F490S, and S494L could potentially disrupt the binding of antibody MR17 and the S protein. |
Fig. 8 Illustration of SARS-CoV-2 mutation-induced binding free energy changes for the complexes of S protein and S309 (PDB: 6WPS). The blue color in the structure plot indicates a positive BFE change while the red color indicates a negative BFE change, and toning indicates the strength. Here, mutations E340A, N354D, and K356R could potentially weaken the binding of antibody S309 and the S protein. |
Finally, we consider the BFE change predictions for the antibody S309 and S protein complex, whose receptor binding motif (RBM) does not overlap with the RBM of ACE2 (see Fig. 3(e)). The BFE changes induced by 80 mutations are predicted. Among them, 38 changes are positive. Similar to the aforementioned antibodies, most of the mutations lead to small changes in their binding affinity magnitude but three mutations, E340A, N354D, and K356R, induce moderate negative changes. Interestingly, none of the 80 RBD mutations have a major impact on S309. Although mutation R403K might disrupt S309, it does weaken many other antibody bindings with the S protein. While antibodies play a variety of functions in the human immune system, such as neutralization of infection, phagocytosis, antibody-dependent cellular cytotoxicity, etc., their binding with antigens is crucial for these functions. Our analysis of BFE changes following mutations on the S protein suggests that some antibodies will be less affected by mutations, which is important for developing vaccines and antibody therapies.
Based on our earlier analysis, three types of SARS-CoV-2 S protein secondary structural residue have different mutation rates. Among them, the random coils are major components of the RDB and the NTD, as shown in Fig. 3. Most RBD mutations (287 of 462) occur on the residues whose secondary structures are coil, while 93 out of 462 mutations are on the helix, and 82 out of 462 mutations are on the sheet. Therefore, mutations on the RBD are split into three categories based on their locations in secondary structures of helix, sheet, and coil. In Fig. 9, we present the BFE changes for the complexes of the S protein and antibodies or ACE2 induced by mutations on the helix residues of the S protein RBD. The frequency for each mutation is also presented. Most mutations on helix residues lead to negative BFE changes (pink squares), which weaken the bindings, while some mutations induce positive BFE changes (green squares). It is noted that most mutations lead to the strengthening of the S protein and ACE2 binding, which is consistent with the natural selection rule. Mutations N406G, I418N, N422K, D442H, Y505S, and Y505C give rise to a strong weakening effect on most antibodies. The N439K mutation having the highest frequency, shows a positive BFE change on ACE2 but negative changes on most antibodies. Mutation D405Y appears to strengthen most antibodies.
In Fig. 10, we present the BFE changes for the S protein and antibody (ACE2) complexes following sheet residue mutations of the S protein RBD. Like the last case, most mutations lead to positive BFE changes for ACE2, indicating infectivity strengthening. There are many disruptive mutations, such as R355W, F401I, F401C, I402F, C432G, I434K, A435P, O493P, V510E, V512G, and L513P, that will weaken most antibody and S protein complexes. On the other hand, most mutations strengthen certain antibodies but weaken other ones, which allows the effectiveness of antibody cocktails for better protection. The binding of antibody H014 and the S protein is strengthened by many mutations, particularly S375F, K378O, R403K, and Y453F. Among them, Y453F is an infectivity-strengthening mutation with a relatively high frequency.
Fig. 11–13 present the BFE changes for the S protein and antibody (ACE2) complexes following coil residue mutations of the S protein RBD. Overall, most mutations on coil residues lead to mild negative BFE changes. However, mutations V350F, W353R, I401N, G416V, G431V, Y449D, Y449S, C480R, P491R, P491L, Y495C, and O506P will weaken most antibody bindings to the S protein. Some residues, like A348, N460, and P521, can produce many binding-strengthening mutations for most antibodies and ACE2. For the high-frequency mutation S447N in Fig. 13, the BFE changes are mild on ACE2 and antibodies. Additionally, the N501Y mutation, one of the typical mutations in the UK B.1.1.7 variant, strengthens the infectivity but induces mixed reactions to antibodies as shown in Fig. 13.
Fig. 12 Illustration of SARS-CoV-2 coil-residue mutation induced BFE changes for the complexes of S protein and 51 antibodies or ACE2 (continued from Fig. 11). Positive changes strengthen the binding while negative changes weaken the binding. Mutation frequency is presented for each mutation. Grey color indicates that PDB structures do not include residues induced by those mutations. |
Fig. 13 Illustration of SARS-CoV-2 coil-residue mutation induced BFE changes for the complexes of S protein and 51 antibodies or ACE2 (continued from Fig. 12). Positive changes strengthen the binding while negative changes weaken the binding. Mutation frequency is presented for each mutation. Grey color indicates that PDB structures do not include residues induced by those mutations. |
Fig. 14 indicates the BFE change extreme values (maximal in cyan and minimal in pink) and average values (positive in blue and negative in red) of the complexes of S protein and ACE2 or antibodies following mutations. The maximal BFE changes of the helix, sheet, and coil residues are 1.44 kcal mol−1, 1.94 kcal mol−1, and 1.00 kcal mol−1, respectively, while the minimal BFE changes are −3.87 kcal mol−1, −3.9 kcal mol−1, and −4.38 kcal mol−1, respectively. The disparity in their maximal and minimal values indicates the relatively optimal nature of the S protein and antibody binding complexes. It means that the human immune system has the ability to produce optimized antibodies for a given antigen. However, antibodies, once generated, are prone to infection by new mutants. The disparity shown in Fig. 14 also means that the SARS-CoV-2 was at an advanced stage of evolution with respect to human infection. There is not much room for SARS-CoV-2 to improve its infectivity by single-site mutations.
Many antibody cocktails, such as CR3022/H11-D4, CC12.1/CR3022, BD-236/BD368-2, BD604/BD368-2, S309/S2H14/S304, and Fabs 298/52, are relatively less sensitive to the current S protein mutations. However, some other antibodies, such as H11-D4, CV30, CC12.3, and S2H13, can be dramatically affected by SARS-CoV-2 mutations. Importantly, ACE2 is also impacted by mutations and has the largest positive BFE change on average.
The second type of vaccine is the viral-vector vaccine, which is genetically engineered so that it can produce coronavirus surface proteins in the human body without causing diseases. There are two subtypes of viral-vector vaccine: the non-replicating viral vector and the replicating viral vector. On February 25, 2021, the World Health Organization (WHO) granted an emergency use listing (EUL) for a vaccine developed by AstraZeneca and the University of Oxford, which is a non-replicating viral vector vaccine. Moreover, there are 3 non-replicating viral vector vaccines in phase III trials as well. They work by taking a chimpanzee virus and coating it with the S proteins of SARS-CoV-2. The chimp virus causes a harmless infection in humans, but the spike proteins will activate the immune system to recognize signs of a future SARS-CoV-2 invasion. Notably, booster shots may be needed to retain long-lasting immunity. Furthermore, at this stage, only one replicating viral-vector vaccine is in phase II. The University of Hong Kong, in cooperation with Xiamen University and Wantai Biological Pharmacy, is developing such a replicating viral vaccine, which tends to be safe and provoke a strong immune response.
The third type of vaccine is nucleic acid vaccines, which include two subtypes: DNA-based vaccines and RNA-based vaccines. At least 40 teams are currently working on nucleic-acid vaccines since they are safe and easy to develop. The DNA-based vaccine works by inserting genetically engineered blueprints of the viral gene into small DNA molecules such as plasmids for injection. Moreover, the electroporation technique is employed to create pores in membranes to increase DNA uptake into cells. The injected DNA will produce mRNA by transcription with the help of the nucleus in human cells. Such an mRNA will translate viral proteins (mostly spike proteins), which are dutifully produced by cells in response to the genes, alarm the immune system, and should produce immunity. Currently, there is one DNA-based vaccine in phase III. Similar to DNA-based vaccines, RNA-based vaccines provide immunity through the introduction of RNA, which is encased in a lipid coat to ensure that it enters into cells. Two RNA-based vaccines have been granted authorization for emergency use in many countries. One is designed by BioNTech, which cooperates with Pfizer, and the other one is from Moderna.
The fourth type of vaccine is the protein-based vaccine, which aims to inject viral proteins directly to human bodies to trigger immune readiness. The protein subunit vaccine is one of the subtypes of the protein-based vaccine. More than 80 teams are working on vaccines with viral protein subunits, such as spike proteins and membrane (M) proteins. Another subtype of the protein-based vaccine is the virus-like particle (VLP) vaccine. VLP vaccines closely resemble viruses. However, they are not infectious since they do not contain viral genetic material. Their non-replicating properties provide a safer alternative to weakened virus vaccines; the HPV vaccine or newer flu vaccines are VLP vaccines. Currently, 22 teams are working on VLP vaccines for future prevention of COVID-19.
Fig. 16 shows the secondary structure of the S protein. The red, blue, and green colors represent helix, sheet, and random coils of S protein. It can be seen that the S protein mostly consists of random coils, which means that there are many other potential antigenic epitopes on the S protein for antibody CDRs. We believe that the emphasis on direct binding competition with ACE2 in the past66,67,80 has led to the neglecting of many important antibodies that do not bind to the RBD. Therefore, we suggest that researchers pay more attention to antibodies that do not bind to the RBD.
Fig. 16 The secondary structure of S protein. The red, green, and blue colors represent helix, sheet, and random coils of S protein. |
As shown in Fig. 14, mutations could considerably weaken the binding between the S protein and antibodies and thus pose a direct threat to reduce the efficacy of vaccines. However, there are a few obstacles in determining the exact impacts of mutations on COVID-19 vaccines. Firstly, the four types of vaccine platform can produce very different virus peptides, resulting in different immune responses, as well as antibodies. Secondly, even for a given vaccine platform, different peptides may be produced due to different immune responses caused by gender difference, age difference, race difference, etc. Therefore, in this work, we proposed to understand the impact of SARS-CoV-2 mutations on COVID-19 vaccines by statistical analysis. By evaluating the binding affinity changes induced by 51 existing SARS-CoV-2 antibodies, as shown in Fig. 9 to 13, we can identify vaccine escape mutants that will strengthen the binding between the S protein and ACE2 while disrupting the binding between the S protein and antibodies. Table 3 lists a collection of the most disruptive mutations. However, this list is not complete. There are many other antibody disrupting mutations as shown in Fig. 9 to 13. For example, the infectivity-strengthening South Africa mutant E484K can cause dramatically disruptive effects on many antibodies such as H11-D4, Fab 2-4, H11-H4, COVA2-39, BD368-2, etc. but it also enhances the binding of other antibodies, such as B38, CV30, CC21.1, Sb23, Fabs 298 52, etc. The infectivity-strengthening mutation N501Y in UK B.1.1.7 variants has a disruptive effect only on a few known antibodies, including B38, CC12.3, S2M11, NAB, S309, S2H12, S304, C1A-B12, STE90-C11, etc.
Location | Mutants |
---|---|
Helix | E406G, I418N, Y421D, N422K, D442H, Y505S |
Sheet | R355W, F400I, F400C, I402F, C432G, I434K, A435P, Q493P, V510E, V512G, L513P |
Coils | V350F, W353R, I410N, G416V, G431V, Y449D, Y449S, L461H, S469P, C480R, P491R, P491L, Y495C, Q506P |
In a nutshell, by setting up a SARS-CoV-2 antibody library with the statistical analysis based on the mutation-induced binding free energy changes, we can estimate the impacts of SARS-CoV-2 mutations on COVID-19 vaccines, which will provide a way to infer how a specific mutation will pose a threat to vaccines. This approach works better when more antibody structures become available.
Another important factor in prioritization is mutation frequency. Fig. 9–13 have provided frequency information from our SNP calling. Once a mutation is identified as a potential threat, it can be incorporated into the next generation of vaccines in a cocktail approach. In principle, all four types of vaccine platform allow the accommodation of new viral strains.
Fig. 17 presents a comparison between experimental deep mutation enrichment data on the RBD and machine learning predicted RBD-mutation-induced BFE changes for the SARS-CoV-2 S protein and CTC-445.2 complex. In the heatmaps of Fig. 17, one can see that the predicted BFE changes have a very high correlation with the experimental enrichment ratio data. Both enrichment ratios and BFE changes describe the affinity strength of the protein–protein interaction induced by mutations. The high similarity between these heatmaps demonstrates the reliability of our machine learning predictions of BFE changes following mutations on the S protein RBD.
Fig. 17 A comparison between experimental deep mutation enrichment data and TopNetTree predictions for the SARS-CoV-2 S protein RBD and CTC-445.2 complex (7KL9 (ref. 89)). Top left: deep mutational scanning heatmap showing the average effect on the enrichment for single site mutants of the RBD when assayed by yeast display for binding to CTC-445.2.89 Top right: the RBD colored by average enrichment at each residue position bound to CTC-445.2. Bottom: machine learning predicted BFE changes for the CTC-445.2 and S protein complex induced by single site mutations on the RBD. |
We present the most comprehensive analysis and prediction of mutation threats to vaccines and antibody therapies. First, we identify existing mutations on the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike (S) protein, which is the main target for both vaccines and antibody therapies. We analyze the mechanism, frequency, and ratio of mutations along with the secondary structures of the S protein. Additionally, we build a library of 55 antibodies with structures available from the Protein Data Bank (PDB) and analyze their two-dimensional (2D) and three-dimensional (3D) characteristics by employing computational biophysics. We further predict the mutation-induced binding free energy (BFE) changes of S protein and antibody complexes using a model called TopNetTree based on deep learning and algebraic topology. The performance of our model has been extensively validated by its prediction of experimental deep mutation data. Our significant findings are as follows. First, we reveal that none of the known mutations are safe to all antibodies. On average, most mutations (i.e., 71%) will weaken the binding between the S protein and antibodies, which implies that vaccines will also be compromised by existing mutations. Additionally, we identify 31 antibody disrupting mutants that dramatically weaken the binding between the S protein and most known antibodies. Moreover, we find that most RBD mutations (i.e., 64.9%) will enhance the binding strength between the S protein and angiotensin-converting enzyme 2 (ACE2), which implies that most existing mutations will strengthen the SARS-CoV-2 infectivity. This result is consistent with the natural selection of mutations and our earlier findings.84 Finally, we discover that the maximal BFE change magnitudes of binding-strengthening mutations are much smaller than those of binding-weakening mutations for all antibodies, which shows that current human antibodies were optimized with respect to the original S protein and are prone to the S protein mutations. Our findings indicate the pressing need to keep developing mutation-resistant vaccines and antibody drugs and to be ready for seasonal vaccinations.
Footnotes |
† Electronic supplementary information (ESI) available: (S1) Methods; (S2) multiple sequence alignments of antibodies and pairwise identity scores; (S3) random coil percentages of antibody paratopes; and (S4) additional analysis of antibody–S protein complexes. See DOI: 10.1039/d1sc01203g |
‡ The first three authors contributed equally. |
This journal is © The Royal Society of Chemistry 2021 |