Stuart Lang
*
New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 0SS, UK. E-mail: stuart.lang@cresset-group.com
First published on 17th April 2025
Treatment and prevention of HIV/AIDS infections represents a significant global challenge, with this being the cause of a substantial number of deaths each year. HIV-CA, the protein responsible for protecting the viral RNA and facilitating reverse transcription, has emerged as an important target in drug discovery. This review applies various computer drug discovery tools for the analysis and understanding of not only the HIV-CA protein, but also the ligands reported to bind to the site at the NTD-CTD interface between two capsid monomer units. Combining this evaluation with reported experimental data, highlights the effects that changes to the ligands make to the binding affinity. This analysis, including identifying areas of the ligand that have not been adequately explored, allows for the generation of guidelines that can be applied to the design of novel ligands that bind to HIV-CA.
The HIV-CA protein, consisting of an of assembly roughly 1200–1500 monomer units of repeated hexamers (approximately 250) and pentamers (exactly 12) producing a fullerene cone geometry,3–5 provides an opportunity for therapeutic small molecule intervention to inhibit infection of the HIV virus. There are two strategies that have been proposed for small molecules interacting with HIV-CA, the first is premature uncoating of the protective HIV-CA protein before it has the opportunity to infect the cell, the second is stabilisation of the HIV-CA protein so that it is unable to release the viral RNA into the host cell.6,7
A key interaction that has been exploited in the design of molecules that bind to HIV-CA is one also demonstrated by two of the co-factors (CPSF6 and Nup153),8,9 which are both involved in capsid nuclear entry. This site, at the NTD-CTD, is predicted as being druggable using a pocket detection tool10 (Fig. 1A and B). The highlighted phenylalanine in both CPSF6 (Fig. 1C) and Nup153 (Fig. 1D) interact, through the phenyl ring, with Leu56, Met66, and the sidechain of Lys70 and, through the N–H and CO, with Asn57.11 An interesting point of note is the interaction with the sidechain of Lys70. While, due to the amine head group, lysine can be considered a polar amino acid, it also contains a lipophilic chain that links this amine group to the protein backbone. This sidechain, as is seen here, can make interactions with lipophilic groups.
![]() | ||
Fig. 1 HIV-CA binding site. (A) HIV-CA protein (PDB: 5HGN) with hydrophobicity surface added. Areas coloured yellow represent areas of high lipophilicity while areas coloured blue represent areas of high hydrophilicity with grey coloured areas falling in an intermediate range. The green, orange, and purple regions (B) were detected using a pocket detection tool and represent the NTD-CTD binding site covered in this review. This pocket has been shown to bind to both co-factors CPSF6 (C, PDB: 8CL1) and Nup153 (D, PDB: 8CKY). |
These interactions are key recognition motifs present in all the small molecules that will be discussed in this review. This phenylalanine-glycine (FG) binding site12 is mainly lipophilic and is situated predominately in the NTD, however ligands that bind in this site can also, by accessing a suitable vector, extend into the interface between the NTD and the CTD. Lipophilic amino acids, particularly aromatic amino acids like phenylalanine, tyrosine, and tryptophan often make key interactions in protein–protein/peptide interactions.13 This is because groups of this type are not happy being exposed to the polar aqueous environment that can exist outside the protein, instead wanting to bury their lipophilicity in more suitable surroundings.14
One small molecule that takes advantage of this phenylalanine-based interaction with HIV-CA is PF-3450074 (1), herein referred to as PF-74 (Fig. 2A).15 This molecule contains a central phenylalanine unit that conserves the interactions, through the phenyl ring, with Leu56, Met66, and the sidechain of Lys70 along with the interactions, through the N–H and CO, with Asn57 (Fig. 2B).16 PF-74 (1) also contains a tertiary amide which, in the binding site, adopts a cis geometry allowing the phenyl ring to interact with the methyl group of Thr107.
![]() | ||
Fig. 2 PF-74 (1) bound to HIV-CA (PDB: 4U0E). (A) PF-74 (1) 2D structure. (B) PF-74 (1) inside binding site of HIV-CA. (C) Hydrophobicity surface added to HIV-CA. 3D-RISM water analysis of PF-74 (1) bound to HIV-CA, with predicted thermodynamically favourable (green) and unfavourable (red) water molecules shown as spheres. (D) PF-74 (1) shown with positive (red), negative (blue), hydrophobic (gold), and van der Waals (yellow) field surfaces. (E) PF-74 (1) shown with positive (red), negative (blue), hydrophobic (gold), and van der Waals (yellow) field points. (F) PF-74 (1) shown with Electrostatic Complementarity™ (EC) to HIV-CA surface of with favourable (green) and unfavourable (red) regions highlighted. |
PF-74 (1) makes a series of interactions with the amino group on the head of Lys 70, with the CO making a hydrogen bonding interaction and indole making two cation–π interactions (one with each ring). The pyrrole component of the indole also makes a cation–π interaction with Arg173, which is situated in CTD. The N-H of the indole also makes a hydrogen bonding interaction with the C
O in the backbone of Gln63.
A key analysis of this system is mapping the regions that predict if water molecules will be energetically favoured or unfavoured.17 Using a 3D-RISM calculation18 the positions that water is predicted to be located are shown as spheres, with the colours highlighting the water molecules that can be easily displaced (red) and those that will suffer a binding energy penalty if a ligand is displaced (green). It is clear from this analysis that there is space to grow the ligand in the region of the phenyl ring next to the cis amide (Fig. 2C) with the water molecules in this area being predicted to be energetically disfavoured.
The water molecules in the area around the methyl group of the cis amide (Fig. 2C), toward the NTD-CTD, are not as energetically unfavourable as those around the phenyl ring. However, as the majority of these predicted water molecules are coloured white, there is still an opportunity to explore these regions of the protein when developing molecules that bind to this site in the HIV-CA protein, a strategy adopted in the discovery of lenacapavir (also known as GS-6207).
Lenacapavir (2)19 binds to the same site as PF-74 (1) and there are similarities in the binding mode adopted for the two molecules. One of the major problems with PF-74 (1) is its poor metabolic stability, resulting in a short half-life.20,21 The peptidyl nature of PF-74 (1) makes it susceptible to enzymatic degradation of the amide bond. This metabolism issue was a key consideration in the development of lenacapavir (2). Not only has the phenyl ring present in the phenylalanine component of the molecule been replaced with a difluoroaryl ring, but the labile tertiary cis amide has been locked in the desired bioactive conformation by introduction of a pyridine ring (Fig. 3B).22 The nitrogen atom of this pyridine makes the same interaction with the N–H of Asn57 that was seen with the amide CO that it replaced in PF-74 (1), representing an example stabilising the molecule with use of a bioisostere replacement.23,24
![]() | ||
Fig. 3 Lenacapavir (2) bound to HIV-CA (PDB: 6VKV). (A) Lenacapavir (2) 2D structure. (B) Lenacapavir (2) in binding site of HIV-CA. (C) Lenacapavir (2) shown with positive (red), negative (blue), hydrophobic (gold), and van der Waals (yellow) field surfaces. (D) Lenacapavir (2) shown with positive (red), negative (blue), hydrophobic (gold), and van der Waals (yellow) field points. (E) Lenacapavir (2) shown with Electrostatic Complementarity (EC) to HIV-CA surface of with favourable (green) and unfavourable (red) regions highlighted. |
The phenyl ring that interacts with Thr107 has been replaced with a chloro-indazole. The chlorine atom makes a halogen bond with the CO of Asn74. There is an additional interaction between the N–H of Asn74 and a S
O, along with the sulfonamide N anion, of the sulfonamide substituent on the indazole ring. This indazole ring can also make a π–cation interaction with the amine on Lys70, which although also possible with the phenyl ring of PF-74 (1) was not displayed in the crystal structure (Fig. 2B). An additional factor that may give an improvement in binding affinity is that the highly substituted indazole is considerably larger than the phenyl ring that it replaced. This increase in volume will have resulted in displacement of unhappy water molecules, predicted to be present in the PF-74 (1) system (Fig. 2C).
A key difference with lenacapavir (2) and PF-74 (1) is in the pyrazole derivative that has replaced the indole at the NTD-CTD interface. There are two particularly noticeable variations, the first is that the pyrazole no longer contains a N-H therefore losing the H-bonding interaction with Gln63, although this is replaced with a hydrophobic interaction with the chain of the Gln63, the second is that the carbocyclic ring fills a different portion in the NTD-CTD interface than the indole arene system of PF-74 (1). This new binding mode causes movement of Met66 to bind to the group at the NTD-CTD interface and is now not interacting with the difluoroaryl ring. In addition to the previously discussed interaction the cyclopropane makes with Gln63, it also makes an additional hydrophobic interaction with the chain of Tyr169 in the CTD, along with the pyrazole ring conserving the CTD π–cation interaction with the Arg173.
In addition to optimisation within the silhouette of PF-74 (1), lenacapavir (2) also grows into a previously unexplored pocket of the protein. This region was previously identified as a druggable pocket using the pocket detection analysis (Fig. 1A and B, highlighted in purple). Through use of an alkyne linker, lenacapavir (2) can make interactions through the SO groups of a sulfone with a N–H on Asn57 and the O–H of Ser43. An additional hydrophobic interaction is made between the Me group attached to the sulfone and the sidechain of Gln50.
PF-74 (1) and lenacapavir (2) can also be expressed in terms of their field surface (Fig. 2D and 3C respectively),25 which can be used to calculate a molecular interaction potential for the respective ligand. This molecular field surface can be simplified to a field point (Fig. 2E and 3D), these field points can be used as a 3D molecular descriptor for ligand-based design and virtual screening.26 It is not only for the ligand that a field surface can be calculated, this can also be done with the protein. Generation of the electrostatic field surface for both ligand and protein allow the Electrostatic Complementarity™ (EC) surface to be calculated (Fig. 2F and 3E).27 Areas that are a match, or complement each other, are shown in green, with electrostatically mismatched regions shown in red. From a visual perspective the EC surface can be represented on either the protein or the ligand (it is shown on the ligand in Fig. 2F and 3E). This EC surface can be used to assess the areas of the ligand that are electrostatically compatible with the protein and the areas where a clash is observed.
A striking observation made from the analysis of the EC surface for PF-74 (1) is, while there is a definite interaction between the π-systems of the indole group with the NH3 cation on Lys70, this interaction is not electrostatically favourable (Fig. 2F). This may be due to the indole ring being a suboptimal position with respect to Lys70, because its N-H making an electrostatically favourable interaction with the amide CO. Although not governed by any interaction with the amine group in Lys70 the phenyl ring in the phenylalanine component of PF-74 (1) is also not electrostatically compatible with its protein environment. While there is slight improvement with the EC in lenacapavir (2), perhaps resulting from the introduction of electronegative fluorine atoms, there are still areas that have poor EC, most notably in the hydrophobic groups in the vicinity of the sulfone.
Lenacapavir (2) inspired the design of GSK878 (3),28 which maintains the difluoroaryl ring and the pyrazole that binds at the NTD-CTD interface (Fig. 4). It also contains the indazole ring, however because of the replacement of the pyridine with a quinazolinone, the CH2CF3 group can be trimmed back to a methyl group and still lock its axial chirality, specifically atropisomerism.29 Another interesting observation when comparing the 6VKV crystal structure with lenacapavir (2) bound and the 8FIU structure containing GSK878 (3) is that the Thr107 residue has rotated, with the Me group making a lipophilic contact with the indazole of lenacapavir (2), while the OH makes an H-bonding interaction with the CO in the quinazolinones group of GSK878 (3). Furthermore, the replacement of the sulfone in lenacapavir (2) with a morpholine in GSK878 (3) shows an improved EC profile.
![]() | ||
Fig. 4 GSK878 (3) bound to HIV-CA (PDB: 8FIU). (A) GSK878 (3) 2D structure. (B) GSK878 (3) in binding site of HIV-CA. (C) GSK878 (3) shown with Electrostatic Complementarity (EC) to HIV-CA surface of with favourable (green) and unfavourable (red) regions highlighted. |
While the evolution of PF-74 (1) to lenacapavir (2) and subsequent design of GSK878 (3) represents a tour de force in rational structure-based drug design, an alternative approach to finding compounds that bind in this pocket is using high throughput screening (HTS).30 An HTS of ∼60000 compounds was used as part of the process to identify BI-2 (4) (Fig. 5A), along with its structural analogue BI-1 (5).31 Crystallography of BI-2 (4) (Fig. 5B) shows, while it binds to similar pockets in the NTD to PF-74 (1), it does not make any interactions at NTD-CTD interface observed with the indole in PF-74 (1).16
![]() | ||
Fig. 5 BI-2 (4) bound to HIV-CA (PDB: 4U0F). (A) BI-2 (4) 2D structure. (B) BI-2 (4) in binding site of HIV-CA. (C) BI-2 (3) shown with Electrostatic Complementarity (EC) to HIV-CA surface with favourable (green) and unfavourable (red) regions highlighted. |
The unsubstituted phenyl ring of BI-2 (4) makes hydrophobic interactions with Leu56 and the sidechain of Lys70, like PF-74 (1). However, the position of the phenyl ring in BI-2 (4) is slightly different, allowing it to interact with Leu69. As was the case with lenacapavir (2), the interaction between this aryl system and Met66 is lost. However, as BI-2 (4) does not have a group at the NTD-CTD interface, there is no opportunity to regain this contact. An interesting observation when analysing the binding mode of BI-2 (4) (Fig. 5B) is that the amide group on the sidechain of Asn57 has rotated (compared to Fig. 2B, 3B, and 4B) to satisfy the alternative hydrogen bond donor/acceptor requirements with this scaffold.
The phenol ring of BI-2 (4) while located in the same pocket at the phenyl ring connected to the cis amide in PF-74 (1), does not interact with the same residues. In fact, the Thr107, which made a hydrophobic interaction with the phenyl ring in PF-74 (1) has rotated, as seen with GSK838 (3), in this system allowing the alcohol oxygen to make an H-bond with the N-H of the lactam in BI-2 (4). This places the phenol ring deeper in this pocket, shown by the water analysis (Fig. 2C) of PF-74 (1) to contain energetically unfavoured water molecules. The aryl system of this phenol makes hydrophobic interactions with the side chain of Lys70 and Ile73. An additional interaction on this system, which would not be possible with BI-1 (5), is the phenol OH which can make an H-bond with the CO of the Asn74, a residue that was also targeted with lenacapavir (2) and GSK878 (3).
While the phenyl and phenol rings of BI-2 (4) map well with their equivalent rings systems in compounds 1–3, the alternative orientation of the Asn57 primary amide makes growth toward Tyr169 and Arg171 in the CTD difficult with this scaffold. The vector exploited by compounds 1–3 is unavailable in BI-2 (4), with the pyrazole N acting as a hydrogen bond acceptor for the N-H of Asn57. Furthermore, the requirement of the pyrazole N–H to bind to the CO of Asn57 means that the vector is also not available for growing into the purple pocket (Fig. 1A and B). The presence of the lactam C
O in BI-2 (4) also makes growth into this pocket from that position of the scaffold challenging.
From this analysis, it is possible to map the key residues that are interacting with the ligand and are likely to be responsible for activity. The residues that interact with the phenylalanine unit (and its bioisosteres) in the FG binding site are essential for the function of the capsid and therefore unlikely to be mutated. The main residue responsible for H-bonding is Asn57, which functions as both an H-bond donor and acceptor. The Leu56 and Lys70 are also key residues in this area that all the ligands described in this analysis bind to, with Met66 and Leu69 also providing binding opportunities.
Thr107 is an interesting amino acid residue, in both PF-74 (1) and lenacapavir (2) in that it makes a hydrophobic interaction with an aryl ring. However, if this group is rotated, exposing the alcoholic O–H group, it can make polar interactions such as H-bonding with the CO of the quinazolinones GSK878 (3) and the lactam N-H in BI-2 (4). This highlights that, while in molecules such as lenacapavir (2), it is possible to take advantage of the highly lipophilic nature of the protein to improve potency, there is an opportunity to interact with the polar groups that are presented within the pocket.
The Asn74 residue offers another opportunity within this pocket. Both lenacapavir (2) and GSK878 (3) make interactions with this residue via both a halogen bond with the chloro group and an H-bond with the SO and N anion of the sulfonamide that are both substituents on the indazole system. This residue also interacts with the phenolic OH of BI-2 (4), although interaction with Asn74 is not possible with either PF-74 (1) or BI-1 (5).
Interaction with amino acids that are in the CTD side of the NTD-CTD interface offers an attractive strategy, with both PF-74 (1) and lenacapavir (2) taking advantage of this. While many of the interactions made with groups at this interface are with amino acids in the NTD, such as Gln63 and Lys70, the π–cation interaction of PF-74 (1) with Arg173 provides an opportunity to also bind to the CTD, as does the lipophilic interaction with the cyclopropyl ring of lenacapavir (2) and GSK878 (3) to Tyr169.
This binding pose analysis also highlights the transient nature of the specific positions of the amino acids within the pocket, particularly those with flexible side chains. Along with the Thr107 rotation, we have also observed a rotation of the key Asn57 residue when binding to BI-2 (4) and BI-1 (5) compared to the other ligands, along with significant movement of the Met66 and Lys70 side chains. Similarly, the H-bonding interaction that the indole N–H of PF-74 (1) makes with CO in the amide side chain of Gln63 is completely different to the hydrophobic interaction that the cyclopropane of lenacapavir (2) and GSK878 (3) make with the lipophilic chain of Gln63. This means that care must be taken when designing new ligands as modifications made at one part of the molecule could have an impact on the binding of another group.
Water molecules are used to fill the unoccupied pockets in a protein,37 as demonstrated with the 3D-RISM analysis (Fig. 2C), these can be energetically stable within the protein. Displacement of these water molecules may cause a lower binding affinity. With water molecules that are energetically unstable, displacement of these water molecules could lead to an increased binding affinity.38
As not all the parameters responsible for the binding affinity are captured in the protein-ligand binding pose, with the ligand's behaviour outside of the protein also being critical. A ligand that has a high degree of flexibility will exist in multiple low energy conformations while in solution.39 There will be an energy penalty associated with reorientating this ligand from this solution conformation(s) to the bioactive conformation, which may be higher in energy. Another factor that will affect a ligand's activity is its lipophilicity.40,41 Compounds that are highly lipophilic will prefer to bind to lipophilic areas of a protein target rather than the more hydrophilic aqueous environment that predominately exists outside the protein. This means that lipophilic molecules will appear to be more potent, although because they prefer to bind to proteins in general, as opposed to being in solution, there will be little specificity for the target protein compared to other proteins.42 This can lead to selectivity issues and can result in problems associated with toxicity43 and metabolism.44,45
To understand the contribution to binding energy for each part of a molecule it is necessary to look for patterns, or structure activity relationships (SAR).46 This SAR is not based on in silico analysis, but rather in finding patterns in the experimentally measured activity data that is associated with a series of compounds that share a binding site. As discussed, PF-74 (1), lenacapavir (2), GSK878 (3) and BI-2 (4) share a binding site, with shared interactions being made with common amino acid residues in the protein. This means that groups in similar positions can be compared. By analysis of the compounds that are structurally related (Fig. 6) it is possible to track the evolution of PF-74 (1) to lenacapavir (2) and its next generation analogues, including GSK878 (3).
![]() | ||
Fig. 6 Key molecule that bind to same HIV-CA binding site – evolution from PF74 (1) to more advanced molecules. |
The introduction of the pyridine ring to replace the cis amide of PF-74 (1), while eliminating the possibility of rotation around the amide bond, introduces an aspect of axial chirality to the molecule. By placing larger groups at the ortho positions to the biaryl bond, such as the CH2CF3 on lenacapavir (2) or the dual effect of equivalent Me group coupled with the CO bond in molecules like 12 and GSK878 (3), it is possible to increase the energy needed to move from one form to another (Table 1).29
Compound | pIC50 | clog![]() |
TPSA | LLE | LE | pIC50 (QSAR) |
---|---|---|---|---|---|---|
PF-74 (1) | 6.2 | 4.4 | 65 | 1.8 | 0.27 | 6.4 |
Lenacapavir (2) | 9.7 | 8.2 | 158 | 1.4 | 0.21 | 9.7 |
GSK878 (3) | 10.4 | 7.5 | 156 | 2.9 | 0.23 | 10.4 |
BI-2 (4) | 5.7 | 2.6 | 78 | 3.1 | 0.36 | 5.8 |
BI-1 (5) | 5.1 | 2.3 | 71 | 2.8 | 0.33 | 5.3 |
6 | 6.3 | 5.1 | 65 | 1.2 | 0.27 | 6.5 |
7 | 5.6 | 6.2 | 58 | −0.6 | 0.23 | 5.3 |
8 | 6.1 | 6.6 | 58 | −0.5 | 0.25 | 5.8 |
9 | 5.3 | 4.0 | 60 | 1.3 | 0.26 | 5.2 |
GS-CA1 (10) | 9.4 | 8.4 | 158 | 1.2 | 0.21 | 9.5 |
KFA-012 (11) | 10.1 | 7.9 | 158 | 2.2 | 0.23 | 10.1 |
12 | 10.3 | 6.9 | 144 | 3.4 | 0.26 | 9.9 |
In fact, it was shown that the binding energy for compound 12 in its more active axially chiral form is more than 1600 time more active than its less active partner. This represents an example of the benefits to activity, of locking a molecule in a bioactive conformation by restricting its conformational flexibility.28
From analysis of the LLE plot47 (Fig. 7A), the increase in activity in moving from PF-74 (1) to lenacapavir (2) has been driven by an increase in logP. Despite lenacapavir (2) being 3.8 log units more active than PF-74 (1) it has a lower LLE (1.4 compared to 1.8). The addition of the –Cl atom on the aromatic ring in compound 6, while allowing for an additional interaction with Asn74, gives a modest increase in activity, but because of the increase in log
P of around 0.7 results in a drop in LLE with no effect in LE being observed for this change. The introduction of the –OH in BI-2 (4) to give an equivalent contact with Asn74 gives a larger jump in activity, coupled with an improvement in LLE when compared to BI-1 (5) that is unable to make this interaction.31
![]() | ||
Fig. 7 (A) LLE plot of key molecule that bind to same HIV-CA binding site. (B) LE vs. LLE plot of key molecule that bind to same HIV-CA binding site. |
From analysis of the LE vs. LLE plot47 (Fig. 7B), lenacapavir (2) has a lower LE than PF-74 (1). One reason for this is that the addition of the pyridine ring. While this is required to reduce the metabolism of the molecule, comparing compound 7 with PF-74 (1) saw both a drop of potency and increase in logP.48 With the design of compound 7, there have been no steps taken to lock this biaryl system in the correct conformation in preference to the alternative conformations that exist as a result of the axial chirality that has been introduced, which is reflected with this drop in activity.29 While this modification only resulted in a moderate drop in LE, when comparing compound 7 with PF-74 (1), the LLE for this compound is now negative. The replacement of the indole in compound 7 with the pyrazole in compound 9, while leading to a slight drop in potency, reduced the log
P significantly meaning that the LLE has now increased to 1.3 with the LE being at a similar level as observed in compound 7.
Compound 9 represents the minimum pharmacophore that has been elaborated upon to build the more advanced key ligands covered in this analysis. The aryl system attached to the pyridine can be replaced with an indazole that has been engineered to control the axial chirality and interact with the Asn74.49,50 The pyrazole has also been extended with hydrophobic groups that while displacing water molecules at the NTD-CTD interface also significantly increase the logP of the molecule. These more advanced molecules also explore the previously identified druggable pocket (Fig. 1A and B, highlighted in purple) using either a sulfone tethered by an alkyne (as shown in compounds 2, 10, and 11)51 or by replacement of the pyridine ring with a quinazolinone (as shown in compounds 3 and 12).28 The move from the pyridine to the quinazolinone, while only improving the activity slightly, reduces the log
P considerably. This brings the LLE values of compounds 12 and GSK878 (3) in line with that observed for BI-1 (5) and BI-2 (4), but with significantly higher activity.31
QSAR, is a ligand-based technique that does not require any protein information in the calculation.52,53 However, as in this analysis, the protein structural information can be used to generate the ligand alignments, ensuring that the ligands are in the correct conformation to interact with the protein.54 Using Activity Atlas™, a qualitative QSAR tool, it is possible to generate an activity cliffs analysis of the system, which can allow a visual representation.55,56 Using a set of 147 ligands reported in the literature15,22,28,31,48,51,57–69 it is possible to map, when a relevant crystallographic ligand16,22,28 is added as a reference, the areas in the molecule that benefit from positive or negative electrostatics (Fig. 8A) and those that favour and disfavour hydrophobic groups (Fig. 8B). This analysis highlights the preference of a negative electrostatic field in the indazole ring of GSK878 (3). It is also evident that much of the activity of these molecules benefits from the addition of hydrophobic groups in various locations. However, this strategy has resulted in molecules with a high logP. Another aspect that is highlighted by this analysis it the lack of diversity that exists in the phenyl/difluoro motif of the molecule that interacts with the FG binding site, meaning that it is not possible to use this method to predict the changes in this region that will improve the activity of the molecule.
Quantitative QSAR analysis53 can also lead to a better understanding of the system. Using the same ligand set that was used in the Activity Atlas analysis, it is possible to generate a QSAR model that can be used to predict the activity of compounds that are within the activity range of the set used to create the QSAR model (Table 1). Due to the significant crystallographic structural information that is available for this system, meaning the ligands can be aligned with a high degree of certainty, the model generated for this analysis has an R2 = 0.92 for the test set.
In a recent study70 molecules with low molecular weight, or fragments,71,72 were shown to bind to HIV-CA. Fragments, due to their size are often identified as low binding affinity molecules, however the binding displayed can often be more efficient than that of molecules identified by HTS.73 While all the fragments reported have an aryl group in the FG binding site as expected (Fig. 9), it was demonstrated that an aromatic ring is not required in the pocket with Asn74 (Fig. 9B). In compound 13, the aromatic ring present in all previous examples, is replaced with a lactam that H-bonds to the amide sidechain of Asn74.
![]() | ||
Fig. 9 Selected fragments known to bind to HIV-CA. (A) Fragments 13–19 2D structure. (B) Fragment 13 in binding site of HIV-CA (PDB: 8QUK). (C) Fragment 14 in binding site of HIV-CA (PDB: 8QUL). (D) Fragment 15 in binding site of HIV-CA (PDB: 8QUW). (E) Fragment 16 in binding site of HIV-CA (PDB: 8QUX). (F) Fragment 17 in binding site of HIV-CA (PDB: 8QUY). (G) Fragment 18 in binding site of HIV-CA (PDB: 8QV9). (H) Fragment 19 in binding site of HIV-CA (PDB: 8QVA). |
The crystallographic evidence presented demonstrates a preference for aromatic groups to be present in the phenylalanine region of PF-74 (1). These interactions are predominately lipophilicity, with the π-system of these ligands only occasionally making a π–cation interaction with Lys70 (Fig. 9B–D, and F). One example, compound 14 (Fig. 9C), showed that a pyridyl ring was tolerated in this position, as opposed to the regular phenyl group. The pyridyl nitrogen was able to make an interaction with the sulfur atom on the flexible sidechain of Met66 with the NH2 group making additional H-bonds with Asn57 and a water molecule that that is predicted by 3D-RISM to be part of a stable water network, also binding to Gln63.
Care is needed when introducing polarity to the phenylalanine pocket, introducing a phenolic OH, compound 15 (Fig. 9D), caused the entire fragment to adopt an alternative binding pose. While this adjustment is possible with fragments in a more complicated or optimised ligand, where adopting a new binding mode will not be possible, this type of change would result in a drop or complete loss of activity. However, in this case the new binding mode allows the phenolic OH to make a similar interaction with Asn74 as was seen with BI-2 (4) (Fig. 5B). Interaction with Asn74 provides an opportunity to introduce polarity to a molecule, in a region of the protein shown by 3D-RISM (Fig. 9B–H) to contain several energetically disfavoured waters.74 Maximising the occupancy of this pocket, by displacement of these water molecules,75 was a tactic used to optimise ligands such as lenacapavir (2)22 and GSK878 (3).28
It is also noteworthy that in some examples (Fig. 9D–H) an ethylene glycol molecule has been crystalised in this region. Ethylene glycol is commonly used as a co-solvent to obtain protein crystal structures, its presence in the obtained crystal structure can be used to identify areas of easily displaceable water.76 The water molecule is so easy to displace, even ethylene glycol can displace it.
The absence of the phenolic OH allows compound 16 to adopt the more expected binding pose (Fig. 9E), with the lactam making H-bonds with Asn57. Introduction of an additional carbonyl, as seen in compound 17 (Fig. 9F), induces a rotation of Thr107 to allow an H-bond to be made between this CO and the OH of Thr107. This orientation of Thr107 has also been observed in the crystal structures of GSK878 (3) (Fig. 4B) and BI-2 (4) (Fig. 5B).
With the addition of a Br group at the 7-position of the benximidazol-2-one in compound 18 (Fig. 9G), it was possible to increase the binding affinity with HIV-CA to a pKi of 5.3, with the Br group interacting with a water molecule77,78 that is shown to be stable in a 3D-RISM calculation. Taking advantage of the interactions seen with the NH2 group in compound 14 (Fig. 9C), it was possible to replace the Br with an NH2 to give compound 19 (Fig. 9H). Not only does compound 19 maintain the interaction with the stable water molecule, but it is also able to make an additional interaction with Asn57, with a pKi of 5.3 also being seen with this compound.
• All active ligands have an aryl ring in the FG binding site, equivalent to the position of the phenylalanine in CPSF6 and Nup153. PF-74 (1) also contains a phenylalanine unit, with a lipophilic aryl replacement introduced to lenacapavir (2) and GSK878 (3) along with BI-2 (4). This phenyl system interacts with a pocket generated from Leu56, Met66, Leu69, Lys70, Ile73, with the absence of predicted water molecules in 3D-RISM showing that this pocket is well occupied.
It has been shown that a pyridyl N can be introduced, as seen with compound 14, with an interaction being made with Met66. But care needs to be added when adding more polar groups as introducing a phenolic OH, with compound 15, resulted in a change in binding mode. While there is potential to make a π–cation interaction with Lys70, the interactions are primarily lipophilic, it is unclear (based on the evidence presented) if saturated lipophilic groups are also tolerated.
• H-bonding with sidechain Asn57 is beneficial. This interaction occurs with amide groups CPSF6, Nup153, and PF-74 (1). The pyridine heterocycle used to replace an amide in lenacapavir (2) and the quinazolinones used in GSK878 (3) function as bioisosteres to mimic interaction with the NH of Asn57. This Asn57 sidechain can rotate, as seen with BI-2 (4). This orientation of the Asn57 sidechain has not been exploited as extensively. This may be because of limitations in vectors to grow toward the CTD with ligands that induce this Asn57 orientation.
• The pocket generated from Lys70, Ile73, Asn74, and Thr107 was mainly filled using lipophilic groups with PF-74 (1), lenacapavir (2), and GSK878 (3), with the latter two compounds making an interaction with the NH of Asn74 using a Cl. BI-2 (4) showed that it is possible to interact with the CO of Asn74 using a more polar OH group, an interaction also replicated with compound 15. Compound 13 showed that aromaticity is not needed in this pocket, at least providing an interaction is made with Asn74. Like Asn57, Thr107 has demonstrated two different orientations, one with the Me group making lipophilic interactions [e.g. PF-74 (1) and lenacapavir (2)] and another with the OH making polar interactions [e.g. GSK878 (3) and BI-2 (4)]. 3D-RISM shows that there is a network of unstable water molecules in this pocket, this could be exploited for increasing binding.
• The channel between the NTD and CTD is made primarily from Gln63, Met66, Lys70 in the NTD and Tyr169 and Arg173 in the CTD. While not all classes of molecule have exploited this interface between two monomers of the HIV-CA hexamer, it has been a strategy employed by the more advanced ligands. The binding mode of the indole in PF-74 (1) is predominately driven by the formation of π–cation sandwich between Lys70, the indole of PF-74 (1) and Arg173. An additional H-bond exists between the indole NH and the CO in the sidechain of Gln63.
However, the system developed for lenacapavir (2) and used in GSK878 (3), while maintaining a single π-system to facilitate a π–cation sandwich, builds in a different direction to make more lipophilic interactions with Gln66, Met66 and Tyr169. While this strategy has, at least in part, allowed a 3–4 log unit increase in potency, it has done so at the expense of the physical chemical profile of the molecule.
• There is a network of stable water molecules that need to be displaced to access the NTD-CTD interface, as seen with 3D-RISM analysis in Fig. 9. This means that a significant increase in binding affinity must be achieved in accessing this channel to counteract the penalty associated with displacing these water molecules.
• Access to additional residues Ser41, Gln50, and an additional interaction with the other face of Asn57 is achieved with lenacapavir (2) and interaction with Ile37 in the CTD is seen with GSK878 (3) by growing through a channel between Asn57 and Thr107. It is unclear, due to the limitation in published data, the benefit to binding of these additional interactions. However, like with building into the NTD-CTD interface used by the indole of PF-74 (1), 3D-RISM has shown that there are water molecules that need to be displaced to access these residues. While these were coloured white in the 3D-RISM (Fig. 2C), the increase in size of the ligand required to reach these residues, means there may be limited benefit to growing in this channel.
Visualisation of a ligand's binding to HIV-CA in 3D provide opportunities in generating designs, in effect to rationally design new ligands based on the binding conformation of an existing ligand. Modelling the binding conformation of the ligand in the active site, and the interactions made with the protein, allow alternative scaffolds to be explored that provide access to novel vectors, with these designs prioritised using techniques like EC and QSAR.
Further information is obtained by conducting an analysis of the water molecules in the binding site of HIV-CA. Water molecules fill the volume of the protein that is otherwise unoccupied. Through understanding of the binding energy of these water molecules, these water molecules can be classified as thermodynamically favourable and unfavourable. This information is particularly useful when growing ligands into additional pockets, for example in the optimisation of fragments.
Computational drug discovery tools will never replace the requirement to experimentally synthesize and test molecules in suitable assays. However, it should be used to provide insights that may otherwise go unnoticed. This reduces the number of molecules that require preparation and allows focus on synthesising the molecules that will most efficiently progress the project.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5md00111k |
This journal is © The Royal Society of Chemistry 2025 |