Toni T.
Metsänen‡
a,
Katrina W.
Lexa‡
*b,
Celine B.
Santiago
a,
Cheol K.
Chung
c,
Yingju
Xu
c,
Zhijian
Liu
c,
Guy R.
Humphrey
c,
Rebecca T.
Ruck
c,
Edward C.
Sherer
b and
Matthew S.
Sigman
*a
aDepartment of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, USA. E-mail: sigman@chem.utah.edu
bModeling and Informatics, MRL, Merck Sharp & Dohme, Rahway, New Jersey 07065, USA. E-mail: katrina.lexa@gmail.com
cProcess Research and Development, MRL, Merck Sharp & Dohme, Rahway, New Jersey 07065, USA
First published on 18th July 2018
Quantitative structure–activity relationships have an extensive history for optimizing drug candidates, yet they have only recently been applied in reaction development. In this report, the predictive power of multivariate parameterization has been explored toward the optimization of a catalyst promoting an aza-Michael conjugate addition for the asymmetric synthesis of letermovir. A hybrid approach combining 2D QSAR and modern 3D physical organic parameters performed better than either approach in isolation. Using these predictive models, a series of new catalysts were identified, which catalyzed the reaction to provide the desired product in improved enantioselectivity relative to the parent catalyst.
QSAR in the context of catalysis remains in its infancy.4–11 In the reported studies, two distinct approaches have been investigated: (1) the use of traditional QSAR-type descriptors evolved in the medicinal chemistry field, which are often 2D;4 and (2) the more recent use of parameters derived from quantum mechanics (QM) to accurately model key structural features involved in the hypothesized reaction mechanism.11 Examples of these descriptor types in the context of catalysis are depicted in Fig. 1. Clearly, these descriptors are innately unique and, if combined, may synergize to provide better statistical fitting of a dataset. Thus, we hypothesized that integrating these two descriptor sets had the potential to improve a recently developed enantioselective aza-Michael conjugate addition, representing a key step in the industrial synthesis of the approved pharmaceutical, Prevymis™ (letermovir; MK-8228).12,13
In the initial disclosure, the product enantioselectivity from this key step was 88.3% ee using catalyst 2c (Scheme 1) for the commercial process. Importantly, as part of the initial optimization effort, a library of 29 bistriflamide catalysts was generated and evaluated under the same reaction conditions with the resultant enantioselectivity (ee) and conversions measured in triplicate. This library provides a wide range of ee's, offering an excellent foundation to examine the potential benefits of QSAR (Scheme 1). Despite the range in ee, this library failed to demonstrate any intuitive SAR trends. For example, during library design, it was hypothesized that amine pKa would track with ee given the reaction mechanism; however, no relationship was observed. Furthermore, while both electron-withdrawing and -donating substituents (e.g., 2h and 2l, respectively) as well as steric bulk (2f and 2c) were tolerated, only four catalysts (2a–2d) provided improvement to the parent unsubstituted catalyst 2e.
On this basis, we set out to identify improved catalysts through the evaluation of QSAR models and hypothesized that the resultant models could provide a more detailed understanding of the salient features required for enantioselective catalysis. Herein, we present a stepwise analysis of both 2D and modern physical organic descriptor-based models as drivers of this optimization campaign. Ultimately, we determined that combining these descriptor sets enabled identification of enhanced catalysts with non-intuitive structural features through a highly predictive virtual screening campaign.
As all of the catalysts share a common C2-symmetric bistriflamide core, we envisaged that the parent arene could be used as an efficient surrogate structure to rapidly explore quantum mechanical-derived physical organic descriptor space. Importantly, a simple arene proxy significantly limited the number of low energy conformations to be considered. Multi-variate linear regression modeling17,18 produced a relatively simple model using two parameters (NBOC2 and P19) expressed in three terms (NBOC2, P, and NBOC2 × P) to achieve a good statistical fit between the predicted and the measured enantioselectivity for the initial training set of bistriflamides (R2 = 0.80, LOO-Q2 = 0.77, model B) (Fig. 3). Given the physical organic descriptors identified by our simple arene model, we could interrogate several new structural hypotheses. The unsubstituted C2 NBO charge (NBOC2) represents a general electronic parameter describing the overall electron density of the arene.20–22 Additionally, the NBOC2 term could describe a hydrogen bond interaction between the ortho C–H and the substrate. The partition-coefficient (P) describes the hydrophobicity of the arene. This parameter is often used in drug design to predict water solubility of molecules and due to its broad numeric range, is typically reported as logP. We postulated that, in the current case, the partition-coefficient could describe complex attractive interactions between the catalyst and the substrate.
2 | Ar | Predicted% ee (ΔΔG‡) | Measured% ee (ΔΔG‡) | |
---|---|---|---|---|
Model A | Model B | |||
a ΔΔG‡ given in parentheses in kcal mol−1. | ||||
ab | 2-F-4-Br-C6H3 | 86.7(1.64) | 85.9(1.60) | 90.2(1.85) |
ac | 2-F-4-Cl-C6H3 | 87.0(1.66) | 85.8(1.60) | 89.3(1.79) |
ad | 2-F-4-CF3-C6H3 | 87.7(1.70) | 83.6(1.50) | 88.3(1.73) |
ae | 2,4-Cl2-C6H3 | 73.3(1.16) | 85.3(1.58) | 86.2(1.62) |
af | 4-CF3-C6H4 | 84.9(1.56) | 81.3(1.43) | 85.3(1.58) |
ag | 4-F-C6H4 | 82.0(1.44) | 84.0(1.52) | 84.1(1.52) |
ah | 3-F-C6H4 | 82.8(1.47) | 76.6(1.26) | 79.9(1.36) |
ai | 3,4-Cl2-C6H3 | 69.8(1.07) | 81.9(1.44) | 78.9(1.33) |
aj | 2-F-4-CN-C6H3 | 87.1(1.66) | 84.6(1.54) | 78.5(1.32) |
ak | 3,4,5-F3-C6H2 | 62.3(0.91) | 66.3(0.99) | 75.9(1.24) |
Although both models were able to predict enantioselectivity reasonably well, some catalysts were captured better by one of the two QSAR approaches: 2,4- and 3,4-dichlorinated catalysts (2ae and 2ai, respectively) were more accurately predicted by model B (3D parameters), while model A (2D parameters) gave better predictions for the best three catalysts 2ab–2ad. It is worth noting that all of the best performing catalysts (2ab–2ad) were under-predicted by both models. Overall, the models displayed uneven predictive capability for the poorer performing catalysts (2ah–2ak).
We selected the 20 highest-weighted descriptors from model A and added these to the physical organic descriptor set. Both 2D and 3D QSAR parameters were thus available during the process of linear regression modeling.17 Interestingly, one 2D-QSAR parameter, FX1sp3CX2sp207, stood out as highly synergistic with the 3D physical organic descriptors. The FX1sp3CX2sp207 parameter is a substructural atom pair describing fluorine atoms seven bonds away from an sp2-hybridized carbon with two non-hydrogen substituents (Fig. 4). Within the bistriflamide catalyst scaffold, this represents a readout for the unsubstituted ortho and meta positions and, of particular note, ortho- and meta-fluorine substitutions contribute as well. This 2D-descriptor was decisively the best single descriptor (R2 = 0.62) and was found to be a significant predictor in nearly all hybrid models. The fundamental drawback of this parameter is the difficulty to extrapolate beyond current structures within the bisaryltriflamide scaffold.
From our model building process, we obtained four new hybrid models (C–F) in addition to our standalone models A′ and B′ (Scheme 2C, see ESI† for details). All four new models provided a statistically satisfactory fit to the training set (R2 = 0.78–0.90 and LOO-Q2 = 0.70–0.77) and were well-validated against the test set.
![]() | ||
Scheme 2 Physical organic descriptors (A), hybrid models (B), and virtual screen 2 (C); ΔΔG‡ given in parentheses in kcal mol−1. |
Hybrid model C incorporates the C1 NBO charge as an electronic parameter and the previously used partition-coefficient together with FX1sp3CX2sp207 parameter. Out of the virtual screening set, model C predicted the OCF3-substituted catalyst 2aq to be the most selective (94.3% ee).
The second hybrid model D combines the symmetrical aryl IR vibrational frequency26 and the average of C2 NBO charges as physical organic descriptors for the arene ring, and the substructural descriptor FX1sp3CX2sp207. Model D accurately predicted the best performing catalyst in the training and validation sets. Prediction of ee in VS2 with model D identified four catalysts with potential outputs above 90% ee. In agreement with model C, 2aq was predicted at 95.4% ee. Additionally, catalysts 2al, 2an, and 2ao were predicted to give 90.0%, 92.3%, and 91.2% ee, respectively.
The models E and F both include C1 and C2 NBO charge, FX1sp3CX2sp207, and a polarizability parameter. Additionally, both models show an excellent fit to the training and validation sets. The only difference between the two models was the means by which the polarizability was expressed. For model E, isotropic polarizability (polari) was used, which is the calculated average of the ax, ay, az polarizability vectors. In the arene models used in this study, the az term was typically small and the isotropic polarizability was effectively the average polarizability along the xy-plane. In the model F, anisotropic polarizability (polara) is used instead. Whereas isotropic polarizability can be readily used to compare the polarizability of equally spherical (or planar) molecules, anisotropic polarizability can be used to better describe asymmetrical structures.27 As expected, where most of the training set consists of flat structures with negligible polarizability along the z-axis, the two terms are collinear (Fig. 5). However, the situation changes dramatically when bulkier substituents that increase polarizability along the z-axis are incorporated: the 2-OTf substituted catalyst 2c is the greatest outlier from the training set. Due to this difference in the polarizability term, the predictions between the two models are dramatically different. While isotropic model E predicts the silicon (2al), tin (2ao), and boron (2an) substituted catalysts at 95.2–99.2% ee, the anisotropic model F predicts them to give only 85.5–86.4% ee. Interestingly, both models E and F predict the ferrocene catalyst 2ap at 90% ee.
Based on the predictions from the six models (Scheme 2c), as well as synthetic accessibility, we synthesized three catalysts that were expected to yield high product enantioselectivities: 2-F-4-SiMe3 (2al), 2-F-4-I (2am), and 2-F-4-Bpin (2an). Gratifyingly, the three catalysts all gave full conversion and good ee, with 2-F-4-SiMe3 giving the highest enantioselectivity observed for this reaction to date, 90.4% ee.
Of the six models used for the virtual screening, both the pure 2D and pure 3D models failed to accurately predict the behavior of the best catalysts. Although models E and F containing the polarizability terms accurately predicted the iodine-substituted catalyst 2am, they over- or underestimated the silicon (2al) and boron (2an) catalysts, respectively. The best effective predictions for VS2 were obtained with the hybrid model D.
2 | Ar | Measured% ee | |||||
---|---|---|---|---|---|---|---|
Anhydr. toluene | Wet toluene | MTBE | CPME | EtOAc | 2-MeTHF | ||
a ΔΔG‡ given in parentheses in kcal mol−1; MTBE = methyl tert-butyl ether; CPME = cyclopentyl methyl ether. | |||||||
ab | 2-F-4-Br-C6H3 | 90.2 | 90.2 | 86.8 | 88.6 | 77.8 | 79.0 |
al | 2-F-4-SiMe3-C6H3 | 90.4 | 91.8 | 93.7 | 92.3 | 88.9 | 89.2 |
am | 2-F-4-l-C6H3 | 88.8 | 88.6 | 87.1 | 87.0 | 78.1 | 80.5 |
an | 2-F-4-Bpin-C6H3 | 87.3 | 87.8 | 89.4 | 88.7 | 83.2 | 84.9 |
Our previous investigation of this reaction mechanism revealed that hydrogen bonding between the catalyst and substrate is a key driver for enantioselectivity.13 Furthermore, structural modifications of the backbone demonstrated that both sulfonamide N–H's are critical for conversion. DFT (density functional theory) calculations with 2c showed that ring closure most probably occurs via aza-Michael conjugate addition; however, we believe that, following substrate tautomerization, a 6-π electrocyclization mechanism can be energetically accessible with some catalysts as well. This structural information from the mechanistic work is consistent with our 2D, 3D, and hybrid models, indicating the utility of performing both machine learning as well as density functional calculations to fully capture the dynamics of a catalyst system.
Footnotes |
† Electronic supplementary information (ESI) available: Experimental details, computational details and characterization data. See DOI: 10.1039/c8sc02089b |
‡ These authors contributed equally. |
This journal is © The Royal Society of Chemistry 2018 |