Nicolai
Ree
a,
Andreas H.
Göller
*b and
Jan H.
Jensen
*a
aDepartment of Chemistry, University of Copenhagen, Universitetsparken 5, 2100, Copenhagen Ø, Denmark. E-mail: jhjensen@chem.ku.dk
bBayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany. E-mail: andreas.goeller@bayer.com
First published on 8th January 2024
Reactivity scales such as nucleophilicity and electrophilicity are valuable tools for determining chemical reactivity and selectivity. However, prior attempts to predict or calculate nucleophilicity and electrophilicity are either not capable of generalizing well to unseen molecular structures or require substantial computing resources. We present a fully automated quantum chemistry (QM)-based workflow that automatically identifies nucleophilic and electrophilic sites and computes methyl cation affinities and methyl anion affinities to quantify nucleophilicity and electrophilicity, respectively. The calculations are based on r2SCAN-3c SMD(DMSO) single-point calculations on GFN1-xTB ALPB(DMSO) geometries that, in turn, derive from a GFNFF-xTB ALPB(DMSO) conformational search. The workflow is validated against both experimental and higher-level QM-derived data resulting in very strong correlations while having a median wall time of less than two minutes per molecule. Additionally, we demonstrate the workflow on two different applications: first, as a general tool for filtering retrosynthetic routes based on chemical selectivity predictions, and second, as a tool for determining the relative reactivity of covalent inhibitors. The code is freely available on GitHub under the MIT open source license and as a web application at https://www.esnuel.org.
In essence, organic chemistry is all about the interaction between electron-rich atoms or molecules and electron-deficient atoms or molecules.1
The concept of nucleophiles and electrophiles dates back to 1933,2 when it was introduced by Ingold based on Lewis's electronic theory of valency and the acid–base theory of Brøndsted and Lowry.3–5 Over the years, nucleophilicity and electrophilicity have been quantified to create reactivity scales that explain why some atoms or molecules are more reactive than others. Most noteworthy is the work by Mayr and co-workers, who experimentally studied the reactivity of organic molecules that led to the Mayr–Patz equation;6
log![]() | (1) |
Essential to eqn (1) is that k20°C can span from non-observable reactions (k20°C < 10−5 M−1 s−1) to diffusion-controlled reactions (k20°C > 109 M−1 s−1),7,8 and its applicability has been confirmed for various molecules resulting in Mayr's database that currently holds experimental reactivity parameters for 352 electrophiles and 1261 nucleophiles.7 However, measuring reaction rates to extract reactivity parameters is generally time-consuming and often difficult at the extremes of the reactivity scales. Thus, several in silico methods have been proposed to circumvent this problem.8,9 This includes estimating the rate constant using the Eyring equation10,11 and computing the reactivity parameters from frontier molecular orbital (FMO) energies12,13 or chemical affinities.14–16 Furthermore, a lot of new ML approaches based on Mayr's database have recently emerged.17–23
In this work, we will focus on the studies by Van Vranken and Baldi showing that calculated methyl cation affinities (MCAs) and methyl anion affinities (MAAs) of structurally different molecules correlate with Mayr's N × sN and E, respectively, when considering solvent effects.15,16 Based on these findings, they created two QM-derived datasets with reactivity parameters for 1232 nucleophiles and 1113 electrophiles (we have excluded 76 duplicates) covering ∼50 orders of magnitude in each case.24 The datasets were used to train different ML models by treating the affinities as either atomic, functional group, or molecular properties. The best-performing architecture was an atom-based graph attention network (GAT) achieving 10-fold cross-validation R2 coefficients of 0.92 ± 0.02 and 0.94 ± 0.02 for MCAs and MAAs, respectively.
However, it is worth noting that the two datasets contain molecules that are not in Mayr's database, and the ML models have not been validated against experimental reactivity parameters. Furthermore, the atom sites are identified by hand, which makes it difficult to apply their method to arbitrary molecules in an automated fashion. This work will address these limitations by introducing a fast and fully automated quantum chemistry-based workflow that detects relevant sites and calculates associated MCAs and MAAs. In addition, we will apply the workflow to different tasks to highlight the applicability of MCAs and MAAs as quantitative measures of chemical reactivity.
The calculated MCAs and MAAs are defined in eqn (2) and (3) as the negative energy difference (−ΔE) of a nucleophile (Nuc) reacting with a methyl cation (CH3+) and an electrophile (Elec) reacting with a methyl anion (CH3−), respectively.
![]() | (2) |
![]() | (3) |
The workflow is also applied to two different tasks. The first task employs experimental data from the work of Caputo et al.34 and a retrosynthetic route by Manifold from PostEra35 for the synthesis of Raltegravir. The data are used to highlight the applicability of the workflow within computer-aided synthesis planning (CASP) by predicting the selectivity of chemical reactions. The second task involves reactivity data for different covalent inhibitors including 249 acrylamides, 9 propargylamides, and 238 2-chloroacetamides.13 The dataset contains calculated activation energies for the reaction of the warheads with methanethiolate (CH3S−) and FMO-derived electrophilicities at the ωB97XD/cc-pVDZ CPCM(H2O) level of theory.
The top panels in Fig. 1 show that the workflow can reproduce a strong correlation between Mayr's experimental reactivity parameters and QM-calculated MCAs and MAAs as previously observed by Van Vranken and Baldi.15,16 Specifically, the R2 coefficients of 0.84 and 0.94 in Fig. 1a and b are somewhat similar to 0.89 and 0.96 for MCAs and MAAs, respectively, with the latter based on Gibbs free energies at the PBE0-D3(BJ)/DEF2-TZVP COSMO(∞) level of theory as reported by Van Vranken and Baldi.15,16
The ability to replicate higher-level results is further supported by the very strong correlation in the bottom panels of Fig. 1 with R2 coefficients of 0.98 and 0.99 for MCAs and MAAs, respectively. These results actually outperform the ML models by Tavakoli et al.,24 which achieved 10-fold cross-validation R2 coefficients of 0.92 ± 0.02 and 0.94 ± 0.02 for MCAs and MAAs, respectively. Of course, the better performance comes with a higher computational cost, although the median wall time for this data is less than two minutes using eight CPU cores (Intel(R) Xeon(R) CPU X5550@2.67 GHz). In fact, the timings can be further improved due to the handling of structures and conformers being embarrassingly parallel. Alternatively, omitting the single-point r2SCAN-3c SMD(DMSO) calculations will greatly reduce the computational cost without impacting the above results significantly as seen in Fig. S2 in the ESI.† It should be noted that the molecules in the bottom panels of Fig. 1 have on average ∼10 heavy atoms and ∼6 identified electrophilic and nucleophilic sites.
![]() | ||
Fig. 2 Chemical selectivity predictions for the synthesis of Raltegravir with respect to (a) experimental work by Caputo et al.34 and (b) a computer-aided synthesis planning (CASP) route by Manifold from PostEra.35 The most nucleophilic and electrophilic sites for each structure are highlighted in green and blue, respectively. The shaded highlights are predicted reaction centers with lower MCAs and MAAs, and the starred highlights are discussed in the text. The MCAs and MAAs are obtained at the r2SCAN-3c SMD(DMSO)//GFN1-xTB ALPB(DMSO) level of theory and additional values can be found in the ESI.† |
The results in Fig. 2a show that by locating the highest MCA and MAA among the two reactants (the values marked in bold), it is possible to predict the selectivity of the reaction. The most electrophilic site is the carbonyl carbon of 1a with a MAA of 395 kJ mol−1, which is 89 kJ mol−1 higher than the second-highest MAA. The MAA of the carbonyl carbon is marked with a star (“*”) as the chlorine atom acts as a leaving group during the geometry optimization when attaching the methyl anion to the carbonyl carbon. The most nucleophilic site in Fig. 2a is the primary amine of 2a with a MCA of 449 kJ mol−1, which is 56 kJ mol−1 higher than the second-highest MCA. Hence, the experimentally observed nucleophilic acyl substitution reaction can be correctly predicted by locating the highest MCA and MAA despite the two reactants 1a and 2a having a total of 27 nucleophilic and 20 electrophilic sites identified by the workflow.
Now, we will turn to Fig. 2b to analyze and validate the reaction steps proposed by Manifold and demonstrate how the workflow can be applied as a post-filtering tool. A CASP tool like Manifold usually outputs many different retrosynthetic routes, but some of the steps in the retrosynthetic routes may not be feasible due to selectivity issues. We propose using MCAs and MAAs as a tool to predict chemical selectivity and flag steps that are potentially incorrect. The retrosynthetic routes can then be ranked based on the number of warning flags similar to how retrosynthetic routes are commonly ranked based on the number of synthetic steps and the total price of building blocks.
The first reaction step in Fig. 2b involves an ester amidation between 1b and 2b. However, based on the highest MCA and MAA, a more favorable reaction would be an ester amidation between two 2b compounds with MCA and MAA of 478 and 242 kJ mol−1 for the primary amine and the carbonyl carbon, respectively. The latter is marked with a star (“*”) as the proton from the phenol group moves to the carbonyl oxygen during the geometry optimization when attaching the methyl anion to the carbonyl carbon. The MAA of the carbonyl carbon in 1b is 186 kJ mol−1, which is 56 kJ mol−1 lower than the highest MAA. In fact, another site in 1b has a MAA that is 35 kJ mol−1 higher than the MAA of the carbonyl carbon. This reaction step should therefore be flagged as the chance of this reaction step being a success is low. Instead, one could imagine a similar retrosynthetic route starting with a nucleophilic acyl substitution similar to the one shown in Fig. 2a by replacing 1b with 1a.
The second reaction step involves a Chan–Lam coupling reaction between the product of the first reaction step (3b) and methylboronic acid (4b). This reaction employs a copper catalyst, which makes the reaction mechanism more complex, and validating the reaction solely based on the highest MCA and MAA is probably not sufficient. However, looking at the MCAs and MAAs we see that none of the proposed reaction centers have the highest MCA and MAA. In terms of the MAAs, the workflow did not identify any electrophilic sites in 4b, and the highest MAA is therefore found in 3b. This MAA is marked with a star (“*”) as the attached proton moves to the neighboring nitrogen atom during the geometry optimization. The highest MCA is 428 kJ mol−1, which is only 17 kJ mol−1 higher than the MCA of the proposed reaction center. Removing the proton from the proposed reaction center would make it the most nucleophilic site. However, based on predicted pKa values by MarvinSketch from ChemAxon the phenol group is more acidic, and a deprotonation of this phenol group would lower the MCA of the proposed reaction center as seen in the ESI.†
The last reaction step involves an ester amidation reaction between the product of the second reaction step (5b) and 4-fluorobenzylamine (6b). The most nucleophilic site is the primary amine of 6b with a MCA of 489 kJ mol−1, and this site is also the proposed reaction center of 6b. The most electrophilic site is the highlighted carbon atom next to the fluorine atom of 6b with a MAA of 310 kJ mol−1. The MAA is marked with a star (“*”) as the fluorine atom acts as a leaving group during the geometry optimization when attaching the methyl anion to the highlighted carbon atom. The second most electrophilic site is the carbonyl carbon in 5b, which is the other proposed reaction center. This site is marked with a star (“*”) as the proton from the phenol group moves to the carbonyl oxygen during the geometry optimization when attaching the methyl anion to the carbonyl carbon.
In summary, the use of MCAs and MAAs to flag retrosynthetic steps for further inspection seems promising but will require further work. For example, the effect of using other reactivity probes than methyl anion and cation.
![]() | (4) |
![]() | (5) |
η = IP − EA = εLUMO − εHOMO | (6) |
![]() | ||
Fig. 3 Covalent inhibitor reactivity prediction for various acrylamides (top), propargylamides (middle), and 2-chloroacetamides (bottom). The calculated MAAs are obtained at the r2SCAN-3c SMD(DMSO)//GFN1-xTB ALPB(DMSO) level of theory and otherwise ωB97XD/cc-pVDZ CPCM(H2O). The black regression lines consider all points, whereas the blue regression lines are defined in the text. The red dots are considered outliers in the work of Hermann et al.13 |
In Fig. 3, we compare the calculated activation energies to the warhead-associated electrophilicity index and calculated MAAs for various acrylamides (top), propargylamides (middle), and 2-chloroacetamides (bottom). The calculated MAAs are obtained for the leftmost carbon atom in each of the depicted structures with the chlorides (Cl−) being removed for the 2-chloroacetamides.
The results of the acrylamides (Fig. 3a and b) show strong correlations for both the warhead-associated electrophilicity index and MAAs with R2 coefficients of 0.87 and 0.80, respectively. However, computing the warhead-associated electrophilicity index requires an analysis of the FMOs to select suitable orbitals, whereas the MAAs are straightforward to calculate. In fact, the R2 coefficient for the acrylamides without adjusting the electrophilicity index to the warhead-associated HOMO and LUMO energies (i.e., simply relying on the canonical MO-based HOMO and LUMO energies) is only 0.60,13 which is significantly worse than calculating MAAs.
The propargylamides (Fig. 3c and d) only include nine different structures shown in Fig. S11.† When considering all of them, the R2 coefficients are 0.64 and 0.67 for the warhead-associated electrophilicity index and calculated MAAs, respectively. Excluding the red entries in Fig. 3c, viewed as outliers in the work of Hermann et al.,13 results in a significantly better R2 coefficient of 0.89 for the warhead-associated electrophilicity index. Yet, the MAAs for these structures align well with the other entries. On the other hand, the blue entry in Fig. 3d is originally relatively far from the black regression line. This entry corresponds to a structure with bulky groups on both sides of the triple bond, and the transition vector could be pointing toward the neighboring SP-hybridized carbon atom. Unfortunately, the transition state structures are not available. Instead, we can calculate the MAA for this neighboring carbon atom resulting in a strong correlation with an R2 coefficient of 0.85. This approach is only possible for the atom-specific MAAs as the warhead-associated electrophilicity index uses FMOs primarily localized on both SP-hybridized carbon atoms.
The results of the 2-chloroacetamides (Fig. 3e and f) show no correlation for both the warhead-associated electrophilicity index and MAAs. This behavior is extensively studied in the work of Hermann et al.,13 and their arguments reflect the change from the Michael-type nucleophilic additions to an SN2 reaction. Specifically, they show that the LUMO energy correlates with the bond strength of both the C–Cl and C–SMe bonds. Thus, the electrophilicity index fails to capture the energetics of the reaction due to the simultaneous bond formation and rupture. However, the calculated MAAs behave similarly despite not depending on FMO energies, which raises questions about their reasonings. Furthermore, when using CH3S− instead of CH3−, we again find no correlation with an R2 coefficient of 0.01 as seen in the ESI.† This result is surprising as it contradicts the Bell–Evans–Polanyi principle (i.e., the change in activation energy for similar reactions being proportional to the change in reaction enthalpy), indicating that further analysis of the transition state structures should be carried out.
Additionally, we highlight two different applications of the workflow. The first application is within computer-aided synthesis planning (CASP), where the workflow can be used to predict chemical selectivity and detect potential problems in a retrosynthetic route. This is demonstrated using experimental data from the work of Caputo et al.34 and a retrosynthetic route by Manifold from PostEra35 for the synthesis of Raltegravir. The workflow correctly predicts the reported reaction from the work of Caputo et al.34 by locating the highest MCA and MAA despite the two reactants having a total of 27 nucleophilic and 20 electrophilic sites. However, some of the steps in the retrosynthetic route by Manifold are found problematic suggesting that another route should be preferred.
In the second application, we show that the workflow can be used to predict the reactivity of covalent inhibitors. We report a strong correlation between MAAs and calculated activation energies for various acrylamides and propargylamides similar to the work by Hermann et al.13 using a warhead-associated electrophilicity index. The results of the 2-chloroacetamides showed no correlation for both the warhead-associated electrophilicity index and MAAs, which could be due to errors in the calculated activation energies. The advantage of the MAAs over the warhead-associated electrophilicity index is that the MAAs are atom-specific and completely straightforward to calculate. Whereas, the warhead-associated electrophilicity index requires a selection of the highest occupied and lowest unoccupied orbitals that are associated with the warhead to match the performance of the MAAs.
Future work will use the QM-based workflow to calculate MCAs and MAAs for a large set of diverse molecules and train an atom-based ML model for each property similar to the one presented in Ree et al.37 The ML models will then be used to predict the chemical selectivity for a large set of experimentally reported reactions to provide a statical basis for using the MCAs and MAAs to predict chemical selectivity.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3dd00224a |
This journal is © The Royal Society of Chemistry 2024 |