Matthew A.
McDonald
,
Brent A.
Koscher
,
Richard B.
Canty
and
Klavs F.
Jensen
*
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. E-mail: kfjensen@mit.edu
First published on 29th May 2024
Reaction optimization and characterization depend on reliable measures of reaction yield, often measured by high-performance liquid chromatography (HPLC). Peak areas in HPLC chromatograms are correlated to analyte concentrations by way of calibration standards, typically pure samples of known concentration. Preparing the pure material required for calibration runs can be tedious for low-yielding reactions and technically challenging at small reaction scales. Herein, we present a method to quantify the yield of reactions by HPLC without needing to isolate the product(s) by combining a machine learning model for molar extinction coefficient estimation, and both UV-vis absorption and mass spectra. We demonstrate the method for a variety of reactions important in medicinal and process chemistry, including amide couplings, palladium catalyzed cross-couplings, nucleophilic aromatic substitutions, aminations, and heterocycle syntheses. The reactions were all performed using an automated synthesis and isolation platform. Calibration-free methods such as the presented approach are necessary for such automated platforms to be able to discover, characterize, and optimize reactions automatically.
In this study we describe an HPLC-based approach to reaction yield estimation requiring minimal sample preparation and no reference material. The method is targeted towards reactions to produce drug-like molecules and was developed for a chemistry platform that autonomously explores targeted regions of organic chemical space.11 The platform executes a variety of organic reactions at the ten-micromole scale, handling multiple 96-well plates simultaneously to complete multistep syntheses of proposed new molecules. To enable fast, automated analysis of these small-scale reactions, the approach uses only a single small-volume sample of crude reaction mixture and circumvents the need for reaction workup, mass balances, calibration standards, and HPLC method development; each of these steps can be lengthy on their own.9 The approach leverages two HPLC detectors—a photo-diode array (PDA) and mass spectrometer (MS)—, spectral peak resolution, and a machine-learning (ML) model of molar extinction coefficient (ε). While we developed and validated this technique to enable an autonomous chemistry platform it may be generally useful as a means of simplifying reaction yield estimation.
To validate the ML model and proposed method, we instructed the autonomous platform to synthesize a series of drug and drug-like molecules with a variety of reaction plans. The yield of each of the synthesized molecules was estimated with the method and compared to a measured calibration curve based on pure product samples. We also validated the method on a larger series of simulated reactions: solutions of drug and drug-like molecules prepared by the platform in known concentrations. An example reaction (the final step in the synthesis of the Hedgehog signaling pathway inhibitor sonidegib) and the workflow to estimate the reaction's yield is shown in Fig. 1. Overall, we found that the method has similar accuracy to other universal detectors, such as evaporative light scattering (ELSD), with typical yield estimates having less than 20% error on our platform. The error derives from both model uncertainty and automated liquid handling inaccuracy, which can be partially mitigated with internal standards; however, automated reaction discovery, scoping, and optimization at the drug-discovery scale are often tolerant of the method's error rate.
A sample of the analyte solution is injected into the HPLC system. In the automated workflow, a custom API is used to interface with the HPLC instrument control software to run samples automatically with an autosampler. The automated process quickly examines the most intense peak in the absorbance data to make sure that the signal is within the dynamic range of the PDA detector and can inject additional samples with varied volumes to ensure a robust signal.
Analysis of the chromatograms involves two data processing steps. First, the product peak is identified based on the known product mass in the MS chromatogram. Then, the corresponding peak in the absorbance data is deconvoluted from baseline and impurity species' signals. Fig. 1 shows each step of the method. The absorbance is proportional to the concentration of the target analyte by the Beer–Lambert law: where A is absorbance, ε is the molar absorption coefficient, is the path length of the PDA detector, and c is the concentration of the analyte. Integrating this equation over the entire peak leads to a linear relationship between the peak area (a) and the total number of moles of injected analyte: . This linearity is often used to build calibration curves, which avoids the need to measure the molar absorption coefficient and can account for irregular baselines and peak overlap of impurity species. Herein, we demonstrate an ML model that can predict the molar absorption coefficient coupled with resolution of the analyte spectrum from baseline and impurities' spectra to avoid the need to build a calibration curve. Instead, the yield of a reaction, or concentration of an unknown analyte, can be directly estimated with a single HPLC-MS sample.
Two models were trained on two different training sets: the publicly available Deep4Chem database of 3800 molecules,17 and a proprietary dataset from Reaxys of 38000 unique measurements (Reaxys registry numbers are provided in the ESI†). The model takes as input the target analyte and the solvent because extinction coefficients and absorption peaks depend on the molecular environment. The Deep4Chem database was compiled with a focus on optically active materials (dyes, fluorescent probes, etc.) whereas we desired a model of more general organic molecules. Therefore, when creating the 38k dataset, we used data for organic molecules weighting less than 800 g mol−1 with extinction coefficient measurements in common HPLC solvents. Table 1 shows a comparison of the two models with 10-fold cross-validation on the same 1000 molecule test set covering the four most common solvents (breakdown of solvents used in training in ESI†). The 38k training set outperforms the Deep4Chem model, with root-mean-square error (RMSE) and MAE both about 40% better, and the variance between folds (standard deviations of the RMSE and MAE) indicating that the model trained on the larger dataset has higher average confidence and better coverage of more diverse chemical structures. While RMSE is always more sensitive to outlying errors than MAE, the differences between RMSE and MAE within each model indicates substantial error stems from a handful of outlying points. When using the model, an ensemble of predictions are made from each of the 10 cross validation models, and the variance in the ensemble of predictions is used as a proxy for model uncertainty.18 Moving forward, we use the Reaxys 38k dataset (Fig. 2), but the same method can be implemented with the Deep4Chem dataset, albeit with slightly lower confidence. More model details, including hyperparameter optimization, are included in the ESI.†
10-Fold X-valid. | Deep4Chem 3.8k | Reaxys 38k |
---|---|---|
Average MAE | 0.367 | 0.195 |
Std. dev. of MAE | 9.70 × 10−3 | 2.86 × 10−3 |
Average RMSE | 0.511 | 0.279 |
Std. dev. of RMSE | 11.8 × 10−3 | 4.81 × 10−3 |
For analysis of reactions, in the simplest case, the peaks of each component elute separately, and the yield can be estimated from the area of the product peak. Typically, an HPLC gradient method that results in the product peak being baseline separated from all other species' peaks is desired.23 Tailoring methods for individual reactions is an intensive manual task, although it is becoming simpler with continued progress on automated gradient optimization algorithms.24 Our method avoids intensive method development by resolving non-baseline separated species based on spectral similarity. The components of the overlapped peaks can be separated provided the species do not substantially co-elute or have indistinguishable absorption spectra. Previous studies have developed a number of methods for resolving spectra, such as parallel factor analysis (PARAFAC), PARAFAC2,25,26 and shift-invariant tri-linearity (SIT),27 as well as chromatography specific applications such as MOCCA.14 These methods were developed for analyzing series of chromatograms, where each chromatogram represents a mixture of the same (sub)set of species. A robust measurement of the absorption spectrum of each species is possible when combining multiple chromatograms, even accounting for shifts in retention time (PARAFAC2) and spectral shape constraints (SIT).
Our method is meant to quickly estimate reaction yield/concentration from a single chromatogram, therefore in this study we resolve peaks by simpler and faster multivariate curve resolution (MCR) using least squares.28 MCR involves solving A = CST + E where A (I × J) represents the raw absorption data (over I time points and J wavelength channels), C (I × N) represents the resolved elution profiles (for N components), S (J × N) represents the pure spectrum of each component, and E is the error to be minimized. MCR is initially run on the baseline separated peak identified by MS as containing the target species assuming a single component. If the error minimization for the product-containing region is below a predetermined threshold (0.2 AU × s) then the peak is assumed to be pure. If the error remains high, additional components are added until the threshold is met, or more than three components are required. We chose three as the maximum number of resolvable components as it was a good balance between accuracy and the duration of our chosen HPLC method. Shorter solvent gradients cause more peaks to overlap and increase the burden of spectral resolution whereas longer gradients can separate more peaks, lessening the need for robust spectral resolution. In principle, the platform could automatically perform an analysis with a longer gradient on samples with more than three overlapping components, but we flagged those samples and withheld them from the analysis (only one of our test reaction systems required more than three components, enalapril overreaction, and is detailed in the ESI†). The target is assigned to the resolved peak with the closest matching MS retention time, which can become a source of substantial error if peaks overlap severely and the wrong assignment is made.
The integral of the intensity of the resolved target over the duration of the chromatographic peak at the wavelength of the reddest absorption peak, is combined with the predicted molar absorption coefficient and the pathlength of the PDA detector to estimate the number of moles of the product in the injected sample. This estimate, with the known injection volume, sample dilution volume, and reactant concentration gives the estimated reaction yield. The algebraic details of this calculation are illustrated in the ESI.†
For simulated reactions, between one and three non-reactive chemicals from the library were randomly mixed in dimethyl sulfoxide (DMSO) at a concentration of 5 mM and a total volume of 100 μL. The pipetting action has a typical error of about 0.5 μL (see ESI†), which is insignificant for pipetting nearly 100 μL of solvent but can become significant when dispensing more concentrated stock solutions. To overcome this issue, the stock DMSO was spiked with 4,4′-di-tert-butylbiphenyl used as an internal standard. The internal standard peak area was used to normalize all PDA data peak areas and correct for the error associated with liquid handling, the variability in the autosampler injection volume, and the sorption of atmospheric water by dry DMSO while the sample plate sat in the autosampler. None of the targets in the simulated reactions were included in the training set for the extinction coefficient model.
The reactions to form drug and drug-like products were also prepared with the automated platform in 96-well plates. The required reactants were dosed from the platform library and prepared in 200 μL of the designated solvent at 15 mM. For multistep syntheses, working in the retrosynthetic direction, the concentration of each reaction was doubled so that in a two-step synthesis the first step was at 30 mM and the second step was at 15 mM, with the appropriate amount of crude product being dosed into the second reaction based on the estimated yield of the first reaction. After preparation, the well plates were automatically placed on Inheco thermoshakes that heated or cooled the plate to the desired reaction temperature while agitating the reactions with orbital shaking. Palladium catalyzed reactions were prepared and run in a nitrogen-purged box and high temperature reactions were run in an array of 96-well glass vials in an aluminum block. After four hours the reactions were allowed to return to room temperature before being vacuum filtered through a 0.5 μm 96-well filter plate. 50 μL of crude reaction mixture were diluted with 100 μL of internal standard-spiked DMSO and filtered to remove any remaining solids, resulting in samples ready for analysis at approximately the same concentration as the simulated reactions. In this study, all products were sufficiently soluble in DMSO, however failure to dissolve the product will cause the yield to be underestimated.
The samples, including simulated reactions, were automatically transferred to the autosampler which injected 2.0 μL of sample onto a 50 × 4.6 mm, 1.8 μm, reverse phase C18 column. Each run consisted of an eight-minute water–acetonitrile gradient, starting with 0.5 min of 5% acetonitrile, then linearly increasing to 100% acetonitrile over 6 min, holding at 100% for 0.5 min, linearly decreasing to 5% acetonitrile over 0.5 min, then holding at 5% acetonitrile for the final 0.5 min. After analysis of all the samples, any chromatograms with saturated or too little absorption signal were automatically rerun with either half or twice the injection volume, respectively.
The typical error associated with this method is in the 10–25% range (median 18% among simulated reactions). The observed experimental error was found to be moderately correlated with the variance of the ensemble prediction for molar extinction coefficient (Pearson's correlation coefficient of 0.402). This further supports the assumption that model ensemble variance can serve as an indicator for model uncertainty in predicted extinction coefficient values. The prediction ensemble standard deviation can, therefore, inform whether the error of an experiment can be expected to be at the high or low end of the range. The error observed by this method is similar to that from so-called universal detectors like ELSD and Charged Aerosol Detectors (CAD). The accuracy of the presented method can be improved by adding data to the extinction coefficient training set, whereas decreasing the error from a universal detector requires hardware improvements or method redevelopment.
We observed that increasing the number of datapoints by an order of magnitude (3.8k for Deep4Chem to 38k for Reaxys) improved the model performance by about a factor of two (MAE decreased from 0.367 to 0.195). If used in, for example, an experimental campaign wherein the target molecules cover a narrower chemical space, one could train a new model specific to those experiments for improved yield estimation. Studies have shown that a small number of relevant measurements can anchor model predictions to increase accuracy significantly for similar molecules.30 The model could also be iteratively updated during the campaign to improve accuracy for later experiments.11 The model we present is meant to cover the broad space of small-molecule chemistry and can serve as a useful starting point for targeted campaigns.
The most significant outlier in Fig. 3 is 4-(dimethylamino)-benzoic acid. The chemprop interpretation feature indicates that the dimethylaniline substructure is the most significant factor that led to the nearly 2-fold error in predicted molar extinction coefficient. Dimethylaniline is a very common substructure in diverse families of dye molecules with large molar extinction coefficients (e.g., tetramethylrhodamine as a xanthine dye, methyl orange as an azo-dye, and crystal violet as a triarylmethane dye). The dataset(s) used to train the model is likely biased towards strongly absorbing dye molecules as they are more likely to have their optical properties recorded. Molecules that strongly resemble dyes but do not have dye-like properties, such as 4-(dimethylamino)benzoic acid, are therefore more likely to be overpredicted. At the same time, tests with molecules like indocyanine green, which are engineered to be strongly absorbing, show that the method considerably overestimates the concentration of such strongly absorbing species. Likewise with albumin, a biomolecule far larger than any molecule in the training set, the model underpredicts the molar extinction coefficient (see ESI†). Our model is meant to evaluate reactions for small-molecule drug-like chemistry and cannot be expected to extrapolate well to highly engineered or unfamiliar chemical spaces, such as those represented by indocyanine green and albumin. With data for these types of molecules, a new model could be trained and applied to strongly absorbing dyes or macromolecules.
We also demonstrate the method on a selection of real reactions; simulated reactions cannot account for the complexity of real automated chemistry. We chose reactions in route to six drug and drug-like molecules for which we had pure reference materials to verify yield estimates. The reference materials were used to build calibration curves using the framework developed in MOCCA, an open source chromatographic analysis program.14 The reactions represent the variety of transformations that are commonly encountered in medicinal chemistry.31,32 The reactions are shown in Fig. 4 and the conditions for each reaction in Table 2; 80 reactions were automatically run and analyzed between the different transformations and conditions tested. The plot in Fig. 4 shows how the method performs at automatically estimating reaction yields based on only a single HPLC run. The markers are colored by the chemical transformation executed (with several different conditions) and the marker shapes indicate which drug molecule was being synthesized. Multistep syntheses, such as the linear synthesis of camostat and convergent synthesis of sonidegib, were conducted with extraction and filtration as the only workup steps between reactions; the final steps in these reactions therefore have numerous peaks from impurities and by-products accumulated in early steps. The yield of these reactions is reported step-wise, and unsurprisingly the performance degrades with increased number of reaction steps.
Fig. 4 The reactions examined in this study to demonstrate the usefulness of calibration-free HPLC yield estimation. The yield as estimated using the method is plotted against the yield measured with a traditional calibration curve, the inset shows yield between 0 and 8%. The conditions (Roman numerals) for the reaction pathways (Arabic numerals) are given in Table 2. |
Reaction | Temperature | Reagent 1 | Reagent 2 | Noted |
---|---|---|---|---|
a Reagent added last. b Palladium catalysts, in order: XPhos Pd G4, tBuBrettPhos Pd G3, (dppf)PdCl2. c Bases, in order, DIPEA, Cs2CO3, KOtBu, DBU, DBU (1.5 eq.). d Abbreviations. EDAC: 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide, HOBt: N-hydroxybenzotriazole, HATU: 2-(7-aza-1H-benzotriazole-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate, DIPEA: diisopropyl ethyl amine, DIAD: diisopropylazodicarboxylate, DBU: 1,8-diazabicyclo[5.4.0]undec-7-ene, KOtBu: potassium tert-butoxide, TBAB: tetrabutylammonium bromide, CDI: carbonyl diimidazole, SOCl2: thionyl chloride, CuI: copper(I) iodide, PPh3: triphenylphosphine, PTC: phase transfer catalyst, X-pyr: halogenated nitropyridine. | ||||
1i-ii | 40 | EDAC | HOBt | Two reactions at 1.0 and 1.5 eq. |
1iii-iv | 40 | HATU, 1.5 eq. | DIPEA, 3.0 eq. | Two reactions, add acid v. amine first |
1v-vi | 40 | CDI, 1.0 eq. | DIPEA, 3.0 eq.a | Two reactions, add acid v. amine first |
1vii | 40 | SOCl2, 1.5 eq. | DIPEA, 3.0 eq. | |
2i-iii | 80 | KOtBu, 3.0 eq. | Pd catalystsb | |
3i-iii | 80 | CuI, 0.2 eq. | DBU, DBU, Cs2CO3, 3.0 eq. | TBAB, aliquat 336, no PTC |
4i | 40 | EDAC | HOBt | Two reactions at 1.0 and 1.5 eq. |
4ii-iii | 40 | HATU, 1.5 eq. | DIPEA, 3.0 eq. | Two reactions, add acid v. amine first |
4iv-v | 40 | CDI, 1.0 eq. | DIPEA, 3.0 eq.a | Two reactions, add acid v. amine first |
5i-v | 80 | Basesc, 3.0 eq. | Five different bases and eq. | |
5vi-viii | 40 | EDAC | HOBt | Two reactions at 1.0 eq., one at 1.5 eq. |
5viv | 40 | PPh3, 1.0 eq. | DIAD, 1.2 eq.a | |
5x-xi | 40 | CDI, 1.0 eq. | DIPEA, 3.0 eq.a | Two reactions, add acid v. amine first |
5xii-xiii | 40 | SOCl2 | DIPEA, 3.0 eq. | Two reactions at 1.0 and 1.5 eq. SOCl2 |
6i-xi | 80 | DIPEA | F-, Cl-, and Br-nitropyridine | DIPEA at 1, 2, 3, and 4 eq. each X-pyr |
6xii-xiii | 80 | DBU, 3 eq. | Pd catalystsb | |
6xiv-xvii | 40 | SnCl2 | HCl | At 1.0, 2.0, 5.0, and 10.0 eq. |
6xviii-xxi | 40 | EDAC | HOBt | Two reactions at 1.0, third at 1.5 eq. |
6xxii-xxiv | 40 | HATU, 1.5 eq. | DIPEA, 3.0 eq. | Two reactions, add acid v. amine first |
6xxv-xxvi | 40 | CDI, 1.0 eq. | DIPEA, 3.0 eq.a | Two reactions, add acid v. amine first |
6xxvii | 40 | SOCl2 | DIPEA, 3.0 eq. | Two reactions at 1.0 and 1.5 eq. SOCl2 |
The performance is robust across Suzuki couplings, Buchwald–Hartwig aminations, un-activated aminations, amide couplings, aryl and alkyl esterification, nucleophilic substitutions, and reductions, all of which are critical to pharmaceutical discovery and manufacturing, with an overall MAE of 8.3% and RMSE of 12.8% for yield prediction. The reactions themselves span a range of yields, skewed towards lower yield (Fig. 4 inset) because the automated chemistry platform was designed for flexibility and molecular discovery rather than reaction optimization.11 The Buchwald–Hartwig aminations failed for several conditions likely because of poisoning by trace water or oxygen. Likewise, the Suzuki couplings did not produce the near quantitative yield that is reported in the literature because the automated platform does not have access to completely air-free conditions, but rather uses a strong nitrogen purge to try and eliminate oxygen. At the same time, simpler reactions such as the aromatic nucleophilic substitution reaction with the fluoro-substituted pyridine (6af) gave nearly quantitative yield in all tested conditions.
Manual inspection of some of the other reactions revealed numerous interesting episodes. The nitro-reduction of 6c with insufficient reductant could produce small quantities of an azo-coupled dimer with very large extinction coefficient. The camostat precursor (5c) could be produced at significantly higher yield using a single equivalent of cesium carbonate and 1.2 equivalents of 5b in dimethylformamide (DMF), whereas reported syntheses use organic bases.33 Camostat (5e) itself posed a challenge for the method, as the basicity of the guanidine group causes it to interact strongly with acidic silanol groups present in the column. The extinction coefficient of camostat shifted by 19% and the wavelength of maximum absorptions shifted 15 nm because of the changing solvent composition over the long duration of its elution (see Fig. S3†). Those shifts resulted in MCR identifying additional components at high camostat yields, and caused the strong disagreement seen for the light green squares in Fig. 4.
While in general performance is agnostic of the transformation, there are potential instances when the method can fail because of particulars of the reactants and/or products. When two compounds have nearly identical absorption spectra and strongly overlapped peaks, MCR fails to differentiate them as separate compounds. This was the case for some enalapril syntheses, where more than one proline (1a) was coupled to the starting material (1b), making the yield appear higher than it was (blue triangles in Fig. 4). For the case of enalapril, these compounds could be differentiated by the MS chromatogram (see Fig. S4†), and while the performance on enalapril is still commendable, future improvements to our method could make quantitative use of the MS chromatogram in such instances where absorption spectra are insufficient.
Reliance on mass spectrometry brings its own challenges. In particular, the method cannot differentiate between isomers or isobars, and therefore may not accurately gauge the yield of reactions that are not regio- or stereo-selective. Reaction 2 was chosen to demonstrate the case of regioselectivity as there are two potential amination sites. In this case, we observed small peaks corresponding to both single amination isomers and were only able to differentiate them based on the elution profile of a reference sample. The yield estimate for reaction 2 is very low for both isomers, but had the reaction been higher yielding or strongly site selective the automatically calculated estimate could have been much less accurate. Additionally, as our setup relies on DUIS ionization, the yield cannot be estimated for products that do not ionize under those conditions; for this study we limited the reactions (both real and simulated) to those that included reliably ionizing groups common to drug molecules (such as an amine, acid, heterocycle, or halogen groups). Lastly, the method requires that the product be soluble in an HPLC-compatible solvent and has only been evaluated using reverse phase HPLC running a water–acetonitrile gradient, meaning appropriate solvents must be water-miscible with a polarity between water and acetonitrile. DMF and DMSO are the preferred solvents as they have broad solvating abilities and are non-volatile; however, they absorb atmospheric water that can precipitate more hydrophobic analytes. Normal phase HPLC could eliminate these issues without adversely impacting the calibration-free quantification method.
The error in the yield was also weakly correlated to the error in the predicted absorption maximum (Pearson's correlation coefficient of 0.250), as measured by the PDA detector. This could provide an alternative measure of confidence in the molar extinction coefficient prediction and therefore yield estimate based only on the single HPLC sample. The error in absorption maximum is an immediately available measure of how well the molecule fits the scope of the training data. However, using the error in absorption maximum to give a quantitative estimate of the error in the predicted extinction coefficient will require substantially more work and is beyond the scope of this study.18
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4sc01881h |
This journal is © The Royal Society of Chemistry 2024 |