Issue 39, 2023

The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions

Abstract

Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis prediction, but current models have failed to generalize to large literature datasets. To understand the causes and inspire future design, we systematically benchmarked the yield prediction task. We carefully curated and augmented a literature dataset of 41 239 amide coupling reactions, each with information on reactants, products, intermediates, yields, and reaction contexts, and provided 3D structures for the molecules. We calculated molecular features related to 2D and 3D structure information, as well as physical and electronic properties. These descriptors were paired with 4 categories of machine learning methods (linear, kernel, ensemble, and neural network), yielding valuable benchmarks about feature and model performance. Despite the excellent performance on a high-throughput experiment (HTE) dataset (R2 around 0.9), no method gave satisfactory results on the literature data. The best performance was an R2 of 0.395 ± 0.020 using the stack technique. Error analysis revealed that reactivity cliff and yield uncertainty are among the main reasons for incorrect predictions. Removing reactivity cliffs and uncertain reactions boosted the R2 to 0.457 ± 0.006. These results highlight that yield prediction models must be sensitive to the reactivity change due to the subtle structure variance, as well as be robust to the uncertainty associated with yield measurements.

Graphical abstract: The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Edge Article
Submitted
27 Jul 2023
Accepted
12 Sep 2023
First published
13 Sep 2023
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry
Creative Commons BY license

Chem. Sci., 2023,14, 10835-10846

The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions

Z. Liu, Y. S. Moroz and O. Isayev, Chem. Sci., 2023, 14, 10835 DOI: 10.1039/D3SC03902A

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements