Impact of noise on inverse design: the case of NMR spectra matching

Dominik Lemm; Guido Falk von Rudorff; O. Anatole von Lilienfeld

doi:10.1039/D3DD00132F

Impact of noise on inverse design: the case of NMR spectra matching†

Dominik Lemm,

^ab Guido Falk von Rudorff

^cd and O. Anatole von Lilienfeld

^efg

Author affiliations

^a University of Vienna, Faculty of Physics, Kolingasse 14-16, AT-1090 Vienna, Austria

^b University of Vienna, Vienna Doctoral School in Physics, Boltzmanngasse 5, AT-1090 Vienna, Austria

^c University Kassel, Department of Chemistry, Heinrich-Plett-Str.40, 34132 Kassel, Germany

^d Center for Interdisciplinary Nanostructure Science and Technology (CINSaT), Heinrich-Plett-Straße 40, 34132 Kassel, Germany

^e Departments of Chemistry, Materials Science and Engineering, and Physics, University of Toronto, St. George Campus, Toronto, ON, Canada
E-mail: anatole.vonlilienfeld@utoronto.ca

^f Vector Institute for Artificial Intelligence, Toronto, ON M5S 1M1, Canada

^g Machine Learning Group, Technische Universität Berlin and Institute for the Foundations of Learning and Data, 10587 Berlin, Germany

Abstract

Despite its fundamental importance and widespread use for assessing reaction success in organic chemistry, deducing chemical structures from nuclear magnetic resonance (NMR) measurements has remained largely manual and time consuming. To keep up with the accelerated pace of automated synthesis in self driving laboratory settings, robust computational algorithms are needed to rapidly perform structure elucidations. We analyse the effectiveness of solving the NMR spectra matching task encountered in this inverse structure elucidation problem by systematically constraining the chemical search space, and correspondingly reducing the ambiguity of the matching task. Numerical evidence collected for the twenty most common stoichiometries in the QM9-NMR database indicate systematic trends of more permissible machine learning prediction errors in constrained search spaces. Results suggest that compounds with multiple heteroatoms are harder to characterize than others. Extending QM9 by ∼10 times more constitutional isomers with 3D structures generated by Surge, ETKDG and CREST, we used ML models of chemical shifts trained on the QM9-NMR data to test the spectra matching algorithms. Combining both ¹³C and ¹H shifts in the matching process suggests twice as permissible machine learning prediction errors than for matching based on ¹³C shifts alone. Performance curves demonstrate that reducing ambiguity and search space can decrease machine learning training data needs by orders of magnitude.

Digital Discovery

Impact of noise on inverse design: the case of NMR spectra matching†

Abstract

Supplementary files

Transparent peer review

Article information

Download Citation

Permissions

Impact of noise on inverse design: the case of NMR spectra matching

Social activity

Search articles by author

Spotlight

Advertisements