Issue 3, 2023

Synthetic data enable experiments in atomistic machine learning

Abstract

Machine-learning models are increasingly used to predict properties of atoms in chemical systems. There have been major advances in developing descriptors and regression frameworks for this task, typically starting from (relatively) small sets of quantum-mechanical reference data. Larger datasets of this kind are becoming available, but remain expensive to generate. Here we demonstrate the use of a large dataset that we have “synthetically” labelled with per-atom energies from an existing ML potential model. The cheapness of this process, compared to the quantum-mechanical ground truth, allows us to generate millions of datapoints, in turn enabling rapid experimentation with atomistic ML models from the small- to the large-data regime. This approach allows us here to compare regression frameworks in depth, and to explore visualisation based on learned representations. We also show that learning synthetic data labels can be a useful pre-training task for subsequent fine-tuning on small datasets. In the future, we expect that our open-sourced dataset, and similar ones, will be useful in rapidly exploring deep-learning models in the limit of abundant chemical data.

Graphical abstract: Synthetic data enable experiments in atomistic machine learning

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
09 Dec 2022
Accepted
20 Mar 2023
First published
20 Mar 2023
This article is Open Access
Creative Commons BY license

Digital Discovery, 2023,2, 651-662

Synthetic data enable experiments in atomistic machine learning

J. L. A. Gardner, Z. Faure Beaulieu and V. L. Deringer, Digital Discovery, 2023, 2, 651 DOI: 10.1039/D2DD00137C

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements