Investigating the error imbalance of large-scale machine learning potentials in catalysis

Kareem Abdelmaqsoud; Muhammed Shuaibi; Adeesh Kolluru; Raffaele Cheula; John R. Kitchin

doi:10.1039/D4CY00615A

Investigating the error imbalance of large-scale machine learning potentials in catalysis†

Kareem Abdelmaqsoud,

^a Muhammed Shuaibi,^b Adeesh Kolluru,^a Raffaele Cheula^ac and John R. Kitchin

*^a

Author affiliations

* Corresponding authors

^a Department of Chemical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
E-mail: jkitchin@andrew.cmu.edu

^b Meta Fundamental AI Research, Menlo Park, California 94025, USA

^c Department of Physics and Astronomy, Aarhus University, Nordre Ringgade 1, 8000 Aarhus C, Denmark

Abstract

Machine learning potentials (MLPs) have greatly accelerated atomistic simulations for material discovery. The Open Catalyst 2020 (OC20) dataset is one of the largest datasets for training MLPs for heterogeneous catalysis. The mean absolute errors (MAE) of the MLPs on the energy target of the dataset have asymptotically approached about 0.2 eV over the past two years with increasingly sophisticated models. The errors were found to be imbalanced between the different material classes with non-metals having the highest errors. In this work, we investigate several potential sources for the imbalanced distribution of errors. We examined material class-specific convergence errors in the density functional theory (DFT) calculations including k-point sampling, plane wave cutoff and smearing width. Significant DFT convergence errors with a mean absolute value of ∼0.15 eV were found on the total energies of non-metals, higher than the other material classes. However, as a result of cancellation of errors, convergence errors on adsorption energies have a mean absolute value of ∼0.05 eV across all material classes. Moreover, we found that the MAEs of the MLPs are not affected by these convergence errors. Second, we show that calculations with surface reconstruction can introduce inconsistencies to the adsorption energy referencing scheme that cannot be fit by the MLPs. Nonmetals and halides were found to have the highest fraction of calculations with surface reconstructions. Removing calculations with surface reconstructions from the validation sets, without re-training, significantly lowers the MAEs by ∼35% and reduces the imbalance of the MAEs. Alternatively, MLPs trained on total energies provide a solution to the surface reconstruction inconsistencies since they eliminate the referencing issue, and have comparable MAEs to MLPs trained on adsorption energies.

This article is part of the themed collection: Digital Catalysis

Catalysis Science & Technology

Investigating the error imbalance of large-scale machine learning potentials in catalysis†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Investigating the error imbalance of large-scale machine learning potentials in catalysis

Social activity

Search articles by author

Spotlight

Advertisements