Selected machine learning of HOMO–LUMO gaps with improved data-efficiency

Bernard Mazouin; Alexandre Alain Schöpfer; O. Anatole von Lilienfeld

doi:10.1039/D2MA00742H

Selected machine learning of HOMO–LUMO gaps with improved data-efficiency†

Bernard Mazouin,

^a Alexandre Alain Schöpfer

^b and O. Anatole von Lilienfeld

*^cde

Author affiliations

* Corresponding authors

^a University of Vienna, Faculty of Physics and Vienna Doctoral School in Physics, Kolingasse 14-16, 1090 Vienna, Austria

^b Department of Chemistry, University of Basel, Klingelbergstrasse 70, 4056 Basel, Switzerland
E-mail: anatole.vonlilienfeld@utoronto.ca

^c Departments of Chemistry, Materials Science and Engineering, and Physics, University of Toronto, St. George Campus, Toronto, ON, Canada

^d Vector Institute for Artificial Intelligence, Toronto, ON, Canada

^e Machine Learning Group, Technische Universität Berlin and Institute for the Foundations of Learning and Data, 10587 Berlin, Germany

Abstract

Despite their relevance for organic electronics, quantum machine learning (QML) models of molecular electronic properties, such as HOMO–LUMO-gaps, often struggle to achieve satisfying data-efficiency as measured by decreasing prediction errors for increasing training set sizes. We demonstrate that partitioning training sets into different chemical classes prior to training results in independently trained QML models with overall reduced training data needs. For organic molecules drawn from previously published QM7 and QM9-data-sets we have identified and exploited three relevant classes corresponding to compounds containing either aromatic rings and carbonyl groups, or single unsaturated bonds, or saturated bonds The selected QML models of band-gaps (considered at GW and hybrid DFT levels of theory) reach mean absolute prediction errors of ∼0.1 eV for up to an order of magnitude fewer training molecules than for QML models trained on randomly selected molecules. Comparison to Δ-QML models of band-gaps indicates that selected QML exhibit superior data-efficiency. Our findings suggest that selected QML, e.g. based on simple classifications prior to training, could help to successfully tackle challenging quantum property screening tasks of large libraries with high fidelity and low computational burden.

This article is part of the themed collection: Materials Informatics

Materials Advances

Selected machine learning of HOMO–LUMO gaps with improved data-efficiency†

Abstract

Article information

Download Citation

Permissions

Selected machine learning of HOMO–LUMO gaps with improved data-efficiency

Social activity

Search articles by author

Spotlight

Advertisements