Issue 4, 2018

Thermochemistry of gas-phase and surface species via LASSO-assisted subgraph selection

Abstract

Graph theory-based regression techniques, such as group additivity, have widely been implemented for fast estimation of thermochemistry of large molecules. The essence of these techniques lies in graphs that molecules are decomposed to. These graphs are selected based on heuristics and as a result, they may not give optimal accuracy and are hard to choose for non-nearest-neighbor electronic effects such as ring strain, steric hindrance, and resonance structures. Here, we explore LASSO, a feature selection algorithm, to select the optimal set of graph descriptors for predicting the standard enthalpy of formation, ΔfH°. We gather hydrocarbon gas-phase data from the NIST Webbook and the Burcat's databases. We find that models using LASSO-based graph descriptors from the exhaustively enumerated graph descriptor space predict ΔfH° more accurately than the traditional group additivity. We compare our framework with state-of-the-art machine-learning models for the QM9 data set. The mean absolute error of 1.39 kcal mol−1 is comparable to published machine learning models. To cope with the computational cost of complete enumeration, we present: (1) a semi-supervised LASSO learning method and (2) an adsorbate subgraph mining algorithm. The former prunes the graph descriptor space on-the-fly during the LASSO regression and is applied to a gas-phase hydrocarbon data set. The latter enumerates a truncated graph descriptor space from adsorbate graphs of surface science data. For lignin monomer adsorbates on Pt(111), considered here as an illustrative example, descriptors selected from the adsorbate subgraph space result in a mean absolute error and a root mean square error of 2.08 and 3.03 kcal mol−1, respectively. We discuss a simple method that identifies outliers in descriptor space that result in large model errors so the accuracy can be improved with the addition of suitable data.

Graphical abstract: Thermochemistry of gas-phase and surface species via LASSO-assisted subgraph selection

Article information

Article type
Paper
Submitted
18 Dec 2017
Accepted
13 Feb 2018
First published
13 Feb 2018

React. Chem. Eng., 2018,3, 454-466

Thermochemistry of gas-phase and surface species via LASSO-assisted subgraph selection

G. H. Gu, P. Plechac and D. G. Vlachos, React. Chem. Eng., 2018, 3, 454 DOI: 10.1039/C7RE00210F

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements