Global geometry of chemical graph neural network representations in terms of chemical moieties

Amer Marwan El-Samman; Incé Amina Husain; Mai Huynh; Stefano De Castro; Brooke Morton; Stijn De Baerdemacker

doi:10.1039/D3DD00200D

Global geometry of chemical graph neural network representations in terms of chemical moieties

Amer Marwan El-Samman,*^a Incé Amina Husain,

^a Mai Huynh,^a Stefano De Castro,

^a Brooke Morton

^a and Stijn De Baerdemacker

^ab

Author affiliations

* Corresponding authors

^a University of New Brunswick, Department of Chemistry, 30 Dineen Dr, Fredericton, Canada
E-mail: aelsamma@unb.ca

^b University of New Brunswick, Department of Mathematics and Statistics, 30 Dineen Dr, Fredericton, Canada
E-mail: stijn.debaerdemacker@unb.ca

Abstract

Graph neural nets, such as SchNet, [Schütt et al., J. Chem. Phys., 2018, 148, 241722], and AIMNet, [Zubatyuk et al., Sci. Adv., 2019, 5, 8] provide accurate predictions of chemical quantities without invoking any direct physical or chemical principles. These methods learn a hidden statistical representation of molecular systems in an end-to-end fashion; from xyz coordinates to molecular properties with many hidden layers in between. This naturally leads to the interpretability question: what underlying chemical model determines the algorithm's accurate decision-making? By analyzing the hidden layer activations of QM9-trained graph neural networks, also known as “embedding vectors” with dimension-reduction, linear discriminant analysis and Euclidean-distance measures we shed light on an interpretation. The result is a quantifiable geometry of these models' decision making that identifies chemical moieties and has a low parametric space of ∼5 important parameters from the fully-trained 128-parameter embedding. The geometry of the embedding space organizes these moieties with sharp linear boundaries that can classify each chemical environment within <5 × 10⁻⁴ error. Euclidean distance between embedding vectors can be used to demonstrate a versatile molecular similarity measure, comparable to other popular hand-crafted representations such as Smooth Overlap of Atomic Positions (SOAP). We also reveal that the embedding vectors can be used to extract observables that are related to chemical environments such as pK_a and NMR. While not presenting a fully comprehensive theory of interpretability, this work is in line with the recent push for explainable AI (XAI) and gives insights into the depth of modern statistical representations of chemistry, such as graph neural nets, in this rapidly evolving technology.

Article information

https://doi.org/10.1039/D3DD00200D

Article type

Paper

Submitted

06 Oct 2023

Accepted

01 Feb 2024

First published

15 Feb 2024

This article is Open Access

Download Citation

Digital Discovery, 2024,3, 544-557

Permissions

Request permissions

Global geometry of chemical graph neural network representations in terms of chemical moieties

A. M. El-Samman, I. A. Husain, M. Huynh, S. De Castro, B. Morton and S. De Baerdemacker, Digital Discovery, 2024, 3, 544 DOI: 10.1039/D3DD00200D

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Digital Discovery

Global geometry of chemical graph neural network representations in terms of chemical moieties

Abstract

Article information

Download Citation

Permissions

Global geometry of chemical graph neural network representations in terms of chemical moieties

Social activity

Search articles by author

Spotlight

Advertisements