Feature selection in molecular graph neural networks based on quantum chemical approaches†
Abstract
Feature selection is an important topic that has been widely studied in data science. Recently, graph neural networks (GNNs) and graph convolutional networks (GCNs) have also been employed in chemistry. To enhance the performance characteristics of the GNN and GCN in the field of chemistry, feature selection should also be discussed in detail from the chemistry viewpoint. Thus, this study proposes a new feature in molecular GNNs and discusses the accuracy, overcorrelation between features, and interpretability. The feature vector was constructed from molecular atomic properties (MAPs) computed with quantum mechanical (QM) approaches. Although the QM calculations require computational time, we can employ a variety of atomic properties, which will be useful for better prediction. In the preparation of feature vectors from MAPs, we employed the concatenation approach to improve the overcorrelation in GNNs. Moreover, the integrated gradient analysis showed that the machine learning model with the proposed feature vectors explained the prediction outputs reasonably.