Predicting the enthalpy of formation of energetic molecules via conventional machine learning and GNN†
Abstract
Machine learning (ML) provides a promising method for efficiently and accurately predicting molecular properties. Using ML models to predict the enthalpy of formation of energetic molecules helps in fast screening of potential high-energy molecules, thereby accelerating the design of energetic materials. A persistent challenge is to determine the optimal featurization methods for molecular representation and use an appropriate ML model. Thus, in our study, we evaluate various featurization methods (CDS, ECFP, SOAP, GNF) and ML models (RF, MLP, GCN, MPNN), dividing them into two groups: conventional ML models and GNN models, to predict the enthalpy of formation of potential high-energy molecules. Our results demonstrate that CDS and SOAP have advantages over the ECFP, while the GNFs in GCN and MPNN models perform better. Furthermore, the MPNN model performs best among all models with a root mean square error (RMSE) as low as 8.42 kcal mol−1, surpassing even the best performing CDS-MLP model in conventional ML models. Overall, this study provides a benchmark for ML in predicting enthalpy of formation and emphasizes the tremendous potential of GNN in property prediction.