A machine learning approach using frequency descriptor for molecular property predictions†
Abstract
Machine learning algorithms have been found to be effective in predicting the properties of molecules and materials. Recently, a new strategy, Δ-machine learning, which uses low-level calculations as a baseline to predict properties of high-level methods, has been proposed to further reduce computational costs. It has been successfully applied to predictions of potential energy surfaces, bandgaps and chemical shieldings. Here we introduce a new descriptor, in which we used harmonic vibrational frequencies as the descriptor in predictions of molecular properties, namely the frequency descriptor (FD). In detail, we used harmonic vibrational frequencies of several semi-empirical methods (the PM6, PM7 and GFN2-xTB methods) as the descriptor in Δ-machine learning. The energies, enthalpies and HOMO–LUMO gaps of 6095 C7H10O2 isomers at high-level calculations were used as target properties to test the descriptor. We found that the FD generated by the GFN2-xTB method has excellent performance among several semiempirical methods. The chemical accuracy can be achieved with a small training set size according to the combination of single-point calculations at density functional theory levels. In addition, we further included infrared intensities to the FD, namely the FD-II by which the chemical accuracy of energies can be achieved with a small training set size (3%) that represents the smallest sample size in the current dataset (C7H10O2 isomers). We expect that the FD and FD-II can also be used to accelerate other property predictions.