Prediction of flavor and retention index for compounds in beer depending on molecular structure using a machine learning method†
Abstract
In order to make a preliminary prediction of flavor and retention index (RI) for compounds in beer, this work applied the machine learning method to modeling depending on molecular structure. Towards this goal, the flavor compounds in beer from existing literature were collected. The database was classified into four groups as aromatic, bitter, sulfury, and others. The RI values on a non-polar SE-30 column and a polar Carbowax 20M column from the National Institute of Standards Technology (NIST) were investigated. The structures were converted to molecular descriptors calculated by molecular operating environment (MOE), ChemoPy and Mordred, respectively. By combining the pretreatment of the descriptors, machine learning models, including support vector machine (SVM), random forest (RF) and k-nearest neighbour (kNN) were utilized for beer flavor models. Principal component regression (PCR), random forest regression (RFR) and partial least squares (PLS) regression were employed to predict the RI. The accuracy of the test set was obtained by SVM, RF, and kNN. Among them, the combination of descriptors calculated by Mordred and RF model afforded the highest accuracy of 0.686. R2 of the optimal regression model achieved 0.96. The results indicated that the models can be used to predict the flavor of a specific compound in beer and its RI value.