Design of fragrant molecules through the incorporation of rough sets into computer-aided molecular design
Abstract
Design and screening of fragrances based on experiments or experiences of specialists can overlook potentially better fragrance products. To overcome this issue, a systematic mathematical programming-based approach is developed for the design of fragrant molecules. A novel data-driven rough set-based machine learning (RSML) model is utilised as a predictive or diagnostic modelling tool for odour properties. RSML generates deterministic rules based on the relationship between the topology of fragrant molecules and their odour characters elicited from an existing odour database. The rules generated are then integrated as constraints into a computer-aided molecular design (CAMD) problem. The CAMD framework also involves other relevant properties such as diffusion coefficient, vapour pressure, viscosity, LC50 and solubility parameter which are predicted using a group contribution (GC) method. Since there are different types of models involved in the prediction of various attributes, molecular signature descriptors are utilised as the common platform that links machine learning and other predictive models in a CAMD problem. The application of the new design method is demonstrated through a case study to design fragrant molecules for shampoo additives with desirable physical and environmental properties. The results indicate the ability of the novel method in identifying non-intuitive and promising fragrant molecules that can be used for various applications.