Combining machine learning and quantum chemical calculations for high-throughput virtual screening of thermally activated delayed fluorescence molecular materials: the impact of selection strategy and structural mutations†
Abstract
In view of the theoretical importance and huge application potential of Thermally Activated Delayed Fluorescence (TADF) materials, it is of great significance to conduct High-Throughput Virtual Screening (HTVS) on compound libraries to find TADF candidate molecules. This research focuses on the computational design of pure organic TADF molecules. By combining machine learning and quantum chemical calculations, using cheminformatics tools, and introducing the concept of selection and mutation from evolutionary theory, we have designed a computational program for HTVS of TADF molecular materials, especially the impact of selection strategy and structural mutations on the results of HTVS was explored. An initial compound library (size = 103) constructed by enumeration of typical donors and acceptors was used to evolve by successively applying selection and 10 different structural mutations. And a group fingerprint similarity (ΔMSPR) index was proposed to account for the similarity between two compound libraries with comparable sizes. Based on the computed data, we have found that the mix of selection and mutations into the evolution map does have great impact on the HTVS results: (a) except the fast mutation Sub2, all the rest of the mutations can effectively concentrate ‘good’ molecules in a compound library, and hence give large material abundance (typically >0.8) for high mutation generations (ng ≥ 6). (b) The mean energy gap can exhibit a fast convergent trend toward very low values, hence the studied mutations (except Sub2) can cooperate very well with the studied DA substrates to generate optimal molecules, and the group fingerprint similarity can retain high enough values for large ng, which can be associated with the apparent convergence in molecular skeletons as ng increases. (c) The distribution of skeleton frequencies for a specific mutation is generally uneven with one dominant skeleton. The overall numbers of common and generic cores for all mutations are 11 and 7 as ng = 9. Hence, in a sense, the ‘optimal’ skeletons seem unique and useful in realizing low energy gaps. With these observations and the development of related HTVS software, we expect to provide insight and tools to the research community of HTVS of molecular (TADF) materials.