Chunyun Tua,
Weijiang Huanga,
Sheng Liangb,
Kui Wanga,
Qin Tiana and
Wei Yan*a
aSchool of Chemistry and Materials Engineering, Guiyang University, Guiyang, 550005, P. R. China. E-mail: lrasyw@163.com; Tel: +86-180-9605-0905
bSchool of Mathematics and Information Science, Guiyang University, Guiyang, 550005, P. R. China
First published on 31st October 2022
In view of the theoretical importance and huge application potential of Thermally Activated Delayed Fluorescence (TADF) materials, it is of great significance to conduct High-Throughput Virtual Screening (HTVS) on compound libraries to find TADF candidate molecules. This research focuses on the computational design of pure organic TADF molecules. By combining machine learning and quantum chemical calculations, using cheminformatics tools, and introducing the concept of selection and mutation from evolutionary theory, we have designed a computational program for HTVS of TADF molecular materials, especially the impact of selection strategy and structural mutations on the results of HTVS was explored. An initial compound library (size = 103) constructed by enumeration of typical donors and acceptors was used to evolve by successively applying selection and 10 different structural mutations. And a group fingerprint similarity (ΔMSPR) index was proposed to account for the similarity between two compound libraries with comparable sizes. Based on the computed data, we have found that the mix of selection and mutations into the evolution map does have great impact on the HTVS results: (a) except the fast mutation Sub2, all the rest of the mutations can effectively concentrate ‘good’ molecules in a compound library, and hence give large material abundance (typically >0.8) for high mutation generations (ng ≥ 6). (b) The mean energy gap can exhibit a fast convergent trend toward very low values, hence the studied mutations (except Sub2) can cooperate very well with the studied DA substrates to generate optimal molecules, and the group fingerprint similarity can retain high enough values for large ng, which can be associated with the apparent convergence in molecular skeletons as ng increases. (c) The distribution of skeleton frequencies for a specific mutation is generally uneven with one dominant skeleton. The overall numbers of common and generic cores for all mutations are 11 and 7 as ng = 9. Hence, in a sense, the ‘optimal’ skeletons seem unique and useful in realizing low energy gaps. With these observations and the development of related HTVS software, we expect to provide insight and tools to the research community of HTVS of molecular (TADF) materials.
So far, the luminescent materials as core OLED materials have undergone profound improvements, starting from the first generation of fluorescent materials (e.g., aluminum octahydroxyquinoline), through the second generation of phosphorescent materials represented by heavy transition noble metal organic complexes (e.g., bipyridine complexes of Ir(III)) until the third generation of TADF materials (e.g., organic donor–π-bridge–acceptor molecules).
Upon electric excitation, TADF materials (compounds characterized by very low first excited singlet-triplet energy gaps (ΔEST)) get thermally activated to induce efficient reverse intersystem crossing (rISC) where the triplet excitons get converted into singlet excitons, so as to emit light dominantly from the emissive singlet excited state. In Fig. 1, the electroluminescence process of TADF material is schematically shown. Compared with noble metal–organic complex phosphorescent materials, TADF materials have the advantages of larger material space, low price, easy preparation and synthesis, easy fabrication of flexible screens, and more stable blue light emission. Therefore, in the last decade, as the most promising electroluminescent material for modern OLEDs, they have been experimentally,2,5–9, theoretically10–23 and theory-experiment jointly15,24,25 studied in depth.
Fig. 1 Schematic diagram of the electroluminescence process of thermally activated delayed fluorescent materials. |
Basically, there are two classes of TADF materials that have been carefully explored.4 The first type is pure organic D–A or D–π–A systems whose electron donor (D) or acceptor (A) are mainly constructed by nitrogen-containing aromatic heterocycles. The lowest excitation states typically possess significant intramolecular charge transfer (CT) transition character. After reasonable design and optimization, the external quantum efficiency (EQE) of OLED devices based on such TADF materials can even be as high as 30%. From the perspective of structural characteristics, the best luminous efficiency usually corresponds to the twisted D–A (or D–π–A) compounds due to enough steric hindrance between the donor and acceptor parts. Another type is transition metal (Cu(I), Ag(I), Zn(II), etc.) complexes with electronic configuration of d10, and their lowest excited states usually have significant metal–ligand Charge transfer (MLCT) transition character. The saturated d10 electronic configuration of the central metal is very beneficial to reduce the possible quenching of the dπ–dπ* transitions in the complex and achieve deep blue emission.
The experimental breakthroughs came mainly from Adachi and collaborators, who focused on designing organic molecules with D–π–A (and other) frameworks, and tuning the frameworks to achieve a small enough ΔEST while maintaining a suitable fluorescence radiation rate, so that efficient TADF becomes possible. Recently developed blue TADF OLED devices have an EQE approaching 37%, which is rather impressive considering the EQE of Tang and VanSlyke's 1987 version of fluorescent OLEDs is about 1%.1
In a review on molecular design patterns of organic TADF materials,3 Im et al. suggested that high-efficiency TADF materials should have at least a small ΔEST and a high photoluminescence quantum yield (PLQY). ΔEST is associated with upconverting triplet excitons to singlet excitons, while PLQY is closely related to the radiative transition probability. To obtain a small ΔEST, a strong donor/acceptor should be used and the molecular backbone should be twisted. The acquisition of high PLQY should have: a phenyl bridge as a connecting unit, delocalized and dispersed highest occupied molecular orbital (HOMO), and a double luminescent core. These strategies will undoubtedly provide useful guidance for further molecular design of TADF materials.
Contemporary electronic structure theory methods (e.g., density functional theory, DFT) have been able to predict the optoelectronic properties of molecules (or materials) with relatively high accuracy.26,27 Theoretical research is playing an increasingly important role in the in-depth understanding of the structure–property relationship and luminescence mechanism of TADF materials, and has a significant impact on the molecular design of such materials. As pointed out by Olivier and collaborators, theoretical research on this type of materials requires careful consideration.19 Designing new molecules with efficient TADF emission is a difficult task, as they must exhibit a strong transition between singlet and triplet states without using heavy elements to enhance spin–orbit coupling fast conversion (large krISC). They should also show a large fluorescence rate (large kF), but at the same time a small energy difference between excited singlet and triplet states (small ΔEST). In a feature article, Penfold et al. reviewed recent advances in theoretical and computational chemistry to understand TADF materials and mechanisms.20 For luminescence dynamics, simply assume krISC ≫ kF, and apply eqn (1)
(1) |
Commonly viewed as a branch discipline of theoretical chemistry, the rise of cheminformatics in recent years is deemed to make great impact on chemical science. With the continuous development of the theoretical system,28–33 additionally, the open-sourceization of many high-quality cheminformatics tools (e.g., RDKit, Mordred, stk etc.),34–37 those make it possible (even for non-experts) to efficiently manage large amounts of chemical information. The efficient management of virtual molecules as well as molecular libraries in silico by using cheminformatics tools is crucial for large-scale computational design of (molecular) materials. On the other hand, in the field of computational design of (organic) molecules and (solid-state) materials, exploration of chemical compound space (CCS) using high-throughput virtual screening (HTVS) methods is being accepted as a routine procedure for molecular or material lead discovery. The important material categories involved include photovoltaic materials, optoelectronic materials, organic matrix flow battery materials, etc.38–40 By designing computational funnels to efficiently deploy computational programs, the HTVS approach allows researchers to make data-driven discoveries by observing trends in the data.
As one of the branches of artificial intelligence (AI), machine learning (ML) can efficiently extract hidden relationships from large amounts of complex data. With advances in algorithmic models and open-source tools (general purpose: Scikit-learn, TensorFlow, Pytorch etc.;41–44 chemistry or materials orientation: DeepChem, MLatom, MAST-ML etc.45–48), ML has profoundly changed the research paradigm of computational chemistry (or materials) science in the last decade.49 Classical algorithm developments and applications include: predicting molecular atomization energies;50 finding density functionals for model systems;51 improving high-level electron correlation methods, learning universal molecular force fields; predicting molecular thermochemical properties, chemical reaction active sites, molecular excited state properties, molecular crystallization behavior, etc.39,52–55 On the other hand, the establishment of open-source molecular databases has also promoted the development and calibration of models and algorithms which combine quantum chemistry with machine learning.45,56–59
Considering the rarity and high price of heavy metal transition metal complex phosphorescent materials, as well as the difficulty in achieving high-performance blue light emission, it is undoubtedly very attractive to design and develop stable and efficient TADF blue light materials as an alternative.4 A pioneering attempt at high-throughput virtual screening of organic TADF materials was first made by Aspuru-Guzik and collaborators. By utilizing machine learning and time-dependent density functional theory methods, the screening procedure is rationally set to screen thousands of promising candidate TADF molecules from a search space of 1.6 million molecules, among which the best candidate molecules can be used to prepare OLED devices. The achieved external quantum efficiencies can be as high as 22%.15 In another distinguished study, the same authors designed a deep neural network incorporating a variational autoencoder (VAE),60 by accepting hundreds of thousands of existing chemical structures to build three coupled functions: encoder, decoder and predictor.61 This model can convert discrete molecular representations to and from multidimensional continuous ones. Notably, the continuous representation allows the use of powerful gradient-based optimization to efficiently guide the search for optimal functional compounds.
This study focuses on the computational design of pure organic TADF molecules, by examining the effects of structural mutations as well as selection strategy on the results of high-throughput virtual screening of TADF materials, we expect to provide theoretical basis and guidance for the optimization of organic (or metallic complex type) TADF materials (lead) for larger-scale chemical space exploration in the future.
Structural mutations can play an important role in tuning the electronic properties of molecular systems. Suppose our starting molecule is Biphenyl with 10 aromatic C–H bonds (aC–H) in the structure, if we allow two types of simple structural mutations:
(1) The whole is replaced by an aromatic N (aN),
aC–H → aN; |
(2) The terminal H is substituted by a group G (G is a common simple electron donor or acceptor),
aC–H → aC–G |
If we further set substitution group G to be F (Fluorine group), for this molecule, we would virtually have 210 mutant offspring (assuming all positions are distinguishable), and the real size would be 210 after removing the duplicates. This only considers the consequences of a single mutation. If there are more than one possible mutations at a single substitutable position, the number of combinations will expand dramatically beyond the calculable extent, given typically available computation resources owned by a computational research group. Obviously, the size of our initial molecular library G0 will not be 1 (typically greater than 103). Therefore, designing computational funnels based on a core property (or several core properties) of a material is crucial for efficient exploration of chemical compound space. For TADF material in current case, this property was chosen to be the energy difference between the first singlet excited state and the first triplet excited state (ΔEST).
Both single and mixed mutations have been taken into account. They are: N(slow), N(fast), F, CN, OMe, and NMe2; F or OMe, F or NMe2, CN or OMe, CN or NMe2. N(slow) and N(fast) denote different mutation speeds, where N(slow) restricts only one position to be substituted, and N(fast) allows at most two. These mutations are denoted symbolically as Sub1, Sub2, Sub3, …, Sub10, respectively.
The designed computational framework for high-throughput virtual screening for TADF materials in this study is schematically presented in Fig. 2. The brief process is as follows:
(a) The control parameters get initialized. The convergence criteria for the loop are set as a combination of three: number of generation of mutations (ng), number of accumulated optimal molecules (nacc_opt_mols), and material abundance (ωMA).
(b) Through rational selection of donor and acceptor fragments (30 donors and 43 acceptors, see Fig. S1 and S2 in the ESI†), under the donor–acceptor (DA) structural framework, using the open source cheminformatics package RDKit, and based on the Simplified Molecular Input Line Entry System (SMILES),29 an initial molecular library G0 (limit its size to 103) was obtained by combinatorial enumeration of fragments.
(c) Starting from this library, some molecules are randomly selected (the selection ratio is set to 10%), and their initial molecular conformations are generated by the RDKit package, where the ETKGD algorithm35 is adopted.
(d) The core properties of the selected molecules are quickly and accurately calculated by quantum chemical calculations. The geometry optimization of ground state is performed by semi-empirical quantum chemical methond PM6-D3.63,64 Based on the optimized geometry, the vertical energy gap (ΔEST) is calculated by TD-ωB97XD/6-31G(d) method.65
The differences between ground state geometries computed by B3LYP/6-31G(d) and PM6-D3 levels of theory is mesured by the root-mean-square deviation (RMSD) of the computed molecules for Sub3 (ng = 0 only). The RMSDs is calculated by the Python code rmsd with adoption of the Kabsch algorithm to align molecules.66,67 The distribution of frequency of RMSDs is given in Fig. S3.† The mean of the RMSDs is 0.59, and 75% of them are smaller than 0.69, which indicates the size of difference in geometries might be acceptable. Hence, the PM6-D3 method is adopt. In addition, the effect of varied ground state geometry optimization methods on the HTVS results have been briefly tested (see ESI†). Moreover, tuned range-separated hybrid functional methods (e.g., LC-ω*PBE, ω*B97XD and CAM-B3LYP) are typically chosen to accurately compute the related electronic properties of TADF molecules. In this study, owing to the limit on available computational resources, the TD-ωB97XD/6-31G(d) method is chosen with the range-separation parameter not tuned, with the hope that the tuned range-separation parameters of molecules could not deviate considerably from the default values or if the deviations are considerable, they could induce the same direction changes on the distribution of the computed property of the compound library.
(e) The molecular structures get featurized by molecular fingerprint method, and are introduced into the machine learning algorithm to train and learn a model. The chosen fingerprint is the ECFP method, and the computation is assisted by the DeepChem package. And the Random Forest (RF) Regressor68 of the machine learning package Scikit-learn is used. (For more details, refer to the related section in the ESI.†)
(f) By using the learned ML model, we predict the property of entire molecular library so as to obtain the optimal molecules within the library.
(g) A certain proportion (10%) of the top-ranked molecules are taken out to generate a new generation of molecular libraries (named Gn, n = 1, 2, 3, …) by means of structural mutations. Here, selection and mutation get incorporated into the computational paths.
(h) Analysis of molecular skeleton decomposition is performed to access the corresponding evolution of skeleton of molecules in library. There the Murcko Skeleton Decomposition69 method in the RDKit package is adopted.
(i) An energy sieve is then applied to divide the molecules into regions of different colors. Molecules with predicted vertical first excited energies (ES1) larger than 2.80 eV, between 2.50 to 2.80 eV, and smaller than 2.50 eV are partitioned into the blue, green, and red regions of colors, respectively.
(j) Compute material abundance (ωMA) and accumulate optimal molecules to get number of accumulated optimal molecules (nacc_opt_mols). The threshold for 'good' material is the predicted ΔEST < 0.15 eV. The material abundance is computed by eqn (2)
(2) |
(k) The accumulated optimal molecules are finally ranked based on Synthetic Accessibility Scores (SAS) to obtain the best TADF material candidates. Low SASs imply relative ease of synthesis of molecules. Since the perpetual mutations on the molecular framework would profoundly disturb the structure, even could make the synthesis impossible, a final control of SAS is certainly necessary to sieve out bad structures from the good ones.
Repeat the above steps from (c) to (j) using the newly formed library Gn to generate a next library Gn + 1, until we have reached the preset loop convergence criterion. The definition of calculation completeness for geometry optimization and property evaluation is meant to assist the automation of related calculation routines. Unavoidably, the geometry or property of some molecules (and their mutation offspring) may not converge under the chosen computational methods, therefore, only a preset completeness ratio is required to escape the steps. For geometry optimization and property calculation, the ratios are 0.80 and 0.90, respectively. Inside the loop, the interconversion of chemical files between different formats is facilitated by the Open Babel cheminformatics tool.70 Gratefully, the analysis of data is assisted by the Anaconda3 (ref. 71) scientific computing platform and the Spyder72 integrate development environment, where several numeric Python packages have been used, including NumPy, Pandas, SciPy and Matplotlib.73–76 The above computational program has been packaged and distributed as an open source Python code (SALAM).77
All quantum chemical computations is done by the Gaussian 16 package.78
ng | Sub1 | Sub2 | Sub3 | Sub4 | Sub5 | Sub6 | Sub7 | Sub8 | Sub9 | Sub10 |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.010 | 0.010 | 0.010 | 0.010 | 0.010 | 0.010 | 0.010 | 0.010 | 0.010 | 0.010 |
1 | 0.133 | 0.076 | 0.200 | 0.025 | 0.121 | 0.093 | 0.186 | 0.133 | 0.067 | 0.152 |
2 | 0.460 | 0.033 | 0.697 | 0.351 | 0.337 | 0.027 | 0.967 | 0.648 | 0.901 | 0.799 |
3 | 0.766 | 0.210 | 0.825 | 0.634 | 0.953 | 0.179 | 0.968 | 0.967 | 0.865 | 0.708 |
4 | 0.592 | 0.481 | 0.798 | 0.762 | 0.805 | 0.352 | 0.975 | 0.649 | 0.720 | 0.567 |
5 | 0.789 | 0.330 | 0.837 | 0.857 | 0.926 | 0.591 | 0.879 | 0.423 | 0.916 | 0.845 |
6 | 0.760 | 0.376 | 0.834 | 0.782 | 0.959 | 0.785 | 0.929 | 0.941 | 0.982 | 0.894 |
7 | 0.747 | 0.368 | 0.833 | 0.814 | 0.949 | 0.927 | 0.958 | 0.979 | 0.812 | 1.000 |
8 | 0.760 | 0.197 | 0.860 | 0.867 | 0.877 | 0.876 | 1.000 | 0.823 | 0.841 | 0.777 |
9 | 0.642 | 0.306 | 0.852 | 0.946 | 0.895 | 0.716 | 0.995 | 0.915 | 0.982 | 1.000 |
A comparison of Sub1 and Sub2 (correspond to different mutation speeds: slow versus fast), tells that for this type of mutations (aC–H → aN) a slow mutation speed is favorable than a fast one in achieving high ωMA at large ng. As ng = 9, the ωMA for Sub1 and Sub2 are 0.642 and 0.306, correspondingly. For the second type of mutations (aC–H → aC–G), the evolution of ωMA seem relatively small as ng increases to high values. To sum up, except the fast mutation Sub2, all of the rest mutations can effectively concentrate ‘good’ molecules in compound library, hence give large ωMA values.
ng | Sub1 | Sub2 | Sub3 | Sub4 | Sub5 | Sub6 | Sub7 | Sub8 | Sub9 | Sub10 |
---|---|---|---|---|---|---|---|---|---|---|
0 | 18.3 | 18.3 | 18.3 | 18.3 | 18.3 | 18.3 | 18.3 | 18.3 | 18.3 | 18.3 |
1 | 15.7 | 14.8 | 14.9 | 14.9 | 14.9 | 14.9 | 15.8 | 15.9 | 15.8 | 15.7 |
2 | 13.7 | 11.1 | 11.3 | 10.0 | 12.9 | 10.7 | 13.1 | 12.6 | 14.7 | 12.8 |
3 | 11.8 | 7.8 | 10.1 | 11.1 | 9.4 | 8.9 | 17.8 | 10.5 | 13.8 | 14.5 |
4 | 11.9 | 6.5 | 6.3 | 12.5 | 8.0 | 15.6 | 11.1 | 9.3 | 16.6 | 15.1 |
5 | 11.1 | 5.5 | 6.2 | 10.9 | 6.9 | 19.2 | 10.7 | 13.4 | 19.2 | 20.1 |
6 | 11.1 | 5.7 | 5.4 | 7.9 | 7.2 | 18.1 | 11.3 | 6.1 | 14.0 | 21.1 |
7 | 11.2 | 5.8 | 5.3 | 8.0 | 5.9 | 20.2 | 10.5 | 4.8 | 10.5 | 20.4 |
8 | 10.9 | 5.5 | 5.0 | 7.8 | 6.3 | 18.2 | 10.6 | 4.4 | 7.8 | 4.0 |
9 | 10.8 | 5.4 | 4.9 | 8.2 | 5.4 | 16.0 | 9.7 | 3.6 | 7.5 | 2.9 |
Take Sub3 as a case, whose naCH starts from a large value 18.3 (ng = 0), sharply decreases to 6.3 (ng = 4), finally drops to 4.9 (ng = 9). The related diagram for Sub3 has been depicted in Fig. 4. The starting large naCH should be attributed to large amount of unsubstituted molecules with multi-cyclic aromatic structures in the library. A small ending naCH should be attributed to large amount of oversubstituted molecules in the library, while a large ending naCH should be attributed to concentration of very large multi-cyclic aromatic molecules with low substitutions in the library. Thus, the seemingly anomalous phenomena of alternative increase and decrease in naCH can be understood by tracing the evolution of molecular skeletons of the compound library.
As compared with the slow mutation (Sub1), the naCH of the fast mutation (Sub2) exhibits a very rapid drop in value. However, this rapid drop in naCH is not sufficient to guarantee a meaningful increase in ωMA (compare Tables 2 and 1). For some mutations (Sub6 and Sub10), the anomalous alternative increase and decrease in naCH is a sign of violent transformation of dominant molecular skeletons under selection and mutation process. Therefore, it can be used as an indicator to differentiate the skeleton transformation effect of different mutations on the same DA substrates. If naCH retains large values as ng turns large, there would be great amount of relatively ‘big’ molecules accumulated in library. Otherwise, if naCH exhibits rapid drop as ng increases, there would be great amount of relatively ‘small’ molecules accumulated in library. Generally, from a point of view of synthetic chemistry, the ‘small’ molecules is more favorable than the ‘big’ ones.
To sum up, analysis of the naCH of different mutations tells us that the uniform drop in naCH with increase of ng is not a sufficient condition to guarantee a meaningful increase in ωMA, rather it can be used a an indicator to differentiate different mutations on skeleton transformation effect.
The evolution of number of accumulated optimal molecules (nacc_opt_mols) with increase of mutation generation (ng) for different mutations has been listed in Table 3. The nacc_opt_mols for any of mutations can exhibit a sharp increase in low to middle ng values (0 < ng ≤ 5), follow by slower growth for middle to high ng (6 ≤ ng ≤ 9), and may finally trend to flatten out. The probability of finding identical molecules between adjacent libraries will increase as ng becomes large. This behavior is best demonstrated by the data of Sub3, as depicted in Fig. 5.
ng | Sub1 | Sub2 | Sub3 | Sub4 | Sub5 | Sub6 | Sub7 | Sub8 | Sub9 | Sub10 |
---|---|---|---|---|---|---|---|---|---|---|
0 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
1 | 83 | 83 | 202 | 34 | 125 | 97 | 191 | 137 | 75 | 156 |
2 | 288 | 108 | 791 | 341 | 418 | 122 | 1048 | 695 | 910 | 870 |
3 | 556 | 275 | 1429 | 807 | 1195 | 285 | 1894 | 1500 | 1679 | 1463 |
4 | 781 | 533 | 1808 | 1464 | 1607 | 576 | 2728 | 2043 | 2329 | 1947 |
5 | 1052 | 615 | 2041 | 2168 | 1841 | 1078 | 3455 | 2399 | 3143 | 2666 |
6 | 1345 | 696 | 2144 | 2667 | 2191 | 1768 | 4265 | 3104 | 4012 | 3453 |
7 | 1626 | 732 | 2182 | 3074 | 2380 | 2655 | 5083 | 3807 | 4730 | 4318 |
8 | 1896 | 741 | 2236 | 3433 | 2596 | 3423 | 5961 | 4328 | 5376 | 4645 |
9 | 2063 | 753 | 2273 | 3970 | 2730 | 4057 | 6833 | 4803 | 6172 | 4902 |
As compared with the fast mutation Sub2, the slow mutation Sub1 can give approximately 2.7 times increase in nacc_opt_mols. Thus, Sub1 is more favorable than Sub2 in producing optimal molecules. Taking ng = 9 as base, for the 4 terminal single mutations (Sub3 to Sub6), the precedence order is: Sub3 < Sub5 < Sub4 ≈ Sub6, a strong donor (or acceptor) is superior to a weak one; for the rest 4 mixed mutations (Sub7 to Sub10), the precedence order is: Sub8 < Sub10 < Sub9 < Sub7, the weak–weak pair exhibits superiority among others. The mixed mutations would produce more optimal molecules as expected since they correspond to larger chemical compound spaces, however, the price is the significant increase in molecular complexity, which might eventually prohibit them as material due to difficulty from experimental synthesis.
ng | Sub1 | Sub2 | Sub3 | Sub4 | Sub5 | Sub6 | Sub7 | Sub8 | Sub9 | Sub10 |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.669 | 0.669 | 0.669 | 0.669 | 0.669 | 0.669 | 0.669 | 0.669 | 0.669 | 0.669 |
1 | 0.292 | 0.369 | 0.246 | 0.332 | 0.280 | 0.310 | 0.262 | 0.298 | 0.253 | 0.264 |
2 | 0.218 | 0.316 | 0.140 | 0.218 | 0.275 | 0.236 | 0.065 | 0.150 | 0.096 | 0.115 |
3 | 0.088 | 0.276 | 0.098 | 0.139 | 0.086 | 0.331 | 0.046 | 0.134 | 0.103 | 0.116 |
4 | 0.187 | 0.207 | 0.095 | 0.116 | 0.106 | 0.178 | 0.051 | 0.144 | 0.121 | 0.151 |
5 | 0.104 | 0.268 | 0.078 | 0.101 | 0.086 | 0.165 | 0.082 | 0.256 | 0.098 | 0.085 |
6 | 0.120 | 0.233 | 0.075 | 0.108 | 0.085 | 0.124 | 0.058 | 0.052 | 0.057 | 0.078 |
7 | 0.110 | 0.249 | 0.076 | 0.102 | 0.084 | 0.076 | 0.050 | 0.066 | 0.108 | 0.100 |
8 | 0.114 | 0.274 | 0.074 | 0.077 | 0.089 | 0.104 | 0.042 | 0.092 | 0.090 | 0.096 |
9 | 0.155 | 0.229 | 0.076 | 0.059 | 0.085 | 0.128 | 0.043 | 0.073 | 0.063 | 0.043 |
Regardless of the types of mutations, the fast convergent trend of is rather impressive. The evolution of mean energy gaps versus mutation generation for Sub3 has been depicted in Fig. 6. The starts from a value of 0.669 eV (ng = 0), experiences a sharp drop to 0.098 eV (ng = 3), finally trends to flatten out to 0.076 eV (ng = 9). There should have a clear correlation between the evolutionary behaviors of and ωMA, since both of them are group quantities based on the ΔEST of molecules.
To give more details on the impact of the mutation along with mutation generations, the evolution of energy gaps frequency distribution versus mutation generation for Sub3 has been depicted in Fig. 7. The fast shift to low ΔEST is apparent (ng moves from 0 to 3), then the distribution retains a large proportion in the very low value range and tails in the low to medium range.
To sum up, regardless of the types of mutations, the mean energy gap can exhibit a fast convergent trend toward very low values, hence the studied mutations (except Sub2) can cooperate very well with the DA substrates to generate optimal molecules.
The evolution of skeletons (common cores) versus mutation generation for Sub3 has been depicted in Fig. 8, and that of skeletons (generic cores) has been given in Fig. S4 in the ESI.† For simplicity, at most 9 dominant high-frequency skeletons and selected mutation generations have been shown. For both types of cores, there exists explicit quantitative shrinkage of dominant high-frequency skeletons. The common cores starts from the relatively uniform distribution of frequencies of 9 cores (ng = 0), then collapses to very uneven distribution of frequencies of 5 cores (ng = 5), and finally collapses further to a distribution of frequencies of only 2 cores (ng = 9) (Fig. 8). The generic cores can exhibit more profound collapse in number of cores (Fig. S4 in the ESI†). This collapse is a sign of efficient convergence of the structures around ‘excellent’ molecules. Hence is beneficial for obtaining optimal molecules.
Fig. 8 The evolution of skeleton (core) with mutation generation for Sub3 (the numbers below structures denote the corresponding frequencies). |
Following the concept of similarity for two molecules and based on molecular fingerprint representation, we propose a numerical method to calculate group fingerprint similarity (ΔMSPR) between two compound libraries. Here, the molecular fingerprint representation method is ECFP, and the similarity is measured by the Tanimoto metric.
The calculation of the group fingerprint similarity (ΔMSPR) is based a algorithm, which we name it the Maximum Similarity Pairing Rule (MSPR) (refer to ESI†). The evolution of the number of molecules in library (ntot), the number of intersection molecules (ninter), and the group fingerprint similarity between two libraries (ΔMSPR) with increase of mutation generation (ng) for Sub1, Sub3 and Sub7 is listed in Table 5.
ng | Sub1 | Sub3 | Sub7 | ||||||
---|---|---|---|---|---|---|---|---|---|
ntot | ninter | ΔMSPR | ntot | ninter | ΔMSPR | ntot | ninter | ΔMSPR | |
a Calculated with respect to the corresponding precedent ng. | |||||||||
0 | 1000 | — | — | 1000 | — | — | 1000 | — | — |
1 | 588 | 100a | 0.660a | 1000 | 100 | 0.542 | 1000 | 98 | 0.553 |
2 | 531 | 125 | 0.738 | 960 | 111 | 0.638 | 1000 | 127 | 0.666 |
3 | 487 | 127 | 0.778 | 929 | 140 | 0.787 | 1000 | 122 | 0.737 |
4 | 585 | 179 | 0.900 | 747 | 227 | 0.893 | 1000 | 135 | 0.735 |
5 | 541 | 161 | 0.834 | 618 | 282 | 0.949 | 1000 | 171 | 0.834 |
6 | 570 | 151 | 0.872 | 608 | 354 | 0.908 | 1000 | 126 | 0.779 |
7 | 588 | 180 | 0.898 | 546 | 350 | 0.932 | 1000 | 138 | 0.910 |
8 | 574 | 173 | 0.930 | 514 | 339 | 0.980 | 1000 | 120 | 0.891 |
9 | 565 | 212 | 0.928 | 506 | 364 | 0.992 | 1000 | 117 | 0.959 |
For Sub1, the ntot keeps a size of about 500 for ng in range from 1 to 9, and the ninter exhibits a slowly increase trend in that range, hence the ΔMSPR can change from a low value of 0.660 (for ng = 1) to a high value 0.928 (for ng = 9). For Sub3, the rather high values of ΔMSPR for high mutation generations (ng ≥ 5) can be ascribed by the ordered descending of ntot and incrementing of ninter. For Sub7, both of ntot and ninter keep their sizes as ng increases. The high values of ΔMSPR for high mutation generations (ng ≥ 7) can be ascribed by the convergence in molecular skeletons, which can retain the similarity between pairs of molecules at high values of range.
To sum up, regardless of types of mutations, the group fingerprint similarity (ΔMSPR) at high mutation generations can retain high enough values (typically larger than 0.90), which can be associated with the apparent convergence in molecular skeletons at high mutation generations.
In principle, molecules with simpler and more symmetric structures are favored by the SAS sorting routine. Within the studied compound space, those compounds constructed by typical tri-cyclic donors connecting with (polyacetonitrile substituted) benzenes acceptors can possess the lowest SAS. In addition, they can exhibit low enough energy gaps. Therefore, from the point of view of synthetic chemistry, they are recommended as optimal TADF molecules (low ΔEST and SAS), although only a small energy gap might not be enough to guarantee the occurrence of TADF emission.
Notably, possessing a low enough ΔEST as a necessary condition, the real occurrence of TADF emission for a compound should at least be accompanied with an acceptable radioactive fluorescent rate. Since the number of accumulated optimal molecules for high mutation generation are typically larger than 2000, we expect the extra fulfillment of the radioactive fluorescent rate may have great chance to occur possibly by further sieving the already obtained compound library of accumulated optimal molecules.
Mutations | Freq | |
---|---|---|
SMILES of common cores | ||
Sub3 | c1ccc(N2c3ccccc3Nc3ccccc32)cc1 | 303 |
c1ccc(N2c3ccccc3Oc3ccccc32)cc1 | 203 | |
Sub4 | O=S(=O)(c1ccccc1)c1ccc(S(=O)(=O)c2ccccc2)c(N2c3ccccc3Cc3ccccc32)c1 | 640 |
O=S1(=O)c2ccccc2S(=O)(=O)c2cc(N3c4ccccc4Cc4ccccc43)ccc21 | 122 | |
O=S1(=O)c2ccccc2S(=O)(=O)c2cc(N3c4ccccc4Oc4ccccc43)ccc21 | 80 | |
c1ccc(N2c3ccccc3Oc3ccccc32)cc1 | 29 | |
Sub5 | c1ccc(N2c3ccccc3Nc3ccccc32)cc1 | 243 |
c1ccc(N2c3ccccc3Cc3c2ccc2c3c3ccccc3n2-c2ccccc2)cc1 | 233 | |
Sub6 | c1ccc(N(c2ccccc2)c2ccc3c(c2)Cc2cc(N(c4ccccc4)c4ccccc4)ccc2N3c2ccccc2)cc1 | 1000 |
Sub7 | c1ccc(N2c3ccccc3Cc3cc4c(cc32)c2ccccc2n4-c2ccccc2)cc1 | 984 |
c1ccc(N2c3ccccc3Sc3ccccc32)cc1 | 16 | |
Sub8 | c1ccc(N2c3ccccc3Nc3ccccc32)cc1 | 662 |
c1ccc(N2c3ccccc3Oc3ccccc32)cc1 | 134 | |
Sub9 | c1ccc(N2c3ccccc3Cc3cc4c(cc32)c2ccccc2n4-c2ccccc2)cc1 | 688 |
c1ccc(N2c3ccccc3Cc3ccccc32)cc1 | 176 | |
c1ccc(N2c3ccccc3Cc3cc4c(cc32)Cc2ccccc2-4)cc1 | 136 | |
Sub10 | c1ccc(N2c3ccccc3Nc3ccccc32)cc1 | 555 |
SMILES of generic cores | ||
Sub1 | C1CCC(C2C3CCCCC3CC3CC4C(CC32)C2CCCCC2C4C2CCCCC2)CC1 | 565 |
Sub2 | C1CCC(C2C3CCCCC3CC3CCCCC32)CC1 | 506 |
C1CCC(C2C3CCCCC3CC3CC4C(CC5CCCCC54)CC32)CC1 | 170 | |
Sub3 | C1CCC(C2C3CCCCC3CC3CCCCC32)CC1 | 506 |
Sub4 | CC(C)(C1CCCCC1)C1CCC(C(C)(C)C2CCCCC2)C(C2C3CCCCC3CC3CCCCC32)C1 | 640 |
CC1(C)C2CCCCC2C(C)(C)C2CC(C3C4CCCCC4CC4CCCCC43)CCC21 | 202 | |
C1CCC(C2C3CCCCC3CC3CCCCC32)CC1 | 29 | |
Sub5 | C1CCC(C2C3CCCCC3CC3CCCCC32)CC1 | 243 |
C1CCC(C2C3CCCCC3CC3C2CCC2C(C4CCCCC4)C4CCCCC4C32)CC1 | 233 | |
Sub6 | C1CCC(C(C2CCCCC2)C2CCC3C(CC4CC(C(C5CCCCC5)C5CCCCC5)CCC4C3C3CCCCC3)C2)CC1 | 1000 |
Sub7 | C1CCC(C2C3CCCCC3CC3CC4C(CC32)C2CCCCC2C4C2CCCCC2)CC1 | 984 |
C1CCC(C2C3CCCCC3CC3CCCCC32)CC1 | 16 | |
Sub8 | C1CCC(C2C3CCCCC3CC3CCCCC32)CC1 | 796 |
Sub9 | C1CCC(C2C3CCCCC3CC3CC4C(CC32)C2CCCCC2C4C2CCCCC2)CC1 | 688 |
C1CCC(C2C3CCCCC3CC3CCCCC32)CC1 | 176 | |
C1CCC(C2C3CCCCC3CC3CC4C(CC5CCCCC54)CC32)CC1 | 136 | |
Sub10 | C1CCC(C2C3CCCCC3CC3CCCCC32)CC1 | 555 |
The distribution of frequencies of different skeletons for a specific mutation is generally uneven with one dominant skeleton. The common core with SMILES = “c1ccc(N2c3ccccc3Nc3ccccc32)cc1” exists for several mutations (Sub3, Sub5, Sub8 and Sub10) with associated frequencies (303, 243, 662 and 555). The common core with SMILES = “c1ccc(N2c3ccccc3Oc3ccccc32)cc1” exists for several mutations (Sub3, Sub4 and Sub8) with associated frequencies (203, 29 and 134). The common core with SMILES = “c1ccc(N2c3ccccc3Cc3cc4c(cc32)c2ccccc2n4-c2ccccc2)cc1” exists for two mutations (Sub7 and Sub9) with associated frequencies (984 and 688). Similarly, the common core with SMILES = “OS(O)(c1ccccc1)c1ccc(S(O)(O)c2ccccc2)c(N2c3ccccc3Cc3ccccc32)c1” exists only for Sub4 with frequencies = 640. Different skeletons can exhibit distinguishable preference to associate with different mutations. In short, the mutation can select the optimal skeleton(s) out from thousands of original DA substrates to realize low energy gaps.
After removing duplicates, the structures of optimal skeletons (common cores) for mutations from Sub3 to Sub10 as ng = 9 have been depicted in Fig. 10. The related diagram for generic cores is given by Fig. S6 in ESI.† By definitions of common and generic cores, the common cores for Sub1 and Sub2 cannot collapse, however, their generic cores do belong to 1 or 2 skeletons (Table 6). It's interesting to note that the overall numbers of common and generic cores for all mutations are 11 and 7. Hence, in a sense, the ‘optimal’ skeletons seem unique and useful in realizing low energy gaps.
(1) To access red color, Sub10 (CN or NMe2) is the best mutation groups, which can produce considerable proportion red molecules when the mutation generation equals to 9.
(2) To access green color, Sub1 (N(slow)), Sub2 (N(fast)), Sub4 (CN), Sub9 (CN or OMe), and Sub10 (CN or NMe2) are favored groups.
(3) To access blue color, all mutations seem valid. Sub3 (F) and Sub5 (OMe) are the recommended groups.
* Systems: organic molecules within DA, D–π–A, D–A–D, A–D–A and D3–A frameworks; organometallic complexes with conjugate organic aromatic ligands.
* Properties: electronic and electric properties based on ground state (and possibly excited state) geometry of molecule.
* Materials: organic and organometallic TADF, nonlinear optical and two-photon absorption materials; and possibly organic conductive and photovoltaic materials.
Obviously, there are some areas for further improvement. Whether other types of mutations are possible for conjugated aromatic systems, and whether it is possible to design a valid crossover operator to combine two parent molecules to give offsprings molecules, have not been explored yet. Moreover, the program is driven by only one core property, sometimes there would be several properties to be optimized simultaneously, therefore, more design efforts should be devoted to support this kind of requirement. And, there should be more supporting on (artificial neural network based) deep learning methods to improve the models' accuracy for property prediction. Additionally, more quantum chemical packages as computing engines for electronic structure should be supported. Accompanied with the above-mentioned areas for improvement, we still hope that the designed HTVS programs could provide some valuable insights into related fields.
(1) Except the fast mutation Sub2, all of the rest mutations can effectively concentrate ‘good’ molecules in compound library, hence give large ωMA (typically >0.8) for mutation generation at high values (ng ≥ 6).
(2) Analysis of the naCH of different mutations tells us that the uniform drop in naCH with increase of ng is not a sufficient condition to guarantee a meaningful increase in ωMA, rather it can be used a an indicator to differentiate different mutations on skeleton transformation effect.
(3) The nacc_opt_mols for any of mutations can exhibit a sharp increase in low to middle ng values (0 < ng ≤ 5), follow by slower growth for middle to high ng (6 ≤ ng ≤ 9), and may finally trend to flatten out. Sub1 is more favorable than Sub2 in producing optimal molecules. For the 4 terminal single mutations, the precedence order is: Sub3 < Sub5 < Sub4 ≈ Sub6; for the rest 4 mixed mutations, the precedence order is: Sub8 < Sub10 < Sub9 < Sub7. The mixed mutations would produce more optimal molecules as expected in price of significant increase in molecular complexity, which might eventually prohibit them as material due to difficulty from experimental synthesis.
(4) The can exhibit a fast convergent trend toward very low values, hence the studied mutations (except Sub2) can cooperate very well with the DA substrates to generate optimal molecules.
(5) A group fingerprint similarity (ΔMSPR) index was proposed to account for the similarity between two compound libraries with comparable sizes. The ΔMSPR can retain high enough values (typically larger than 0.90) for large ng, which can be associated with the apparent convergence in molecular skeletons at high mutation generations.
(6) The distribution of frequencies of different skeletons for a specific mutation is generally uneven with one dominant skeleton. The overall numbers of common and generic cores for all mutations are 11 and 7 as ng = 9. Hence, in a sense, the ‘optimal’ skeletons seem unique and useful in realizing low energy gaps.
With above observations and the development of HTVS software, we expect to provide insight and tool to the research community of HTVS of molecular (TADF) materials.
Footnote |
† Electronic supplementary information (ESI) available: Structure of donors and acceptors; details for machine learning; definition of group molecular similarity; etc. See DOI: https://doi.org/10.1039/d2ra05643g |
This journal is © The Royal Society of Chemistry 2022 |