Platon P.
Chebotaev
a,
Andrey A.
Buglak
*ab,
Aimee
Sheehan
c and
Mikhail A.
Filatov
*c
aFaculty of Physics, Saint-Petersburg State University, Universiteteskaya Emb. 7-9, 199034 St. Petersburg, Russia. E-mail: andreybuglak@gmail.com
bInstitute of Physics, Kazan Federal University, 18 Kremlyovskaya street, 420008, Kazan, Russia
cSchool of Chemical and Biopharmaceutical Sciences, Technological University Dublin, City Campus, Grangegorman, Dublin 7, Ireland. E-mail: mikhail.filatov@tudublin.ie
First published on 18th September 2024
Functional dyes that are capable of both bright fluorescence and efficient singlet oxygen generation are crucial for theranostic techniques, which integrate fluorescence imaging and photodynamic therapy (PDT). The development of new functional dyes for theranostics is often costly and time-consuming due to laborious synthesis and post-synthetic screening of large libraries of compounds. In this work, we describe machine learning methods suitable for simultaneous prediction of fluorescence and photosensitizing ability of heavy-atom-free boron dipyrromethene (BODIPY) compounds. We analysed the ratio between fluorescence quantum yield (ΦFl) and singlet oxygen quantum yield (ΦΔ) for over 70 BODIPY structures in polar (acetonitrile) and non-polar (toluene) solvents, which mimic hydrophilic and hydrophobic cell environments, respectively. QSPR models were developed based on more than 5000 calculated molecular descriptors, including quantum chemical and topological descriptors. We applied multiple linear regression (MLR), support vector regression (SVR), and random forest regression (RFR) methods for model building and optimization. The resulting models demonstrated robust statistical parameters (R2 = 0.73–0.91) for both polar and non-polar media. The relative contributions of the descriptors to the models were assessed, identifying Eig03_EA(dm), F01[C–N], and TDB06p as the most influential. These results demonstrate that QSPR machine learning methods are effective in predicting key photochemical parameters of BODIPY photosensitizers, thereby potentially streamlining the development of theranostic agents.
Dyes with switchable ISC hold immense practical potential as they can serve both as probes for fluorescence-based imaging and initiate photochemical transformations. There is currently increased interest in dyes which exhibit bright fluorescence emission and, at the same time, efficiently generate singlet oxygen (1O2) – particularly in the area of theranostics. This is a treatment strategy where therapy and diagnostics are combined through the use of a single compound – for example, a dye fluoresces in the presence of malignant cells, and can then be light-activated to produce cytotoxic 1O2 and induce apoptosis.6 Typical organic dyes are usually only capable of one such function, as the higher the fluorescence quantum yield of a dye, the lower its photosensitizing efficiency is, and vice versa. Designing a compound having a good balance of both efficient fluorescence and reactive oxygen species (ROS) generation is challenging.
Approaches for predicting and controlling the ISC process in organic dyes are in high demand. One common approach used for switching between triplet and singlet excited states is based on reducing singlet to triplet energy gap (ΔES–T) which makes the reverse intersystem crossing (RISC) process feasible.7 This process often used for enhancement of the internal quantum efficiency of organic light-emitting diodes (OLEDs).8 However, the insufficient versatility of this approach is a major limitation for its use in modulation of fluorescence and triplet state yields in organic dyes. Another common approach for the enhancement of ISC efficiency relies on the introduction of heavy atoms into the structure, such as halogens or transition metals, which promote ISC via spin-orbital interactions.9 A representative example is shown in Fig. 1a, a halogen-substituted boron-dipyrromethenes (BODIPY) dye 1 possessing a triplet excited state yield (ΦT) of >80%, which accounts for its uses as a photosensitizer (PS).10 Alternatively, heavy-atom-free compound 2 exhibits a high fluorescence quantum yield, while the triplet state yield is very low due to weak spin–orbit coupling.
Fig. 1 (a) Fluorescence and triplet state yields of BODIPY 1-3. (b) Simplified Jablonski diagram illustrating photophysical process in compound 3 in polar and non-polar solvents.12a |
In recent years, the formation of triplet excited states in electron donor–acceptor dyads via the process of spin–orbit charge transfer intersystem crossing (SOCT-ISC) has attracted particular attention. In these systems, photoinduced electron transfer between the donor and acceptor subunits leads to formation of a charge-transfer state (1CT), which can further undergo charge recombination (CR) into the lowest triplet excited state (T1, Fig. 1b).11
SOCT-ISC has been observed in various BODIPY donor–acceptor dyads12 and dimers,13 with many reported molecular systems exhibiting singlet oxygen quantum yields (ΦΔ) comparable to or even higher than those of transition metal complexes and halogenated dyes.14 These dyes also possess additional advantages, including synthetic accessibility, high phototoxicity in cells with negligible dark toxicity (i.e. in the absence of UV-Vis light irradiation), long triplet excited state lifetimes and intense absorption in the 400–500 nm region. Unlike conventional dyes, in which fluorescence and ISC rates are predetermined by chemical structure of the molecule, SOCT-ISC dyes can exhibit either or both functions, depending on characteristics of the environment. Modulating the media polarity allows for ISC switching in these molecules. For instance, the dye can function as an efficient photosensitizer in polar media due to an efficient charge transfer process leading to high ΦT values, as illustrated in Fig. 1b. Conversely, in non-polar media, the same dye behaves as a fluorophore because, under these conditions, the energy of the charge transfer state is higher than that of S1, rendering SOCT-ISC inefficient. Ultimately, fluorescence and singlet oxygen quantum yields of such dyes can be programmed for specific environments, depending on the target application. Such dual performance, i.e., the combination of fluorescence and photosensitization abilities in a single molecule, offers access to a new generation of triplet–triplet annihilation upconversion (TTA-UC) systems15 and holds promise for applications in bioimaging,16 PDT17 and photocatalysis.18
There is potential for the application of quantitative structure–property relationships (QSPR) modelling in pre-synthetic screening of dyes and predicting fluorescence and singlet oxygen quantum yields. While this approach is commonly employed in medicinal chemistry, its utilization in photochemistry remains limited. QSAR/QSPR analysis19 and deep neural network modelling20 has been applied in the studies of photophysics and photodynamic activity of BODIPYs.21 However, applying QSAR modelling for predicting ISC in BODIPYs is still challenging. Recently, we introduced the first QSPR computational study for systems undergoing SOCT-ISC, presenting a method for predicting singlet oxygen generation quantum yields for various BODIPY structures in different media: non-polar, moderately polar, and highly polar.22 Our developed QSPR models integrate quantum mechanical molecular descriptors (frontier molecular orbital energies, HOMO–LUMO gap, excited states energies), and topological descriptors related to 3D-molecular geometry, allowing for rapid and accurate prediction of quantum yields and enabling virtual screening of photosensitizers, thus expediting their development.
Here, we aimed to explore the feasibility of using QSPR for simultaneous prediction of both fluorescence and singlet oxygen generation quantum yields for heavy-atom-free BODIPYs. Such predictions would streamline the screening process for molecules with dual functionality and facilitate the identification of dyes for theranostics applications. To achieve this, we systematically investigated BODIPYs comprising electron-deficient and electron-rich aromatic subunits capable of charge transfer and SOCT-ISC processes. In this study, we: (1) identify the most significant descriptors for predicting quantum yields; (2) develop QSPR models capable of predicting the ΦFl/ΦΔ ratio; and (3) assess the accuracy of the obtained QSPR models.
As a result of the MLR algorithm, the following equation was developed:
y = b0 + b1x1 +⋯+ bnxn, |
SVR and RFR models were obtained in Scikit-Learn with gridsearch method and 5-fold cross-validation which was successfully applied in previous QSAR/QSPR studies.29 The development of the SVR model was carried out by varying three parameters: C, epsilon and kernel (linear, polynomial, sigmoid or radial basis function). The search for RFR was carried out by changing the values of five parameters: the number of estimators (“trees”), the maximum depth, min_samples_split, min_samples_leaf and max_features. The remaining parameters were used by default.
Standard scaling of descriptor values was performed for the SVR and MLR models. The idea of standard scaling is that the values of each descriptor in a dataset have zero mean and unit variance according to the expression:
In the RFR method, the importance of descriptors was estimated using the built-in function of the Scikit-Learn, since there are no coefficients in this method, and the importance is determined based on the change in entropy when dividing the sample by each feature. Relative descriptor contribution to the models is provided along with ALE (Fig. S1–S6, ESI†) and SHAP (Fig. S7–S12, ESI†) analysis in the ESI.†
R
train
2 and Rtest2 show how well the model has trained and tested, respectively. It is calculated using the following formula:
Q
2 is a measure of the internal stability of a model: when a compound is excluded from the training set, the performance of the model should not struggle significantly, in particular, R2 (q2) does not fall below a value of 0.5. To calculate this parameter, each molecule in the training set was excluded once and of the excluded molecule was predicted by using the model developed by the remaining compounds. It is calculated according to the following formula:
Fig. 3 General structures of BODIPY dimers, donor–acceptor dyads and reference compounds investigated in this work. |
Machine learning models were obtained to predict the ratio of the logarithms of the fluorescence quantum yields and singlet oxygen generation quantum yields for both solvents, toluene and acetonitrile.
Parameter | MLR | SVR | RFR |
---|---|---|---|
R test 2 | 0.777 | 0.811 | 0.912 |
RMSEtest | 0.338 | 0.310 | 0.213 |
The QSPR model is considered effective if the following conditions are met: Rtrain2 > 0.6, Rtest2 > 0.5.30 The results presented in Table 1 and Table S2 (ESI†) show that all three models meet these criteria. The MLR model has the largest Rtrain2 and the smallest RMSEtrain, but among the models obtained it has the smallest Rtest2 and the largest RMSEtest. The SVR model is balanced, since all parameters are average in comparison with other models. The RFR model has the highest value of the most important statistical parameter Rtest2. The MLR, RFR and SVR models possess q2 > 0.5, which indicates that the models have a good ability to explain the variation of the dependent variable based on the molecular descriptors and has a satisfactory generalizability. The SVR model is the most internally stable one (q2 = 0.793). The RFR model has a q2 equal to 0.556. This may indicate that the model is the least internally stable. The Radjusted2 is > 0.6 for all the models, which means that the QSPR approach is effective for the chosen molecules and descriptors.
Descriptor | Coefficients | Relative contribution, % |
---|---|---|
Intercept | 0.72231073 | — |
VE1_RG | 0.20807172 | 11.7 |
G3u | −0.16770193 | −10.1 |
MATS8i | 0.10067028 | 5.7 |
TDB06p | 0.43276623 | 22.1 |
Electronegativity | 0.17996268 | 10.3 |
Eig03_EA(dm) | 0.54257344 | 27.9 |
Mor23i | 0.2239952 | 12.2 |
Compound | Eig03_EA (dm) | TDB06p | Mor23i | |
---|---|---|---|---|
BDP-1 | 0 | 2.573 | −0.633 | 0.108 |
BDP-2test | 0 | 2.163 | −0.235 | 0.228 |
BDP-21 | 0 | 2.952 | −1.21 | 0.111 |
BDP-26 | 0 | 2.892 | −1.093 | 0.153 |
BDP-27 | 0 | 3.239 | −1.529 | 0.374 |
BDP-28 | 0 | 3.326 | −2.249 | 1.618 |
BDP-29 | 0 | 3.221 | −1.91 | 0.121 |
BDP-30 | 0 | 3.104 | −1.596 | 1.460 |
BDP-31test | 0 | 3.085 | −1.635 | 0.049 |
BDP-32 | 0 | 3.484 | −1.593 | 2.218 |
BDP-33 | 0 | 3.196 | −1.712 | 0.046 |
BDP-37 | 0.6 | 3.283 | −2.457 | 1.060 |
BDP-39test | 0.6 | 3.246 | −3.606 | 0.083 |
BDP-40 | 0.86 | 3.301 | −3.625 | 0.308 |
BDP-42test | 0.8 | 2.933 | −2.467 | 0.240 |
BDP-43 | 0.6 | 2.864 | −0.856 | 0.082 |
BDP-44 | 0.8 | 2.752 | −0.412 | 0.705 |
BDP-45 | 0.6 | 3.105 | −1.604 | 0.437 |
BDP-46 | 0.8 | 2.872 | −0.553 | 0.076 |
BDP-47test | 0 | 3.098 | −2.21 | 0.135 |
BDP-48 | 0 | 2.867 | −1.888 | 0.472 |
BDP-49 | 0 | 3.142 | −2.75 | 0.095 |
BDP-50 | 0 | 3.013 | −2.614 | 0.393 |
BDP-51 | 0 | 3.100 | −4.862 | 0.767 |
BDP-52 | 0.8 | 3.047 | −4.187 | 0.389 |
BDP-53test | 0.986 | 3.227 | −4.31 | 0.569 |
BDP-54 | 0 | 3.090 | −2.429 | 1.054 |
BDP-55 | 0 | 3.226 | −3.529 | 0.280 |
BDP-56 | 0.8 | 3.144 | −5.38 | 0.585 |
BDP-57test | 0 | 2.683 | −1.321 | 0.317 |
BDP-58 | 0 | 3.076 | −1.583 | 2.113 |
BDP-59 | 0 | 2.952 | −2.366 | 0.027 |
BDP-60 | 0 | 2.541 | −1.852 | 0.635 |
BDP-61 | 0 | 2.643 | −0.425 | 0.980 |
BDP-63 | 0 | 3.000 | −0.392 | 2.000 |
BDP-64 | 0 | 2.833 | −1.63 | 2.343 |
BDP-65 | 0 | 3.473 | −2.428 | 0.286 |
BDP-66test | 0 | 3.529 | −2.237 | 0.092 |
BDP-67 | 0 | 3.677 | −3.593 | 0.538 |
BDP-68 | 0 | 3.657 | −3.708 | 0.502 |
BDP-69 | 0 | 3.445 | −3.218 | 2.308 |
BDP-70test | 0 | 3.306 | −3.753 | 1.905 |
BDP-71 | 0 | 3.182 | −1.102 | 0.818 |
BDP-72 | 0 | 3.259 | −0.933 | 1.806 |
BDP-73 | 0 | 3.863 | −2.782 | 2.463 |
TDB06p is a Dragon 7 descriptor, which is a 3D Topological distance-based descriptor – lag 6 weighted by polarizability. It belongs to a class of descriptors based on topological distance in 3D space. This descriptor takes into account the distance between atoms or fragments of a molecule in 3D space. Lag 6 means that the distance between atoms or fragments of a molecule is at least six interatomic bonds. Polarizability determines the weight that is assigned to each distance. A weight based on polarizability considers the ability of atoms or fragments of a molecule to change their electronic structure when exposed to an electric field. Thus, this descriptor factors in not only the geometric properties of the molecule, but also its chemical properties related to polarizability. This descriptor has the second most significant relative contribution to the model (22.1%). BDP-73 molecule has the highest TDB06p value of 3.863, whereas BDP-2 has the lowest value equal to 2.163. In simple terms, BODIPY molecules with bulky substituents have a tendency to possess high TDB06p values (at a topological distance lag 6), whereas small molecules like BDP-1 and BDP-2 have a low descriptor value.
Experimental vs. predicted values for the studied BODIPYs in toluene are presented in Fig. 4. The Mor23i is a signal value weighted by ionization potential. Signal 23 is a molecular descriptor that is associated with the electron density distribution in the molecule. It takes into account the influence of the electronic structure of a molecule on its chemical properties. Ionization potential is the minimum amount of energy required to remove an electron from a molecule. The weighting of the Mor23i descriptor by ionization potential means that the value of that descriptor has been modified to take ionization potential into account. Thus, the descriptor depends on both the electronic structure of the molecule and its chemical properties related to ionization potential. BDP-2 molecule has the highest value of this descriptor equal to −0.235, whereas BDP-56 has the lowest Mor23i value equal to −5.38.
Descriptors | Relative contribution, % |
---|---|
Eig03_EA(dm) | 28.0 |
TDB06p | 21.6 |
VE1_RG | 14.5 |
Electronegativity | 12.7 |
Mor23i | 10.1 |
G3u | 7.2 |
MATS8i | 5.9 |
VE1_RG descriptor is the sum of the last eigenvector coefficients (absolute values) from the inverse square geometric matrix. An eigenvector is a vector that does not change its direction when the matrix is transformed. The inverse square geometric matrix is a matrix that describes the geometry of a molecule. It is obtained by inverting the interatomic distances collected in a geometric matrix. The importance of this descriptor is 14.5% (Table 4). The largest value of VE1_RG is possessed by compound BDP-73 (BDP-73 also has the highest value of the dependent variable y). Compound BDP-63 has the lowest descriptor value equal to 2.512. The VE1_RG descriptor is related to the eigenvalues of the Randic matrix of a BODIPY, whereas the first eigenvalue of the Randic matrix is related to the stability and reactivity of the molecule. It can provide comprehension of the electronic distribution and the potential energy surface of the BODIPY molecule.
Fig. 5 shows a comparison of the experimental and SVR predicted values for compounds in toluene. The close correspondence of the dots to the trend is confirmed by statistical parameters.
The highest relative contribution in model 3 is observed for the ATSC4e descriptor: 34.4% (Table 5). This descriptor is used to describe the structure of a molecule and its chemical properties. It is based on the Broto–Moreau autocorrelation method, which calculates the correlation between atoms or fragments of a molecule at a certain distance: lag 4 is the distance between atoms or fragments equal to four. Thus, centered autocorrelation considers both positive and negative correlations between atoms or fragments of a molecule. ATSC4e values are Sanderson electronegativity-weighted, which means that the autocorrelation value is multiplied by the Sanderson electronegativity for each atom or fragment. Sanderson electronegativity is a measure of the ability of an atom to attract electrons in a molecule. Thus, ATSC4e takes into account both the structure of the molecule and its chemical properties related to the electronegativity of the atoms and fragments. BDP-45 molecule has the highest ATSC4e value of 0.784. In general, the presence of electron-withdrawing groups decreases the ATSC4e value, whereas electron-donating alkyl and alkoxy groups increase the ATSC4e value. BDP-66 has the lowest ATSC4e value of 0.146.
Descriptors | Relative contribution, % |
---|---|
ATSC4e | 34.4 |
Eig03_EA(dm) | 30.6 |
TDB01e | 20.9 |
IVDE | 10.5 |
F10[C–N] | 2.4 |
CATS2D_04_PL | 1.2 |
In model 3, the second most significant descriptor is Eig03_EA(dm) with a relative contribution of 30.6%. This descriptor also had a high contribution in the MLR and SVR models.
Another significant descriptor is TDB01e, with a contribution of 20.9%. TDB01e (3D topological distance-based descriptor – lag 1, weighted by Sanderson electronegativity) is based on the 3D topological distance method, which calculates the distance between atoms and molecular fragments in 3D space. In this case, lag 1 means that the distance between atoms or fragments of a molecule is equal to one (only the nearest neighbors of atoms or fragments are considered). Also, the TDB01e descriptor accounts for the Sanderson electronegativity for each atom or fragment, which means that the topological distance value is multiplied by the Sanderson electronegativity for each atom. Thus, TDB01e factors in both the structure of the molecule and its chemical properties related to the electronegativity. BDP-1 has the highest TDB01e value of 1.352, whereas BDP-59 has a TDB01e value of 1.281, which is the smallest one.
Fig. 6 demonstrates that the results obtained experimentally for BODIPYs in toluene are consistent with the predictions made using the RFR method. High statistical parameters are observed for the test set as well.
Therefore, different ML models in toluene use similar descriptors. For example, in model 1 (MLR, toluene) major descriptors are TDB06p and Eig03_EA(dm). Model 2 (SVR, toluene) utilizes Eig03_EA(dm) and TDB06p, whereas model 3 (RFR, toluene) uses Eig03_EA(dm) and TDB01e descriptors. All three models involve the Eig03_EA(dm) descriptor, which, in general, depicts molecular symmetry. Apparently, the Eig03_EA(dm) descriptor allows to separate highly symmetrical molecules (a feature not favorable for high ) from more asymmetrical ones, which are favorable to have low SOCT-ISC and high . Moreover, models 1–3 all contain a TDB-type descriptor. TDB descriptors allow to distinguish molecules with polarized/electronegative substituents specific for high rate of SOCT-ISC from BODIPYs with substituents favorable for high . Global electronegativity descriptor is also possessed by two out of three toluene models. The dependence of singlet oxygen generation quantum yield on electronegativity is in line with previous works by us24 and others.31
Parameter | MLR | SVR | RFR |
---|---|---|---|
R test 2 | 0.739 | 0.880 | 0.870 |
RMSEtest | 0.427 | 0.295 | 0.301 |
As shown in Table 6, all three models are statistically sufficient, i.e. the statistical metrics have satisfactory values (Rtrain2 > 0.6, q2 > 0.5, Rtest2 > 0.5), indicating that the models possess good predictive ability. The RFR model has the highest Rtrain2 and the lowest RMSEtrian, but moderate Rtest2 and RMSEtest values. The MLR model is balanced in terms of training indicators, but the worst in terms of test indicators. The SVR model has the highest value of the most important statistical parameter Rtest2 and the lowest RMSEtest. The MLR model has a q2 > 0.5, which indicates that the model is the most internally stable one for acetonitrile. The RFR and SVR model have a q2 equal to 0.528 and 0.483. respectively. This may indicate that the SVR model has an average ability to predict the data, as more than 50% of the variability in the data remains unexplained. However, the study used a small number of molecules, for which q2 is not as informative as R2. The Radjusted2 is also > 0.6 for all the models, which means that QSPR methodology works fine for the regarded dataset.
Descriptor | Coefficient | Relative contribution, % |
---|---|---|
Intercept | 1.34016439 | — |
CATS2D_06_PL | 0.74562573 | 15.3 |
VE1sign_G/D | 1.14360872 | 23.9 |
R3p+ | 0.17823064 | 3.4 |
F01[C–N] | 0.91168208 | 19.4 |
F04[C–N] | −0.49813696 | −10.3 |
VE2sign_G/D | −0.99441918 | −21.1 |
B06[N–O] | 0.32897972 | 6.7 |
VE2sign_G/D is the average coefficient of the last eigenvector from the distance matrix. The eigenvector, especially the last one in descending order of eigenvalues, often reflects the least significant structural changes in the molecule. Thus, the average coefficient of this vector can provide information about subtle but important aspects of the structure that may be related to its chemical and physical properties. The descriptor makes the second largest contribution to the model (−21.1%). A negative contribution value indicates that the descriptor is inversely correlated to the target variable y. The BDP-15 molecule has the highest value of this descriptor, equal to 0.00618. BDP-64 has the lowest VE2sign_G/D value equal to 0. VE2sign_G/D descriptor is related to the second eigenvalue of the Laplacian matrix for the BODIPY, i.e. stability and reactivity of the molecule. VE2sign_G/D descriptor provides a measure of the BODIPY topology and connectivity, weighted by the degrees of the atoms.
F01[C–N] (relative contribution equals 19.4%) shows how often bonds between carbon and nitrogen atoms occur in a molecule at topological distance equal 1. Topological distance 1 means that the carbon and nitrogen atoms directly interact with each other. Among the considered molecules, this descriptor takes discrete values: 4, 5, 7, 8 and 9. For compound BDP-63, this descriptor takes the highest value equal 9. For more than half of the molecules, the descriptor value equals 4.
Fig. 7 demonstrates that the experimental data for BODIPY compounds studied in acetonitrile correlates well with the predictions made by model 4. Thus, model 4 is suitable for predicting the ratio of the logarithms.
The most significant descriptor in the SVR model is F06[N–B], its relative contribution is 20.0% (Table 8). It stands for the number of times nitrogen and boron atoms are within a topological distance of six bonds from each other in a molecule. Topological distance is measured by the minimum number of bonds that must be traversed to get from one atom to another. Only four molecules have values of this descriptor that are not equal to 0. The highest values of the descriptor are for compounds BDP-61 and BDP-63: they are equal to 3. These compounds have the highest value of the target variable y.
Descriptors | Relative contribution, % |
---|---|
F06[N–B] | 20.0 |
F01[C–N] | 18.3 |
B05[O–O] | 15.3 |
LDI | 14.2 |
F04[C–B] | 12.6 |
LLS_02 | 10.5 |
E LUMO (eV) | 9.1 |
The second most important descriptor is f01[C–N], its relative contribution to the model is 18.3%. This descriptor was already used in the MLR model 4 for acetonitrile.
In model 5, the third most important descriptor is B05[O–O], indicating the presence/absence of oxygen atoms at a topological distance of five bonds in the molecule. This means that if a path of five bonds can be developed between two O atoms, then this descriptor will take that arrangement into account. In cheminformatics, descriptors of this type are used to analyze the structural features of molecules and can help predict their chemical and physical properties, as well as biological activity. For example, certain distances between oxygen atoms can affect a molecule's ability to form hydrogen bonds or its reactivity in chemical reactions. Among the studied molecules, the descriptor value is not equal to 0 for only two compounds: BDP-12 and BDP-13, in which it is equal to 1.
Fig. 8 illustrates sufficient agreement between the experimental results and the predictions made by the SVR method. The deviation of data points from the trend line is greater than for the SVR model in toluene (model 2), but model 5 can still be used to make predictions.
The largest relative contribution with a value of 39.6% of the RFR model is caused by the H2u descriptor (Table 9). The molecular descriptor H2u, or unweighted H autocorrelation for lag 2, is a statistical measure that evaluates the relationship between the atomic property values of hydrogen in a molecule separated by two chemical bonds. “Lag 2” means that the relationship between hydrogen atoms that are separated by two bonds is being considered. The “unweighted” part of the description indicates that when calculating autocorrelation, no weights are used for atoms or bonds, that is, they are all considered equally important. Compounds BDP-67 and BDP-63 have the highest value of the descriptor: 4.173 and 4.118, respectively. BDP-2 has the lowest descriptor value: 2.052. Thus, H2u descriptor allows to distinguish topologically simple BODIPY molecules from topologically complex structures with multiple H–X–H groups (Table S8, ESI†).
Descriptors | Relative contribuion, % |
---|---|
H2u | 39.6 |
P_VSA_logP_5 | 19.9 |
F01[C–N] | 19.8 |
Polar area(75) (Å2) | 10.2 |
GATS7p | 6.7 |
X1Av | 3.9 |
The second most important descriptor is P_VSA_logP_5 (relative contribution equals 19.9%). It is related to the van der Waals surface area (VSA) descriptors associated with the logarithm of the partition coefficient (logP). logP is one of the most popular descriptors and is a measure of the hydrophobicity of a molecule: it is the logarithm of the ratio of the concentrations of a compound in two phases: octanol and water. P_VSA_logP_5 denotes the fifth interval of van der Waals surface values that correlates with logP. This can be used to evaluate how a van der Waals surface of a molecule affects its hydrophobic properties. BDP-13 and BDP-14 have the highest value (61.470) of this descriptor. BDP-2 has the lowest descriptor value which is 6.371.
The third most significant descriptor is F01[C–N] with a relative contribution of 19.8%. In models 4 and 5, F01[C–N] had a similar amount of relative contribution. It can be considered robust as its importance is confirmed in various modeling techniques.
Fig. 9 shows the experimental and predicted y values obtained using the RFR method for compounds in acetonitrile. The observed trend indicates the ability of the model to predict the target parameter.
Thus, in acetonitrile, the major contributor of models 4 and 5 is a F01[C–N] 2D atom pair, which is one of the major descriptors in model 6 as well. This shows that optical properties of the BODIPY can be modelled by using similar approach as for pharmacophores modeling. Model 6 is dominated by H2u descriptor (39.6% of relative contribution).
Finally, a comparison of toluene and acetonitrile models show that QSPR models for toluene possess mainly 3D topological descriptors (Eig03_EA(dm), TDB06p, etc.) reflecting molecular symmetry and taking into account the presence of heteroatoms (through local dipole moments and electronegativities) in the side substituents. For comparison, acetonitrile models utilize mostly 2D atom pairs frequency: first, F01[C–N] descriptor, but also F04[C–N], F06[N–B], etc. 2D atom pairs are usually exploited in pharmacophore modeling, however, in our case 2D representation of BODIPYs was also beneficial. In particular, the frequency of C–N atom pairs at a distance of a single bond was representative for studying BODIPY photochemistry. In simple terms, the presence of multiple nitrogen atoms in the side substituents is favorable for low SOCT-ISC and high .
In this study, we analysed the relationship between molecular descriptors and the ratio of fluorescence and singlet oxygen generation quantum yields (ΦFl/ΦΔ) for a series of BODIPY compounds using QSPR. Three machine learning methods—support vector regression (SVR), multiple linear regression (MLR), and random forest regression (RFR)—were employed to model two groups of compounds studied in toluene and acetonitrile, respectively.
The analysis revealed the significance of various descriptors, with those related to the electronic structure, polarizability, ionization potential, and topological features playing crucial roles. Notably, descriptors related to 2D atom pairs (the shortest path between two atoms in the molecule, measured by the number of bonds), particularly the arrangement of carbon and nitrogen atoms, emerged as highly influential for compounds in acetonitrile. High statistical parameters of the models demonstrated their accuracy in predicting the ΦFl/ΦΔ ratio, with the RFR model performing best for compounds in toluene and the SVR model for compounds in acetonitrile.
Our findings demonstrate the applicability of the QSPR methodology for studying the ΦFl/ΦΔ ratio, providing a valuable tool for pre-synthetic screening of promising structures. These predictive models offer a simple and effective means to expedite the search for novel functional dyes, replacing the need for random synthesis of new molecular libraries. Furthermore, they can guide the synthesis of dyes with a desired ΦFl/ΦΔ ratio in specific environments, such as solvents of varying polarity, potentially accelerating the search for new theranostic drugs.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4cp02471k |
This journal is © the Owner Societies 2024 |