Andrew E.
Sifain‡
ab,
Levi
Lystrom‡
ab,
Richard A.
Messerly
a,
Justin S.
Smith
ab,
Benjamin
Nebgen
ac,
Kipton
Barros
ab,
Sergei
Tretiak
abc,
Nicholas
Lubbers
*d and
Brendan J.
Gifford
*abc
aTheoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA 87545. E-mail: giff@lanl.gov
bCenter for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA 87545
cCenter for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, NM, USA 87545
dComputer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA 87545. E-mail: nlubbers@lanl.gov
First published on 29th June 2021
Phosphorescence is commonly utilized for applications including light-emitting diodes and photovoltaics. Machine learning (ML) approaches trained on ab initio datasets of singlet–triplet energy gaps may expedite the discovery of phosphorescent compounds with the desired emission energies. However, we show that standard ML approaches for modeling potential energy surfaces inaccurately predict singlet–triplet energy gaps due to the failure to account for spatial localities of spin transitions. To solve this, we introduce localization layers in a neural network model that weight atomic contributions to the energy gap, thereby allowing the model to isolate the most determinative chemical environments. Trained on the singlet–triplet energy gaps of organic molecules, we apply our method to an out-of-sample test set of large phosphorescent compounds and demonstrate the substantial improvement that localization layers have on predicting their phosphorescence energies. Remarkably, the inferred localization weights have a strong relationship with the ab initio spin density of the singlet–triplet transition, and thus infer localities of the molecule that determine the spin transition, despite the fact that no direct electronic information was provided during training. The use of localization layers is expected to improve the modeling of many localized, non-extensive phenomena and could be implemented in any atom-centered neural network model.
ML research has led to predictive models of various molecular properties,17–19 among which are general interatomic potentials of ground state energies.20–23 Such models can be trained to large datasets (e.g., ∼105–106 molecules) containing both equilibrium and off-equilibrium structures. This approach enables the construction of machine learned potential energy surfaces (PESs)24,25 for purposes such as dynamics26 and geometry optimization.27,28 Data curation and model training are computationally demanding tasks,29,30 but once trained, such models not only generate high-fidelity ab initio-quality predictions at low computational costs but also infer relationships within high-dimensional data31 and are transferable to molecules outside of the training dataset.24,31,32 ML for ground state chemistry has been explored at an appreciable depth, achieving models of high-accuracy theories33–35 and extremely large systems.36 However, its use for excited state processes37–39 such as phosphorescence is a new area of research.
The physics involved in describing an excited state transition is fundamentally different however from that of a ground state. For ground states, ML potentials incorporate the extensivity principle which defines the total energy of a molecule as a sum over individual atomic contributions.40 This assumption is physically motivated and has been tested over a large chemical space, and provides the means for attaining transferable and extensible models.24,31,41 However, it breaks down in the case of electronic excitation energies, for which atoms may contribute to the transition energy in a disproportionate way. This disproportionality is evidenced by localized versus delocalized electronic transitions that are prevalent in chemical physics.42–45 In turn, the energy of an excitation does not scale strictly with the size of the molecule. Rather than using extensive predictions, it is reasonable to seek a method that can associate the energy of an excitation with specific regions of the molecule. Such an approach would have wide-ranging applicability since the notion of localization generalizes to many molecular phenomena and systems such as charged species and radicals, and can be potentially applied to a broad class of problems centered around energy and charge carrier transport in molecular materials and solids.
In this work, we develop a model based on the Hierarchical Interacting Particle Neural Network (HIPNN)46 that accurately predicts the singlet–triplet energy gaps in a diverse chemical space of organic compounds at computational speeds that reach over one million times faster than the underlying ab initio method for single-point calculations. HIPNN has previously shown excellent performance for ground state energies,46 partial atomic charges,47 and molecular dipole moments.48 We present new atomistic localization layers that compute a weight for each atom corresponding to the atomic contribution to the molecule's energy gap. The mathematics of these new layers can be likened to the statistical mechanics of a chemical potential, or to the technology of attention mechanisms in the neural network literature.49,50 The localization layers improve phosphorescence energy predictions on an out-of-sample test set of molecules that are larger than those found in the training set and that are known to be phosphorescent from experiment while qualitatively inferring the changes to the electron density due to the transition as shown by direct comparison to reference quantum mechanical spin density calculations. This result is remarkable given that DFT spin density was not provided as a target during training and highlights the added benefit that electronic information could have toward ML models of localized, non-extensive molecular phenomena.
(1) |
(2) |
Note that the εi do not represent a direct physical quantity, but rather a factorization or ansatz form of the total energy E, representing the notion that the potential may be taken as a sum of contributions over latent variables εi in the model, and it is a useful ansatz for an extensive total energy. For a single conformation of a system, many values of the εi are compatible with the total energy. However, the machine learning algorithm learns a model of εi so that E will fit a large database of calculations simultaneously. The high-dimensional feature vector zi,a contains information collected from the local environment of that atom, and is also learned, but is less readily interpretable than εi.
This model can be applied in either the singlet ground state (S0) or first triplet excited state (T1). In addition to being trained to energy, our models are also trained to atomic forces. The inclusion of atomic forces as targets has been shown to improve energy predictions.51,52 Throughout the paper, we refer to the extensive HIPNN model that solely employs eqn (1) to compute energy as HIPNN (Scheme 1a, left).
In HIPNN, singlet (ES) and triplet (ET) energies are determined from independent Self-Consistent Field (SCF) calculations, and both ES and ET are learned independently using separate linear prediction with the form of eqn (1). Their difference,
ΔE = ET − ES | (3) |
We described how one could learn ES and ET and compute ΔE following their predictions (eqn (3)). This approach was carried out for the HIPNN model. Alternatively, one could learn ES and ΔE, with ET defined as ET = ES + ΔE. This definition represents ET as a sum of an extensive (ES) and non-extensive (ΔE) property, the latter of which motivates the HIP-loc model. Atomic forces on the singlet state are the same as those used in the HIPNN model, whereas the forces associated with ΔE are calculated as . We now introduce the principal difference between HIPNN and HIP-loc (Scheme 1a). Instead of modeling ΔE as a simple sum over atom-centered energies (eqn (1)), we model it as a non-extensive quantity by weighting atomic energies by normalized weights wi
(4) |
This form is physically motivated by the fact that the electron density may be localized on certain atoms. We show that the inclusion of these weights is instrumental in accurately predicting ΔE, as opposed to eqn (1), which assumes equal atomic contributions. We use a softmax function for wi,
(5) |
The form of eqn (4) and (5) are similar to those of attention models49,50 and elementary statistical mechanical models. In a statistical mechanical analogy, the propensity ai plays the role of a (negative) chemical potential, whereas the weights wi are interpreted as a probability derived from the propensity using a Boltzmann weighting, and the excitation corresponds to an observable of the system. Such an analogy gives rise to intuition about the relationship between the propensity and the weights: if the propensities are narrowly distributed, the weights will be roughly evenly distributed across the molecule. If a few propensities are larger than the rest, the weights will be concentrated on those few atoms. In our results, we show that these localization weights effectively infer the region of the triplet excitation, as determined by ab initio spin density calculations.
The molecules of the dataset were randomly sampled from the GDB13 dataset55 and contain C, H, N, O, S, and Cl atoms. Ab initio computations were performed using the ωb97x density functional56 and 6-31g* basis set57 with Gaussian 16.58 All molecules were optimized in vacuum in the T1 state. Thermal conformers (T = 300 K) near the minimum of T1 were obtained with a normal mode sampling scheme.24 Doing so enables learning PESs at molecular conformations near points of stability (Scheme 1b). Subsequent single-point calculations of the entire dataset, including optimized and thermal conformers, were carried out to determine ES and ET. The entire dataset consisted of approximately ∼702k structures with median molecule size of 8 heavy atoms (excluding H) and 16 total atoms with a maximum molecule size of 12 heavy atoms and 30 total atoms (Fig. S2†). A more detailed breakdown of the dataset, including the number of molecules sampled from each GDB dataset, is available in Table S1.†
The trained models were also tested on a challenging set of large aromatic molecules known to be phosphorescent from experiment with median molecule size of 17 heavy atoms (excluding H) and 27 total atoms with a maximum molecule size of 31 heavy atoms and 51 total atoms (Fig. S2†). The compounds of this extensibility set were taken from ref. 59–62 and make up 35 unique compounds and 912 thermal conformers. A sample set of the training and test set compounds is shown in Fig. 1. Further details regarding the neural network architecture and model training can be found in the ESI† under the section labeled Neural Network Architecture and Training Procedure. It is worth noting that a full end-to-end ML-based approach is ultimately desirable for molecular design workflows. Ideally, the ML model should optimize the triplet geometry, which makes up a significant portion of the computational time needed to predict emission energies from first principles. In this work, we concentrate on using localization layers in the modeling of localized, non-extensive properties with emphasis on phosphorescence energies as the application of interest. Thus, we eliminate the error of an ML-based optimization in our results, using geometries obtained by normal mode sampling of conformations near DFT-optimized structures of T1. We do present encouraging results with T1 optimized with ML in ESI,† including the singlet–triplet energy gaps and root-mean-square deviation (RMSD) of the ML-optimized geometries compared with DFT-based optimization.
Despite HIPNN's success of predicting ΔE for molecules with sizes comparable to those found in the training set, its performance substantially worsens for phosphorescent molecules of the extensibility set that consist of on average ∼2× the number of atoms (Fig. S2†). Yet we show that HIP-loc remedies this issue. HIPNN erroneously predicts negative ΔE energies (i.e., T1 is lower in energy than S0) for many molecules, contradicting reference results (Fig. 2c). This result is particularly troublesome given that these are phosphorescent molecules for which DFT correctly predicts T1 to be higher in energy than S0. Altogether, HIPNN's performance on the extensibility set is quite poor with a RMSE of ∼53 kcal mol−1 or ∼2.3 eV, which is well outside the error commonly expected between ab initio and experimental values.63 HIPNN's predictions are especially poor for molecules composed of more than 35 atoms (Fig. S8 and S9†). In contrast, Fig. 2d shows HIP-loc significantly outperforming HIPNN on the extensibility set. The RMSE of HIP-loc results on the extensibility set is ∼13 kcal mol−1, amounting to an ∼4× improvement and median percent error of ∼13% (Fig. S3†). The erroneous negative ΔE energies, predicted by HIPNN (Fig. 2c), are all but eliminated with HIP-loc (Fig. 2d and S10†).
HIP-loc strongly outperforms HIPNN, and the only difference between the two models is the usage of localization layers to model the non-extensive part of ET (i.e., ΔE) by weighting local atomic environments. The extensibility of the model is substantially improved, but relatively large disagreement between HIP-loc and DFT is observed for the largest molecules in the extensibility set comprising several bonded aromatic fragments (Fig. S11†). Nevertheless, HIP-loc's stronger performance demonstrates the importance of model engineering to account for the localized nature of spin transitions. Parity plots for all molecules of the extensibility set, categorized by chemical similarity, are shown in Fig. S11 through S16.†
Additionally, we investigated whether the learned localization weights in HIP-loc have some physical significance. We compared the localization weights to DFT spin density differences between the T1 and S0 states. In order to compare the two methods, DFT spin density was approximated as an atom-centered quantity, which we obtained using the Hirshfeld charge partitioning scheme.64Fig. 3a visualizes this comparison for a subset of representative molecules selected from a random sample of the held-out test set. Remarkably, there is qualitative agreement between HIP-loc and DFT for most molecules, suggesting that the HIP-loc localization weights provide physical insight. Moreover, the correspondence between the quantum mechanical and inferred localizations is somewhat correlated with the accuracy in predicted energies; the absolute error in ΔE for the last molecule shown in Fig. 3a is relatively high and, correspondingly, the atom-centered DFT spin density does not as strongly resemble the HIP-loc weights as compared to the other molecules shown. We also find a rough, yet highly significant correlation observed between these learned weights and the DFT spin densities (Fig. 3b). It is important to note that the results of Fig. 3a and b are not meant to assert that learned weights are intrinsically related to spin densities in a rigorous way, especially since they are atom-centered quantities and charge partitioning is ambiguous, but rather that inferring localization through the use of energies and forces across a large dataset leads to similar assignments of localization.
We also analyzed the agreement between the ab initio and inferred localization over a large sample of the held-out test set. For this analysis, we made use of a localization metric η, defined as the ratio of the distance between the centers of localization computed using HIP-loc weights and DFT spin density to the radius of gyration (Rg), which quantifies the spatial size of the molecule: . For η ≪ 1, the centers of localizations are in close proximity to one another, whereas η ∼ 1 signifies that the centers of localization differ by approximately the radius of the molecule and therefore there is very little or no agreement in the predicted localization. See ESI† under Localization Metric for more details. Fig. 3c shows a histogram of η computed for our sample set. The distribution is concentrated at low η (mean = 0.15), indicating that the center of HIP-loc's inferred localization is in close proximity to that determined from DFT for the majority of compounds.
This agreement is also observed for molecules of the phosphorescent extensibility set. Similar to Fig. 3a and 4a visualizes the similarities between DFT and HIP-loc for two example molecules that perform particularly poorly with HIPNN. Additional visualizations of a similar nature are available in ESI Fig. S17 through S20.† The spatial localities associated with the singlet–triplet transitions in these molecules are confined to a relatively small number of atoms compared to the molecules' total number of atoms. This result suggests, albeit not too surprisingly, that the advantages provided by HIP-loc are accentuated in the case of strongly localized transitions. In order to fully probe the improvement provided by HIP-loc for modeling localized versus delocalized transitions, we study the participation ratio (PR). The PR is described as follows: given an N-body wavefunction expanded in terms of atom-localized states with expansion coefficients ci, the PR is expressed as , ranging from 1 (fully localized to a single atom) to N (equally delocalized across all atoms).42 Although we are not working with wavefunctions, we apply the concept of PR to estimate the number of atoms involved in the singlet–triplet transition based on the ab initio (DFT) spin density and thereby quantify the degree of localization. The squares of the expansion coefficients are approximated from the atom-centered DFT spin density (qi): . The PR is complimentary to the previously utilized localization metric η. The latter quantifies the agreement in proximity of the ab initio and ML-inferred centers of localization, whereas the former estimates the number of atoms involved in the transition and is therefore a more appropriate measure of quantifying the degree to which a transition is localized.
Fig. 4b shows the relationship between absolute error in ΔE and the degree of localization as computed by PR/N. For molecules in which the density is dispersed more homogeneously across the atoms (or larger PRDFT/N edging closer to ∼1), energy is accurately predicted using eqn (1) of HIPNN. In fact, HIPNN and HIP-loc achieve comparable performance in this regime. However, for molecules in which the density is strongly localized on only a small handful of atoms (or PRDFT/N closer to ∼1/N), HIP-loc is superior. This result underscores the main motivation of our work in that HIPNN does not accurately predict the singlet–triplet energy gap for strongly localized transitions, whereas HIP-loc's localization weights allow the model to weight regions of the molecule that are attributed to the transition, resulting in better energy prediction.
Finally, in order to relate the agreement in the centers of localization to the accuracy in predicted energy, we also show absolute error in ΔE versus η (Fig. 4c). The distribution is concentrated in the regime of low η and low ΔE error, but in contrast to the held-out test set (Fig. 3c), there is also a significant number of molecules that lie in the regime of high η. The distribution of ΔE errors in this high η regime is sporadic and extends to relatively high error. Altogether, these results suggest that robust energy prediction is generally improved when the inferred localization determined by the HIP-loc weights more closely resembles the localization determined by the quantum mechanical spin density.
To further demonstrate HIP-loc's feature of localization without changing stoichiometry (and therefore drastically changing total molecular energy), we examined the predicted energy and concomitant variation in localization while scanning a single torsional angle. The set of molecules investigated consists of three to five six-membered aromatic rings and a central carbon–carbon single bond. A relaxed torsional scan over the dihedral angle around the carbon–carbon bond is performed and HIPNN and HIP-loc predictions are made for each conformation. The PR is calculated for both the DFT spin density and HIP-loc weights in order to compare the overall localization using both methods. The analogous form for PRHIP-loc uses HIP-loc weights (wi) in lieu of atom-centered spin density, that is, .
Our conformational analysis shows that when errors in ΔE predictions are relatively low across the conformational space, localization is accurately predicted by means of the PR. Rotating the molecule's dihedral angle changes its aromaticity and shifts spatial localization of electron density. As a result, the S0 and T1 PESs, as well as the ΔE gap, vary with conformation. Fig. 5a shows absolute error in ΔE as a function of dihedral angle for a representative molecule containing three six-membered aromatic rings, computed with HIPNN and HIP-loc. HIP-loc outperforms HIPNN for all scanned geometries. Additionally, there is qualitative agreement in the trends of the PRs computed using HIP-loc weights and DFT spin density (Fig. 5b). However, HIP-loc infers more delocalized transitions compared to DFT (PR of ∼13 versus ∼6), but in spite of that, for both methods the PR in the planar structure (dihedral of 0° and 180°) is delocalized across atoms on each ring, whereas the PR in the non-planar structure (dihedral of 90°) is localized to slightly more atoms exclusively on the larger ring. The net effect is an increase in PR for the non-planar conformation. An animation in ESI† showing a torsional scan of the molecule in Fig. 5 illustrates this point. Altogether, we find a correspondence between the accuracy of energy prediction and the progression in the degree of localization, quantified by PR. These observations are consistent for all the molecules studied (Fig. S21 through S25†).
Our main contribution is a new set of localization layers for learning target excitation energies. These localization layers do not depend on the detailed structure of the underlying model and could be implemented in any atom-centered neural network. Combined with HIPNN, the new approach (denoted HIP-loc) defines energy as a weighted sum, where inferred localization weights assigned to the atoms determine the atoms' contribution to the target energy (eqn (4)). This minor yet profound modification substantially improved prediction quality on a more challenging set of experimentally verified phosphorescent compounds that consist of, on average, ∼2× more atoms than the molecules found in the training set. Extensibility is an important practical advantage of the model because it allows screening large molecules for which ab initio calculations are computationally prohibitive. Moreover, the superior energy prediction of HIP-loc over HIPNN was not readily visible on the held-out test set, but only the phosphorescent extensibility set in molecules with strongly localized transitions, showing that the extensibility test provides important characterization of the model's performance. We achieve RMSEs of ∼4 kcal mol−1 (∼5% error) on the held-out test set and ∼13 kcal mol−1 (∼13% error) on the phosphorescent extensibility set (Fig. 2). The physical significance of HIP-loc's localization weights, to our surprise, is that they qualitatively correlate to the quantum mechanical spin density (Fig. 3 and 4). Thus, the model inferred localities of the electron density associated with the singlet–triplet transition in order to make more accurate and transferable prediction of the singlet–triplet energy gap. This result is remarkable given that DFT spin density was not provided as a target during training. Instead, the neural network was provided with somewhat limited information, yet discovered this result nonetheless with a modest change in how target energy is calculated that effectively took into account the relative contribution of the atoms.
The results and performance of the new HIP-loc model lead us to conjecture that predicting localized molecular properties are likely to improve if reference electronic properties (e.g., electron densities) are used explicitly as targets during training. This idea sets the stage for many interesting applications of ML in physical chemistry, where learned properties are dependent on, or correlated to, the spatial distribution of the many-electron wavefunction. We also envision applying localization layers to other localized molecular phenomena and systems such as anions, cations, radicals, or other excited states. Additionally, further improvement of excited state ML potentials for geometry optimization could lend itself useful for a full end-to-end ML-based tool for phosphorescent materials screening.
Footnotes |
† Electronic supplementary information (ESI) available: Details regarding ML training procedures and localization metric η. Ab initio energies ES0, ET1, and ΔE as a function of number of atoms, distributions of molecule size for the held-out and extensibility test sets, histograms of percentage error in ΔE prediction on T1 thermal conformers in the held-out and extensibility test sets using HIPNN and HIP-loc, training and testing parity plots of predicted versus true ΔE on thermal conformers sampled around equilibria of S0 and T1 using HIPNN and HIP-loc, RMSD of optimized geometries using the HIP-loc T1 potential and energy error plots at those geometries, absolute errors in ΔE as a function of number of atoms, parity plots of predicted versus true ΔE for the extensibility set categorized by chemical similarity, localization of singlet–triplet transition for select molecules of the extensibility set computed from DFT spin density and HIP-loc weights, conformation-dependent localization of singlet–triplet transitions in molecules with a single torsional angle, and molecular animations of torsional scans including that of the molecule in Fig. 5. See DOI: 10.1039/d1sc02136b |
‡ These authors contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2021 |