Tudur
David
*a,
Nik Khadijah
Nik Aznan
b,
Kathryn
Garside
b and
Thomas
Penfold
a
aChemistry, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK. E-mail: tom.penfold@newcastle.ac.uk
bResearch Software Engineering Group, Catalyst Building, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
First published on 29th August 2023
X-ray absorption near-edge structure (XANES) spectroscopy is widely used across the natural sciences to obtain element specific atomic scale insight into the structure of matter. However, despite its increasing use owing to the proliferation of high-brilliance third- and fourth-generation light sources such as synchrotrons and X-ray free-electron lasers, decoding the wealth of information encoded within each spectra can sometimes be challenging and often requires detailed calculations. In this article we introduce a supervised machine learning method which aims at directly extracting structural information from a XANES spectrum. Using a convolutional neural network, trained using theoretical data, our approach performs this direct translation of spectral information and achieves a median error in first coordination shell bond-lengths of 0.1 Å, when applied to experimental spectra. By combining this with the bootstrap resampling approach, our network is also able to quantify the uncertainty expected, providing non-experts with a metric for the reliability of each prediction. This work sets the foundation for future work in delivering techniques that can accurately quantify structural information directly from XANES spectra.
In contrast, at low photoelectron energies (<50 eV above the edge) associated with the XANES region, spectral features arise from the interference of scattering pathways between multiple atoms and therefore this region contains information about the three-dimensional structure usually within ∼6 Å of the absorbing atom.5 Qualitative insight from these spectra can be obtained using empirical rules, such as shifts in the absorption edge with oxidation state,6 changes in structural symmetry reflected in the pre-edge7,8 or shifts in above-ionisation resonances which can reflect bond length changes (Natoli's rule).9 However, quantitative decoding of the high information content within XANES spectra usually requires detailed theoretical calculations,10 and therefore unlike FT-EXAFS, there is no direct way of extracting structural insight from a spectrum.
To address the challenges associated with the analysis of XANES spectra, there has recently been a substantial research effort seeking to exploit supervised machine-learning/deep learning algorithms to predict spectral shape from an input structure or property.10–14 This, so-called forward mapping approach is akin to the approach used in a first-principles calculations, i.e. an input structure is used to solve the electronic Schrödinger equation and compute a particular spectrum, which is subsequently compared to the experiment one is trying to analyse. However, in terms of the interpretation of experimental spectra the reverse mapping problem, i.e. converting a spectrum into a property/structure, is in many ways the more natural, as it has the direct connection to the focus of the analysis, i.e. what structural information is contained within the spectrum?
Towards achieving this, Timoshenko et al. applied a multi-layer perceptron (MLP) model to identify the average size, shape, and morphology of platinum nanoparticles. The network was trained using theoretical XANES data, calculated with FEFF and FDMNES, but subsequently successfully applied it to interpret experimental spectra. The authors have extended this work to other related areas15–18 and while the results are very encouraging, they are always system-specific or restricted to a narrow class of systems. Consequently a new set of theoretical calculations and model would be necessitated for it to be applied to a different system.
In contrast, Carbone et al.19 used an MLP and convolutional neural network (CNN) to classify the local coordination environment around an absorbing atom from K-edge XANES spectra across each of the first row transition metals. They demonstrated that both these approaches were able to classify, with ∼86% accuracy, the symmetry of coordination environment from a spectrum. In addition, they showed that for octahedral and tetrahedral complexes, this could be largely achieved using only the pre-edge region of the spectrum, which is well-known to be important in determining the coordination geometry around the absorbing atom.7 Torrisi et al.20 extended this work using a random forest model to extract coordination numbers, average first coordination shell bond lengths and atomic charge of the absorbing atom. However, although both works demonstrate highly effective networks, both were trained and applied entirely on theoretical data; which does not address the true purpose of these networks, which would involve application to experimental data.
These previous works have largely focused upon classification models, translating spectra into structural properties such as coordination numbers. In contrast, Kiyohara et al.21 and Higashi et al.22 have both implemented MLP-based approaches to convert calculated XANES spectra into radial distribution functions at the oxygen K-edge. In contrast, in this work we implement a CNN that converts a given spectrum into a pseudo-radial distribution function, based upon the 2-body terms in the weighted atom-centered symmetry function (wACSF) descriptor. We demonstrate and explain its performance based upon simulated and experimental iron K-edge data. We show that our approach achieves a median error in first coordination shell bond-lengths of 0.1 Å, when applied to experimental spectra. In addition, by combining this with the bootstrap resampling approach, our network is also able to quantify the uncertainty expected providing non-experts with a metric for the reliability of each prediction. Finally, we discuss limitations of the present model and proposed ways in which this can be developed in future work.
Gradients of the empirical loss with respect to the internal weights were estimated over mini-batches of 32 samples and updated iteratively according to the Adaptive Moment Estimation (ADAM)23 algorithm. The learning rate was set to 2 × 10−3. The internal weights were initially set according to ref. 24. Unless explicitly stated in this article, optimisation was carried out over 50 iterative cycles through the network, commonly termed epochs.
The DNN is programmed in Python 3 with Pytorch.25 The Atomic Simulation Environment26 (ASE) API is used to handle and manipulate molecular structures. The code is publicly available under the GNU Public License (GPLv3) on GitLab.27
Consequently, we focus upon converting the spectra into the two-body G2 terms of the weighted atom-centered symmetry function (wACSF) descriptor of Gastegger and Marquetand et al.,28 which encodes the local environments around X-ray absorption sites by dimensionality reduction. This descriptor has previously been used in the reverse problem, i.e. converting atomic structures into spectra13,14 and its use in the present work, opposed to a simple radial distribution function, is motiviated by future objectives of achieving cyclic consistency between the two models. The G2 terms take the form:
(1) |
(2) |
The radial distance, rc, supplied to fc has to be sufficiently large to include an appropriate number of nearest neighbours. From the perspective of an absorbing atom in X-ray spectroscopy, rc has to reflect the maximum cutoff distance to which XANES is sensitive and therefore we have used rc = 6.0 Å throughout. Throughout this work, we adopt an input feature vector containing 50G2 functions, constructed according to the “shifted” scheme.13
We note that previous work has transformed XANES spectra directly into radial distribution functions,21,22 rather than wACSF used here. The main difference between the two will be the weighting of contributions in the G2 wACSF by atomic number. This is consistent with physical processes responsible for the features in XANES spectra as different elements will exhibit different backscattering amplitudes and consequently a distinction between atomic contributions is advantageous. We also retain a wACSF descriptor here to consistent with previous work mapping the forward problem, i.e. structure to spectrum.13
Fig. 2a shows a plot of the first two t-distributed stochastic neighbour embedding (t-SNE) components of the wACSF descriptor encoding each local geometry against the first t-SNE component of the spectra (colour bar). In contrast to the more commonly-used linear dimensionality reduction approach of principal component analysis (PCA), t-SNE is a non-linear approach which seeks to preserve the local structure of data by minimising the Kullback–Leibler (KL) divergence between distributions with respect to the locations of the points in the map. t-SNE is not a black box, but instead requires user-defined hyperparameters: the perplexity, learning rate, and the number of iterations which, to produce Fig. 2a, were set to 50, 60, and 1000, respectively. This shows well defined regions of correlation between the structural and spectra t-SNE, which makes the dataset amenable to learning. Fig. 2b shows the same t-SNE plot as Fig. 2a, but in this case the colours present a pseudo-coordination number, i.e. the number of atoms within 2.5 Å of the absorbing atom. This highlights that this pseudo-coordination number represents a significant factor determining the t-SNE data distribution show in Fig. 2a. The peach coloured region in Fig. 2a, which does not appear in Fig. 2b (i.e. not directly associated with coordination number), is associated with complexes exhibiting multiple absorbers and a strong presence of linear bonds, such as CO and CN, which strongly modulate the shape of the XANES spectrum.31
Fig. 3 Learning curve showing the performance of the network, assessed by from five-times-repeated five-fold cross-validation, as a function of the number of training samples. |
Fig. 4a shows a histogram of the MSE for the 3500 spectra in the held-out dataset, composed entirely of theoretical spectra.29 The value of MSE alone can be misleading and therefore to add context, Fig. 4b–d show example theoretical (black) and predicted (grey) G2 wACSF with MSE of 0.004, 0.1, and 0.72 (see grey dashed lines in Fig. 4a). The spectrum with the median error of 0.1 is represented in Fig. 4c and corresponds to the material DALYOG. The G2 wACSF predicted with this error typically exhibits the correct shape and the error is predominantly associated in the prediction of the intensity of the peaks. The mean error for this held-out data set is 0.2, slightly larger than the median, being more sensitive than the median to the worst predictions, e.g. MIGDAT. The coefficient of variation (Cv) for these held-out predictions is 1.73, indicating, consistent with the histogram, a relatively small variability of points which are typically placed towards the higher-performance end.
The peaks in the G2 wACSF shown in Fig. 4b–d indicate atomic distances from the absorbing atom and therefore are most important in terms of assessing the accuracy of the predictions. In the subsequent analysis, we quantify the accuracy of peak predictions generated by the network in two regions: close proximity to the absorber (1–3.5 Å) and far away from the absorber (3.5–6 Å). Close to the absorbing atom, the median and mean errors in the peak position are 0 Å and 0.07 Å, respectively. In the latter case, considering the utilization of 50G2 functions across a range of 5.0 Å, the error is equivalent to the grid point spacing. The interquartile range for peak position errors is 0.1 Å, indicating that overall there is high accuracy in the predictions for this region of the G2 wACSF. In the region further from the absorbing atom, the median and mean error increases to 0.2 Å and 0.22 Å, respectively. The interquartile range is 0.2 Å. This observed increase in error is expected, because as illustrated in Fig. 4b–d, this region exhibits lower intensities compared to the vicinity of the absorbing atom. Consequently, during the model refinement, even if the description is inadequate, it will result in much smaller MSE. However, despite this, an error of approximately ∼0.2 Å is still considered acceptable for this distant region of the G2 wACSF.
Having established the performance of the network, in the following we seek to assess the factors influencing the predictions made by the network, with a particular focus on factors that may influence the performance when applying the trained network to experimental data. Theoretical predictions of absolute transition energies are often challenging34 and consequently, Fig. 5 shows the effect on the G2 wACSF predictions when a constant energy shift of 1.0 and 2.0 eV is applied to the calculated spectra for VELFEX and ATOFEW. The former is in the top 10% of predictions shown in Fig. 4, while ATOFEW is in the bottom 10%. Fig. 5 shows that the spectral shift does not have a strong effect on the peak positions in either, but clearly is larger for ATOFEW, especially in terms of G2 wACSF intensities. Fig. S2† shows a similar case, but instead the spectra chosen exhibit more distinct intensity changes. In this case, for the spectrum which yields an accurate G2 wACSF prediction (XABHIU) the changes are larger than observed in Fig. 5 for VELFEX, but remains much smaller than PIFNUO, which offers a poor prediction of G2 wACSF which is also strongly sensitive to spectral shifts.
Fig. 6 shows the influence of resolution, with spectra increasingly broadened using a Gaussian function with a full width at half maximum (FWHM) between 0.5 and 3 eV. Overall for both spectra, and for the spectra showing more prominent features (Fig. S3†), the broadening has very limited influence on the G2 wACSF prediction. Similar to the example for spectral shifts, it is evident that the largest changes are observed in the cases when the original spectrum provides a poor prediction of the structure (in terms of MSE) when compared to the expected G2 wACSF. This implies that when the network's performance is subpar, it becomes more susceptible to variations in absolute energy and spectral resolution, thus presenting a potential metric that can be utilised to evaluate confidence in predictions.
Fig. 7 seeks to assess the performance of the network when adjusting the spectral shape. For VASYUAL, the energy gap between the first and second above ionisation resonance is gradually increased. This, as shown in the G2 wACSF, gives rise to a shift in the first peak of the first coordination shell to smaller distances. This change is consistent with expectations based upon Natoli's rule,9 which states that ΔE·R ∼ constant, where ΔE is the energy gap between above ionisation resonances. Consequently, as ΔE increases, R should decrease as observed. For NEGQEV, we monitor the effect of the pre-edge intensity on the structural predictions. Previous work35 has demonstrated that lowing the symmetry of the complex from octahedral increases the intensity of the pre-edge associated with 3d/4p mixing. At present, our network does not exhibit any modifications in the G2 wACSFs linked to pre-edge changes. There are two potential sources for this; (i) limitation of the training set: simulating spectra within the muffin tin approximation, as implemented in FEFF, can give rise to inaccuracies close to the absorption edge, i.e. in the pre-edge region, which depends more strongly on the precise details of the potential. (ii) Unphysical changes in the spectrum: a coordination number will not only change the pre-edge shape, but also the above ionisation features. The present pre-edge only change has the possibility to create a spectrum which cannot normally be simulated and therefore outside the present training set. It is likely that both factors contribute to the performance shown in Fig. 7 and ultimately improving the models response and description of the pre-edge region should be the primary focus of future research efforts for this network.
Fig. S4† shows illustrative examples of the G2 wACSF taken from around the median (45th–55th percentile), lower (0th–10th percentile) and upper (90th–100th percentile) when performance is ranked over all held-out DNN predictions by MSE. The light grey traces indicate ±2σ on the predicted G2 wACSF obtained using the bootstrap resampling approach. From Fig. S4,† we observe that as the predictions become worse, the standard deviations become visibly larger, consistent with the parity plot shown in Fig. 8b. For most of the examples shown in Fig. S4,† the 2σ largely follows the MSE, i.e. is large in regions where the MSE is greater. However, the aforementioned slight over-confidence identified is highlighted in DAGPUX, between 3–5 Å, where the error is significant, but the σ is small. The origin of this over-confidence is associated with the Fe–S in the first coordination shell and in discussed in more detail in the following section.
Fig. 9 shows 6 G2 wACSF predicted from the experimental spectra of Fe(acac)3,41 [Fe(bpy)3]2+,42 MbO2,38 [Fe(CN)6]4−,48 FeCO5 (ref. 44) and Fe(dedtc)3.49 The first three are within the top 10 of predictions when ranked by MSE, while the latter three are in the bottom 10 predictions. The MSE corresponds to the difference between the predicted and expected wACSF. We note that the expected wACSF for the experimental spectra could be challenging as it does not directly come from the experiment. In the present work the experimental spectra used have been carefully chosen for systems which have well characterised single component systems with structures reported as shown in Table S1.† These single static structures, reported in the publications from where the spectra have been obtained, are either from crystallography or fitting the XANES spectra. While this could potentially be a source of error, it is expected to be small, given the well characterised nature of the spectra and systems chosen. The predictions associated with the remaining 16 experimental spectra are shown in Fig. S4–S6.†
Fig. 9 Example G2 wACSF predicted from experimental spectra. The source of the experimental spectra is given in Table S1.† The grey lines are the predicted structures with light grey regions showing ±2σ calculated from the bootstrap resampling. The black traces show the expected G2 wACSF from experimentally reported structures. The upper two panels show three of the top 10 predictions, while the bottom three panels show examples from the worst performers. The remaining examples of transformed experimental spectra are shown in the ESI.† |
Fe(acac)3 predicts 4 peaks corresponding to the Fe–O and three Fe–C distances on the acetonylacetonate ligands. The Fe–O distance is 2.0 Å in excellent agreement with expected structure (black line). The predictions for [Fe(bpy)3]2+, in its low spin ground state, captures the first two bands corresponding to the Fe–N and first shell on Fe–C distances. Fig. S4† shows the structural prediction associated with the high-spin state of [Fe(bpy)3]2+. The uncertainty in this prediction, as indicated by the grey shaded area is larger, but remains sufficient to capture the 0.2 Å elongation of the Fe–N bond upon switching from the low to high-spin state.42 MbO2 shows good agreement between the expected and predicted spectrum and is consistent with similar observation made for MbNO and MbCO shown in the Fig. S5.† In contrast, the predictions for related compounds deoxyMb and cytochrome C (Fig. S5†) show worse agreement and a larger uncertainty. This is associated with the underlying theoretical spectra of similar systems in the training sets. In contrast to MbNO, MbCO and MbO2, deoxyMb is a pentacoordinated iron complex, and cytochrome-C has a large Fe–S (2.60 Å) bond meaning that for these latter two systems, the approximated interstitial region in the muffin-tin potentials is large. Therefore, the theoretical spectra will not, for systems like this, provide good agreement with the experiment. The error in these predictions therefore reflect limitations with the underlying training sets.
Three spectra that exhibit poor predictions are shown in the lower 3 panels of Fig. 9. [Fe(CN)6]4− and FeCO5, like other carbonyl and cyanide ligands systems shown in Fig. S4–S6,† show a significant difference between the predicted and expected G2 wACSF, but more importantly a substantial uncertainty. In XANES spectra the scattering pathways along these linear bonds play a larger role than for similar structures containing non-linear bonds due to the focusing effect.52 Recently work addressing the forward mapping (i.e. structure to spectra), highlighted limitations of wACSF descriptor for capturing the focusing effect giving rise to large uncertainties in the associated predictions,32 and our present results clearly exhibit similar limitations for the reverse spectrum to spectra mapping. Fe(dedtc)3 also gives a poor prediction but in contrast to the previous examples a low uncertainty, with the model exhibiting a distinct over-confidence. This is because the structure, consisting of three N,N′-diethyldithiocarbamate ligands, forms a octahedral coordination shell with 6 Fe–S bonds. Such coordination environments are commonly observed in the training set, however, the Fe–S bond length of 2.3 Å leads to a large approximated interstitial region in the muffin-tin potentials, resulting in theoretical spectra that do not agree well with experimental data for such systems. Consequently, although the network is trained on molecules exhibiting a similar structure giving the network a high confidence in the G2 wACSF predictions, it is misplaced because the training data does not accurately represent the experimental observations for such cases.
This work sets the foundation for developing reliable models that can routinely translate experimental XANES spectra to provide structural insight. Within the present framework based solely on theoretical data, limitations of the model will arise where theory does not offer a good agreement with its experimental counterpart in term of peak positions and intensities. The training sets in the present work were developed using multiple scattering theory within the muffin-tin approximation. While computationally inexpensive, this approach provided accurate spectra for a large region of the spectrum, especially higher in energy above the absorption edge.30 This is demonstrated by our model reproducing expected physical trends, such as Natoli's rule9 which rely on the above-ionisation resonances.
Although the muffin-tin approximation is a good approximation for large regions of the spectrum, close to the edge, the excited electron is often sensitive to the fine details of the atomic potential leading to a breakdown of the muffin-tin approximation. Such problems are most commonly encountered in the case of open structured systems (i.e. long bond lengths to absorbing atom)53 or when the absorbing atom is not fully coordinated. This means, in both cases, that the approximated interstitial region is large. Our model demonstrates this limitation when it does not reflect structural changes when there are changes in the pre-edge region of the spectrum. It can also be observed in our present analysis of experimental spectra with poor performance for complexes such as Fe(dedtc)3. The obvious solution for this is to use higher levels of theory, which go beyond the MT potential. However care must taken to incorporate the many body effects associated with the high energy photoelectrons which are often important in the XANES intensities.54
Two additional elements when assessing differences between experiment and theory which may affect the performance of the network are the absolute energies and spectral broadening. In this work we have demonstrated that both will have a rather limited influence on the structure predicted, unless the prediction is poor and consequently, these tests, alongside the bootstrap resampling could serve as a metric for assessing confidence.
In summary, this work provides an exciting foundation to deliver quantitative analysis of XANES spectra, equivalent to the FT analysis of EXAFS. The previous discussions highlight that improvements for the current network should focus upon the training data and its use. The most obvious approach would be to train future networks using experimental data, however despite the increasing capacity to record XANES spectra based upon developments such as laboratory based X-ray spectroscopy,55–57 it remains a tall order to record the >1000 spectra required to train a network. One future approach is to incorporate experimental data into the training process, through either mixed training sets or by a transfer learning approach.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3dd00101f |
This journal is © The Royal Society of Chemistry 2023 |