Keisuke
Kameda
*,
Takaaki
Ariga
,
Kazuma
Ito
,
Manabu
Ihara
* and
Sergei
Manzhos
*
Department of Chemical Science and Engineering, School of Materials and Chemical Technology, Tokyo Institute of Technology, 2-12-1 Ōokayama, Meguro-ku, Tokyo, 152-8552, Japan. E-mail: manzhos.s.aa@m.titech.ac.jp; mihara@chemeng.titech.ac.jp; kameda.k.ac@m.titech.ac.jp
First published on 16th August 2024
The development of novel functional ceramics is critically important for several applications, including the design of better electrochemical batteries and fuel cells, in particular solid oxide fuel cells. Computational prescreening and selection of such materials can help discover novel materials but is also challenging due to the high cost of electronic structure calculations which would be needed to compute the structures and properties of interest such as the material's stability and ion diffusion properties. The soft bond valence (SoftBV) approach is attractive for rapid prescreening among multiple compositions and structures, but the simplicity of the approximation can make the results inaccurate. In this study, we explore the possibility of enhancing the accuracy of the SoftBV approach when estimating crystal structures by adapting the parameters of the approximation to the chemical composition. Specifically, on the examples of perovskite- and spinel-type oxides that have been proposed as promising solid-state ionic conductors, the screening factor – an independent parameter of the SoftBV approximation – is modeled using linear and non-linear methods as a function of descriptors of the chemical composition. We find that making the screening factor a function of composition can noticeably improve the ability of the SoftBV approximation to correctly model structures, in particular new, putative crystal structures whose structural parameters are yet unknown. We also analyze the relative importance of nonlinearity and coupling in improving the model and find that while the quality of the model is improved by including nonlinearity, coupling is relatively unimportant. While using a neural network showed practically no improvement over linear regression, the recently proposed GPR-NN method that is a hybrid between a single hidden layer neural network and kernel regression showed substantial improvement, enabling the prediction of structural parameters of new ceramics with accuracy on the order of 1%.
Density functional theory (DFT)18,19 is in principle sufficiently accurate to ascertain the required properties of a ceramic material with a putative composition and (crystal) structure. It can provide mechanistic insights, control, and resolution not easily achievable experimentally, but the relatively high computational cost of DFT calculations makes prescreening of all conceivable structures, let alone all ionic conduction paths, in a wide range of candidate materials too tedious. Such prescreening can in principle be done at the force field level if a force field framework is available that can be used for a wide range of ceramics and provide sufficient accuracy without requiring refitting of the force field for every new composition and structure. Most promising material candidates can then be subject to more detailed analysis with DFT and ultimately experimental verification.
The soft bond valence approximation (SoftBV) developed by Adams and co-workers provides such a framework.20–22 It is a type of two-body force-field approximation that incorporates assumptions about the physics of bonding interactions. It is based on the bond valence approximation23 and the inclusion of screened coulombic interactions, which is appropriate for sufficiently ionic bonding. In this approach, one introduces a Bond Valence Site Energy (BVSE) which is a sum of contributions from all cations i21,22
(1) |
(2) |
(3) |
(4) |
The effective charges qi and qj of anions and cations in eqn (1) are typically calculated as21,25
(5) |
This ensures, in particular, the overall charge neutrality. A relationship between Rcutoff,ij and other parameters has also been proposed:22
(6) |
The SoftBV approach provides a measure of material's stability via the Global Instability Index (GII)21
(7) |
SoftBV is often used for fixed crystal structures. Comparisons of properties (site energies, diffusion paths, etc.) at any level of theory are only meaningful if the structure is known with sufficient accuracy. For materials with new, putative compositions, optimal structures are unknown. It is desirable to have sufficient force field accuracy to find the correct structure directly with SoftBV without engaging in much more expensive DFT calculations or experiments. The ability to predict the structure would facilitate using more accurate methods (such as DFT) for energetic analysis, as the cost of optimization is then saved. It is in principle possible to improve the accuracy of the SoftBV approximation by adjusting its parameters, for example by making them depend on the composition or chemical environment of the atom. While bij, D0,ij, R0,ij, Rcutoff or NC, and charges still can be treated as tunable parameters and made depend on the chemical environment (see e.g. ref. 40), it would be at a cost of tempering with the basis of SoftBV ideology unless restrictions are imposed enforcing interrelations between the parameters such as those indicated above. This issue does not arise when tuning or parameterizing sf. When the structure of a material is known, sf can be automatically set to minimize the pressure, thus effectively tuning sf to the structure (lattice constants).21 This value will in the following be called sfauto. When prescreening for new materials with putative compositions where the optimal (correct) structure is not known, this approach in principle results in a non-optimal value of sf (i.e. in a sfauto value optimal for a wrong structure).
In this study, we therefore aim to determine an optimal value of the screening factor when the structure is not known, as a function of composition. We use linear and neural network (NN) models and show, on the examples of perovskite-41–43 and spinel-type oxides44 which have been proposed as promising solid-state ionic conductors, that this can noticeably improve the ability of the SoftBV approximation to model structures, in particular new, putative crystal structures whose structural parameters are yet unknown. We show that due to the smallness of the training dataset, there is no improvement with a neural network over the linear regression in spite of the higher expressive power of an NN. We employ a recently proposed machine learning method (called in the following GPR-NN) that is a hybrid between a neural network and kernel regression; in particular, it avoids nonlinear parameter optimization that is a cause of overfitting. GPR-NN allows building optimal nonlinear functions and controlling the inclusion of coupling between the features,45 to analyze the importance of nonlinearity and of coupling. We find that while the quality of the model is improved by including nonlinearity, the coupling is relatively unimportant. Overall, GPR-NN allowed the most accurate estimation of the optimal screening factor as a function of composition, enabling the prediction of structural parameters of new ceramics with accuracy on the order of 1%.
Fig. 1 Crystal structures of (a) perovskite (A – green, B – violet, O – red), (b) spinel (A – green, B – violet, O – red) oxides. |
Considering the relatively large dimensionality of the feature space, the number of data points (number of structures) is small.50 We, therefore, perform the following procedure as shown in Fig. 2: from each real structure obtained from the database called reference structure in the following, we form structures with lattice vectors isotropically expanded or contracted by 10%; these are called sample structures in the following. A 10% strain is sufficient to cover the range of possible lattice parameters of a given type of crystal structure with different chemical compositions. sf is then set to values from 0.55 to 0.75 at 0.0125 intervals and structure optimization was performed in SoftBV. The error Er – the difference between the lattice constants (defined below) following SoftBV optimization – is then collected resulting in a dataset of bij, R0,ij, Rcutoff,ij, ri, NC, sf, and Er for each composition. In this way, the number of data points is expanded severalfold. In the case of perovskite-type oxides, SoftBV optimization does not result in any changes in fractional positions of atoms or distortions of the rectilinearity of the unit cell, and Er is defined as the mean relative error in lattice vectors a = b = c between a reference structure obtained from the database and the optimized structure by SoftBV (i.e. Er = (areference − aopt)/areference = (breference − bopt)/breference = (creference − copt)/creference). In the case of spinel-type oxides, SoftBV optimization results in small changes in the fractional positions of atoms within the unit cell. We defined the changes in fractional position per number of ions (N) as , where N is the number of atoms in the cell and Δx, Δy, and Δz are errors in fractional coordinates. Δsite was lower than 0.01 in most of the spinel-type oxides, i.e. the error in structural parameters is mostly due to the lattice constants. Therefore, Er defined above was also used for optimizing crystal structures of the spinel-type oxides.
We define a D = 11 dimensional vector of descriptors
x = (Er, R0,AO, bAO, NC,AO, Rcutoff,AO, rA, R0,BO, bBO, NC,BO, Rcutoff,BO, rB) ≡ (x1, x2, …, x11) |
We perform linear regressions using the “regress” function in MATLAB (version R2021a of MATLAB was used in this work):
(8) |
We also perform non-linear regression using a feed-forward neural network (NN):51
sf = NN(x) | (9) |
The NN regressions are performed in MATLAB using “trainlm” function. Levenberg–Marquardt algorithm52 was used to train the NN. We considered different numbers of hidden layers and neurons. “tansig” neuron activation function is used in the following. Other neuron activation functions were tried but resulted in no improvement (not shown). The estimated optimal sf (sfest) was obtained from eqn (8)–(10) by setting Er = 0, i.e. sfopt = NN(0,). SoftBV optimization of crystal structures with expanded or contracted lattice was carried out using sfauto and sfopt, and the Er was compared to evaluate the accuracy of SoftBV.
For the analysis of the relative importance of nonlinearity and coupling among the features, we use the GPR-NN method of Manzhos and Ihara.45 The reader is referred to ref. 45, 53 and 54 for more details and context; here, we only briefly summarize the key properties of the method relevant to the purpose of the present work. The target function sf(x) is expressed as
(10) |
This is a first-order additive model in (generally) redundant coordinates y = Wx, where W is the matrix of coefficients. The rows of W are defined as elements of a D-dimensional Sobol sequence55 although other ways of setting W are possible.45 The shapes of the functions fn are computed using the first-order additive GPR53,54,56–58 in y. That is, GPR of sf(y(x)) is performed in y with an additive kernel . In this way, all functions fn are computed simultaneously, in one linear step. The shapes of the functions fn are optimal for given data and given W in the least squares sense.56 The original coordinates {xn} are also included in the set of {yn}. If only {xn} are included, the method defaults to first-order additive GPR.45,56,57 The representation of eqn (10) is equivalent to a single hidden layer NN with optimal and individual to each neuron activation functions, and with weights fixed by rules rather than optimized. Matrix W is equivalent to the matrix of NN weights, while biases are subsumed in the definition of fn. One can say that eqn (10) is an NN in x and a 1st-order additive GPR in y. The method has the advantage that because no nonlinear optimization is done, it does not suffer from overfitting as the number of ‘neurons’ N grows beyond optimal,45 combining the high expressive power of an NN and the robustness of linear regression (with nonlinear basis functions) which is GPR.59
In this work, we use an additive RBF kernel in y: where . The data are normalized so that an isotropic kernel is used with a single length parameter l. In this work, we use this method to probe the importance of coupling terms by testing different N. In the limit of large N the model fully includes all coupling among features, while in the limit y = x ∈ RD, no coupling is included. On the other hand, the construction of optimal shapes of fn in the method is used to study the importance of nonlinearity. Similar to the case of an NN fit, sfopt is computed from the model of eqn (10) by setting Er = 0, i.e. sfopt = f(0,).
With all methods, the loss function minimized by the regression was the SSE (sum of squared errors), , where {(x(i), sf(x(i)))}, i = 1, …, M is the training set of size M.
The linear and single-hidden layer NN regressions of sf for perovskite- and spinel-type oxides were carried out 100 times using different combinations of training and testing data. Fig. 4 shows the distributions of root mean square error (RMSE) values of estimated sf from these regressions.
Table 1 summarizes the maximum, minimum, and median RMSE and R2 values over the 100 runs. The RMSE for the training data decreases with an increase in the number of nodes for the NN regression, as expected, while the median RMSE for the testing data was the lowest for the NN regressions with only 1–3 nodes. The results did not change when the number of the hidden layers changed to 2–12. The NN regressions with 1–3 nodes show smaller median RMSE for both training and testing data than the linear regression. Therefore, the non-linearity or coupling effects present in an NN might improve the accuracy, which is analyzed in Section 3.2, but the small number of data makes it difficult.
Methods | RMSE/R2 of training | RMSE/R2 of testing | ||||
---|---|---|---|---|---|---|
Maximum | Minimum | Median | Maximum | Minimum | Median | |
Perovskite | ||||||
Linear | 0.032/0.83 | 0.025/0.73 | 0.031/0.75 | 0.049/0.89 | 0.024/0.56 | 0.03/0.76 |
NN, N = 1 | 0.024/0.90 | 0.019/0.84 | 0.023/0.86 | 0.036/0.93 | 0.016/0.71 | 0.024/0.85 |
NN, N = 2 | 0.023/0.95 | 0.014/0.86 | 0.02/0.89 | 40/0.93 | 0.017/0.00 | 0.029/0.80 |
NN, N = 3 | 0.021/0.97 | 0.011/0.88 | 0.017/0.92 | 24/0.95 | 0.014/0.00 | 0.029/0.79 |
NN, N = 4 | 0.018/0.97 | 0.010/0.91 | 0.014/0.95 | 2.5/0.94 | 0.015/0.01 | 0.036/0.71 |
NN, N = 6 | 0.013/0.99 | 0.006/0.95 | 0.009/0.98 | 38/0.91 | 0.018/0.00 | 0.051/0.60 |
NN, N = 9 | 0.007/1.00 | 0.003/0.99 | 0.005/0.99 | 12/0.85 | 0.026/0.00 | 0.088/0.34 |
NN, N = 12 | 0.004/1.00 | 0.002/0.99 | 0.003/1.00 | 6.2/0.76 | 0.040/0.00 | 0.14/0.17 |
Spinel | ||||||
Linear | 0.023/0.88 | 0.021/0.86 | 0.022/0.87 | 0.028/0.90 | 0.02/0.79 | 0.024/0.85 |
NN, N = 1 | 0.019/0.93 | 0.016/0.90 | 0.018/0.92 | 0.029/0.96 | 0.012/0.80 | 0.02/0.90 |
NN, N = 2 | 0.016/0.96 | 0.013/0.93 | 0.015/0.94 | 0.087/0.94 | 0.015/0.33 | 0.019/0.91 |
NN, N = 3 | 0.015/0.97 | 0.010/0.94 | 0.012/0.96 | 1.4/0.95 | 0.014/0.00 | 0.02/0.90 |
NN, N = 4 | 0.013/0.98 | 0.008/0.96 | 0.010/0.97 | 28/0.96 | 0.012/0.00 | 0.022/0.88 |
NN, N = 6 | 0.009/0.99 | 0.005/0.98 | 0.007/0.99 | 15/0.95 | 0.015/0.00 | 0.031/0.80 |
NN, N = 9 | 0.005/1.00 | 0.003/0.99 | 0.004/1.00 | 9.2/0.91 | 0.019/0.00 | 0.073/0.41 |
NN, N = 12 | 0.003/1.00 | 0.002/1.00 | 0.002/1.00 | 3.9/0.83 | 0.028/0.00 | 0.11/0.23 |
Perovskite + spinel | ||||||
Linear | 0.031/0.80 | 0.027/0.74 | 0.030/0.76 | 0.038/0.86 | 0.024/0.63 | 0.029/0.77 |
NN, N = 1 | 0.024/0.88 | 0.021/0.84 | 0.023/0.86 | 0.030/0.91 | 0.019/0.77 | 0.024/0.85 |
NN, N = 2 | 0.023/0.92 | 0.018/0.86 | 0.021/0.88 | 0.035/0.92 | 0.017/0.72 | 0.024/0.86 |
NN, N = 3 | 0.022/0.92 | 0.017/0.87 | 0.019/0.90 | 1.2/0.93 | 0.016/0.01 | 0.023/0.86 |
NN, N = 4 | 0.019/0.94 | 0.014/0.90 | 0.017/0.92 | 1.3/0.92 | 0.017/0.00 | 0.024/0.86 |
NN, N = 6 | 0.016/0.96 | 0.012/0.93 | 0.014/0.95 | 3.5/0.93 | 0.018/0.00 | 0.031/0.77 |
NN, N = 9 | 0.013/0.98 | 0.009/0.96 | 0.011/0.97 | 0.62/0.90 | 0.020/0.00 | 0.039/0.70 |
NN, N = 12 | 0.01/0.99 | 0.007/0.97 | 0.008/0.98 | 4.3/0.86 | 0.024/0.00 | 0.051/0.59 |
The RMSE for the testing set could be decreased, especially for the NN regression with a larger number of nodes, by increasing the number of data using both the perovskite- and spinel-type oxides data in a combined dataset. These results show that a key issue is overfitting due to the small number of data points. Fig. 5 shows the distributions of the data for selected pairs of parameters (among bij, R0,ij, Rcutoff,ij, ri, NC). Even from two-dimensional projections that allow only a limited insight into a multivariate distribution, one can appreciate rather uneven and sparse sampling with data based on individual crystal structure types. This result indicates that the accuracy of SoftBV can be improved by estimating sf as a function of SoftBV parameters encoding composition if the space of descriptors can be adequately sampled using data for oxides of various compositions.
The crystal structures were optimized using each of the average sfopt computed from each of the five linear, NN, and GPR-NN (shown in Section 3.2) regression models that had the highest R2 values among the 100 runs (Fig. 6). These models used both perovskite- and spinel-type oxide data for training. The use of sfopt improved the accuracy of structure optimization from using sfauto. The mean absolute error (MAE) and the standard deviation (STD) of the distributions of Er are summarized in Table 2. Although an NN in principle has a higher expressive power and should be able to make a better fit, the MAE and STD for the linear model were equal or even slightly better than the NN model. This ultimately has to do with a small number of data and associated overfitting (see Fig. 4). Overall, there is no significant improvement in sf fitting quality with NN vs. linear regression, and the NN fit does not lead to an improvement in the estimation of the optimal sf and in the quality of structure optimization. While the accuracy has improved on average, the distribution of Er with the linear or NN regression is relatively broad with Er for some materials exceeding 0.1. The GPR-NN regressions (described in the following section) have the highest accuracy for optimizing the crystal structures with the narrowed distribution of Er, with MAE = 0.014 and STD = 0.025.
Auto | Linear | NN (node = 1) | GPR-NN | ||
---|---|---|---|---|---|
Perovskite oxides | MAE | 0.13 | 0.026 | 0.031 | 0.014 |
STD | 0.066 | 0.038 | 0.044 | 0.024 | |
Spinel oxides | MAE | 0.10 | 0.023 | 0.022 | 0.013 |
STD | 0.032 | 0.024 | 0.026 | 0.026 | |
Both oxides | MAE | 0.12 | 0.025 | 0.026 | 0.014 |
STD | 0.053 | 0.032 | 0.036 | 0.025 |
Fig. 7 shows the relationship between GII obtained from the optimized and reference structures. GII is an index for chemical stability, e.g. GII < 0.1 is typically taken to mean that the structure is stable, while GII > 0.2 is considered to be a warning that the structure may be unstable.21 A better GII value should be obtained when a better structure is used because the error of GII is due to the error of as eqn (7), in other words, due to the error in the distance between cations and anions. GII values of optimized structures using sfauto are larger than those of reference structures and do not show the correlation of GII values of SoftBV-optimized structures with those of reference structures. On the other hand, there is a correlation between the GII values of structures optimized using sfopt and the reference structures, especially for perovskite-type oxides. This result reflects the improvement of the accuracy of structure optimization with ML-estimated sf.
A NN performs non-linear operations on linear combinations of inputs {xn} introducing both nonlinearity and coupling. This is true even for a single-hidden neuron NN. We can separate these two effects with the help of the GPR-NN method. We first perform simulations where y = x, i.e. an additive model in x, . We perform a two-dimensional hyperparameters scan of the length parameter l and the GPR noise parameter σ. At each (l, σ), we perform 100 fits differing by different random splits of training and test data (whereby 20 percent of materials are used for testing and 80 for training). Note that when l becomes large (l ≫ 1 for data scaled on unit cube), kernel resolution is lost61 and the component functions fn(xn) become near-linear. This is illustrated in Fig. 8 for the case of l = 200, log(σ) = −3, where we show the shapes of fn in such a limiting case as well as the correlation plots between the exact (target) values of sf and those predicted by the model for a representative run. In this case the average/min/max/standard deviation (over 100 runs) of the training set R2 are 0.80/0.78/0.84/0.02, and of the test set R2, 0.77/0.59/0.85/0.06, respectively, – similar to traditional linear regression. The average/min/max/standard deviation of the RMSE is 0.031/0.028/0.032/0.001 for the training and 0.032/0.028/0.039/0.002 for the test set, respectively.
The optimal hyperparameters were chosen as those minimizing simultaneously the average test set R2 and its variance (over multiple runs); they are l = 7 and log(σ) = −3. With these hyperparameters, the average/min/max/standard deviation (over 100 runs) of the training set R2 are 0.89/0.88/0.92/0.01, respectively, and of the test set R2, 0.85/0.71/0.93/0.05, respectively. The average/min/max/standard deviation of the RMSE is 0.022/0.019/0.023/0.001 for the training and 0.024/0.017/0.038/0.006 for the test set, respectively. This is a noticeable improvement over linear regression and the NN. This model has no coupling. The correlation plots between the exact (target) values of sf and those predicted by the model as well as the shapes of fn in this case are shown in Fig. 9 for a representative run. They are highly nonlinear. Nonlinearity improves the quality of the model and also influences the relative importance of variables: in both the linear and the nonlinear model, the most important (by the magnitude of fn(xn)) variables are x1 (Er), x4 (R0,AO), and x8 (rA). The least important is x10 (NC,AO) in the non-linear model with the optimal l = 7 while it is x11 (NC,BO) in the (practically) linear model achieved with l = 200. The order of importance of variables with small magnitudes of fn(xn) may differ; it is normal that the relative importance of features is different for different methods.62,63
We now fix l and σ at their optimized values and test if adding coupling terms further improves the model. The results are summarized in Fig. 10. We do not observe any further improvement due to the inclusion of coupling among the features. The coupling terms are either unimportant or unrecoverable due to the low density of sampling.
Finally, in Fig. 6, we show the distribution of structural parameter errors achieved with the GPR-NN method (using optimal hyperparameters). The method is clearly superior over the linear regression and the NN in terms of the average error as well as the width of the error distribution, which are listed in Table 2. The optimal shapes of the nonlinear functions used with each variable, and the absence of nonlinear parameter optimization in GPR-NN allow capitalizing on the superior expressive power of a nonlinear method while retaining the robustness of linear regression.
We showed that the sampling density of the space of descriptors is an important limiting factor in the possible improvement in sf, which may even prevent one from using the superior expressive power of nonlinear models. In this work, this was palliated on one hand by combining data from different crystal structures having structural similarity (perovskite and spinel oxides in this case) and on the other hand by producing synthetic sample points from strained structures. Only a slight improvement in the screening factor regression was obtained with an NN over linear regression while no improvement over linear regression was observed in the quality of structure optimization with sf predicted by the NN model.
We then applied to this problem the recently developed GPR-NN method that allows obtaining a superior expressive power of a nonlinear approximation while avoiding nonlinear parameter optimization during regression. The method is a hybrid between an NN and kernel regression; it builds optimal shapes of nonlinear basis functions (neuron activation functions) and permits including coupling among features in a controlled way. We analyzed the relative importance of nonlinearity and coupling and found that while nonlinearity helps obtain a more accurate model, coupling terms were not important or were unrecoverable from the data. The sf predicted by GPR-NN showed the best quality of structure optimization with SoftBV and a significant improvement over linear and NN regressions.
As our tests on perovskites, spinels, and combined data show, there is a degree of portability of the machine-learned model to other crystal structures with similar coordination environments of ions. The models are in principle not portable to crystal structures with significantly different coordination environments simply because other parts of the feature vector x, which are environment-dependent SoftBV parameters, will be different and unsampled. While this is a limitation, there is still much value in exploring different compositions of particular crystal symmetries, of which there is typically a finite number of interest in particular applications.
Footnote |
† Electronic supplementary information (ESI) available: The list of structures used in this work together with their respective database identifiers, as well as the dataset used for machine learning. See DOI: https://doi.org/10.1039/d4dd00152d |
This journal is © The Royal Society of Chemistry 2024 |