Byung Do Lee‡
a,
Jin-Woong Lee‡a,
Joonseo Parka,
Min Young Choa,
Woon Bae Park*b and
Kee-Sun Sohn*a
aFaculty of Nanotechnology and Advanced Materials Engineering, Sejong University, Seoul 05006, Republic of Korea. E-mail: kssohn@sejong.ac.kr
bDepartment of Advanced Components and Materials Engineering, Sunchon National University, Chonnam 57922, Republic of Korea. E-mail: wbpark@scnu.ac.kr
First published on 31st October 2022
When constructing a partially occupied model structure for use in density functional theory (DFT) and ab initio molecular dynamics (AIMD) calculations, the selection of appropriate configurations has been a vexing issue. Random sampling and the ensuing low-Coulomb-energy entry selection have been routine. Here, we report a more efficient way of selecting low-Coulomb-energy configurations for a representative solid electrolyte, Li6PS5Cl. Metaheuristics (genetic algorithm, particle swarm optimization, cuckoo search, and harmony search), Bayesian optimization, and modified deep Q-learning are utilized to search the large configurational space. Ten configuration candidates that exhibit relatively low Coulomb energy values and thereby lead to more convincing DFT and AIMD calculation results are pinpointed along with computational cost savings by the assistance of the above-described optimization algorithms, which constitute an integrated optimization strategy. Consequently, the integrated optimization strategy outperforms the conventional random sampling-based selection strategy.
When the configuration issue is to be considered, virtual crystal approximation (VCA)16–20 and coherent pseudopotential approximation (CPA)21 are viable solutions. The VCA (i.e., an effective pseudopotential represented as a weighted sum of different pseudopotentials) has been suggested for solid solutions, but the VCA is available only for limited compounds that are composed of metallic elements with similar electronic configurations. The CPA approximates a random configuration with an effective medium that is determined self-consistently from the condition of stationary scattering.21 Although the CPA has actually outperformed the VCA,21 it is not well suited to total energy calculations and geometry optimization.22 Likewise, the configurational average approximation also seems unrealistic for relatively large supercells because the enhanced configurational diversity in a large supercell would lead to more scattered ab initio calculation results.23,24 In addition, the configurational average strategy could increase both cost and time.25–28 The cluster expansion method (CEM)29–33 could be an alternative approach to address the configuration issue in a solid electrolyte material. Since the cluster expansion is fitted using a DFT-calculated property for a set of representative configurations, the CEM still requires extra and expensive DFT-calculated data in the regression fitting procedures. Therefore, none of the strategies described above seems ideally suited to address the configurational issue in solid electrolyte materials.
A general and simple solution to address the configuration issue, which has been widely accepted in the field,34–39 is random sampling from the large number of viable configurations (for instance, the total number of viable configurations for Li6PS5Cl amounts to ∼1013 even for a single unit cell model) and the ensuing choice of some configurations based on a certain selection criterion. The criterion utilized for the configuration selection following the random sampling is the electrostatic energy (Coulomb potential or energy),34–39 which can be evaluated by a very fast, low-cost computation and is reliable enough to roughly approximate the DFT-calculated total energy, although the structure that minimizes the electrostatic energy is not always the lowest energy structure based on DFT calculations. Despite the swift evaluation of the Coulomb energy that is much faster than the DFT calculations, the complete enumeration for the huge number of configurations is practically impossible. The selection of a few low-Coulomb-energy entries from the nominal number (∼1000) of random configurations, which is a current method in the field, would be inefficient to find the lowest-Coulomb-energy configurations that might be better matched with real-world ground state structures.
While the computationally inexpensive Coulomb energy is still being utilized as a selection criterion, we propose a more systematic methodology to nominate plausible configurations for use in the ensuing DFT and MD calculations—an optimization algorithm that enables us to nominate a greater number of low-Coulomb-energy configurations with incredible cost savings. This optimization algorithm makes it possible to search the enormous configuration search space and nominate a number of configurations that could represent real-world ground state structures for argyrodite. A series of optimization algorithms are applied in parallel to select argyrodite configurations that have minimum Coulomb energy values and thus could be used as input model structures for the subsequent DFT and MD calculations.
The adopted optimization algorithm includes four well-known metaheuristics: the genetic algorithm (GA),40–42 particle swarm optimization (PSO),43,44 harmony search (HS),45 and cuckoo search (CS).46 In addition to metaheuristic algorithms, we introduce Bayesian optimization (BO).47,48 Although the deep Q-network (DQN)49 is not dogmatically designed for optimization tasks, we modify it for use in a simple optimization task such as the argyrodite configuration search. The metaheuristic, which is inspired by generally acknowledged disciplines such as biology, ethology, and music, is known to outperform the traditional Jacobian/Hessian-based optimization.50 The metaheuristic has the merit of offering a higher possibility of escaping from local optima. We take on the four most frequently utilized metaheuristics: GA, PSO, HS, and CS. Along with the metaheuristic, we also employ Bayesian optimization, which has been known to be more efficient for optimization tasks that require the extremely expensive evaluation of an objective function. DQN, a recently devised reinforcement learning algorithm, represents a merger between traditional Q-learning and a more recent deep neural network.51 In principle, the DQN is not designed for use in such a simple optimization but for more complicated dynamic problems such as games. The DQN capability is far beyond a simple optimization. A truncated (or modified) DQN algorithm, called a single-episode DQN, is fully adapted to the optimization task.
We systematically compare all the above-described optimization algorithms for the argyrodite configuration search. In addition, the validity of each optimization algorithm is preconfirmed using well-known benchmark test functions. Although some previous reports have dealt with a single optimization algorithm, such as a genetic algorithm or particle swarm optimization, for a similar configuration selection task,52–55 we introduce six optimization algorithms, involving four metaheuristics, a Bayesian optimization, and a reinforcement learning algorithm (=DQN), which constitute an integrated optimization strategy.
Method | Hyperparameter | Coulomb energy (eV f.u.−1) |
---|---|---|
Random | — | −247.20 |
GA | Mutation probability: 0.1, elite size: 0.01, crossover probability: 0.5 | −249.27 |
PSO | C1: 2, C2: 2, w: 0.85 | −248.10 |
HS | HMCR: 0.7, PAR: 0.3, bw: 0.01 | −248.72 |
CS | Pa: 0.25, β: 1.5 | −248.80 |
BO | α: 10−5, β: 10−7 | −247.54 |
DQN | Episode length: 2500, learning rate: 0.001 | −248.01 |
The BO performance depends significantly on the objective function terrain; if a continuous and smooth benchmark function is of concern, the GPR works better; on the other hand, if a benchmark function has a nonsmooth terrain with abrupt transitions, then the GRP would not be appropriate as a surrogate function. Very recently, Lei et al.59 reported a much more improved BO performance using different surrogate functions rather than GPR, so-called Bayesian multivariate adaptive regression splines (BMARS)60,61 and Bayesian additive regression trees (BART).62 However, Lei et al.59 adopted a truncated search space in the decision variable range [−2, 2] with 4D (for Rosenbrock) and 10D (for Rastrigin) search spaces. The decision variable ranges that Lei et al.59 adopted are far narrower than those generally recommended. We adopted the decision variable range [−5, 5] with a 6D search space for six benchmark functions: the Ackley, Egg Crate, Griewank, Rastrigin, Rosenbrock, and Sphere functions. More importantly, since the metaheuristics, such as GA, PSO, HA, and CS, greatly outperformed BO, the improved BO algorithms would make no sense in the present investigation. However, we also adopted alternative surrogate functions, which are four ensemble-learning algorithms such as random forest, AdaBoost, gradient Boost, and extreme gradient (XG) Boost.
We altered the operation method to make it possible to use the DQN algorithm for a simple optimization task. A so-called one-episode training scheme was developed to save computational cost. It is unnecessary to fully train the DQN, but instead it would be satisfactory as far as we can manage to reach (or pass through) the global optimum (or one of promising local optima) in the course of the one-episode training. Since it is highly likely to find the global optimum (or one of promising local optima) during an episode at the expense of minimum computational costs, we do not need to learn the complete terrain of the objective function.
The state was defined as decision variable vectors in a six-dimensional decision variable space. The action was defined as small-step movement from the current state. A remarkable measure that we newly took for our modified DQN approach is the inclusion of the step length in the output layer of the deep neural network, such that it can be adjustable during the training, and depending on the state of concern (i.e., the step length differed for every position in the decision variable space, presumably, the step length would be shorter if we get closer to the optimum, otherwise longer). The reward was defined as two fixed numbers; if the action from the current state to the next state gives rise to an improvement in the objective function, then the reward is +0.1; otherwise, it is −100. More details for the one-episode DQN can be recognized from the code available at https://github.com/socoolblue/optimization.
Coulomb energy is assigned to the objective function to be minimized. The Coulomb energy could be a rough measure of the total internal energy, although a complete coincidence would not be viable. A plot of Coulomb energy vs. DFT-calculated total internal energy is shown in Fig. 2b, wherein some randomly chosen configuration data were presented along with the ideal linear fit. It is reasonable to recommend Coulomb energy as a rough measure of total energy and thereby to use it as an objective function for the optimization process, wherein the appropriate ground state configuration can be found. The use of Coulomb energy has an economical merit in comparison to the use of DFT-calculated total energy, since it takes less than a second to complete the Coulomb energy calculation for a single configuration. We adopted Okhotnikov et al.'s Coulomb energy calculation code.72
There have been well-established codes for crystal structure prediction such as USPEX73,74 and CALYPSO.75,76 These codes exhibit either the similarity or the dissimilarity in comparison to our approach. In view of the optimization algorithm, an evolutional algorithm is adopted for USPEX73,74 and a PSO algorithm for CALYPSO.75,76 Presumably, both the well-established codes would give a similar result to what we produced. It should be noted, however, that we employed six optimization algorithms even including both the PSO and evolutionary algorithm that the USPEX and CALYPSO adopted.
In advance of their use in the argyrodite configuration optimization task, we tested all the optimization algorithms using several well-known benchmark test functions, such as the Ackley,78 Egg Crate,79 Griewank,80 Rastrigin,81 Rosenbrock,82 and Sphere83 functions, each of which is schematically described in Fig. S1, ESI.† The search space constituted by the 6-dimensional decision variable vector span with each vector component was restricted in the range [−5, 5]. This preliminary benchmark function test was executed in hopes that the same codes could work for the argyrodite configuration optimization as well. Table S1, ESI,† shows the final optimized objective function (benchmark function) values that were reached by each of the metaheuristic (GA, PSO, HS, and CS), BO, and DQN algorithms. All the algorithms worked out, although it is not clear whether the minimum objective function value at the last generation was close to the global optimum. Although the PSO algorithm is conspicuously superior to others in terms of the final optimized objective function value in the last generation for most benchmark functions, it is not possible to find a definite relative dominance among those optimization algorithms by considering all the benchmark function test results on average, as evidenced in Table S1.† In principle, the performance of optimization algorithms strongly depends on the benchmark test function. Although we can roughly predict the terrain of all the benchmark functions, no one knows what the Coulomb energy terrain in the argyrodite configuration space looks like. That is why we have to pretest more than several benchmark functions that dramatically differ from one another. Moreover, we have to employ as many optimization algorithms as possible because it would be thoroughly impossible to know which algorithm works for which benchmark function. In this context, we employed the six optimization algorithms for Coulomb energy minimization over the large argyrodite configuration space, and these six optimization algorithms were pretested using six benchmark functions. Thus, the integrated optimization strategy we propose here would be more promising than a typical single-algorithm strategy.
For a systematic comparison among those optimization algorithms, the same number of objective function evaluations was applied for every algorithm, that is, the population size was 25 and the generation number was 100, leading to the total number of evaluations fixed at 2500. In the exceptional case of the DQN algorithm, however, a higher number of objective function evaluations was required to obtain a similar level of optimization in comparison to the other optimization algorithms. Although it appears that a much higher number of evaluations is required for the DQN algorithm, the required number of evaluations varies on a case-by-case basis. The initial decision variable vector (the starting point) that is to be determined randomly had a great influence on the performance of the DQN-based optimization, since we adopted a single episode training, as discussed in the Computational methods section. Fig. S1, ESI,† shows the instantaneous objective function (benchmark function) value versus the generation number (which equals the number of objective function evaluations multiplied by the population size of 25) for every optimization algorithm. Fig. S1† also shows that there is no obvious superiority or inferiority among the optimization algorithms for all the benchmark functions, and every algorithm ultimately leads to an acceptable convergence close to the global optimum. For some optimization algorithms (in particular, for BO and DQN), neither the global optimum nor one of promising local optima was reached, but this sort of incomplete optimization is customary for the heuristic nature of the algorithm. The benchmark function test results shown in Table S1 and Fig. S1† imply that some of the aforementioned algorithms with the same hyperparameters refined from these benchmark function tests should definitely work out for the argyrodite configuration optimization problem as well. The hyperparameter screening results are also given in Table S2.†
Rather than the nomination of minimum Coulomb energy from a single random sampling, the 100 extreme (minimum) values were extracted from the 100 random samples, each of which consists of 2500 configurations. This means that we reached the averaged minimum of −247.20 eV f.u.−1 from the 100 samples (250000 configurations in total), but a conventional single random sampling (even if we enhanced the number of configurations in the sample) would never make it possible to reach this level. The optimization algorithm should be employed to reach as close to the global minimum as possible. The use of the optimization algorithm would be more appropriate in finding energetically stable configurations than the present random sampling that is even more reasonable than the conventional single random sampling. On this ground, Table 1 shows the Coulomb energy values that were minimized by every optimization algorithm and includes a random sampling result at the same expense of computational costs (that is, the same number of objective function evaluations). All the optimization algorithms gave rise to lower Coulomb energy than random sampling. In particular, GA exhibits an outstanding record (−249.27 eV f.u.−1), which might be presumed to be close to a global optimum. Despite the 100 times greater number of evaluations, random sampling could never reach this level.
Fig. 3b shows the argyrodite configuration optimization execution results for all the algorithms, namely, the plot of Coulomb energy (objective function) as a function of the generation number. The Coulomb energy gradually decreases as the algorithm proceeds toward later generations. The term ‘generation’ originated from the GA, but we used it in common for every algorithm for convenience since all the algorithms are population-based. Of course, both the BO and DQN algorithms are not population-based. The generation can be, however, regarded simply as the 25 evaluations of the objective function in these cases. The plot of Coulomb energy versus the generation number is in a typically decreasing form, similar to those for the benchmark function-based optimization shown in Fig. S1, ESI.† The initial drastic drop was followed by a retarded decrease for all the metaheuristics, as shown in Fig. 3b. In particular, the GA appears to outperform the other metaheuristics, but it was proven that every metaheuristic worked out to a certain extent and thereby gave acceptable Coulomb energy values that are lower than those from random sampling. While PSO outperforms the other metaheuristics for most benchmark function tests, as shown in Fig. S1, ESI,† the GA is the best for the argyrodite configuration optimization, as evidenced in Fig. 3a and b, wherein the objective function value converged earlier for the GA and the converged value is lower than any other algorithms. In contrast to the benchmark function test, PSO gave a disappointing result, namely, the PSO-nominated minimum Coulomb energy is relatively high in comparison to the GA-nominated, and the PSO data distribution, shown in Fig. 3a, also looks to be the worst, which is almost identical to the random data distribution. Although the minimum objective function values obtained from the PSO, BO, and DQN algorithms are still lower than the minimum from random sampling, the overall data distribution for the PSO, BO, and DQN data never appears to be improved from the random data distribution, as shown in Fig. 3a. This sort of disappointing performance for the PSO algorithm seems unusual when referring to the benchmark function-based optimization shown in Fig. S1, ESI.† By invoking the fact that the optimization performance is strongly dependent on the shape of the objective function, the successful benchmark function test never guarantees success for a real-world optimization task since the objective function terrain for the real-world optimization task always remains unknown. That is why we had a discrepancy between the benchmark function test result and the real-world argyrodite configuration optimization result.
Rather than using the comparison between the optimization algorithms adopted in the present investigation, it is more important to refer to the result from the conventional random sampling and to mention the superiority of the optimization algorithm to the random sampling. We eventually sampled 250000 configurations (2500 configurations/sample × 100 samples) in total through the random process, and the minimum Coulomb energy among all those samples was −248.18 eV f.u.−1. As described above, the average of minimum values that were extracted from 100 random samples, each of which includes 2500 configurations, was −247.20 eV f.u.−1. A comparison should be made between the optimization algorithm result and the averaged minimum from the 100 random samples, including 250000 configurations, to secure a more consistent rationale for using the optimization algorithms. Note that all the optimization algorithms were equally executed based on 2500 evaluations. The random sampling results, at −247.20 eV f.u.−1 (the average of the minimums from 100 samples), are higher than those recommended by any of the optimization algorithms, although the PSO and DQN algorithms yielded a minimized Coulomb energy that is only slightly improved from the random sampling result.
It should be noted that the total number of objective function evaluations for all the adopted optimization algorithms was equally 2500 (population sizes of 25 and 100 generations). Even DQN reached −248.1 eV f.u.−1 with less than 500 evaluations (Fig. 3b). The BO algorithm yielded a slightly worse performance in comparison to the other optimization algorithms. As shown in Fig. 3b, the minimum Coulomb energy obtained from the BO execution was slightly higher than those obtained from the other optimization algorithms, as shown in Table 1 and Fig. 3b, although it is still better than the random sampling result. There is no doubt that all the optimization algorithms used in the present investigation led to argyrodite configurations that exhibit lower Coulomb energy than the minimum obtained from the random sampling at the expense of the same computational cost, which implies that an equal number of evaluations on average led to a better outcome when adopting the integrated optimization strategy rather than the conventional random sampling.
We do not blindly trust the capabilities of BO, since it is our opinion that BO capability has been slightly exaggerated due to its outstanding merit that BO significantly reduces the computational cost for some expensive optimization problems. The expensive optimization problem represents a time- and cost-consuming objective function evaluation. Although the BO execution yielded an acceptable result and remarkably outperformed the random sampling when using the simple benchmark functions (Table S1†), it showed no such conspicuous improvement in comparison to the random sampling for the real-world argyrodite configuration optimization. The use of GPR seems inappropriate to simulate the Coulomb energy terrain over the configurational space. GPR is only suitable in smooth, continuous objective function terrain. However, the Coulomb energy terrain over the argyrodite configuration is supposed to be neither smooth nor continuous. The introduction of other surrogate functions would improve the BO performance for real-world argyrodite configuration optimization, as suggested by Lei et al.59 However, the alternative surrogate functions that we adopted did not outperform the existing GPR-based BO for the Coulomb energy optimization for the argyrodite configuration determination, as shown in Fig. S4.† More importantly, since the metaheuristics, such as GA, PSO, HA, and CS, greatly outperformed BO, the improved BO algorithms would make no sense in the present investigation.
Fig. 4 Ten configurations in the simplified schematic manner, nominated by the integrated optimization strategy. |
The ten finally nominated argyrodite configurations are utilized for the ensuing DFT and AIMD calculations, and thereby, we validated that the calculated result is more convincing in comparison with those from the random sampling. The AIMD calculations for the ten selected argyrodite configurations enabled us to estimate the Li+ ionic conductivity. Seven temperatures were selected for the calculation of mean-squared displacement (MSD) as a function of time interval (Δt). The diffusivity was obtained from the slope of the MSD over Δt by referring to the Einstein relation. The Li+ ionic conductivity was also obtained from the diffusivity calculated according to the Nernst–Einstein relation. The activation energy was obtained from the Arrhenius plot of diffusivity vs. temperature, and thereby, the Li+ ionic conductivity at room temperature was finally evaluated through the extrapolation process. Fig. 5 shows the Arrhenius plot of diffusivity vs. temperature for the ten nominated configurations. We used pymatgen34,84 and mostly followed the protocol for the AIMD-based diffusivity calculation suggested by Ong et al.34 and He et al.84 The extrapolated room temperature diffusivity is marked in red, and the blue pentacle with an error bar stands for the average room temperature diffusivity obtained from the ten extrapolated data points (the mean and standard deviation for all ten red dots in Fig. 5). Table 2 shows the calculated room-temperature Li+ ionic conductivity values for the ten low-Coulomb-energy configurations along with some other calculated and experimental results reported in the literature. We suggest that the average conductivity value from the ten Li+ ionic conductivity values should be a more representative value for argyrodite. Since the previously reported values from the literature are lacking and even the calculated and experimental values are scattered, a legitimate Li+ ionic conductivity value for argyrodite is not clearly defined. However, the Li+ ionic conductivity values for some of the suggested configurations (4.67 and 4.28 mS cm−1) were similar to the experimental values (2.7 ± 1.26 mS cm−1), despite the overall overestimation of the calculated conductivity, which is a general trend. The proposed strategy would be a starting guideline for future computational Li+ ionic conductivity.
Fig. 5 Arrhenius plot for ten selected low-Coulomb-energy configurations. The error bar stands for the uncertainty for the diffusivity at each temperature. |
Material | Ea (meV) | Diffusivity (cm2 s−1) | Conductivity (mS cm−1) | Source |
---|---|---|---|---|
a a: ref. 4, b: ref. 5, c: ref. 6, d–i: ref. 10–15. | ||||
Li6PS5Cl (GA-1) | 243 | 4.63 × 10−8 | 7.08 | Ab initio (MSD) |
Li6PS5Cl (GA-2) | 235 | 6.33 × 10−8 | 9.68 | Ab initio (MSD) |
Li6PS5Cl (GA-3) | 263 | 3.06 × 10−8 | 4.67 | Ab initio (MSD) |
Li6PS5Cl (GA-4) | 202 | 1.36 × 10−7 | 20.87 | Ab initio (MSD) |
Li6PS5Cl (GA-5) | 253 | 3.93 × 10−8 | 6.01 | Ab initio (MSD) |
Li6PS5Cl (GA-6) | 247 | 4.41 × 10−8 | 6.75 | Ab initio (MSD) |
Li6PS5Cl (GA-7) | 234 | 6.00 × 10−8 | 9.19 | Ab initio (MSD) |
Li6PS5Cl (GA-8) | 230 | 6.71 × 10−8 | 10.27 | Ab initio (MSD) |
Li6PS5Cl (HS-1) | 262 | 2.80 × 10−8 | 4.28 | Ab initio (MSD) |
Li6PS5Cl (CS-1) | 225 | 7.64 × 10−8 | 11.69 | Ab initio (MSD) |
Li6PS5Cla | 524 | — | 2 × 10−3 | Ab initio (MSD) |
Li6PS5Clb | 130–270 | — | 40 | Ab initio (MSD) |
Li6PS5Clc | 190 | — | — | Ab initio (MSD) |
Li6PS5Cld | 350 | — | 3.1 | Experimental |
Li6PS5Cle | 340 | — | 2.5 | Experimental |
Li6PS5Clf | 330 | — | 1.33 | Experimental |
Li6PS5Clg | 160 | — | 4.96 | Experimental |
Li6PS5Clh | — | — | 2.4 | Experimental |
Li6PS5Cli | 380 | — | 1.9 | Experimental |
Although the benchmark function test resulted in an acceptable result for all the optimization algorithms, the real-world argyrodite configuration optimization resulted in a disappointing result for some optimization algorithms. The GA, HS, and CS exhibited promising optimization performances both for the benchmark function test and the real-world argyrodite configuration optimization. However, PSO gave a disappointing performance for the real-world argyrodite configuration optimization, while an excellent performance was observed for the benchmark function test. In particular, the BO algorithm was worst among all the optimization algorithms for real-world argyrodite configuration optimization. It was found that the optimization performance is extremely dependent upon the objective function terrain, but no one method can surmise the objective function terrain in advance of the adoption of optimization algorithms. In this regard, the integrated optimization strategy that we suggested in the present investigation would work out rather than the use of a single specific optimization algorithm only.
Ten final argyrodite configuration candidates exhibiting the lowest Coulomb energies were identified by the GA, CS, and HS (eight from the GA and two from CS and HS). Although the lowest-Coulomb-energy entry could not directly lead to a ground state configuration corresponding to the lowest DFT-calculated total internal energy, several lowest-Coulomb-energy entries were recommended, and their averaged DFT-calculated total internal energy could represent a real-world ground state.
The diffusivity at several temperatures was calculated via AIMD calculations to constitute the Arrhenius plot, from which the activation energy and eventually the room temperature Li+ ionic conductivity were evaluated. A representative room temperature Li+ ionic conductivity value was ultimately estimated to be 9.05 ± 4.82 mS cm−1 by averaging the ten final argyrodite configurations. Although it is not possible to pinpoint a definite true conductivity value since all the reported values are diverse irrespective of whether they are experimental or computational, it is worth appreciating our averaged conductivity value thanks to the concrete rationale behind the evaluation procedures.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2ra05889h |
‡ These authors contributed equally. |
This journal is © The Royal Society of Chemistry 2022 |