Daniel
Barter‡
a,
Evan Walter
Clark Spotte-Smith‡
bc,
Nikita S.
Redkar
cd,
Aniruddh
Khanwale
bce,
Shyam
Dwaraknath
f,
Kristin A.
Persson
bg and
Samuel M.
Blau
*a
aEnergy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA. E-mail: smblau@lbl.gov
bDepartment of Materials Science and Engineering, University of California, Berkeley, CA 94720, USA
cMaterials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
dDepartment of Chemical and Biomolecular Engineering, University of California, Berkeley, CA 94720, USA
eDepartment of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94720, USA
fLuxembourg Institute of Science and Technology, Luxembourg
gMolecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
First published on 30th November 2022
Chemical reaction networks (CRNs) are powerful tools for obtaining insight into complex reactive processes. However, they are difficult to employ in domains such as electrochemistry where reaction mechanisms and outcomes are not well understood. To overcome these limitations, we report new methods to assist in CRN construction and analysis. Beginning with a known set of potentially relevant species, we enumerate and then filter all stoichiometrically valid reactions, constructing CRNs without reaction templates. By applying efficient stochastic algorithms, we can then interrogate CRNs to predict network products and reveal reaction pathways to species of interest. We apply this methodology to study solid-electrolyte interphase (SEI) formation in Li-ion batteries, automatically recovering products from the literature and predicting previously unknown species. We validate these results by combining CRN-predicted pathways with first-principles mechanistic analysis, discovering novel mechanisms which could realistically contribute to SEI formation. This methodology enables the exploration of vast chemical spaces, with the potential for applications throughout electrochemistry.
CRNs are often generated3 by applying quantum chemical methods to explore a potential energy surface (PES).4 PES exploration techniques – including ab initio molecular dynamics,5 artificial force-induced reactions,6 and stochastic surface walking,7 among others – are useful for exploring a chemical space. PES exploration requires minimal initial information (e.g. a set of initial species) and allows for the identification of intermediates, reactive products, and elementary reaction steps (including energy barriers). Unfortunately, PES exploration techniques typically suffer from prohibitively high cost, limiting their application to simple systems involving only small molecules or exploring reactivity over very short (∼10 ps) time scales. While applications of semi-empirical methods8,9 and machine learning10,11 could soon alleviate this limitation in some domains, the ongoing challenges in simulating electrochemical dynamics even for simple systems (e.g. the hydrogen evolution reaction)12 suggest that PES exploration remains unsuitable for studies of complex electrochemistry.
When PES exploration is not used, CRNs are most commonly constructed based on human chemical intuition. By applying reaction templates to include only commonly observed mechanisms13–17 or pruning by the “chemical distance” between species (the number of bonds that must change for a reaction to occur, or the number of reactions required to transform reactants to products) to focus only on starting species and known products of interest,18 it is possible to create networks capable of elucidating reaction pathways. However, chemical intuition is limited and unreliable when describing new reactive spaces.19 In electrochemistry, studies of reaction mechanisms20,21 and characterization of reaction products22 are very challenging. Additionally, the linear scaling relations (i.e. Bell–Evans–Polanyi23,24) that are widely used to predict the rates of families of similar reactions in thermochemistry have not been well established in electrochemistry. As a result, CRN methods that rely on templates or the chemical distance to known products cannot yet be used to study electrochemical reactivity.
Aiming to bypass both the cost of PES exploration and the intuition required for template-based CRN generation, we recently developed the first method to construct and analyze electrochemical CRNs, which we used to study the formation of the solid electrolyte interphase (SEI) in lithium-ion batteries. After generating graph-based CRNs containing thousands of species and millions of reactions, we used shortest-path algorithms to identify optimal pathways to two key SEI products, lithium ethylene dicarbonate (LEDC)25 and lithium ethylene monocarbonate (LEMC).26 With this approach, we recovered known and proposed reaction mechanisms and predicted pathways that had not been previously reported.
In our prior work, the networks that we studied were limited by the computational cost of network analysis, in particular due to the poor scaling of shortest-path graph algorithms. These costs constrained the number of species as well as the number and types of reactions contained in the networks. Even more critically, our prior graph-based analysis approach was limited in its predictive capacity; in order to apply shortest-path algorithms, products of interest had to be known a priori. Here we confront the more challenging problem of exploring a reactive space without significant knowledge of end products. Specifically, we seek to search for many feasible pathways under various starting conditions to a range of products, byproducts, and intermediates, including species that might not be known to be important at the time of network construction.
We present a new approach to construct and explore CRNs in electrochemistry that is capable of extracting unique insights and generating hypotheses to guide further in-depth analysis. First, we describe our method of High-Performance Reaction Generation (HiPRGen). Beginning with a set of possible species that could contribute to the chemistry of interest (Sinit), HiPRGen enumerates all stoichiometrically valid reactions and employs user-defined filters to eliminate reactions based on physical or practical criteria while aiming to retain a diverse and chemically reasonable set R. To overcome the scaling limitations of graph-based pathfinding, we explore CRNs with a stochastic approach, sampling the reactive space based without knowledge of reaction kinetics. We can then extract paths to any molecule formed in the trajectories and heuristically identify the products of the network as a function of initial conditions. The combination of HiPRGen with stochastic network analysis allows for the investigation of electrochemical reactivity without prior knowledge of reaction mechanisms or end products for the first time. We demonstrate and validate this approach with an application to SEI formation. We first identify 36 products of a HiPRGen-constructed network of roughly 5200 molecules and 86000000 reactions by analyzing the average of many stochastic simulations. The identified network products include many species reported in the SEI literature as well as a range of unreported species. To demonstrate the plausibility of these network products and their associated formation pathways, we use first-principles calculations to refine the shortest thermodynamic paths to two previously unreported products, discovering chemically plausible elementary mechanisms. Further bespoke calculations indicate that several previously-unreported network products (or related intermediates) could reasonably emerge during SEI formation and could contribute to the production of other, experimentally-identified SEI products. The methods described here serve as a starting point for predictive studies of reactivity in electrochemistry where existing knowledge is limited.
HiPRGen begins with some large dataset of species, the properties of which are known from e.g. quantum chemical calculations. We note that this work does not consider species dataset construction and assumes that an appropriate species dataset is already available (the dataset used in this work is described in Results and discussion; aims to construct electrochemical CRNs without an initial set of species are discussed in Future work). We then apply a series of filters, where each filter can remove species that are chemically unreasonable or otherwise undesirable under the conditions studied (Fig. 1-1). A list of species filters that we have designed and employed is described in Methods and is discussed in more detail in the ESI.† HiPRGen has been designed such that users can easily include additional filters, which might be necessary to apply HiPRGen to diverse chemistries.
The filtered set of species Sfiltered is then used to populate buckets that are each defined by a unique composition (Fig. 1-2). Buckets are populated by members containing either one or two species where the total composition of each member matches the composition of the bucket. This means that any pair of members in a given bucket define the reactants and products of a stoichiometrically balanced chemical reaction containing one or two reactants and one or two products. In order to reduce the number of possible reactions, we do not presently allow ternary reactions. While some elementary reactions with three products are possible, we expect them to be rare, and we do not generally believe that elementary reactions with three reactants are meaningful in electrochemistry. For each bucket, all combinations of two unique members yield unique reactions (Fig. 1-3). Note that, because we allow for electrochemical reactions, charge is not necessarily balanced in these reactions. For a system of several thousand species, there can easily be hundreds of billions or even trillions of stoichiometrically valid reactions. Reaction filters are therefore employed to remove reactions that, despite being stoichiometrically valid, are chemically implausible or otherwise undesirable (Fig. 1-4). All reaction filters employed in this work are described in Methods and are discussed in more detail in the ESI.† Finally, the reactions from each bucket that pass all filters are aggregated. The result of HiPRGen is a set of filtered species Sfiltered and filtered reactions Rfiltered, which constitute a CRN.
HiPRGen can enumerate and filter all possible reactions between up to approximately 10000 species, overcoming the scaling limitations of our previous approach.26 Further, to the best of our knowledge, HiPRGen is the first method that combines an exhaustive enumeration of stoichiometrically valid reactions with a range of chemically-motivated filters that leverage pre-computed molecular properties. In addition to improving the thoroughness and efficiency of reaction generation compared to previous methods (see ESI†), HiPRGen has the benefit that the filtering infrastructure was designed to be easily modified and extended by future users, making it facile to apply HiPRGen to new chemical domains.
It is worth briefly comparing HiPRGen to template-based methods of reaction enumeration. HiPRGen is inherently inefficient compared to template-based CRN generation. Many of the reactions generated by HiPRGen may not occur in a single step, may not be kinetically accessible (due to excessively high energy barriers), or may not ever occur in the chemical system of interest because they require a reactant that will never form. While we are continuing to improve HiPRGen's filters in order to better avoid non-elementary or inaccessible reactions, in the absence of a general method to robustly identify plausible species and reactions in electrochemistry, this inefficiency cannot presently be avoided. Templates can also produce unreasonable reactions, and it can be difficult even for experts to identify such exceptions to chemical rules.27 Nonetheless, this problem is likely more severe for filter-based than template-based reactions. Where HiPRGen excels is in the inclusion of exceptional reactions that do not follow normal trends or patterns. Moreover, HiPRGen's method of bucketing ensures that no duplicate reactions will ever be added to a CRN (a particular reactant–product pair is only considered once), while in a naive template-based approach, duplicate reactions could easily be produced if multiple templates convert a set of reactants to the same products.
From the CRN generated by HiPRGen, it becomes possible to search for diverse products and reaction pathways to those products. However, even after filtering the set of stoichiometrically valid reactions, the number of remaining reactions can be so vast that a highly scalable method of network analysis is required.
When templates are viable and accurately describe the reactivity in a system, they can be used to approximate reaction kinetics with minimal cost.13,17 In a template-free network of potentially millions of reactions, it is completely impossible to include accurate rate coefficients for all reactions. For the purposes of stochastic network exploration and analysis, we therefore assign rate coefficients by fiat. All unimolecular reactions are given the same rate coefficient k0; to ensure appropriate units, all bimolecular reactions have the rate coefficient k0/V, where V is a spatial term related to the (in this case fictitious) system volume.28,30
A critical note: most commonly, the Gillespie algorithm and kMC are used to study the time evolution of a reacting system. In such a case, the use of fiat rate coefficients would be inappropriate, as it likely would not lead to even qualitatively accurate dynamics. However, in this work we are not interested in reactive time evolution. Rather, as we discuss below, we use the Gillespie method in a somewhat unorthodox manner to obtain non-dynamic insights into reactivity, focusing mainly on which reactions proceed (without concern for how quickly they proceed or when they proceed in time) and which species form (without concern for quantitatively accurate ratios of products, which would require a notion of relative rates). Given sufficient sampling, all reactions included in the network that would occur given accurate kinetics will be observed in kMC simulations with fixed rate coefficients. In sum, because our aims do not include accurate time evolution or quantitative competition between possible products, we can employ kMC with arbitrary rate coefficients. We further note that the analysis of CRNs without kinetic information is not without precedent; for instance, Stocker et al.31 have previously used reaction thermodynamics and arbitrary energy barriers to explore a CRN describing combustion.
In addition to providing arbitrary and fixed rate coefficients, we consider only reactions with ΔG < 0 eV. This latter simplification is necessary to eliminate cycles or loops from the network. In reality, depending on temperature, reactions with ΔG above zero can occur. Furthermore, the inherent uncertainty in calculated reaction thermodynamics likely means that some number of reactions that we calculate to have ΔG slightly above zero (endergonic) in reality have ΔG slightly below zero instead (exergonic). However, the elimination of loops is practically necessary to enable CRN analysis. If all reactions have the same rate coefficient, then the presence of loops effectively ensures that any kMC simulation will be dominated by unimportant back-and-forth processes. This dramatically increases the noise in the simulations, making identification of important species and reactions difficult (see below). Beyond such practical and technical considerations, the elimination of endergonic reactions is reasonable on a physical basis within our domain of interest, electrochemistry. Electrochemical reaction cascades are often dominated by ion- and radical-driven reactions.32,33 Such cascades should, in general, be comprised entirely or almost entirely by (often rapid) exergonic steps, meaning that the elimination of endergonic steps should not significantly affect the predictions of our simulations.
To analyze a CRN, we perform a large number of kMC simulations in parallel (Fig. 2a). The result of each simulation is a series of reactions defining a trajectory of the system state. If a molecule of interest is known, we can use these trajectories to identify potential formation pathways to that molecule. We trace each trajectory; if the molecule of interest is formed at any point, we then identify the shortest sequence of reactions leading to its first formation (Fig. 2b). Performing this method of stochastic pathfinding over many trajectories, we identify a range of possible pathways to the molecule of interest. We then rank the identified paths in order to identify the “best” paths among those observed, as defined by some cost function (see Methods). The thermodynamic pathways obtained from network analysis can then be subjected to further analysis to identify complete mechanisms, including transition-states (TS) and energy barriers.
However, pathfinding is useful only if one already knows what molecule to search for. Stochastic sampling with kMC, unlike graph-based pathfinding, enables the exploration of a reactive space without a specific target. This is because, while kMC trajectories can be used to search for a specific species, they are neither produced with any species in mind, nor are they biased towards any species. As a result, a unique capability of our approach is the ability to identify products of a CRN with minimal prior knowledge. To do this, we apply a set of heuristic criteria to the collection of trajectories (Fig. 2c). In line with the common-sense notion of a reaction product, we define a network product as any species that (i) is on average formed significantly more than it is consumed; (ii) accumulates significantly in the final state of an average trajectory; and (iii) can be reached by low-cost reaction pathways (see Methods). We note that the specific products that are identified depend on threshold values for these heuristics, which are arbitrarily selected. We further emphasize that the heuristics just described essentially require the elimination of endergonic reactions and reactive loops, as described above. If a species is involved in one or many loops, then the back-and-forth reactions would make exact counts of formation and consumption reactions meaningless. In addition, with loops present, a kMC simulation can in principle proceed indefinitely, which makes definition of accumulation in a “final” state problematic.
Using this heuristic method, we are able to analyze the structure of the CRN itself. The average trajectory (Fig. 2c) satisfies a rate equation of the system.34,35 We observe (see ESI†) that the average trajectory is smooth, indicating convergence to the exact expected dynamics. Because the rates used in our simulations are arbitrary, the dynamics themselves are not meaningful, but the trajectory smoothing ensures that we have sufficiently sampled reactive trajectories. Therefore, the identified products are well defined and invariant to changes in e.g. random seeds. The products of the network are not necessarily the metastable or stable products that would be observed experimentally, nor are they necessarily exhaustive (as they depend on which species are included in the CRN). Nonetheless, the network products provide useful hypotheses regarding what might form in an actual reactive system. We can then interrogate these hypotheses and validate them by either theoretical or experimental means. We note that in addition to the choice of heuristic thresholds, the choice of initial state can affect the network products identified via this method.
Network construction resulted in a CRN containing 5193 filtered species and 86001275 filtered reactions. With this network, we conducted 100000 stochastic trajectories under four conditions, with combinations of two different applied potentials (+0.0 V vs. Li/Li+ and +0.5 V vs. Li/Li+) and two different initial states (one consisting only of Li+ and ethylene carbonate (EC) and the other consisting of Li+, EC, and CO2). Average trajectories for each condition are shown in the ESI.† We emphasize that our goal is not to compute and observe the dynamics of SEI formation, but rather to identify key species and reaction pathways. We further note that we do not consider the effect of the electrode surface in our simulations. However, since the SEI can grow to a thickness of ∼10–100 nm, the effect of the electrode on the SEI chemistry should be small after the first reactions occur. Moreover, the products of the SEI are in general insensitive to the identity of the anode.41
The utility of our approach is demonstrated through analysis of the 36 network products collected from the set of four conditions previously described (Fig. 3). We first note that our automated procedure recovers 16 species that include a majority of the experimentally observed products of SEI formation (Fig. 3 solid dark green). These include gases (H2, C2H4, CO),42 inorganic species (lithium carbonate (Li2CO3) and lithium oxalate (Li2C2O4)),43,44 and alkyl carbonates (including species closely related to LEDC43–45 and LEMC,22,44 as well as lithium methyl carbonate or LMC, lithium butylene dicarbonate or LBDC,44 and lithium vinyl carbonate).46 We emphasize that these species are recovered even though reaction kinetics are entirely ignored in network exploration.
In addition to these well-known species, there are also a number of novel products that have not previously been proposed to participate in SEI formation. Among these are six additional alkyl carbonates (Fig. 3 dotted light green) which are each very similar to known products in molecular size, composition, bonding, and contained functional groups. Due to the extreme difficulty of experimentally characterizing the SEI and the resulting limited ability to resolve small signal to noise,47 the likely spectroscopic similarity48 of these species to the known products means that they may be present in the SEI in small quantities but that they could not easily be positively identified.
Other network products include species with ester, carboxylate, and oxide functional groups, such as lithium 2-(formyloxy)ethan-1-olate, which we abbreviate as LFEO, as well as a number of cyclic species, such as 4,4′,5,5′-tetrahydro-2,2′-bi(1,3-dioxolylidene), which we abbreviate as bi-dioxolylidene. LFEO and bi-dioxolylidene (Fig. 3 dashed purple) were particularly unexpected given how distinct they are from other predicted SEI products and, in particular, the experimentally identified products. Evaluating whether or not these products will actually form in the SEI necessitates considering energy barriers, kinetics, and reactive competition. Using the shortest paths from stochastic network analysis to guide automated transition-state calculations, we identified elementary formation mechanisms to both LFEO and bi-dioxolylidene to evaluate their potential to participate in SEI formation (see CRN-derived elementary mechanisms to form unexpected network products).
On the other hand, there are some network products which do not reflect the corresponding chemical system in a real battery. Specifically, both vinylene carbonate (VC) and propylene carbonate (PC) are known to rapidly decompose when included in battery electrolytes.49,50 This contradiction indicates that there are reactions or species that are necessary to facilitate the decomposition of VC and PC must be missing from the network. The identification of this gap through the use of CRN analysis and network product prediction provides a tractable path forward to expand the CRN via selective addition of missing molecules that enable redox, decomposition, or recombination of network products with other abundant intermediate or product species.
Using our stochastic approach, we can identify the N lowest-cost reaction pathways to the network products, ranked by a cost function that we have employed previously25 (see Methods). We selected two unexpected network products – LFEO and bi-dioxolylidene – and subjected their shortest pathways in order of cost to an automated procedure to identify the TS for each step along each pathway, allowing for the construction of complete reaction mechanisms. Fig. 4 highlights two formation paths obtained using this procedure, emphasizing the utility of network-generated reaction pathways to construct elementary mechanisms.
The network pathway shown in Fig. 4a has the 12th lowest cost with only Li+ and EC as starting species (no CO2) at +0.0 V vs. Li/Li+. In this pathway, Li+ coordinates with EC, and the Li+EC reduces twice. The doubly reduced Li+EC2− then ring-opens at the shoulder bond, after which this shoulder-ring-opened species can abstract a proton from an additional Li+EC, forming LFEO with a Li+EC–H− as a byproduct. The identified elementary mechanism follows this path exactly, with two TS – one for the ring-opening of Li+EC2− with a barrier ΔG‡ = 0.10 eV and one for proton abstraction from EC to form LFEO with ΔG‡ = 0.11 eV.
A path to form bi-dioxolylidene is shown in Fig. 4b. This network pathway has the 3rd lowest cost for simulations with CO2 was present as a starting species at 0.0 V vs. Li/Li+. In the pathway, CO2 reduces twice and coordinates with Li+, forming Li+CO22−. This Li+CO22− species reacts with EC to form Li+CO32− and the 1,3-dioxolylidene carbene, which we abbreviate as dioxolylidene. Two of these carbenes can then combine to form the dimer bi-dioxolylidene. The identified elementary mechanism follows the same general steps as the network pathway – coordinate and reduce, form dioxolylidene, and then dimerize two carbenes – but differs in two main ways. First, it is more favorable to reduce EC than CO2, which changes the initial steps of the mechanism. Second, we found that the carbene formation actually occurs via an addition–elimination mechanism with two elementary steps. The addition, which results in an EC–CO2 adduct, has a barrier ΔG‡ = 0.20 eV, and the elimination to produce Li+CO32− and dioxolylidene has a barrier ΔG‡ = 0.01 eV.
The identified elementary mechanisms to LFEO and bi-dioxolylidene involve steps that are predicted to be competitive with other known SEI formation processes. Both mechanisms rely on Li+EC2−, which can form easily at low potentials.41,51 After breaking the shoulder bond, the ring-opened Li+EC2− is known to decompose unimolecularly to Li+OCH2CH2O2− and CO with a predicted energy barrier between 0.09 eV (ref. 41) and 0.22 eV (ref. 51) depending on the level of theory used. Considering that the necessary precursor to LFEO formation, Li+EC, should be present in abundance during early SEI formation, this implies that LFEO could actually be a significant product during early SEI formation at low potentials.
The formation of bi-dioxolylidene is predicted to be less kinetically favorable than that of LFEO. The dimerization reaction has an energy barrier that is considerably higher than many major SEI formation pathways,41,51–54 implying that bi-dioxolylidene should not be a significant product. The formation of the carbene monomer, on the other hand, is plausible. Using the Eyring equation,55 the addition of CO2 to Li+EC2− with a 0.20 eV barrier has a predicted rate coefficient roughly 70 times lower than that of the shoulder ring-opening of Li+EC2− with a 0.09 eV barrier. However, dioxolylidene formation could be significant if CO2 is abundant, a plausible scenario considering that CO2 can form at either the anode56 or the cathode57 in Li-ion batteries. On this basis, we predict that while LFEO, Li+OCH2CH2O2−, and CO will be favored, dioxolylidene should at least form as a short-lived minority intermediate (considering the general reactivity of carbenes).58
In the mechanism for LFEO formation identified in Fig. 4a, a byproduct is the deprotonated EC species Li+EC–H−. We suspected that this byproduct would be highly reactive and would likely decompose. Indeed, we found (Fig. 5a) that Li+EC–H− can open at the waist bond with an extremely low barrier (ΔG‡ = 0.02 eV), forming lithium vinyl carbonate. We note that lithium vinyl carbonate has previously been identified as an SEI product,46 though its formation mechanism has not been thoroughly studied. Therefore, not only is the formation of LFEO plausible on the basis of the low reaction barriers identified, but LFEO formation can potentially help to explain the formation of another SEI product.
We also considered the reactivity of the dioxolylidene carbene (Fig. 5b). In addition to the dimerization shown in Fig. 4b, we found that dioxolylidene could react in two other ways, either decomposing in a single step to form CO2 and C2H4 or decomposing to Li+CO2− and C2H4via a two-step process after coordination with Li+ and reduction. All reactions identified – dimerization and both decomposition mechanisms – involve relatively high energy barriers. Our understanding of the role of dioxolylidene in SEI formation remains incomplete, and further work must be done to elucidate its decomposition routes. However, if the barriers to dioxolylidene decomposition were lowered by e.g. a solvent effect59 or a reactive surface,60 the possibility exists for a catalytic loop in which dioxolylidene is repeatedly reformed via the reaction of CO2 with Li+EC2− or Li+CO2− with EC−.
When PES exploration can be used to generate CRNs, new intermediate and product species can be identified automatically via reactions starting from known species. Likewise, when templates can be used, new species are generated by repeatedly applying reaction motifs to provided or previously generated reactants. As we have explained, neither PES exploration nor templates can presently be applied to electrochemistry. Instead, HiPRGen currently requires an input set of species containing all molecules to be included in the CRN and their calculated properties. For applications to Li-ion SEI formation, where the LIBE dataset of molecular thermochemistry already exists,36 this is not a significant limitation. More generally, the reliance on a pre-computed dataset of species is problematic, as it potentially creates a high computational barrier to entry for the user and limits predictive discovery of novel species and pathways they participate in. We are actively working to overcome this requirement, developing a framework to automatically generate CRNs from only a set of known starting molecules by leveraging calculated network products to strategically guide iterative network expansion and machine learning (ML) to reduce the number of required high-throughput DFT calculations.
An additional improvement could be made by incorporating reaction kinetics into CRN generation and stochastic analysis, as is common in the study of CRNs. We have shown that reaction thermodynamics alone can be sufficient to predict reasonable reaction pathways as well as network products, yet a network based on only thermodynamics certainly contains many reactions that are kinetically limited and therefore not important in practice. Although there are not currently any methods capable of predicting energy barriers or rate coefficients for solution-phase electrochemical reactions without relying on expensive electronic structure methods (which cannot be applied to millions of reactions), we believe that ML could eventually provide sufficiently accurate estimates to realistically constrain network construction at minimal cost given sufficient training data.61
To correct for insufficient metal ion stabilization, we optimized Li+ECn clusters at the ωB97X-D/def2-SVPD/PCM//ωB97X-V/def2-TZVPPD/SMD level of theory, with n ∈ 0, 1, 2, 3, 4 to estimate the stabilizing effect of each solvent molecule on Li+. The lower level of theory (ωB97X-D/def2-SVPD/PCM, ε = 18.5) was used for optimization due to the considerable computational cost of optimizing large clusters. We found (see ESI†) that each EC stabilized Li+ by ∼0.7 eV.
During reaction network construction, we consider two free energies for each species: one uncorrected, and one solvent-corrected. The uncorrected free energy is taken directly from LIBE. For the solvent-corrected free energy, we count the total number of coordinate bonds to all Li+ ions (see the ESI† for a description of the method for deducing metal coordination) and compare this to the maximum expected number of coordinate bonds (assuming that each Li+ would prefer to be coordinated by four neighbors).64 If any Li+ are undercoordinated, then the free energy is lowered by 0.68 eV for each “missing” coordinate bond. When calculating redox free energies, the uncorrected free energy is used; otherwise, the corrected free energy is used (see the ESI†).
• Molecules composed of two or more disconnected fragments.
• Metal-centric complexes, where two or more non-metal fragments are connected only by coordinate bonds to Li+.
• Molecules containing neutral or negative metal ions, where the charges are calculating by applying the natural bonding orbital (NBO) program version 5.0 (ref. 65 and 66) to a single-point energy calculation at the ωB97X-V/def2-TZVPPD/SMD level of theory in Q-Chem.
• Molecules with charge less than −1 or greater than 1.
In addition to these filters, which define types of molecules to be excluded from the final network, we further reduce the molecules in the network by removing redundant species. In LIBE, all molecules are unique based on the combination of their charge, spin multiplicity, and molecular connectivity. This means that there could be several molecules that differ only by spin multiplicity, or that differ only by the coordination environment of Li+ ions (what we call “coordimers”). When this occurs (when there are multiple molecules with the same covalent connectivity and charge but potentially with different coordination environments or spin multiplicities), we include only that species with the lowest solvation-corrected free energy in the final filtered set of species Sfiltered.
These filters are explained further in the ESI.† We emphasize that these filters are particular to the chemistry being studied in this work, but that HiPRGen has been engineered to enable straightforward addition, removal, or modification of filters in order to be easily applied across diverse chemical applications.
• Endergonic reactions with ΔG > 0 eV. The reaction free energies for non-redox reactions were taken as , where GC is the solvation-corrected free energy, m is the length of the product collection, and n is the length of the reactant collection. For redox reactions, ΔG = G0product − G0reactant + Δq(Ge), where G0 is the uncorrected free energy, Δq is the change in charge of the reaction, and Ge is the free energy of the electron (for +0.0 V vs. Li/Li+, Ge = −1.4 eV; for +0.5 V vs. Li/Li+, Ge = −1.9 eV).
• Reactions involving a charge change |Δq| > 1.
• Redox reactions involving more than one reactant or more than one product.
• Unimolecular dissociative redox reactions in which |Δq| > 0 and covalent bonds form or break.
• Reactions involving more than two reactants or products (this is actually enforced by the species bucketing procedure of HiPRGen and is not a separate filter).
• Reactions involving spectators (species that do not change as a result of the reaction) e.g. A + B to A + C.
• Reactions involving more than two bond changes.
• Reactions in which two bonds form simultaneously or two bonds break simultaneously.
• Reactions in which covalent bonds change and metal ions coordinate/decoordinate (note that reactions in which metal ions remain coordinated but change their coordinate bonds are allowed).
Motivations for each of these filters, along with examples, are provided in the ESI.† Like species filters, the reaction filters can be easily modified and extended by end users to suit a broad range of chemical applications. Of course, removing filters will yield a larger final collection of reactions. We note that for the size of the species collection presented in this work, some filters are necessary to obtain a tractable number of reactions in the final collection. For thousands of species, it is further necessary to filter reactions in parallel and for each filter to be computationally efficient in order to allow filtering to complete in a reasonable amount of time (hours to days).
Using RNMC, we performed 100000 simulations under each of the four chosen conditions (+0.0 V without CO2, +0.0 V with CO2, +0.5 V without CO2, and +0.5 V with CO2). For simulations without CO2, the initial state consisted of 30 Li+ and 30 EC; for those with CO2, the initial state also included 30 CO2. Because all reactions were exergonic and no energy barriers were considered, all rate coefficients were constant and equal (discussed in Stochastic network analysis). Each simulation was conducted to “completion” – that is, until there were no further reactions available for further simulation. Due to the relatively small number of initial species, most simulations took between roughly 200 and 500 steps. We reiterate that simulating to completion – especially with so few simulation steps – is only possible because the system contains only exergonic reactions and therefore contains no loops. The elimination of loops is critical to adequately sample the reactive space in a tractable number of simulations (see discussion in Stochastic network analysis).
In general, we are not interested in a single reaction pathway but rather the myriad pathways to the species of interest. Therefore, for each species of interest, we repeat the pathway identification procedure above for each trajectory, collecting all unique pathways. We then rank these pathways by some cost function. Here, the cost Φ of a given reaction is defined as Φ = exp(ΔG/kBT) + 1, where ΔG is the reaction free energy (uncorrected for a redox reaction, and solvation-corrected otherwise).25 The total cost of a reaction pathway is the sum of the costs of the individual reactions. We note that, because all reactions included in our network are exergonic, the constant term tends to dominate, though this cost function retains a preference for highly exergonic reactions over those that are only slightly exergonic.
To determine the ratio of formation and consumption, each trajectory was interrogated to find all reactions involving each species. If a given species is a reactant of an identified reaction, then that means it was consumed; if it is a product of the reaction, then that means it was formed. If the ratio of the total number of instances of formation across all trajectories to the total number of instances of consumption across all trajectories is greater than some threshold (here chosen as 1.5, meaning that three of the species are produced for every two consumed), then it could be a network product.
For relative accumulation, we take the average of all trajectories. The expected value of a species is the average of the final state – how many of the molecule will persist once the average simulation has completed. If this expected value is greater than some threshold (here 0.1, meaning that one of this species is produced and is present in the final state for every ten simulations), then that species could be a product.
Finally, for those species with formation/consumption ratios and expected values that pass the chosen thresholds, we perform pathfinding analysis. If the pathway with the lowest cost has a cost less than some threshold (here 10.0), then we consider the species to be a product of the network.
The species reported in Fig. 3 are network products in at least one – but not necessarily all – of the four conditions considered (see ESI† for details). We note that we add one additional constraint to the products reported here: spin multiplicity. While open-shell species can be products of the network, they are highly unlikely to be stable or meta-stable (long-lived radicals are generally rare). In the hopes of extracting useful chemical insights from network products, we therefore only consider network products that are singlets.
In cases where AutoTS was unable to find a TS for a given reaction, we searched using the single-ended growing string method (SE-GSM), as implemented in the pyGSM code.73 SE-GSM calculations were conducted with a Q-Chem backend (version 5.3.2) at the ωB97X-D/def2-SVPD/PCM level of theory.74 To be as consistent as possible, TS found using SE-GSM in Q-Chem were re-optimized in Jaguar at the ωB97X-D/def2-SVPD(-f)/PCM level of theory.
For each TS, we confirmed that the optimized structure possessed one imaginary frequency and confirmed that it connected the expected endpoints. For cases where the endpoints consist of two molecules that are not covalently bound (typically bound only by coordination to Li+), we allow small imaginary frequencies (less than 75i cm−1). These small imaginary modes can prove extremely difficult to remove using conventional geometry optimization methods, especially when they involve the motion of Li+ ions, and typically do not significantly affect the free energy. We note that in some cases, the barriers that we report are based on the difference between the TS and the reactants or products at infinite separation, rather than the entrance or exit complex. The electronic energies of all optimized structures (TS and endpoints) were corrected using a single-point calculation at a higher level of theory (ωB97X-V/def2-TZVPPD/SMD) in Q-Chem. The SMD parameters used were the same used for the construction of the LIBE dataset.36 We note that we used Q-Chem for these calculations, rather than Jaguar, because the SMD implicit solvent model is not implemented in Jaguar at the time of this writing.
All AutoTS and pyGSM calculations were automated using workflows that we have implemented in the MPcat code (see Code availability). These workflows are designed for high-throughput transition-state searches and reaction pathway analysis. Note that we use a fork of the original pyGSM code for SE-GSM (see Code availability).
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2dd00117a |
‡ These authors contributed equally. |
This journal is © The Royal Society of Chemistry 2023 |