Zisheng
Zhang
*abc,
Winston
Gee
a,
Robert H.
Lavroff
a and
Anastassia N.
Alexandrova
*a
aDepartment of Chemistry and Biochemistry, University of California, Los Angeles, California 90095-1569, USA. E-mail: ana@chem.ucla.edu
bSUNCAT Center for Interface Science and Catalysis, Department of Chemical Engineering, Stanford University, Stanford, California 94305, USA. E-mail: zishengz@stanford.edu
cSUNCAT Center for Interface Science and Catalysis, SLAC National Accelerator Laboratory, Menlo Park, California 94025, USA
First published on 9th December 2024
Restructuring of surfaces and interfaces plays a key role in the activation and/or deactivation of a wide spectrum of heterogeneous catalysts and functional materials. The statistical ensemble representation can provide unique atomistic insights into this fluxional and metastable realm, but constructing the ensemble is very challenging, especially for the systems with off-stoichiometric reconstruction and varying coverage of mixed adsorbates. Here, we report GOCIA, a versatile global optimizer for exploring the chemical space of these systems. It features the grand canonical genetic algorithm (GCGA), which bases the target function on the grand potential and evolves across the compositional space, as well as many useful functionalities, with implementation details explained. GOCIA has been applied to various systems in catalysis, from clusters to surfaces and from thermal to electrocatalysis.
Molecular dynamics (MD) based methods, when combined with enhanced sampling techniques3 and/or machine learning interatomic potentials,4,5 have become a powerful tool to model many dynamical behaviors in catalysis. However, they typically focus on the potential energy landscape of a fixed-composition system and hence are often insufficient for exploring the chemical space of off-stoichiometric restructuring systems with a fluctuating composition and without any well-defined collective variable.
Another approach is to revise the representation of a catalyst as a statistical ensemble of catalyst states instead of a single or a few selected structures.6,7 By extending to a grand canonical (GC) ensemble representation, all reaction-relevant global minimum (GM) and local minimum (LM) catalyst states with varying geometry and composition (including both the surface itself and adsorbate/adatom coverage) can be included in the representation, with their individual contributions to reactivity or spectroscopic signals properly evaluated.8 By probing the response of GC free energetics of the states to external factors (i.e., reaction conditions), the ensemble becomes condition-dependent in nature and can be used to understand and predict structural evolution during operation9 or to better simulate properties or spectra by ensemble averaging.10
Despite the simplicity of the ensemble representation theory, obtaining such an ensemble – including the ab initio thermodynamics of all relevant surface phases – is rather computationally challenging.11 The difficulty lies in exponentially growing chemical space of off-stoichiometric restructuring versus the system size and number of elements. Indeed, constructing a realistic ensemble requires inclusion of all relevant states, which means searching extensively the global and local minima on the potential energy surface (PES), for all relevant stoichiometries. Note that the global optimization minima search at the density functional theory (DFT) level, even for small clusters with a fixed composition, is highly nontrivial.12,13
A recently emerging family of GO techniques is to directly use the grand canonical free energy (Ω, also named grand potential), which is a function of the system's composition at a given set of chemical potentials, as the target function of the minima search. This allows for GC global optimization, in which the stoichiometry is also treated as a set of discrete variables to optimize. In this way, we do not need to extensively sample each possible stoichiometry in a grid-search fashion, but can efficiently sample into relevant stoichiometries on the grand canonical free energy surface (FES) and produce a distribution of stoichiometries in the resultant states. Due to the discrete nature of the compositional degrees of freedom, the fluctuating system size (not on a single PES), and the unavailability of analytical Hessians in plane-wave electronic structure codes, it is not straightforward to adapt many global optimization algorithms that are previously successful for canonical minima search of clusters and crystal structures,14–18 for ab intio GC global optimization of surface systems under periodic boundary conditions (PBC). In recent years, there have been some successful applications of GC treatments to cluster or surface systems, within algorithms such as GC basin hopping (BH),19,20 GC Monte Carlo (MC),21 and the GC genetic algorithm (GA).22,23 However, in the context of PBC surface systems and ab initio minima searches, the available algorithms are usually tailored for a specific set of systems or components, considering either cluster isomerization, surface reconstruction, or adsorbate configurations, but not all of them. A derivative-free and PBC-compatible GC global optimizer that addresses all mentioned aspects of interfacial complexities has been lacking.
This article is aimed to introduce our recent efforts in developing a global optimizer of clusters, interfaces, and adsorbates (GOCIA)24 – a versatile Python package featuring GC global optimization of off-stoichiometric restructuring surface systems at the ab initio level – with a detailed dissection of its components, and to showcase its previous successful applications, applicability, and a roadmap to future developments.
The main feature of GOCIA is the grand canonical genetic algorithm (GCGA) which can efficiently explore the relevant regions in the chemical space of varying compositions, by using grand canonical free energy as the search target, and it eliminates the need for grid search for every possible composition. Built on the basis of a gradient-embedded GA,12,13 the GCGA can achieve extremely efficient exploration of geometric and compositional space, as compared to MD- or Monte Carlo (MC)-based approaches, and yield a GC ensemble of exclusively minima states.
GOCIA was initially built to handle amorphous layers without directional or well-defined bonding modes, where every atom in the sampling region was allowed to form any type of bond (as individual adatoms). A recent update enabled our implementation of the GCGA to handle the coverage of polyatomic and mixed adsorbates while maintaining their intactness, which is rather relevant to study the reaction intermediate-relevant surface phenomenon and the complex interplay between surface atoms and multiple types of surface species.
The random structure generator of GOCIA, whose primary role is to make the initial population for the GCGA, can also work as a good one-shot sampler for smaller systems such as smaller subnanometer clusters supported on surfaces and adsorbate configurations at low coverage.25
GOCIA also provides a toolkit and a streamlined workflow for grand canonical density functional theory (GCDFT) calculations using the surface charging approach. This is useful for sampling of electrified interfaces, such as those used in electrocatalysis.
Every mentioned component of GOCIA is highly versatile and can be customized to meet a broadness of needs in the areas of catalysis, materials science, surface science, and so on.
GOCIA has been applied to study the structure, reactivity, and spectroscopy of many surface systems ranging from clusters to amorphous over-layers and to reconstruction of crystalline metal electrodes, in thermal- and electro-catalysis.26 A few representative systems shown in Fig. 1a–c are: fluorine-doped tin oxide (FTO) supported Ptn (n = 1–8) clusters under varying H coverage during the electrochemical hydrogen evolution reaction (HER);10 partial boron oxide/hydroxide over-layer formed on hexagonal boron nitride (hBN) under conditions of oxidative dehydrogenation of propane (ODHP);27,28 restructuring of crystalline Cu facets induced by H and CO coverage under CO2 reduction reaction (CO2RR) conditions.29 Other notable applications include restructuring of Cu under acidic HER conditions,9,30 metal–support contact angle of small nanoparticles (NPs),25 and the structure of amorphous nickel oxide/hydroxide on the Pt surface.31
The interface class is based on the atom class (from the ASE module32) with some additional structure-related metadata as is illustrated in Fig. 2. There are two atomistic parts within an interface object, a constrained region and a relaxed region. The constrained region is usually the bottom few layers of the slab and can mimic the behaviour of the bulk. The relaxed region is the part of the surface that can interact with the external environment but cannot change its own composition, usually the top few layers of the slab or supported surface species such as subnanometer clusters or adatoms.
The user would also need to define a rectangular sampling box (by the coordinates of its vertices) which intersects with the top few layers of the relaxed region. Compositional changes are only allowed within the sampling box.
In the case of sampling polyatomic adsorbates, one would also need to supply a list of atomic indices for each adsorbate, so that GOCIA can keep track of the connectivity and make sure that every adsorbate is intact during the local and global optimizations, with a similar practice to ref. 13.
A number of useful functions are built-in under the interface class for easy access, modification, and geometric analysis of each individual component.
The project database file (a SQL database in the ASE format) stores all optimized structures along with their metadata (calculator, energy, magnetic moments, fragment lists, labels, population information, etc.), to allow for easy query and manipulation.
All structures in a global optimization search share the same definition of the constrained region, the relaxed region, and the sampling box. These information are stored in a substrate.vasp file (it can be in any format that supports periodic structures with constraints) which is one of the required input files.
The other variables needed to set up a global optimization run, such as the dictionary of chemical potentials, control parameters of GA, and paths/commands to initiate software, can be provided as a separate input.py file in the main directory or included in the main “manager” script (vide infra).
Depending on the job requirements and queuing policy of the high performance computer (HPC), GOCIA users can choose from two different workflows: (i) if the HPC allows submission of a large number of small jobs from a single user: submit a manager job of long wall time, as a single-core process on the login node or interactive session (the manager sleeps periodically and is not resource intensive at all). The manager job will automatically make and submit many worker jobs, each performing a series of local optimization calculations on a structure to which the worker is assigned. The manager will check the queue constantly and resubmit a new worker job if an old one has finished (Fig. 3a). (ii) If the HPC strongly encourages large jobs by measures such as limiting the wall time of smaller jobs: use the multiprocessing module of Python to maintain a pool of many worker processes. The main script will automatically spawn a new worker process to the idle nodes whenever an old one has finished. This should be submitted as a single large bundled job (Fig. 3b).
Abstraction, such as treating surface adsorption configurations as lattices or graphs, and describing adsorbate coverages in a mean-field manner, could help reduce the dimensionality of the problem. However, these abstraction will only hold when the surface itself is relatively rigid regardless of the adsorbate/adatoms on it. In other words, the coupling between surface species coverage/configuration and the arrangement of surface atoms is negligible. This might be actually the case for some systems, but it is quite dangerous to assume so universally, with the growing collection of reports on non-trivial restructurings of surfaces and clusters.34 For the latter, there exists no shortcut.
The picture further complicates when we allow the composition to vary—the system becomes a collection of many constant-composition potential energy hyper-surfaces, each with different dependence on external factors/conditions. Let us consider a system where the number of X and Y atoms, nX and nY, can vary. In the discrete compositional space, each grid point defined by (nX,nY) entails a full PES.
The most straightforward approach to explore this chemical space is the grid search (Fig. 4a)—performing a canonical global optimization on the corresponding constant-composition PES of each (nX,nY). This approach would in theory yields the most uniform sampling distribution over the whole chemical space; however, it is extremely inefficient as the vast majority of the (nX,nY) grids are in the irrelevant regime to the ambient or operating conditions of the catalyst. In addition, the compositional space is infinite, and the initial definition of the grid (i.e., the upper and lower bounds) is arbitrary.
Stochastic sampling into random compositional grids, using techniques such as the bond length distribution algorithm (BLDA),35 can provide a bird's-eye view of the GC free energy landscape at a very low cost (Fig. 4b). For smaller systems, the one-shot stochastic samples may sometime suffice as a (sub-)ensemble.36 However, for larger systems, it is as inefficient as the grid search approach because, again, the majority of the compositional space is catalytically irrelevant.
To steer the search towards the relevant regions in the compositional space, one can adopt the GC free energy Ω, within the GC ensemble (μVT), as the basis of the target function. The composition is then treated as an additional set of variables to optimize. In a typical iterative GC global optimization search, the initial stochastic samples inform the searcher about the “promising” regions, and the search direction is adaptively updated throughout the search to sample denser and denser into the relevant minima regions (Fig. 4c). We illustrate the power of the GCGA with the example of amorphous non-stoichiometric BxOyHz over-layer on borides (Fig. 1b). The GCGA, even if starting from a bad initial guess of compositions, can evolve self-adaptively and progressively toward the final GM region and discover many low-lying regions along the way (Fig. 5a). As a result, the GCGA significantly outperforms BLDA in sampling both GM and low-lying LMs for this complex ternary B–O–H system (Fig. 5b).
![]() | ||
Fig. 5 Results from ref. 27 showing (a) the progressive evolution of the population across the B–O–H compositional space and (b) the comparison between the GCGA and BLDA in sampling low Ω structures. |
![]() | (1) |
![]() | (2) |
![]() | (3) |
μ is a function of reaction conditions such as temperature, partial pressure, concentration, pH, and electrode potential. For example, the corrected chemical potential of H, , can be expressed as follows:
![]() | (4) |
It is noted that the calculation of μ for some elements or species can be less straightforward for a lack of appropriate reference state and/or the limitation of the electronic structure method. The calculated μ can be off by up to a few hundred meV from the realistic condition, and in some cases, one may only be able to estimate a relevant range of μ for a specific species. In these cases, it is advised to perform multiple searches at various μ values in the relevant range, so as to gain a broader distribution of stoichiometry. If there are prior experimental information on the surface composition or adsorbate coverage, one may also vary the μ on a sub-ensemble (from one-shot sampling or an unfinished search) and probe the response of the GM stoichiometry by using the ensemble analysis functions provided by GOCIA (vide infra). This will help narrow down the μ window relevant to the experiment.
Each GC global optimization run would yield likely a multi-modal distribution of stoichiometries (Fig. 6, left panel). The number of modes and the width of the distribution can be highly system dependent, so it is recommended to always check the stoichiometric distribution in the final ensemble merged from multiple searches—they should ideally join and have a more or less uniform density over the stoichiometric space of interest (Fig. 6, right panel). If there is any discontinuity, then more sampling is deserved at its corresponding μ values. After sufficient sampling, the final merged ensemble can be used for further analysis at any μ within the interpolated range among the sampled μ values.
![]() | ||
Fig. 7 The workflow of the GCGA evolution process and the iterative multi-stage local optimization process implemented in GOCIA. |
GOCIA offers three types of structure generation methods to construct the over-layer or place adsorbates on the base surface: (i) growth sampling: it first randomly selects an existing atom from the relaxed region. A random unit vector will be generated to be the direction of the “growth”. The adatom/adsorbate is then aligned to the “growth” direction and placed along it, with the selected surface atom as the starting point. The distance between the adatom/adsorbate and the selected surface atom is then sampled from the bond length distribution algorithm (BLDA),35 based on the covalent bond radii of the two atoms that should form the surface-adsorbate bond. This methods can generate new structures with the most reasonable interatomic distances with high efficiency, but it may fail for some corner cases where the growth direction is ambiguous, such as the interface between a large cluster and the surface, or when the surface is already quite crowded with adsorbates. (ii) Box sampling: it directly makes attempts to place adatoms/adsorbates into the sampling box with random positions and orientations. Since it is less dependent on the surface structure, it works well on cases with irregular shapes and morphology, non-directional and multi-center bonds, and very crowded surfaces. It should be noted that this method can also be used to generate molecular packing structures, such as the micro-solvation slab,37 by applying connectivity constraints and expanding the sampling box. (iii) Graph sampling: this method constructs a connectivity graph of the top surface layer, and then identify the atop, bridge, and hollow sites using the NetworkX module.38 Adsorbates are then added to the identified sites with random rotations. It should be noted that this method expects well-defined lattices and works the best for exploring adsorption on unrestructured surfaces or just to enrich the initial population.
In all three methods, the interatomic distances of attempted geometry are checked to avoid bad contacts. The user may also opt to check the similarity of a new structure with already generated structures to prevent duplicates in the very beginning. GOCIA also offers many user-defined constraints such as bonds that must (or must not) form, upper and lower limits of the coordination number, and whether the added adatom/adsorbate can incorporate into the relaxed region or must stay above. If multiple types of adatoms/adsorbates are to be added, the list can be randomly shuffled before addition to prevent biases from the original ordering. The process iterates until all adatoms/adsorbates have been added to the sampling box while satisfying all geometric and connectivity constraints.
It should be noted that although the GA is not very sensitive to the initial population, the number of sample evaluations before locating the GM or, in other words, the success rate to locate the GM within a certain number of sample evaluations, can depend on the initialization. If some knowledge of the structure is available, starting from a close enough putative structure can save a lot of node-hours. But even without any prior knowledge, the GCGA can still evolve to the ground truth structure if the initial population is diverse enough, although at a higher cost.9
If the user wishes to more extensively sample the LMs, it is recommended to run multiple GCGA searches with different initial populations in terms of geometry and composition. Hereby, the multiple searches would start from different regions in the chemical space and the sample along different paths on its way to the GM.
To ensure an aggressive and progressive sampling, which underlies extensive and delocalized exploration of the chemical space, oftentimes one would allow some unphysical connectivity or interatomic distances to form in the random structure generation. For electronic structure calculators, this may cause slow-down (or even failure) of the self-consistent field (SCF) cycle or force convergence to the initial steps in the local optimization.
A remedy to this problem is to perform a pre-optimization at a lower level of theory before the structure is fed to the electronic structure calculator. GOCIA currently supports Hookean and Lennard-Jones potential as the calculator for the preoptimization. Any code for the geometric adjustment can be interfaced to GOCIA as the pre-optimizer via the ASE calculator class.
To reduce the overall computational expense, we adopt a multi-stage local optimization strategy (Fig. 7), where each stage has a different level of precision and convergence criteria, from computationally cheaper to more expensive. In this way, we can rationalize the structure in earlier and cheaper stages and bring the structure closer to its local optimum, before the final stage of higher precision for production. The flowchart illustrates a 3-stage scheme, but the user may reduce or increase the number of stages if needed.
Since electronic structure calculators do not intrinsically constrain connectivity (bonds are determined quantum mechanically), some unwanted motifs or bonds may form during the local optimizations. GOCIA also offers an iterative local optimization scheme which checks the geometry for undesired connectivity after each stage. If any unwanted substructure is detected, GOCIA would modify the structure to meet the constraint and call for another multi-stage local optimization. This again goes iteratively until convergence of the connectivity (Fig. 7, right). Currently, GOCIA supports the following connectivity constraints: (1) make sure where is no desorbed species that is not connected to the slab, (2) remove any atom that is outside the sampling region, (3) force all adsorbates to directly form bonds with the surface, (4) remove fragments that are not intact, (5) prevent bonds between fragments, and (6) remove atoms that are not involved in a specific type of bonds. Each connectivity constrained can be switched on and off or modified easily. Users can also define their own constraints (geometric or compositional) inside the worker script to archive the unwanted structure, terminate the job, or modify the structure and send it back for re-optimization. This iterative local optimization scheme is one of the main features of GOCIA enabling investigation of a complex adsorption system, and it can be easily interfaced with other GC global optimizers.
![]() | ||
Fig. 8 Crossover of two parent structures to produce a child structure, with an illustration of possible mutation operations. |
In the GC crossover process, the parent structures are split-and-spliced along the same cutting plane. In the case of any bad atomic contact, the one whose center is closer to the cutting plane would be preserved, while the farther one would be removed. As a result, the composition of the child structure can be naturally different from its parents (analogous to chromosomes). In the case of polyatomic adsorbate, the bridle atom (via which the adsorbate is supposed to bind to the substrate) is viewed as the center of the adsorbate.
In the GC mutation process, GOCIA offers the following operators: (i) adding an atom/fragment, (ii) deleting an atom/fragment, (iii) moving a random atom/fragment to a random empty site, (iv) rattling the surface atoms along random vectors drawn from a normal distribution, (v) translating the buffer slab along a x or y axis by a fraction of the cell length, and (vi) permuting a random fraction of the buffer slab. If an offspring is too similar to its parent, its mutation rate is increased to 100% to avoid recalculating the same structure.
In the selection process, an over-mating penalty factor of 1 + (Nmate)−3/4 is multiplied to the grand canonical free energy-based fitness factor. Here, Nmate is the mating counts, and it penalizes the candidates that have mated too many times to diversify the population. Similarity checks against the current population are performed before adding any new candidate to remove duplicates. Adopted mutation operations include: upon the addition of each offspring to the population, the candidate with the lowest fitness is archived to maintain the population size.
We note that the theoretical framework of GC ensemble representation and many of the features of GOCIA presented herein (Sections 4.1–4.6) are universal and not confined to usage with GA. We chose GA to be the weight lifter in our past works due to: (i) its well-benchmarked capability in locating GM for a wide spectrum of systems with little hyper-parameter tuning,42 (ii) its algorithmic simplicity and hence high extensibility and customizability, (iii) GA naturally handles variable-length configurations (analogous to chromosomes) which well suits the GC ensemble sampling tasks, (iv) it avoids the use of Hessians which is very expensive to compute numerically with ab initio methods, and (v) it allows for “unphysical” structural operations (Fig. 8) to take very aggressive leaps across chemical space instead of small local steps bound by finite temperature criteria and inertia. We are open for other more sophisticated alternatives to GA, but, for now, our focus is on improving the GC capabilities, by designing new target functions and mutation operations, within the GA scheme.
After duplicate removal, the unique structures in the ensemble would be sorted by GC free energy and written to a new database which contains all essential metadata from the search. The database file can be used for statistical analysis or computing ensemble-average properties. GOCIA would also report an oversampling ratio which reflects how extensively the chemical space has been sampled. A low oversampling ratio suggests that the sampling is far from extensive, while a high oversampling ratio often means that the search is extensive enough.
The evolution trajectory of a GCGA run, although bearing no physical meaning in a strict sense, contains many useful information. GOCIA offers scripts for tracking the progress of the GCGA by plotting Ω versus the number of samples on-the-fly. It can easily visualize the key new GM's in the search history and if there is a good sign of convergence. It can also inform if there is any sign of significantly restructuring, usually characterized by an apparent dive of the population's Ω to a much lower value and remains there, without the need to inspect each structure in the trajectory.
GOCIA also stores the inheritance information of each candidate in the database. To be specific, the identity of each candidate's parents and the type of mutation (if any) that it went through. GOCIA offers scripts that can track the lineage of any candidate and plot its family tree. This can inform putative pathways via which the restructured GM may arise from pristine structures, and which mutation operations are the most effective for the system of study.
GOCIA offers a GCE class for ab initio thermodynamic analysis of the GC ensemble database. But before anything, an important thing to check is the distribution of stoichiometries. The GCE class offer functions that can cluster the minima into separate groups of the same stoichiometry. By plotting statistical histograms, one can learn about the density (counts) of samples for each stoichiometry, which informs whether the samples cover a continuous range in the chemical space which is the prerequisite for further analysis with interpolated μ values. By calculating the structural similarity metric (the same as in Section 4.5) with respect to a few reference structures, one can also group the samples by their restructuring patterns and check their sampling density.
Within each group, it is straight forward to extract the low-energy local minima (LELMs) as a relevant sub-ensemble, which can be used for further refinement at a higher level of theory or with additional treatments such as solvation and GCDFT. A recommended energy cutoff relative to the GM of each group is 10 kBT; however, one should always check if the relative energies of the LELMs would reorder at a different level of theory, and there may be a need to use a higher cutoff.
The GCE class offers functions for the easy calculation of Ω and Boltzmann population, p, of any states within the ensemble at a specific μ or a series of μ values (Fig. 9a–c) by:
![]() | (5) |
![]() | (6) |
In the cases where the Boltzmann statistics fail, the ensemble can still serve as an ab initio thermodynamics database for kinetics simulations, as it well covers the relevant LELMs. The combination of global optimization and quasi-kinetic MC simulation has been used to study the off-equilibrium structural evolution such as Ostwald ripening of sub-nano clusters36 and surface roughening of Cu electrodes during the CO2 RR.29
![]() | (7) |
We can then replace the electronic energy terms (EAB and EB in eqn (3)) with the resulted Ωel(ϕ). In this way, we can eventually obtain the potential-dependent total GC free energy, Ωtot, with respect to all adatoms/adsorbates as well as electrons:
![]() | (8) |
We note that the described treatment relies on many assumptions including (i) a constant interfacial capacitance, (ii) no dramatic potential-dependent geometric changes, (iii) the minima structures are obtained under constant-charge conditions, and the electronic degree of freedom is added a posteriori. A rigorous GCDFT treatment would require all samples to be locally optimized under constant-potential conditions, which is technically doable37,45–48 but computationally too expensive for a very large number of samples in a typical GC global optimization. Our a posteriori approach has been a successful compromise for metallic systems with simple adsorbates such as H and CO.9,10,29 For surfaces with high polarity and larger flexible adsorbates,49 the constant-potential treatment may be necessary during the search, and we are working on making it more affordable.
Future developments of the GOCIA would include: (i) varying the chemical potentials (corresponding to reaction conditions) during the search. The “scan rate” can be adaptive and depend on how extensive the local chemical potential regime has been sampled. This can be useful in identifying the critical conditions where there is a switch in thermodynamic GM. (ii) Symmetry-based operations and substructure representations, which may accelerate the convergence for some systems where the bonding is more directional and coordination patterns, are more well-defined.17 (iii) Motif-based operations, which can keep track of energetically favorable structural motifs during the search and include them in later structure generation steps, similar to ref. 13 but covering periodic and multi-component cases. (iv) Sampling of explicit solvation layers. Some key goals are determining electrolyte hydration structures and building micro-solvation models for surface species in a more adaptive and efficient way. (v) Metrics for the extensiveness of LM exploration. Two promising options are conformational entropy50 and similarity descriptors checked against the search trajectory. They can serve as additional target functions to control exploitation (finding GM) versus exploration (discover new LMs). (vi) Incorporating Hessian-based techniques into GC schemes, such as sparse methods for efficient harmonic vibrational density calculations.19,51,52 They can boost the search efficiency when the calculators support analytical Hessians. (vii) Including the electronic degree of freedom in the GC search for electrochemical systems. This is paramount for surfaces with high polarity and/or flexible adsorbates whose geometry responds dramatically to varying potentials.
Machine learning (ML) models, especially the interatomic potentials, have undergone impressive development over the recent decade.53–56 However, in our opinion, there are still two obstacles in applying them to global optimizations. (i) Overall cost: the computational cost for generating the training data for making a good model that well covers the corner cases would be comparable to, if not larger than, that of a direct global optimization approach. (ii) Force accuracy: unlike the case of MD, global optimizations require very accurate forces (at a magnitude of a few meV Å−1) to ensure that the final ensemble contains only minima states and exclude saddle points or other structures on flat local regions of the PES. (iii) Differentiability and description of non-local effects: we look forward to further advances in ML model architectures that can enable more accurate force predictions and new features to surpass the limitations discussed before. At this time, GOCIA would still serve as an excellent generator of diverse and off-equilibrium training datasets – or it can be incorporated as an on-the-fly component into active learning workflows.
This manuscript covers the main features of GOCIA, with detailed descriptions of its code structure and the grand canonical genetic algorithm. The relevant theories are explained, and other key functionalities are introduced.
GOCIA is a highly versatile and extendable code, and it can be potentially customized to study many other systems beyond heterogeneous catalysis, such as plasma chemistry, metallurgy, batteries, environmental chemistry, surface molecular assemblies, and other functional materials. GOCIA is an ongoing effort and is open to comments and contributions from researchers in all aforementioned areas, and we hope to continue the development and implementation of community-needed features in the future.
This journal is © the Owner Societies 2025 |