Panagiotis
Krokidas
*a,
Michael
Kainourgiakis
b,
Theodore
Steriotis
c and
George
Giannakopoulos
a
aInstitute of Informatics & Telecommunications, National Center for Scientific Research “Demokritos”, 15341 Agia Paraskevi Attikis, Greece. E-mail: p.krokidas@iit.demokritos.gr
bInstitute of Nuclear & Radiological Sciences & Technology, Energy & Safety, NCSR ‘Demokritos’, 15341 Agia Paraskevi Attikis, Greece
cInstitute of Nanoscience and Nanotechnology, National Center for Scientific Research “Demokritos”, 15341 Agia Paraskevi Attikis, Greece
First published on 16th September 2024
We report a tool combining a biologically inspired evolutionary algorithm with machine learning to design fine-tuned zeolitic-imidazolate frameworks (ZIFs), a sub-family of MOFs, for desired sets of diffusivities of species i (Di) and Di/Dj of any given mixture of species i and j. We display the efficacy and validitiy of our tool, by designing ZIFs that meet industrial performance criteria of permeability and selectivity, for CO2/CH4, O2/N2 and C3H6/C3H8 mixtures.
Massive high-throughput MOF screenings in conjunction with artificial intelligence (AI) techniques, such as machine learning (ML), have proved to constitute a very powerful toolset that can extract complex correlations between the structure of a nanoporous solid family and its properties.3–6 However, even with an accurate MOF-performance correlation, designing materials for specific separations remains a trial-and-error process, though aided by better chemical intuition. In essence, while a predictive model can evaluate existing designs effectively, it falls short in suggesting new ones based on desired performance. Therefore, the design of new materials calls for the inverse direction, which is the target-property -to- MOF-structure prediction.7 There is only but a very small number of recent works reporting successful automated design of nanoporous solids driven by a desired target performance and these are limited to sorption properties,8–17 while diffusivity, which is the governing force of permeability and of overall selectivity in nanoporous membranes,18 is omitted. There is only one recently published pertinent work, that reports the inverse design of MOFs for kinetic-based separations in MOFs.19 The scarcity of extended study concerning the inverse problem, combined with the limited focus on sorption, indicate a scientific gap in the domain, since the design of materials with on-demand pre-chosen properties is regarded as a next frontier in materials science.20 In this work we present an inverse design toolkit on zeolitic-imidazolate frameworks (ZIFs), a sub-family of MOFs, that can suggest material designs for requested target diffusivity (Di, Di/Dj) values for any i/j mixture. ZIFs were chosen as the focus of this study due to their unique structural characteristics, particularly their tight pore openings, which make them highly suitable for highly challenging kinetic-based separations where the size difference between the mixture species is below 1 Å.21,22 In contrast, many other MOFs found in existing databases have limited diffusion-based separation capabilities, as their cavities are usually connected by larger openings. Additionally, ZIFs are underrepresented in high-throughput screenings and machine learning applications compared to other MOF sub-families. By focusing on ZIFs in this study, we aim to address this gap in the literature, providing valuable insights and data on a material class that could significantly improve separation performance in challenging gas mixtures.
Our method pipeline is as follows: first, we developed and assessed various ML models that can predict the diffusivity of guest molecules, in any ZIF of sodalite (SOD) topology, by using as input readily available information for the ZIF's building-units and the guests, and by taking into consideration the frameworks’ flexibility. Then, based on the best-performing ML model, we developed a genetic algorithm that can suggest new ZIFs for a user-determined separation performance of a gas mixture. Our design approach takes advantage of the ZIF building units, as components that can be replaced to change the aperture size, which – in turn – modulates the kinetics of gas penetrants in the ZIF pores (Fig. 1). We utilized a manually constructed structure database that consists of 69 ZIFs that are of fixed SOD topology, where the building units are varied. Our dataset comprises the diffusivities of 14 gas molecules ranging in size (He: 2.66 Å up to iso-butane: 4.8 Å) in all the ZIFs of our database. The diffusivities have been calculated with dynamically corrected transition state theory (dcTST), that accounted for the flexibility of the framework. The simulations were carried out with in-house developed force fields for the ZIFs and TraPPE force-fields for the gas molecules. Information about the force fields and the dcTST calculations can be found in the (ESI†).
The ML regressors examined for the development of efficient ML predictive models were: linear regression (LR), decision tree (DT), random forests (RF), neural networks (NN), as well as Gradient Boosted Tree (GBRT)23 and extreme gradient boosting regression (XGBR).24 Descriptors were based on readily available information about the linkers, functional groups, and the metal center of each ZIF, as well as information about the guest molecules. Unlike most approaches where building unit descriptors are categorical, our method uses numerical descriptors based on properties such as size and mass. This approach applies not only to the ZIF design space but also to the gases, as we use numerical descriptors for both. As a result, our method has the potential to extrapolate beyond the training set, allowing it to explore unvisited regions of the design space and predict properties for new ZIFs and gases that were not part of the original dataset. The comparison of models (Fig. 1(b)) shows that XGBR followed by GBRT, exhibit a notably better performance. Therefore, XGBR was chosen as the ML model that was employed for this work. Additional information about the dataset, the ML models, the ZIF and gas descriptors, and a table with ML models performance metrics can be found in the ESI.†
Our approach is based on the premise that the trained ML models are functions, the input of which reflect ZIF and diffusant i descriptors/features and the output is an estimate of logDi of a species i. As such, these functions can guide an optimization process, to find the best descriptors for a desired target value of logDi. To test our hypothesis, we first tried a conventional optimization algorithm, the L-BFGS-B,25 which is a local search optimization algorithm, that uses the Hessian matrix (second-order derivative). The algorithm worked sufficiently well when used to optimize the value of a simple, linear regression ML model, but fared poorly when more elaborate ML models were considered. For this reason, we used genetic algorithms (GA), which are optimizer algorithms inspired by the Darwinian theory of evolution.26 In GA, each of the structural descriptors (metal, functional group, and organic linker, shown in Fig. 1(a)) are represented as a gene (Fig. 1(c)). A unique set of genes constitutes a chromosome, which corresponds to a unique ZIF. The structure, then, is evolved at each iteration through genetic operations on the chromosome, such as crossover, mutation, replication and selection, and a set of new ZIFs are assembled, with properties that are possibly closer to the requested performance (Fig. 1(c)). Details on the implementation of our GA algorithm can be found in the ESI.†
We combined the ML regressor and the genetic algorithm into a unified tool, which we employed to design optimum ZIFs for the separation of i/j mixtures, that respect target value criteria for Di and Di/Dj for three challenging cases: CO2/CH4, O2/N2 and C3H6/C3H8. Our goal in all three cases was to design ZIFs that achieve performance beyond the boundaries set by the industry as sufficient permeability, Pi (barrer), and Pi/Pj. These sets of values are (33.7, 35.1),27 (0.83, 8.2)28 and (1.2, 35.6),29 for CO2/CH4, O2/N2 and C3H6/C3H8, respectively.
O 2 /N 2 is the most studied separation,30 and one of the toughest for membranes, since there are just a few that get close to the industrial standards; there are hardly any membranes that hold a performance level within the region of industrial interest. Our goal was to design ZIFs with DO2/DN2–10–50. We set 10−13 m2 s−1 < DO2 < 5 × 10−12 m2 s−1, since according to our findings31,32PO2 > 1 barrer corresponds roughly to DO2 > 10−13–10−12 m2 s−1 in ZIFs. CO2/CH4 is the second most investigated mixture in research for membranes separations30 due to the urgency of reducing CO2 emissions, as well as the need to remove CO2 from natural gas and biogas streams.33 According to our findings in our recent works ZIFs31,32 that exhibit permeabilities beyond the lower industrial limits (∼30 barrer) correspond to DCO2 > 10−13 m2 s−1. We have thus set the target 10−10 m2 s−1 < DO2 < 5 × 10−9 m2 s−1 and 104 < DCO2/DCH4 < 105. Moreover, this time we limited the ZIF generation routine in our GA algorithm, to construct symmetrical ZIFs (linker1 = linker2 = linker3; functional_group_1 = functional_group_2 = functional_group_3), because the search space gets rapidly crowded by proposed optimized structures for the given performance boundaries. Finally, C3H6/C3H8 reflects a separation of great industrial interest, as it is applied upon two of the most demanded commodity chemicals. Moreover, it is a highly energy intensive process, as, along with C2H4/C2H6, it accounts for 0.3% of the total energy consumption,34 and no membrane yet has demonstrated promising performance that can replace the existing cryogenic distillation methods. The top performer in this setting is ZIF-67, that has been synthesized and measured for this separation.29,35 Because ZIF-67 was present in our data, we completely removed any ZIF-67 related data, and we re-trained our XGBR model. We set target values close to ZIF-67 performance, to see whether our tool would design it. This would serve as another level of validation (experiments), besides simulations (like in the first two cases). Thus, the target values set in our design tool were 10−13 < DC3H6 < 10−12 m2 s−1 and 50 < DC3H6/DC3H8 < 200.35
Fig. 2(a)–(c) show the best ZIFs that our GA tool produced on the basis of the given boundary performances for the three separations.
From the wide collection of new ZIF designs that our tool produced (Fig. 2(a)–(c)), we chose one for each case. Table 1 shows the composition of each of these ZIFs. The third ZIF, as was stated above, is a well-known ZIF in literature, named ZIF-67, but the other two ZIFs are never-seen before, which we named Cd-I-ZIF-7-8 and dFm_Be (more information about the three ZIFs can be found in ESI,† Section 3.4.3). The goal in all three cases was to select a high-performing ZIF, considering both high selectivity and a high diffusion rate for the fast-permeating species. In the case of Fig. 2(a), Cd-I-ZIF-7-8 consistently appeared among the top performers in multiple iterations of the GA procedure. While some other ZIFs occasionally outperformed it, these were almost never the same between iterations, highlighting the randomness inherent in the GA process. We chose Cd-I-ZIF-7-8 due to its reliability in consistently performing well, even though it was not on the efficient frontier in this specific case. We reconstructed these ZIFs, developed the force fields, and equilibrated the structures with MD simulations, in the NPT ensemble (308 K, 1 bar). Then we ran fully flexible TST simulations to validate their performance. Considering the complexity of the systems, the AI tool's predictions match surprisingly well with the simulations. Especially for the case of ZIF-67, our tool is further validated by the additional comparison with the literature's experiments.35
Formula of generated ZIF | ZIF name | Species (i,j) | Performance | Validation (sims) | Validation (exp.) | |||
---|---|---|---|---|---|---|---|---|
D i (m2 s−1) | D i /D j | D i (m2 s−1) | D i /D j | D i (m2 s−1) | D i /D j | |||
Cd-I-ZIF-7-8 | O2,N2 | 1.6 × 10−12 | 28 | 1.2 × 10−12 | 50 | |||
dFm_Be | CO2, CH4 | 8.3 × 10−10 | 1.7 × 104 | 2.5 × 10−10 | 1.0 × 104 | |||
ZIFF-67 | C3H6,C3H8 | 1.02 × 10−12 | 156 | 2.0 × 10−13 | 200 | 1.5 × 10−12 | 200 | |
Ref. 35 | Ref. 35 |
Moreover, we calculated the solubilities, Si, of the mixtures’ species in the corresponding ZIFs, at infinite dilution, and we estimated the permeabilities, Pi, through Pi = Di × Si, and the resulting ideal selectivities. We plotted the results against data from literature for various membranes (Fig. 3). The Robeson plot of Fig. 3(c) shows that the performance of the new ZIFs is not only within the desired industrial region boundaries for each desired mixture separation but exhibits unprecedented separation performance when compared with competing materials. Also, the third plot serves as an additional validation of our computational approach since the experimental ZIF-6729 performance (PC3H6 = 11.7; PC3H6/PC3H8 = 84.8) is close to our predictions (PC3H6 =4.9; PC3H6/PC3H8 = 172). Information about the permeability estimation computations can be found in the ESI.†
Fig. 3 Comparison of the performance of the new designs with literature's membranes, for (a) CO2/CH4, (b) O2/N2 and (c) C3H6/C3H8. Blue data were taken from Robeson's seminal work.30 Green data for (a) and (b) were gathered from an extended literature survey and can be found in the SI, while for (c) were taken from Kwon et al.29 |
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4cp02488e |
This journal is © the Owner Societies 2024 |