Emmanuel
Ren
ab,
Philippe
Guilbaud
b and
François-Xavier
Coudert
*a
aChimie ParisTech, PSL University, CNRS, Institut de Recherche de Chimie Paris, 75005 Paris, France. E-mail: fx.coudert@chimieparistech.psl.eu
bCEA, DES, ISEC, DMRC, Univ. Montpellier, Marcoule, France
First published on 15th June 2022
Due to their chemical and structural diversity, nanoporous materials can be used in a wide variety of applications, including fluid separation, gas storage, heterogeneous catalysis, drug delivery, etc. Given the large and rapidly increasing number of known nanoporous materials, and the even bigger number of hypothetical structures, computational screening is an efficient method to find the current best-performing materials and to guide the design of future materials. This review highlights the potential of high-throughput computational screenings in various applications. The achievements and the challenges associated to the screening of several material properties are discussed to give a broader perspective on the future of the field.
Nanoporous materials can be used in a very wide range of applications, but systematically identifying the best material may seem like searching for a needle in a haystack. In fact, hundreds of thousands structures have been synthesised and possibly millions of materials are yet to be studied. A purely experimental approach, in addition to be expensive and time-consuming, would never be exhaustive to screen all these structurally and chemically diverse materials. Beyond this experimental limitation, large-scale computational screening studies can enable a more in-depth exploration of the existing materials, as well as generate novel hypothetical structures with potentially better performance. Even if the idea of this thorough exploration and the required databases of computationally-generated or experimentally-sourced structures were known for a very long time,7–9 research interest on computational screening applied to nanoporous materials has just experienced a rapid growth in the last decade (see Fig. 1). Several factors can explain this recent expansion: (1) the emergence of open databases of material structures and properties has opened the access for a growing number of scientists;10–14 (2) the advances in the in silico construction of hypothetical nanoporous materials have created new datasets to explore;15–17 (3) efficiently implemented open-source software have granted access to simulation tools for a much larger research community;18,19 (4) increasingly efficient supercomputers are now more and more available;20 (5) text and data mining have generated new databases of unreported properties from existing literature;21,22 (6) and the size of screenable databases have been increased by several orders of magnitude thanks to artificial intelligence techniques.23–26
Given the aforementioned scientific advances, computational screening, that was commonly used on small series of materials, began to be used on larger databases to identify top performing candidates, to better understand the main explanatory factors at the origin of the performance and to objectively set theoretical performance limits for a given application. Borrowing some techniques from the new field of data science, screening techniques are now applied to predict key performance indicators. These figures of merit are related to a variety of material properties such as electronic structure,27–29 chemical and catalytic activity,30–32 thermal properties,33–35 mechanical properties,36,37 transport and thermodynamic properties for adsorption.38–41
The present work is by no mean an exhaustive review of all the works on the subject, but it aims at giving nonspecialist readers a high-level overview of the potential of computational screening in a large variety of applications, and of the diversity of the different approaches used in this field of research. First, a brief survey of the development of materials databases and screening methodologies is given along with some examples illustrating the major milestones. Then, the thermodynamic properties linked to the adsorption processes are thoroughly reviewed; before moving to kinetic effects, looking at the prediction of transport properties. Finally, other aspects that differ from the adsorption process such as the computational screening of mechanical, thermal and catalytic properties are described at the end. We conclude by outlining some of the perspectives of the field.
The International Zeolite Association (IZA) gave a standardised set of 244 zeolites (in their idealized all-silica form) that can be used for screening purposes. To generate a dataset of structures, existing experimental database like the Cambridge Structural Database can be exploited. However, the raw structures determined experimentally by X-ray cannot be used directly as is. To obtain a computation-ready dataset, Chung et al. used algorithmic cleaning procedures to build the publicly available Computation-Ready Experimental MOF (CoRE MOF) database.45,46 CoRE MOF 2019 contains about 14000 MOF structures, which is the biggest experimental database. Similar approach applied to organic frameworks led to the construction of a set of 187 COFs with disorder-free and solvent-free structures.47,48
These experiment-based databases can already be used in computational screenings to retrieve valuable information, but unknown structures that are yet to be discovered are not represented. To overcome the limits and biases of experimental synthesis, artificial ways of generating nanoporous material datasets can be used, which proved to be extremely efficient. The first in silico generated database of about 130000 MOFs used a recursion-based assembly (or tinkertoy-like) algorithm to combine 102 building blocks.41 Martin and Haranczyk then proposed a topology-specific structure assembly algorithm that leverage the topological information of the structures.49 Inspired by this algorithm, topology-based databases emerged a few years later with the set of 13000 MOF structures generated using the Topologically Based Crystal Constructor (ToBaCCo) algorithm developed by Colon, Gómez-Gualdrón and Snurr.50 Later, Boyd and Woo proposed another topology-based algorithm using a graph theoretical approach and generated a 300000 structures database (BW-DB) based on 46 different network topologies.51 Similar approaches are used for other classes of materials, Deem and coworkers proposed a dataset of nearly 2.6 million hypothetical zeolite structures.52–54 However, one could wonder if these hypothetical structures are synthesisable and can remain stable under operational conditions (e.g. thermal, mechanical, radioactive constraints). To discuss their synthetic likelihood, Anderson and Gómez-Gualdrón computed the free energies of 8500 hypothetical structures and compared them to experimentally observed MOF structures.55 This type of prediction can be very useful as it enables to gauge the relative stability of each materials and to only consider the stable structures. Later, Nandy et al. performed a meta-analysis of thousands of articles associated to the CoRE MOF 2019 database to extract their experimental solvent-removal stability and thermal decomposition temperature.150 These data were then leveraged in the training of multiple ML models to predict stability; such predictions can be very useful to gauge the relative stability of each material and to restrict screening to only structures considered experimentally stable. Other types of materials have been explored, Turcani et al. published 60000 organic cage structures and used machine learning to predict their stability based on the shape persistence metric.56
The Materials Genome Initiative, a 100 million dollar effort from the White House that aims to “discover, develop, and deploy new materials twice as fast”, led to the creation of the “Materials Project”, a centralised database containing all the above mentioned structures.57–59 The fast development of this nanoporous materials genome motivated Boyd et al. to write a comprehensive review on all the initiatives on generating new data for computational analysis.60
Yet, the sole increase in size of the databases is not enough. One needs to add diversity to have more general knowledge on the maximum performance and the explanatory features of such performance. Moreover, the diversity of structures ensure the quality of the predicted best materials for a given application. To qualitatively or quantitatively assess the diversity of a database, inventive methodologies have been developed. For instance, Martin, Smit and Haranczyk proposed a Voronoi hologram representation as a way of measuring similarities between structures to generate geometrically diverse subsets of a database.61 Moosavi et al. made a comparative study of the diversity of three well-known databases CoRE MOF 2019,46 BW-DB51 and ToBaCCo50,62 using geometrical and chemical descriptors to design a theoretical strategy for generating the most diverse set of materials.63 Another approach consists in searching for similarities instead of differences in the materials by studying topological patterns in the data.64 These investigations on the data structures give a solid ground to develop novel materials by objectively defining similarity, diversity and novelty. From the analysis gathered so far, one would need to radically change the approach by proposing materials with new chemistry, topology or mechanism (e.g. flexibility) in order to significantly improve the diversity of the current databases.
With the development of a nanoporous materials genome, several articles proposed methods to screen thousands of structures. Other challenges arose, such as the design of more efficient methods than the brute force screening or the analysis of big data. Two research groups led by R. Snurr and J. Hupp began to address those questions, they used a “funnel-like” approach to efficiently screen about 130000 hypothetical MOF structures.41 To do so, they performed a first screening involving less steps of simulation on the whole dataset, then they extracted a subset of top performing structures to perform a second round with more simulation steps. This procedure is repeated until a few materials are selected by a final round of simulations with reasonable accuracy. Similar “funnel-like” procedures have then been used in other field of applications as described in the Fig. 2. This type of screening saves precious computation time by balancing the complexity of the calculation with the amount of data to be screened. The most demanding simulations or experiments are only applied to the few most promising structures. This method can rather efficiently identify top candidates, but it can't draw quantitative structure–property relationships (QSPR), beside facing scalability issues above a critical dataset size.
Fig. 2 Simplified representation of typical funnel-type screening procedures, exemplified on three different applications from the published literature. (a) Wilmer et al.41 used a series of bi-component Grand Canonical Monte Carlo (GCMC) calculations at different levels of complexity to screen a large dataset of hypothetical MOFs for methane storage application. (b) Yang et al.42 used simulations at infinite dilution to pre-screen the dataset before using computationally demanding simulations and multiple metrics to find the most promising ZIFs for carbon capture. (c) In Qiao et al.,43 transport properties were screened along standard adsorption properties to find the best materials for the targeted CO2/N2/CH4 ternary separation; similarly, cheaper calculations at infinite dilution were carried out in a first step, before using more expensive calculations at working pressure and temperature. |
To overcome these new challenges, people are looking increasingly towards transferable models trained by a machine learning (ML) algorithm on a diverse and size-limited sub-sample. Ideally, such a model is transferable to potentially millions of structures and can provide valuable QSPR. For instance, Fernandez et al.65 used multiple linear regression analysis, decision tree regression, and nonlinear support-vector machine models to extract QSPR and establish rules of designing well-performing MOFs for methane storage, while identifying promising structures. In this first work they only used geometrical descriptors to describe methane storage,65 but realising the importance of chemical descriptors, they proposed the atomic property weighted radial distribution function as a powerful descriptor to predict CO2 uptakes.66 More importantly, they proved that ML can be used as a pre-screening tool to avoid running time-costly simulations by correctly identifying around 95% of the top 1000 best performing materials. Recently, the same group used similar techniques to predict CO2 working capacity as well as CO2/H2 selectivity in MOFs for precombustion carbon capture.67
Regarding energy descriptors, different ones can be used alongside the most basic geometrical ones. For instance, Simon et al. introduced the Voronoi energy, combined with structural descriptors they used them to predict Xe/Kr selectivity of over 600000 structures using a random forest model.70 Bucior et al. also used an energy-based descriptor, the energy histogram, to predict the cryogenic storage capacity of hydrogen three times faster than traditional simulations.71
Descriptors based on the analysis of data have also been studied and enable to find similarly performing materials. Based on advanced knowledge on mathematics and topology, Lee et al. used a topological data analysis-based descriptor, called persistent homology and resembling barcodes, to screen a zeolite database for methane storage and carbon capture applications.72 Later, Yongjin Lee led his group to propose an ML prediction method using the same pore geometry barcodes.29 More recently, Moosavi et al. built geometric landscapes, a representation for energy-structure–function maps based on geometric similarity, quantified by persistent homology.73
To model the chemical behaviour of materials, one developed several chemical descriptors. In particular, Borboudakis et al. introduced the chemical building block as a feature or descriptor of their ML models. In their study, they integrated all the models into a unified algorithm called “Just Add Data” and concluded that random forest and support vector machine were outperforming the other algorithms they tested.74 Recently, the same group continued on providing a universal (transferable on different materials) ML algorithm by using the type of atom instead of the previous building block description, which led to an increased performance on the prediction of methane and carbon dioxide adsorption capacities.75 Anderson et al. used the chemical building blocks of the MOF and the Lennard–Jones parameters of existing or “alchemical” adsorbates to train a neural network model for adsorption isotherms prediction.76
Through the scope of different types of descriptors, we introduced some ML-assisted approach to computational screenings. Fig. 3 gives a higher-level view on how machine learning is practically applied. One can find a more comprehensive review on big-data science applied to porous materials written by Jablonka et al.77 The authors go through the selection of diverse data, the design of meaningful descriptors, ML algorithms, the best practices in the training process of an ML model, the measurement of its performance and the interpretation of the model to avoid the “black box” effect.
Beyond the reluctance to apply data science to fundamental sciences, one should not associate machine learning with the “end of theory”; physicochemical theories can guide the development of the descriptors at the base of any ML models and the interpretation of these models is impossible without scientific insights. The laws of physics are not explicitly included in an ML model, interpretability and exploitability methods can help cover these flaws by identifying potential nonphysical behaviours, or confirming its consistency in describing known physical behaviours, or unveiling unexpected scientific insights.68 If the model fails to meet some standards, further developments are needed for the descriptors to contain all relevant information, or to draw a more consistent relationship between the descriptors and the desired metric. Without a well-designed (containing all physical information) set of descriptors, an ML approach cannot make reliable predictions. The recent developments presented here are confirming this close interplay between data science and theory.
One of the pioneering works in computational screening was published in 2011 by Wilmer et al.41 They performed a large-scale screening of 137953 hypothetical MOF structures to estimate the methane storage capacity of each MOF at 35 bar and 298 K based on the US DOE standards. Back then, the US DOE set a target methane capacity value of 180 volSTP−1 (which has since been achieved by several materials reported in the literature). In their large-scale analysis, Wilmer et al. found over 300 hypothetical MOFs that meet the targeted requirements and the best one can store up to 267 volSTP−1, surpassing the state-of-the-art of the time. From their large dataset, a preliminary structure–property relationship analysis revealed that void fraction values of approximately 0.8 and gravimetric surface areas in a range 2500–3000 m2 g−1 resulted in the highest methane capacities. Optimal pore size are also shown to be around the size of one or two methane molecule(s). Maximisation of gravimetric surface area was a common strategy in the MOF design for storage applications, but this study showed the existence of an optimal range of surface area values. Computational screenings can draw clear relationships between structural descriptors and performance. Later, a more quantitative relationship was drawn by Fernandez et al. using ML models as illustrated on Fig. 4. Beware not to over-interpret the relation given by the response surface, since the identified maxima do not always have a physical reality, especially where there is no training data in the area pointed by the red arrows. However, it highlights promising unexplored feature space and shows potential research directions.
Fig. 4 Two-dimensional response surfaces of the support vector machine (SVM) models trained by Fernandez et al. for methane storage at (A) 35 bar and (B) 100 bar using void fraction and dominant pore size. The blue dots represent the GCMC simulated uptake values. The color of the surface represents the methane storage value, from blue (lowest values) to red (highest values). Blue and red arrows indicate maxima on the response surface. Reprinted with permission from ref. 65. Copyright 2013 American Chemical Society. |
Since then new materials above the target have been found and the US DOE decided to set a higher target of 315 volSTP−1. Until now, this new target is not yet reached. This is why the recent developments have focused on assessing the feasibility of such a target by accelerating the screening methods so that more data can be screened, and by interpreting the QSPR models to extract important knowledge for the design of novel materials. For instance, Gómez-Gualdrón et al. showed that even by artificially quadrupling the Lennard–Jones interaction factor ε and by increasing the delivery temperature by 100 K, the newly set target is only reached by a handful of MOFs.78 This study suggests the impossibility to reach the DOE target using a preconceived (experimentally or theoretically) material to store methane. However, this theoretical limitation can be overcome by increasing the surface density of sites with high affinity with methane and by increasing the delivery temperature.
Later, a larger-scale screening on methane storage was carried out by Simon et al. on 650000 experimental and hypothetical structures of zeolites, MOFs, and PPNs. This study confirmed that the classes of materials currently being investigated were unlikely to meet the new target. The authors suggested that it wasn't surprising since the target was based on economical arguments, while the screening is based on thermodynamic arguments.79 This example illustrates the power of large scale screening to settle questions of physical feasibility (if simulations are accurate) and hence avoiding experimental efforts spent on impossible tasks.
More recently, a dataset containing trillions of hypothetical MOFs have been screened for methane storage.80 Lee et al. developed a methodology using machine learning combined with genetic algorithm to perform the largest screening until now. In addition to confirming most of the results (theoretical limits and QSPR) found by previous screenings, 96 MOFs were found to outperform the current world record. This study shows the scaling potential of ML-assisted screenings in handling “Big data”.
Similarly computational high-throughput screenings have been applied to other storage applications such as hydrogen storage. Computational screenings showed that cryogenic storage of hydrogen can meet the DOE target of 50 g L−1.62,81,82 Anderson et al. performed a large scale screening based on neural networks to test out multiple pressure/temperature swing conditions to find that the maximal deliverable capacity cannot exceed 62 g L−1.83 Compared to the density of liquid hydrogen (72 g L−1), this upper limit seems reasonable since the adsorbent material takes at least 10–20% of the tank. Here, we only showed some flagship results of the field. For a more detailed meta-analysis, Bobbitt and Snurr wrote a very complete review on computational high-throughput screening of MOFs for hydrogen storage.84
The first large-scale computational screening on Xe/Kr adsorption-based was performed by Sikora et al. based on the same approach previously developed for methane storage by their group at the Northwestern University.91 This study was based on the same 137000 structures of hypothetical MOFs.41 They calculated the Xe/Kr selectivity using Monte Carlo molecular simulations on the whole database by iteratively increasing the number of steps and selecting the best materials similar to the approach on Fig. 2. By analysing the relationships between pore sizes and selectivity, they confirmed a hypothesis from a smaller scale study that the pores should be between the size of 1 to 2 xenon molecules.92 Tube-like channel were also found to favour better selectivity. Moreover, they found that top performing materials could have selectivities around 500; but we can only conclude on the order of magnitude of the theoretical limitation of the Xe/Kr selectivity, considering the statistical uncertainty of the simulation.
Seizing the opportunity of a formidable expansion of the nanoporous materials database triggered by the Materials Genome Initiative, Simon et al. screened 670000 experimental and hypothetical nanoporous material structures for Xe/Kr separation.70 It is one of the largest-scale screening performed in this area. Inspired by the work of Fernandez and co-workers,65 they used ML algorithms to train a model on a diverse subset of 15000 structures. This method allowed them to run time-consuming molecular simulations only on this training set, before applying the ML model to predict the selectivity values on the larger set of structures. On top of analysing the links between pore descriptors and selectivity, they rationalised it using theoretical pore models of spherical and cylindrical geometries to confirm the findings of Snurr and co-workers.91,92 By comparing the structural descriptors of good-performing and bad-performing structures, they concluded that geometrical descriptors wasn't enough to explain the performance (see Fig. 5). The analysis of a few top candidates suggests that different chemical insights could explain their good performance. For SBMOF-1 or KAXQIL,93 an experimental MOF, its higher performance was explained by the tube-like 1D channel with a very favourable binding site formed by carbon aromatic rings. This nanoporous material was later tested using breakthrough experiments and proved to be one of the most promising candidates.94 This close collaboration between computation and experimentation is a testimony of the potential of computational screenings to find nanoporous materials for any targeted application.
Fig. 5 Statistical analysis of the adsorptive separation of xenon/krypton mixtures by nanoporous materials. The graphs represent the distributions of structural descriptors explored by highly selective (green) and poorly selective (red) materials separately. Reprinted with permission from ref. 70. Copyright 2015 American Chemical Society. |
The experimental work on Xe/Kr separation on SBMOF-1 revealed discrepancies between the selectivity values obtained experimentally and computationally.94 The assumption of rigid crystal structures in the molecular simulations could partially explain the difference observed. Witman et al. proposed that the flexibility of the materials, that weren't considered in the screening of Simon et al., could explain the lower selectivity observed experimentally.95 In this study, they screened the Henry regime separation of about 4000 MOF structures of the CoRE MOF 2014 database,45 and found that intrinsic flexibility, i.e. the thermal vibration of the material, can make the pore size derive from the ideal value for the separation and hence lower the selectivity. This study further confirms the importance of the pore size by highlighting the effect of its evolution over time.
In 2019, Chung et al. screened the most extensive simulation-ready and experimentally synthesised MOF structures for Xe/Kr separation.46 This study pointed out the potential of coordinated solvent molecules to fine-tune the selectivity for any separation application, since their presence can enhance selectivity in some cases. The results of their screening confirms the potential of structures such as SBMOF-1 found by Simon et al., but they also described a few structures with similar selectivity but with better xenon uptake. The authors emphasise the importance of considering other figures of merit such as the adsorption capacity. Other factors should be taken into account to find the best trade-off between all the relevant figures of merit; we could think of the kinetics of such a separation, the effect of flexibility on the performance, the stability of the materials (especially in radioactive environment), the financial aspects, and more. Some of these aspects will be tackled in the following sections of this review.
Beside noble gas separation, carbon capture could benefit greatly from the use of nanoporous materials and we can find extensive literature on computational screening targeting this application.42,96–100 Findley and Sholl performed a screening of CoRE MOF 2014 to find the best structures for CO2 capture in humid conditions.101 After finding candidates, they performed quantum calculations but found that the classical methods with generic force fields overestimated the performance, highlighting the limits of the methodology. For a more in depth review on separation, Daglar and Kaskin described the recent development of high-throughput screening focusing mainly on CO2 separation from methane of diazote.102
There are two approaches to estimate the diffusion inside a porous material: the first one relies on molecular dynamics (MD) and the second one on transition state theories. In the first approach, one analyses the mean squared displacement of the adsorbed molecule moving in the material. In the second, one identifies minimum energy path along the material to identify transition states (TS) to calculate diffusion energy barriers. The MD-based method requires fewer assumptions and is therefore more reliable than the TS-based method, but the latter is computationally more efficient in the case of low diffusion rate (diffusivity lower than 10−11 m2 s−1).
State-of-the-art MD simulations could calculate rather accurate diffusion coefficients, but the computational cost scales quickly with the number of structures. To use this method on a large dataset without spending to much computation time, Watanabe and Sholl pre-screened the pore sizes of 1163 MOFs to select only the structures within a certain range of PLD (pore limiting diameters).38 A restricted list of 359 MOFs was then used to carry out MD simulations to calculate diffusion coefficients. The results of this final screening are then used to extract the most promising structures for further experimental or computational investigation. Similarly, Qiao et al. used a multi-stage screening to find the best membrane-material within about 130000 hypothetical MOFs for a CO2/N2/CH4 separation.43 They started to select materials based on pore geometry analysis; then they calculated Henry's coefficient and diffusion coefficients at infinite dilution; finally they compared the binary permselectivitys to extract 24 promising MOFs for ternary adsorption and diffusion calculation at the desired pressure and temperature conditions.
Another approach replaces MD simulations with more computationally efficient TS-based methods to determine diffusion coefficients. Haldoupis et al. developed an algorithm to identify diffusion paths by exploiting an energy grid with a clustering algorithm. The diffusion paths are then analysed to identify the pores and the channels, and to calculate key geometric (PLD, largest cavity diameter) and energetic (Henry's constant, diffusion activation energy) features.104 As represented in see Fig. 6, they found a clear dependence of the diffusion energy barrier to the PLD. As one of the first TS-based screenings, it is still subject to many development perspectives. For instance, the approach is limited to spherical adsorbates and rigid frameworks. Moreover, the diffusion coefficients are approximated using a simplistic hopping model for a qualitative analysis. This method is highly efficient, but the accumulation of approximations makes a quantitative systematic analysis of diffusion coefficients out of reach.
Fig. 6 Calculated energy barrier for the diffusion of CH4 in 216 metal–organic frameworks (MOFs), shown as a function of the pore-limiting diameter. The solid lines represents statistical upper and lower bounds on the energy barrier, in a transition state theory approach. Reprinted with permission from ref. 104. Copyright 2010 American Chemical Society. |
Later, Kim et al. introduced a flood fill algorithm to obtain all the points within a given energy.105 These points are then identified as channels or blocked regions. Along the channels, local minimums of energy are defined as lattice sites and transition states are defined perpendicular to the diffusion direction. A random walk is then computed along the lattice sites with hop-rates defined according to the activation energy. A diffusion coefficient is then calculated in each three directions of the space and an average diffusion coefficients is finally determined. A comparison with the MD method on the IZA zeolite structures shows good agreement, but there are still some discrepancies explained by correlated hops in the case of rapid diffusion or by the presence of complicated channel profiles. Inspired by this work, Mace et al. developed a similar method that progressively fill the energy grid to detect transition states, hence removing the previous restriction to orthogonal cells only.106 The diffusion coefficient is now computed using a kinetic Monte Carlo simulation allowing the adsorbate to jump freely in all directions instead of restricting it in a single dimension. This new method, called TuTraSt, handles very complex diffusion paths (like in the AEI zeolite). This new approach seems to be promising as it is in good agreement with MD simulations, while being 2–3 orders of magnitude faster. However, the time performance could improve tremendously by translating it from Matlab to C++ and by implementing parallelisation procedures.
Very recently a massively parallel GPU-accelerated string method has been implemented and shared publicly to compute very efficiently diffusion coefficients based on the transition state theory.107 The recent developments in the prediction of diffusion coefficients in nanoporous materials point towards a promising future for the screening of transport properties applied to even larger databases. Going further, Bukowski et al. reviewed thoroughly diffusion in nanoporous solids as an attempt to connect theory to experiments.108
To give an overview on the potential of computational screenings to predict transport properties, we are now going to focus on the membrane separation applied to natural gas upgrading. The separation of CH4 from N2 and CO2 is a crucial step of this upgrading process. In 2016, a large scale high-throughput screening (see Fig. 2 for the approach) of hypothetical MOF membranes for upgrading natural gas has been performed using MD simulations.43 In that work, Qiao et al. confirmed the existence of MOF materials with performances beyond the upper bound for N2/CH4 and CO2/CH4 separations previously determined by Robeson on a large set of polymeric membranes.112 This Robeson's upper bound is systematically crossed by MOF materials in computational screenings, see as an example the Fig. 7. This can be explained by the fact that MOFs perform better that polymeric frameworks and the simulations at this level of theory. They also identified 24 MOFs suitable for the ternary CO2/N2/CH4 separation using a multi-stage screening described in the previous section.
Fig. 7 Selectivity and permeability of metal–organic framework (MOF) membranes for CO2/CH4 separation, computed at infinite dilution by combining Grand Canonical Monte Carlo and molecular dynamics simulations.114 The black solid line represents the Robeson's upper bound.112,117 MOFs that can exceed the bound are shown in blue, and the 8 top-performing MOF membranes are shown with red symbols. Reprinted with permission from ref. 114. Copyright 2018 American Chemical Society. |
Two years later, Qiao et al. used the same approach to study this ternary separation on a database of synthesised structures.113 Applying machine learning techniques to their data, they performed a QSPR analysis. Using a principal component analysis, they notably found that the permeability is higher when materials have high PLD and void fraction coupled with low density and percentage of pores within a characteristic range. The opposite was found to be true for high membrane selectivity for the CO2/CH4 separation. Using decision tree algorithms, they gave objective procedures of selecting the best separation membranes based on some key descriptors. Finally, they studied in detail some of the best performing materials found by a support vector machine algorithm.
Altintas and Keskin later performed a screening on the same database for CO2/CH4 membrane separation to identify the best performing materials and perform more computationally demanding simulations.114 The simulations in rigid structures at infinite dilution show a large number of structures above the Robeson's upper bound as shown in Fig. 7, this crossing of the upper bound can be explained by either a better performance of MOF membranes compared to the polymeric membranes used by Robeson, or an overestimation due to oversimplified assumptions (infinite dilution, rigidity). But when higher pressures and flexibility are considered, the selectivity values are dropping down closer to the upper boundary, hence confirming the overestimation of the performance in screenings based on rigid approximations at infinite dilution. Budhathoki et al. developed a screening methodology for MOFs in mixed matrix membranes for carbon capture applications by estimating permeation values in these composite materials using a Maxwell model.115 The authors even proposed a pricing for each material compared to their relative performance. Similar studies have been carried out on different materials, Yan et al. showed the influence of decorating COFs with different chemical compounds on the membrane selectivity.116
The transport properties screening is based on the calculation of diffusion coefficients at infinite dilution and in rigid molecules. There are different methods to calculate them (mainly MD and TS-based methods). Flexibility and pressure dependence are very hard to incorporate directly in the screening procedures. Researchers usually consider these factors at the end of the screening on the most promising structures because of the computational complexity of the corresponding simulations. To take account of pressure dependence, we need an MD simulation of several adsorbates that takes much more time than running single component simulations,118,119 which makes it harder to include in a high-throughput screening. Flexibility could be taken account by calculating snapshots and running multiple MD simulations, or by using flexible force fields, which means in both cases an increase in computational run-time. Some faster methods of quantitatively predicting the impact of flexibility on diffusion are being investigated in ZIFs and could give an interesting alternative to these expensive methodologies.120
Although the vast majority of computational screenings have been done on small series, there are a few systematic screenings of larger datasets. The scarcity of the latter can be explained by the high level of computational cost required. Here, we show some examples of such attempts by focusing on the example of C–H bond activation for the conversion of alkanes into alcohols in the presence of nitrous oxide.
Inspired by enzymatic catalysis of the reaction of small alkanes with N2O into alcohols, Vogiatzis et al. identified 7 iron containing MOF structures out of 5000 structures from the CoRE MOF database.126 They found two descriptors that govern the catalytic activity: (1) the N–O dissociation energy of N2O on the adsorption site and (2) the energy difference between two spin states of the intermediate. Using a screening on these descriptors, three structures were identified as promising for further experimental studies. The best one has been computationally demonstrated to catalytically and selectively oxidise ethane to ethanol in presence of N2O. Moreover, the authors found that defects played a major role in the observed catalytic activity.
Later, Rosen et al. enlarged the scope of materials screened to other metals.127 From an 838 DFT-optimised MOFs subset of CoRE MOF 2014, the authors selected 168 MOFs that were likely to have open metal sites and pore-limiting diameters that allows the diffusion of the reactants. They then used a fully automated workflow to place the reactants in the adsorption site and relaxed the system using periodic DFT calculations. As shown in Fig. 8, using the bond activation energy Ea,C–H and the metal–oxo formation energy ΔEO as key parameters, they classified the materials according to their relative stability and reactivity to find the best materials for the application. These energies were then analysed using physicochemical descriptors such as the spin density on the oxygen and the metal–oxygen distance.
Fig. 8 Analysis of a diverse set of experimentally derived metal–organic frameworks (MOFs) with accessible metal sites for the oxidative activation of methane. The graph shows the predicted barrier for the C–H bond activation of methane, Ea, as a function of the metal–oxo formation energy, ΔEO. For each material, the symbol colour refers to the group number of the metal in the periodic table. The best-fit line has is plotted in black, and has a mean absolute error (MAE) of 0.09 eV. MOFs with Ea < 1 eV are classified as being reactive toward C–H bond activation and MOFs with ΔEO < 0 as having thermodynamically favoured active sites when using O2 as the reference state. Reprinted with permission from ref. 127. Copyright 2019 American Chemical Society. |
This type of brute force screening can be quickly cumbersome, as a result many researchers in the field are trying to find key structure–activity relationships to accelerate future computational screenings. Several descriptors have been developed for high-throughput screenings: Butler et al. used electron removal energies to explain photocatalytic behaviours of MOFs;128 Rosen et al. showed that the energy required to form the metal–oxide intermediate was a key descriptor of the thermal catalysis of alkane oxidation by N2O;129 and Fumanal et al. show a screening protocol based on two energy-based descriptors to predict photocatalytic properties of MOFs.130 Lately, Rosen et al. screened thousands of MOF structures to compare different DFT functionals and leveraged the data calculated to train machine learning models that can rapidly predict MOF band gaps.131
The development of ML methods are also critical in the field,132 but the lack of centralised database with high precision descriptors is a challenge for the future of these methods. The influence of defects, the different ways of modelling MOFs as periodic structures or clusters, the diversity of structures and the stability of such structures remain open problems. Yet, it does not threaten the major role of high-throughput screenings in the early design process of any nanoporous materials for catalysis. To conclude this brief overview, we point the readers to a more exhaustive presentation of the matter.133
One of the first studies that investigated systematically the elastic properties of a family of materials was a 2013 study of all-silica zeolites,137i.e., crystalline and porous SiO2 polymorphs. While this dealt with only 121 zeolitic frameworks out of 244 known structures, it showed that systematic studies at the DFT level were computationally tractable, and that they provided physical insight into the link between microscopic structure and macroscopic physical properties. This study demonstrated, among other things, that a small number of zeolites presented large negative linear compressibility (NLC), which could be linked to the wine-rack motif of their frameworks.
Looking outside of the specific case of zeolites, other groups have applied DFT calculations of elastic constants in a high-throughput manner. de Jong et al. leveraged the structures of the Materials Project,58,59 trying to chart the diversity of elastic properties across the whole space of inorganic crystalline compounds.138 As shown in the Fig. 9, they provided a database containing the full elastic information of 1181 inorganic compounds initially, and has grown steadily since then, containing more almost 14000 records to date.139 This dataset has been used in two different ways by researchers in the field.
Fig. 9 Statistical analysis of the calculated volume per atom, Poisson's ratio, bulk modulus KVRH and shear modulus GVRH of 1181 compounds in the Materials Project database. In the vector field-plot, arrows pointing at 12 o'clock correspond to minimum volume-per-atom and move anti-clockwise in the direction of maximum volume-per-atom, which is located at 6 o'clock. Reprinted from ref. 138 under CC-BY license. Copyright 2015 de Jong et al. |
Firstly, the exploration of the database of elastic properties by tensorial analysis has allowed to study quantitatively the occurrence of certain “anomalous” or rare mechanical behaviour, including negative linear compressibility, very high anisotropy, or negative Poisson's ratio (also called auxeticity). Indeed, such properties are considered rare and usually sought after—the materials exhibiting these anomalous behaviours are mechanical metamaterials.140 In addition to their fundamental interest, such materials have applications in materials engineering: for example in energy dissipation (as shock absorbers and for bulletproofing), energy storage, as well as acoustics.141 However, it was not possible until now to quantify exactly “how rare” they are. Chibani et al. showed through a systematic exploration of available mechanical properties of crystalline materials that general mechanical trends, which hold for isotropic (noncrystalline) materials at the macroscopic scale, also apply on average for crystals. Moreover, they could quantify the presence of materials with rare anomalous mechanical properties: 3% of the crystals were found to feature negative linear compressibility, and only 0.3% to exhibit complete auxeticity (negative Poisson's ratio in all directions of space).
Secondly, the datasets of mechanical properties were used as a basis to accelerate the discovery of novel materials with targeted behaviour. Dagdelen et al. used search algorithms to identify 38 candidate materials exhibiting features correlating with auxetic behaviour, from more than 67000 materials in the Materials Project database.142 Performing DFT calculations on these 38 structures, they could identify 7 new auxetic compounds. In a more complex setup, Gaillac et al.37 have used a multi-scale modelling strategy for the fast exploration and identification of novel auxetic materials. They combined classical force fields MD simulations with DFT calculations on candidate materials, and then used this reference DFT data to train an ML algorithm. They found that the accuracy of this multi-scale method exceeds the current low-computational-cost approaches for screening. In a similar work, Moghadam et al. used molecular simulation to train an artificial neural network (ANN) for the prediction of the bulk modulus of metal–organic frameworks.143 This shows the potential of such methodologies to treat very different (chemically as well as structurally) classes of materials.
Despite the progress made, important drawbacks of the current methodologies remain. High-throughput screenings rely too much on oversimplified assumptions such as the rigidity of the framework, the absence of defects, the use of Lennard–Jones potentials and inaccurate charges. For instance, the rigidity of the framework only takes into account one conformation of the framework. Yet, thermal agitation induces a “breathing” movement of the framework with an amplitude dependent on its intrinsic flexibility. The pores of the framework can change depending on the number of adsorbates to interact more optimally with them, which can be induced by a change in pressure. The issue of flexibility is rarely tackled, and when considered, it is only on the few most selective structures given by an inaccurate screening based on the rigid crystal approximation. One can wonder about the results obtained if it is applied to larger sets of structures. Witman et al. found that flexibility applied to top performing materials can decrease the selectivity, because the pore does not have an optimal size anymore.95 In some cases, the selectivity of a well performing material can even increase to become a top performing one. Computational screenings can be closer to predict experimental values of selectivity, diffusivity, and other key performance metrics.
Many open problems remain for the design of efficient high-throughput computational screenings. The connection between different properties for a given application is not systematically integrated in the screening procedures. For example, in methane storage, the working capacity of the material is the main property to optimise, but the kinetics of the adsorption/desorption or the mechanical resistance to compaction amongst others also need to be considered. Designing a nanoporous material is in fact a multivariate optimisation problem with tacit constraints, for example the synthesisability. Moreover, the transferability of the methodology to a broad range of materials is often achieved at the expense of accuracy in specific cases. And one can rightly question the universality of depending on faster but less elaborated models, which boils down to a trade-off problem between prediction accuracy and computational cost (or complexity). For instance, classical force-fields are broadly used in rigid materials for adsorption properties, but the switch to more costly ab initio methods or the addition of flexibility can result in a more accurate description at the expense of computational resources. The use of ML algorithms can be a way out of this apparent deadlock. They can learn sufficient information on as small a subset as possible to accurately predict the performance of other materials on a large dataset. It could in the future reduce the size of the dataset that needs to be accurately screened by computationally expensive simulations, while maintaining the quality of the predictions.
The development of such ML-assisted screenings is paired with the advances in data science techniques and algorithms, but more importantly to the construction of descriptors tailored to the many possible application. This construction work cannot be dissociated to the physical and chemical intuition of the scientists. Topological, chemical, electronic and other descriptors have been developed on top of the more common geometrical and thermodynamic descriptors, which displays the importance of strong physical chemistry knowledge. The discovery of novel relevant descriptors remains the main lever for increased performance of the ML models and is closely related to a rigorous theoretical work.
The development of databases is another key aspect in the promotion of data science in the field of materials science in general, and nanoporous materials chemistry in particular. The diversity of materials, the inclusion of experimental data (successful or failed), the addition of under studied classes of materials (e.g. amorphous) are all key aspects to upgrade the existing database. Even if existing attempts to create a centralised database have been initiated by the materials project,139 this database does not contain all the existing information on each material.
In the future, computational high-throughput screening could be integrated more tightly into the design process of nanoporous materials, hence further improving its efficiency. The computational pre-screening can be coupled with automated screenings of the most promising materials to finally identify candidates for further studies. This automated design process is described by Lyu et al. in their paper on “Digital Reticular Chemistry” and set out promising perspectives for computational screenings in the field.147 Some studies are already pioneering this new research area by combining high-throughput characterisations, active learning algorithms and robotic synthesis.148,149 Another step towards faster industrialisation would integrate process modelling to enrich the purely atomistic approach.
This journal is © The Royal Society of Chemistry 2022 |