Yangzesheng
Sun
a,
Robert F.
DeJaco
ab and
J. Ilja
Siepmann
*ab
aDepartment of Chemistry and Chemical Theory Center, University of Minnesota, 207 Pleasant Street SE, Minneapolis, Minnesota 55455-0431, USA. E-mail: siepmann@umn.edu; Fax: +1 (612) 626-7541; Tel: +1 (612) 624-1844
bDepartment of Chemical Engineering and Materials Science, University of Minnesota, 412 Washington Avenue SE, Minneapolis, Minnesota 55455-0132, USA
First published on 18th March 2019
We employed deep neural networks (NNs) as an efficient and intelligent surrogate of molecular simulations for complex sorption equilibria using probabilistic modeling. Canonical (N1N2VT) Gibbs ensemble Monte Carlo simulations were performed to model a single-stage equilibrium desorptive drying process for (1,4-butanediol or 1,5-pentanediol)/water and 1,5-pentanediol/ethanol from all-silica MFI zeolite and 1,5-pentanediol/water from all-silica LTA zeolite. A multi-task deep NN was trained on the simulation data to predict equilibrium loadings as a function of thermodynamic state variables. The NN accurately reproduces simulation results and is able to obtain a continuous isotherm function. Its predictions can be therefore utilized to facilitate optimization of desorption conditions, which requires a laborious iterative search if undertaken by simulation alone. Furthermore, it learns information about the binary sorption equilibria as hidden layer representations. This allows for application of transfer learning with limited data by fine-tuning a pretrained NN for a different alkanediol/solvent/zeolite system.
In addition, binary equilibria are more difficult to measure experimentally than single-component equilibria.5 Gmehling et al. estimated that less than 2% of the binary mixtures of technical interest have data available for equation of state and excess Gibbs energy models.6 To address the lack of experimental data available, molecular simulation has been an effective tool for predicting phase and sorption equilibrium properties in complex thermodynamic systems.7–9 However, to implement these simulation-based equilibria in modeling of an industrial process, a continuous function is necessary to describe the xi,P,T-hypersurface (where the thermodynamic state variables xi, P, and T for an adsorption system denote the mole fraction of component i and the pressure of the reservoir phase and the temperature of the system).10,11
Over the past decade, machine learning has enjoyed unprecedented attention and success in modeling massively complex systems, tasks and behaviors, including image recognition,12,13 natural language processing,14,15 and action planning.16–18 By virtue of fast and accurate evaluation (inference) after being trained, machine learning models are well-suited for the prediction of thermodynamic equilibria. As a predictive thermodynamic modeling approach, machine learning methods have been applied to spin lattices,19 supercritical fluids,20 multiphase mixtures21 and separation processes.22–24 Moreover, it is noteworthy that a fair number of machine learning models are inspired from and thus closely related to thermodynamic systems.25–27
Recent achievements of machine learning are mainly attributed to the emergence of deep learning which uses multilayered deep neural networks (NNs) to extract information from input data. Moreover, the features learned by a deep NN are transferable among similar systems or tasks.28 As a result, transfer learning can be used to tune a pre-trained deep learning model on a small amount of data for a new problem for faster convergence and lower error. While transfer learning has become a common practice in image recognition and natural language processing,29,30 predictive thermodynamic modeling can dramatically benefit from the transferability of deep NNs. If a transferable NN is coupled with molecular simulations, properties of a new system can be predicted at a fraction of the computational cost, and the amount of data required to achieve accurate predictions can be much less demanding.
One type of thermodynamic equilibria is adsorption equilibria, where one or more components (adsorbates) are in contact with an adsorbent phase and a reservoir phase. Adsorption isotherms are the most common type of closed-form functions to describe adsorption equilibria at constant temperature.31–39 Apart from isotherms that mostly describe single-component adsorption, multicomponent adsorption theories40–44 have been developed for mixture adsorption systems. NNs were also employed in adsorption systems as a replacement of traditional functional isotherms to fit experiments,45–48 while in this work, transferable deep NNs are developed over molecular simulations of adsorption equilibria to further increase the predictive power.
Here, we present a modeling workflow that combines molecular simulations with deep NNs to learn the xi,P,T-hypersurface of complex chemical systems. We consider binary sorption equilibria, where two components (adsorbates) are in contact with an adsorbent phase and a reservoir phase (see Fig. 1). The adsorbing mixtures consist of a linear alkane-α,ω-diol (referred to alkanediol or diol hereafter) and a solvent, either water or ethanol. The adsorbents considered are zeolites, crystalline materials with size-selective pores widely used in industrial applications,49–52 in the (hydrophobic) all-silica form. These sorption equilibria are necessary for heterogeneous catalysis53–56 and separation57–61 applications, and all-silica zeolites can allow for highly-selective separation of diols over water. Prediction of the equilibria of these highly non-ideal mixtures is challenging from only single-component measurements.62,63 Previously, we have shown that molecular simulations for alkanediol adsorption exhibit great agreement with experiments.60 Therefore, the simulations can be trusted to obtain accurate equilibria at conditions difficult to probe experimentally, such as at desorption conditions. At desorption conditions, a large amount of pressure and temperature data is required and is typically unavailable.64
To model the xi,P,T-hypersurface, a machine learning formalism was developed based on underlying principles of statistical thermodynamics, and a deep multi-task NN was trained to the molecular simulation results (see Fig. 1). The NN was then utilized to optimize the temperature for maximum sorbate enrichment of a single-stage equilibrium desorption operation. Furthermore, the transferability of the deep NN was investigated. The information on sorption equilibria for one specific sorbate/framework system learned by the deep NN can be generalized into chemically similar systems through transfer learning, obtaining lower test set errors than retraining the network on the new system.
Y = [y]; y ∼ Pϕ(y|N,V,T) |
Particularly, we focus on the loading of each component in a zeolite adsorption system, namely the amount of each component adsorbed in the zeolite phase when it equilibrates with a reservoir phase. In a multicomponent Gibbs ensemble Monte Carlo (GEMC) simulation, the ensemble-averaged adsorption loading of the ith component, qi, is directly measured from the number of molecules in the zeolite phase,
qi(N,V,T) = zi∼Pϕ[zi|N,V,T] |
At a given input state (N,V,T), the conditional KL divergence can be evaluated using the binomial equation of Pθ(y|N,V,T),
l(qi,ŷi) = −qilog ŷi− (Ni − qi)log(1 − ŷi) |
The GEMC set-up allows one to probe desorption of a zeolite loaded with a specific number of molecules, whereas many grand canonical Monte Carlo simulations would be needed to find the set of chemical potentials that corresponds to this specific desorptive drying scenario. The loadings of both components in the zeolite phase are measured after equilibration of the desorption process. Two types of diols, butane-1,4-diol (C4) or pentane-1,5-diol (C5) were used, water (W) or ethanol (E) were used as the solvent, and the chosen adsorbents were all-silica zeolites of the framework types MFI and LTA. The zeolite–diol–solvent combinations for which simulation data were obtained are MFI-C5-W, MFI-C5-E, MFI-C4-W and LTA-C5-W. A multi-task learning model was employed to simultaneously predict the loadings of both diol and solvent to account for the behavior of both components,
{ŷ1,ŷ2} = fθ(n,v,T) |
Simulations for each desorption system were performed at 16 temperatures (343 K to 493 K in steps of 10 K), 16 logarithmically-spaced vapor-phase volumes, and 4 initial sorbate loadings, and the results were collected for 32 independent simulations at each set of thermodynamic state variables. This gives 1024 state points and 32768 simulation data entries for each system, and 131072 simulations for the four systems. The details of the molecular simulation are reported in the ESI (Section S2†).
The reasons why multiple independent simulations were performed at a single state point are that carrying out independent simulations reduces the wall clock time needed to obtain results of a desirable statistical significance, and that the statistical uncertainty of the simulation prediction can be estimated from independent simulations in a straightforward manner. Taking subsets of the independent simulations to train separate NNs can provide a path to uncertainty estimation. To capture the uncertainty among independent simulations, the bagging method for ensemble learning was used to obtain the mean and uncertainty for NN predictions.77,78 Using 32 SorbNets each trained against data from one independent simulation at different state points, the mean and standard deviation from their 32 predictions can be compared with simulation statistics. It is found that the standard deviation from the ensemble learning reflects the uncertainties of the simulation data (see Fig. S4 in the ESI†). Using data from the 32 independent simulations enables more facile training of SorbNet compared to using the mean from the independent simulations.
The training–validation (test) set split for simulation data was performed according to the temperature of data points. In molecular simulation workflows, a whole isotherm sequence at a specific temperature and different molecule numbers (or pressures) is usually determined instead of data from random state points to probe the adsorption behavior. Based on this convention all data at 4 out of 16 temperatures were held out to construct the validation (test) set (see Fig. 2a).
Transfer learning experiments were also carried out on the NN model. First a NN model was trained using the training set described above. Then the pre-trained NN was transferred on either the training set of a different sorption system or a much smaller transfer set constructed by picking 1 out of 16 temperatures of the simulation data.
The mean square errors (MSE) for the training and test sets in each sorption system are listed in Table 1. The simulation variance for a system is given as the average variance of the normalized equilibrium loading y for 32 independent simulations at all 1024 state points in the system. In fitting the training set data, the SorbNet achieved an accuracy comparable to simulation precision with the mean square error at around twice of the averaged variance for the simulation result. When predicting the unseen test set data, the SorbNet also maintained an accuracy level at the same magnitude as the simulation precision. Although the predictions from MC simulations can be made more precise by running longer simulations, the uncertainties arising from force field parameterization79 and framework structure60 will not be affected by doing so. On the other hand, while the prediction error of a NN can be decreased by increasing the number of neurons,80 the number of NN parameters must be kept smaller than the number of data points in the training set. In the present work, we found that increasing the complexity of the NN did not dramatically improve the predictions.
Sorption system | Model | Training MSEa (×10−4) | Test MSEa (×10−4) | Simulation variance (×10−4) |
---|---|---|---|---|
a Mean square errors were evaluated using the averaged simulation result as the true value. Standard deviations were measured from 8 training runs. | ||||
MFI-C5-W | SorbNet | 3.8 ± 0.5 | 9.3 ± 0.9 | 2.8 |
Shallow | 4.3 ± 0.2 | 9.4 ± 0.4 | ||
MFI-C4-W | SorbNet | 2.5 ± 0.4 | 4.5 ± 0.7 | 1.9 |
Shallow | 4.1 ± 0.5 | 7.2 ± 1.1 | ||
MFI-C5-E | SorbNet | 1.9 ± 0.4 | 5.6 ± 0.9 | 0.7 |
Shallow | 2.7 ± 0.3 | 7.5 ± 0.7 | ||
LTA-C5-W | SorbNet | 2.8 ± 0.6 | 7.6 ± 1.6 | 1.4 |
Shallow | 3.8 ± 0.2 | 9.6 ± 0.5 |
To justify the design of SorbNet structure, the performance of SorbNet was compared against a shallow NN with the same number of parameters since a shallow network is already able to approximate any continuous function due to the universal approximation theorem.80Fig. 2b shows the training curves of SorbNet and the shallow network over the first 200 epochs. The shallow network learned at a lower efficiency on simulation data evidenced by a much slower convergence. The inefficiency of a shallow network is likely to be a result from strong correlation among hidden layer units, as 48 hidden layer representations are mapped from only 4 input features. Moreover, the shallow network gave a slightly higher training and test error after convergence. Those observations prove that a deep NN is able to achieve superior performance to a shallow NN albeit the latter already has enough theoretical prediction power.
Since NNs perform significantly faster evaluations than simulations, it is possible to interpolate a numerically continuous isotherm curve using NN predictions. Fig. 2d shows the interpolated isotherm curves by SorbNet at one representative sorbate composition in the test set, and NN prediction agrees well with the simulation results.
To obtain an adsorption hypersurface in terms of pressure, the total vapor pressure was calculated using vapor phase densities from simulation assuming an ideal gas in the vapor phase. Another NN (p–v network) was used to map the total vapor pressure p to relative vapor phase volume v by taking a state (n1,n2,p,T) as input. Since it is a trivial prediction task, a shallow network much smaller than SorbNet was adopted and its output was coupled with SorbNet to predict the equilibrium loadings (Fig. 3c). Subsequently, an isobaric adsorption curve with varying temperatures is produced. A 2D heatmap for the diol–solvent molar ratio in the zeolite phase is shown in Fig. 3a. For an isobaric equilibrium operation (a horizontal line), a temperature with the maximum molar ratio exists in the heatmap, and the optimal temperatures are found at the constraint boundary for the MFI-C5-W system (Fig. 3b). Using two representative initial loadings, the optimal temperature as a function of operation pressure was calculated by maximum searching in 0.2 K intervals from 343 K to 493 K (Fig. 3d). For an initial composition with a lower diol–solvent ratio, the slope of Toversus log-pressure is higher, indicating that it is more difficult to maintain high recovery (i.e., 99% fractional diol loading) when conducting the isobaric desorption operation. This desorptive-drying optimization problem would be much more difficult using molecular simulations and traditional isotherm fitting. If the optimization was to be undertaken by simulations alone, a sequence of desorption simulations would need to be conducted iteratively to search for the optimal temperature following the bisection method. If adsorption isotherm modeling was to be used, the equilibrium vapor phase composition as well as the partial pressures are not known beforehand, so it would be difficult to fit multicomponent isotherms as they almost always require the partial pressures of all components.
Since the NN always predicts the same output property while the microscopic configurations vary by different systems at the same state point, the branched higher layers of SorbNet were transferred as different systems share output semantics.81 As an intuition of its multi-task design, we expect that the lower shared layers of SorbNet extract information about a particular sorption system and the higher branched layers calculate the adsorption loadings from the extracted information and work in a system-independent manner. Such information can be as simple as adsorption decreases with temperature, and can also be more complex thus difficult to observe. This also echos conventional thermodynamic modeling in that coefficients determining how state variables are related to each other contain information about the system, such as the critical point in the Peng–Robinson equation of state.82 The hypothesis that the higher layer weights are general to different sorption systems was proved by pre-training the SorbNet structure on one adsorption system, keeping the pre-trained weights of branched layers while reinitializing the lower layers and retraining it on another system with different zeolite or sorbates. The transfer learning results to other sorption systems with the branched layer weights either fixed or trainable are shown in Table 2.
Sorption system | Initialization | Branched layers | Training MSEa (×10−4) | Test MSEa (×10−4) |
---|---|---|---|---|
a Standard deviations were measured from 8 training runs. 8 models independently pre-trained on MFI-C5-W system used as initialization in transfer learning experiments. | ||||
MFI-C4-W | Pre-trained | Fixed | 2.9 ± 0.6 | 5.0 ± 1.2 |
Pre-trained | Trainable | 2.6 ± 0.5 | 4.6 ± 1.1 | |
Random | Trainable | 2.5 ± 0.4 | 4.5 ± 0.7 | |
Random | Fixed | 105 ± 6 | 113 ± 8 | |
MFI-C5-E | Pre-trained | Fixed | 2.5 ± 0.5 | 6.4 ± 1.4 |
Pre-trained | Trainable | 1.6 ± 0.6 | 4.4 ± 1.4 | |
Random | Trainable | 1.9 ± 0.4 | 5.6 ± 0.9 | |
Random | Fixed | 150 ± 3 | 155 ± 5 | |
LTA-C5-W | Pre-trained | Fixed | 3.5 ± 0.3 | 9.2 ± 0.7 |
Pre-trained | Trainable | 2.8 ± 0.5 | 7.5 ± 1.5 | |
Random | Trainable | 2.8 ± 0.6 | 7.6 ± 1.6 | |
Random | Fixed | (2.0 ± 1.7) × 102 | (2.2 ± 1.7) × 102 |
Compared with training a network from scratch on the new system, retraining the shared lower layers with fixed branched layers resulted in a slightly higher error yet generally at the magnitude on par with a new network. When the branched layers are further allowed to be fine-tuned, transfer learning achieves statistically indistinguishable performance from training a new network (Table 2). However, an alternative explanation for those results is that the lower layers have already had the enough capacity to accomplish the entire prediction task, in which case the information in the higher layers are irrelevant. To inspect this possibility, another SorbNet structure was also trained on each sorption system with its higher layers fixed at randomly initialized weights. In machine learning practice, a way to probe whether the NN is overcomplicated for the task, is to check whether it even fits the data with random labels. If the lower layers already fit random outputs given by the initial weights of the higher layers, it would be irrelevant whether the branched layers have extracted any useful features. As is shown in Table 2, training the lower layers against random higher layer weights resulted in considerably higher errors. Conclusively, the higher branched layers of SorbNet indeed play a role in predicting sorption loading from features extracted by lower layers, and are transferable among different sorption systems.
We utilize the transferability of the SorbNet in the prediction of temperature dependence for a sorption simulation system with the data at only 1 temperature. Since the lower layers also encode potentially useful information about the sorption system, we kept the lower layer weights instead of reinitializing them when performing transfer learning. In this transfer application, the transfer performance of SorbNet was compared with another deep NN with the same number of hidden layers and a very similar parameter complexity. Its difference with the SorbNet is that it does not have branches and all units are interconnected between layers and is referred to as ‘dense network’ for convenience. Both SorbNet and the dense network were first pretrained on MFI-C5-W system and then fine-tuned on the first two layers in the 1-temperature transfer set for other systems. The NNs were only pre-trained for 200 epochs to prevent overfitting. In addition, a newly initialized SorbNet structure was trained on the transfer set as a baseline for each system. The results for temperature dependence prediction on the transfer set are shown in Fig. 4.
Among all target sorption systems for transfer, the pre-trained SorbNet consistently outperformed the newly-trained baseline in terms of errors at most testing temperatures. SorbNet did not exhibit statistically poorer performance than the dense network, indicating that full connection between every two layers is not necessary. Apart from transferring to different sorption systems, another transfer learning task was also created where the identities of the alkanediol and the solvent in the pre-training system were switched by swapping their corresponding initial and equilibrium loading variables (MFI-W-C5). The intention to create this transfer task is that branches of SorbNet are supposed not to discriminate between stronger and weaker interacting adsorbates, and the split in the SorbNet design encourages the higher-layer features to be general for any sorbate. Since the two branches work independently, they only differ in recognizing which features are relevant to the first or second sorbate in their input. Therefore, the branches are trained to only distinguish between the ‘first’ and the ‘second’ sorbate in the data supplied. In the sorbate-switched system, the SorbNet maintained a mostly lower test error than baseline, while the dense network had a substantially poorer performance and gave a much more unstable training result (Fig. 4d). This can be explained by the tendency of the dense network to ‘remember’ which sorbate binds more strongly with the zeolite in its higher layers, implying the positive effect of a branched structure design of SorbNet.
Another potential benefit of SorbNet's branched structure to its transferability is the prevention of complex co-adaptation by network pruning. Co-adaptation in NNs refers to the phenomenon that the response (output) of different neurons to the training data are always strongly correlated, typically involving the cancellation of excessively large values produced by neurons. Therefore, when the network operates on the test set or is transferred to a different dataset, such correlation may be broken (large values cannot be cancelled), leading to severe overfitting. One common way to prevent co-adaptation is to prune the network structure. With connections between neurons reduced, the neurons are less likely to be tightly correlated with each other. Random pruning of the NN is one of the key ideas in the well-known Dropout method,83 while the higher layers of SorbNet are intentionally (rather than randomly) pruned into two separate branches, which also improves transferability of the network.
To further investigate the difference in transfer performance among new sorption systems, we evaluated the similarities between simulation systems using data-driven methods. For each simulation system and desorption initial loading, the equilibrium loadings were measured at exactly the same temperatures and vapor volumes, allowing us to directly compare the sorption patterns. Principal component analysis (PCA) was performed on the 16 q(V,T) adsorption patterns (4 loadings each for 4 systems) in the full simulation dataset (see Fig. 5a).
Also we measured the distance between the mean over 4 loadings (centroid) of MFI-C5-W system and those of other 3 systems in the 2D principal component space. Interestingly, the PCA similarities between simulation systems agrees well with chemical intuition (see Fig. 5b). With only the alkanediol different by one CH2 unit, the adsorption pattern of MFI-C4-W is very similar to that of MFI-C5-W. The MFI-C5-E system uses an organic solvent instead of water, thus having a lower similarity due to fairly different solvent–zeolite interactions. For the LTA-C5-W system, the pore structure of a new zeolite makes it also less similar system to MFI-C5-W among the zeolite and sorbate combinations. Comparing the system similarities with the transfer learning results in Fig. 4, SorbNet exhibits a higher generalization error when transferring to a less similar system at a temperature far from the transfer set, since the information learned on the MFI-C5-W system is applied less effectively when the zeolite or sorbate becomes more different.
It should be emphasized that SorbNet operates only on the thermodynamic state variables (n1,n2,v,T) instead of attributes of the zeolites and sorbates, while this does not prevent it from being transferable among similar sorption systems. More specifically, ‘transfer’ in machine learning refers to applying already-trained features (weights) onto a different dataset or task, and those weights are either fixed or further tuned against the new task. In thermodynamic modeling, this is analogous to assuming that two compounds similar in shape and polarity possess similarities in their fluid phase equilibria and properties, i.e., the principle of corresponding states. Therefore, even without supplying descriptors about the sorbates and the sorbent (e.g., kinetic diameter or limiting pore diameter), SorbNet is transferable to similar sorption systems when limited data for the new system are provided. Nevertheless, SorbNet is likely to transfer very poorly if the sorbates or sorbent are drastically different. Conversely, if descriptors about the sorption system are explicitly contained in the input data, prediction of different sorption systems would be usually considered as generalization instead of transfer learning. This is because, in this hypothesized setting, the NN was intended to fit to different sorption systems already during training. If it is sufficiently predictive, then the exact same set of weights can still be used when the descriptor values for the sorption system vary. In this case, the NN is always performing the same task, and is essentially generalizing from the training set to the test set.
The learning formalism proposed focuses on approximating the expectation value for a thermodynamic variable. However, it is possible to extend this approach to the joint learning of multiple probabilistic metrics for a thermodynamic system, such as learning both expectation and variance. In the experiments of transfer learning, the selection of sorption systems was limited within the variation of zeolites and sorbate molecules. Since SorbNet employs a multi-task architecture, it would be of great interest to expand the scope of transfer learning to pre-training on the unary sorption systems for each sorbate and transfer on the corresponding binary system.
A few limitations of SorbNet are emphasized next. One major limitation is that SorbNet is trained on and predicts sorption data for one specific combination of adsorbates and porous framework. Therefore, to predict another sorption system (changing adsorbates and/or framework), some information on the new system is required to retrain the NN using transfer learning. This makes SorbNet inefficient for high-throughput screening where predictions across a large number of porous frameworks are desired for the same state point (partial pressures of adsorbates and temperature). For the same reasons, the SorbNet predictions would hold when changes in framework structure (e.g., including framework flexibility) and force field parameters do not yield significant changes in the simulation outcomes (i.e., for SorbNet, a change in force field parameters is equivalent to changing the adsorbate/framework combination) but, again, some limited new simulation data and re-training through transfer learning would improve accuracy of the SorbNet predictions. Hence, it would be interesting to include representations of diverse porous frameworks and sorbates into the machine learning system so that the NN does not need to be retrained upon changing the sorption system. Another limitation is that the predictions of SorbNet rely more heavily on training data than on the physical principles underlying adsorption. As a result, the NN is prone to yield thermodynamically inconsistent data when (derivative) properties, such as the heat of adsorption, are calculated from the NN predictions. This could be improved by embedding physical constraints as regularization of the NN.84 In addition, SorbNet cannot make predictions for the same set of adsorbates and framework for state points far outside the training set (e.g., the diol/solvent adsorption from the liquid phase at relatively low temperature is too far removed from the desorptive drying conditions).
Our work provides a new avenue into applying machine learning in conjunction with molecular simulations for modeling sorption equilibria. Machine learning has revolutionized a large number of computational disciplines, and we hope that this work will provide guidance to harness artificial intelligence power for simulation-based materials discovery.
Footnote |
† Electronic supplementary information (ESI) available: Experimental details for machine learning and Monte Carlo simulation for binary sorption systems investigated, neural network prediction results for equilibrium loadings in systems other than MFI-C5-W, and additional method details for desorption operation optimization. See DOI: 10.1039/c8sc05340e |
This journal is © The Royal Society of Chemistry 2019 |