Markus J.
Buehler
ab
aLaboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA. E-mail: mbuehler@mit.edu
bCenter for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
First published on 24th June 2022
Deep learning holds great promise for applications in materials science, including the discovery of physical laws and materials design. However, the availability of proper data remains a challenge – often, data lacks labels, or does not contain direct pairing between input and output property of interest. Here we report an approach based on an adversarial neural network model – composed of four individual deep neural nets – to yield atomistic-level prediction of stress fields directly from an input atomic microstructure, illustrated here for defected graphene sheets under tension. The primary question we address is whether it is possible to predict stress fields without any microstructure-to-stress fields pairings, nor the existence of any input–output pairs whatsoever, in the dataset. Using a cycle-consistent adversarial neural net with either U-Net, ResNet and a hybrid U-Net-ResNet architecture, applied to a system of graphene lattices with defects we devise an algorithmic framework that enables us to successfully train and validate a model that reliably predicts atomistic-level field data of unknown microstructures, generalizing to reproduce well-known nano- and micromechanical features such as stress concentrations, size effects, and crack shielding. In a series of validation analyses, we show that the model closely reproduces reactive molecular dynamics simulations but at significant computational efficiency, and without a priori knowledge of any physical laws that govern this complex fracture problem. The model opens an avenue for upscaling where the mechanistic insights, and predictions from the model, can be used to construct analyses of very large systems, based off relatively small and sparse datasets. Since the model is trained to achieve cycle consistency, a trained model features both forward (microstructure to stress) and inverse (stress to microstructure) generators; offering potential applications in materials design to achieve a certain stress field. Another application is the prediction of stress fields based off experimentally acquired structural data, where the knowledge of solely positions of atoms is sufficient to predict physical quantities for augmentation or analysis processes.
Machine learning broadly, and especially deep learning4,5 holds great promise for applications in nanoscience, including materials discovery and design,6–9 and has been applied to various nanoscale systems including to graphene in prior research.10,11 Here we explore the analysis of graphene mechanics using deep learning, applied specifically to the nanomechanical problem of fracture mechanics,12,13 using an adversarial framework. Earlier work has shown that field data can accurately be predicted by training neural networks against paired images14,15 (that is, neural networks are given pairs of input and output fields during during).
While paired datasets (e.g. field images) may be available in some cases, there are scenarios where such pairings, or the knowledge of pairings, may not exist, such as in broad experimental data collection or in the combination of data from multiple sources. Importantly, we believe it is also a question of fundamental interest to assess whether correct field-to-field predictions, including combining multiple objectives such as displacement fields and stress fields simultaneously, can be predicted from datasets without the existence of pairs in the dataset. We address this fundamental question in this study and show that it is indeed possible to solve such a problem through the use of an adversarial training approach, offering a game theoretic approach to solve this nanomechanics problem.
Fig. 1 shows the model setup used in this study, resembling a small piece of graphene under periodic boundary conditions and uniaxial loading.16Fig. 1A shows the model setup and application of mechanical strain, used here for demonstration of the method, and Fig. 1B depicts sample results from molecular dynamics simulations that serve as input to train the deep learning model (left panel: input microstructure, right panel: predicted output stress field where numerical stress values in each atom are mapped to a color via a colormap; as predicted from MD simulation). We solve both the forward problem (predicting stress fields from microstructure) as well as the inverse problem (predicting microstructure from stress field). The model does not require any pairing of input–output images, and that it works for datasets that do not even feature the existence of any pairs in the dataset.
Fig. 1 Atomistic simulation setup (Panel A, strain in the x-direction is exaggerated for clarity) and sample results from the MD simulations (left: microstructure, right: stress field). All MD simulations are carried out in LAMMPS.17 We solve both the forward problem (predicting stress fields from microstructure) as well as the inverse problem (predicting microstructure from stress field). The stress field images are generated by mapping a stress value (or other field data) to a color; and the process can be reversed when converting a predicted field back into numerical field data for further analysis. |
Fig. 2 depicts the training set design without any pairs (see “X” marked in Fig. 2A to note the images removed, for just a few sample images), and also no pairing information between input and output data. Fig. 2B depicts a sample of the collection of input microstructures and output stress fields. The van Mises stress20 is used here as an effective overall stress measure, but the method can be generalized for other field data. Two types of crack simulations are included, graphene sheets with single cracks (left half) and multiple cracks (right half). We conduct a total of 2000 MD simulations, half with single cracks and the other half with multiple cracks, for a total of 4000 images to begin with. Once any pairs are removed, a total of 2000 images are left in the training set (this is reduced to 1000 images for training to test the reliability of the method with even fewer images).
We use a cycle-consistent adversarial neural net (GAN),21,22 as shown in Fig. 3. Fig. 3 shows an overview of the models, featuring U-Net, ResNet and hybrid U-Net-ResNet generators,23,24 as well as two PatchGAN classifiers.25,26 Generator G transforms input lattices into stress fields and generator F transforms stress fields into input lattices. Using adversarial training, G learns to generate images that resemble real stress fields, and a discriminator Dy, aims to distinguish between generated stress fields = G(xmicrostructure) and real stress fields xstress. At the same time, F learns to generate images that resemble real stress fields, and a discriminator Dx aims to distinguish between generated microstructures microstructure = F(xstress) and real microstructures xmicrostructure.
With xi and i denoting the real field and approximate, predicted field, respectively,
G(xmicrostructure) = stress | (1) |
F(xstress) = microstructure | (2) |
F(G(xmicrostructure)) = microstructure ≈ xmicrostructure | (3) |
G(F(xstress)) = stress ≈ xstress. | (4) |
F(xmicrostructure) = microstructure ≈ xmicrostructure | (5) |
G(xstress) = stress ≈ xstress | (6) |
Eqn (5) and (6) signify that if a real stress field is provided to generator G, the same stress field is produced. Similarly, once an input lattice is provided to F, the same input lattice is produced.
In other words, if we provide a microstructure to generator G we will get the “same” microstructure back. Similarly, if we provide a stress field to generator F will get the “same” stress field back.
A λ parameter is introduced to weigh the relative contributions of the losses (discriminator loss, cycle consistency loss, and identity loss). Cycle consistency losses are weight by λcycle, and the identity loss by λidentity. These two contributions are typically weighted at λi ≫ 1 (details in Materials and methods). This strategy ensures that the model not only learns how to generate images that “look like” the required output, but specifically requires that the mapping in both forward and backward direction is satisfied. This is critical for a physical problem as solved here.
Indeed, we hypothesize that due to the cycle consistent formulation of the model, no pairings of the input and output are necessary, and that the model can even learn how to predict multiple features at the same time.
We explore the use of two types of generator models, one based on a U-Net architecture, and one based on a ResNet architecture (as well as a hybrid model of the two, referred to as U-Net-ResNet). As is shown in the following sections, both models can learn to predict stress fields from input microstructures well, and also solve the inverse problem. Further details on the discriminator models are provided in Materials and Methods, along with other specifics, as well as22 for additional model aspects.
We now validate the method by comparing predictions for novel microstructures (which have not been part of the training set) with MD simulation results, as shown in Fig. 5. Sample results for single cracks (top 3) and multiple cracks (bottom 3), comparing the input, stress field, cycle, and ground truth, for the U-Net architecture, confirm excellent agreement. It is evident that the model predicts the stress fields very well, generally. As anticipated from fracture mechanics theory,13 high stresses occur at crack tips. The model also adequately captures size effects, where smaller cracks lead to lower stress intensity.27 Another interesting result is that the model predicts crack shielding (e.g. bottom example in Fig. 5), and can accurately account for the orientation of the crack, following closely the prediction by Inglis.28 Specifically, the model predicts that horizontal cracks have lower stress concentration than vertically oriented cracks.
Fig. 7 shows sample results for single cracks (top 3) and multiple cracks (bottom 3), comparing the input, stress field, cycle, and ground truth, for the ResNet architecture. As the U-Net model described in the previous section, the ResNet model predicts the stress fields very well, generally, predicting high stresses at crack tips, size effects (smaller cracks lead to lower stress intensity), crack shielding (e.g. bottom example), and orientation of the crack (horizontal cracks have lower stress concentration than vertically oriented cracks).
It is noted that the ResNet model requires longer training to reach good predictions. It usually took around a few tens of epochs for convergence of the ResNet models, whereas the U-Net model converges within a few epochs.
Fig. 8 Validating the model against molecular dynamics results. This figure depicts a detailed comparison of stresses near crack tip, comparing U-Net, ResNet and ground truth. Generally, the stress concentration is predicted well at the atomic level, albeit there are some slight differences. The U-Net model tends to make slightly better predictions (note, Fig. S2 (ESI†) shows a direct comparison between U-Net and ResNet of the entire field; not repeated here since the images are already included in prior figures). |
Fig. 9 shows examples from the model that predicts both stress field and deformation simultaneously, using a U-Net-ResNet model (featuring ResNet blocks at the bottom of the “U”, combining the two models used in the previous sections towards a more complex neural network that has the capacity to learn even more complex relationships).
Fig. 9A presents comparisons for three sample geometries. The overall shape change of the image is visible, for instance by comparing the input and output shape. Fig. 9B shows a detailed comparison of stresses near crack tip, comparing prediction and ground truth. It can be seen that generally, the stresses are well reproduced. We note that the model failed to accurate learn to predict the input microstructure from an image of a deformed stress field, at least not nearly as good as the earlier models described. This may or may not be considered limiting depending on the objective of the use case, however, it deserves further investigation in future work. For instance, adjusting learning rates and/or relative weights of the losses may help, since the forward vs. backward problems have different complexities associated with them. Other possible explorations in order to achieve better predictions of a deformed lattice to an undeformed lattice include training against solely deformations, not stress fields. We anticipate that some of these questions may be addressed in future work.
We believe that what has been demonstrated here represents a remarkable feat that offers immense opportunities for many other physical phenomena for which solely “observations”, but absolutely no correlation between the input and output, is known, for mechanistic discovery of accurate pairing by the algorithm itself.
Once the neural net is trained, one of the generators (F to translate microstructures to stress data, and G to translate stress data to microstructures) is sufficient to make relevant predictions. Such predictions can easily be carried out on a CPU or GPU and take a fraction of a second, much less time than a MD simulation to solve the same problem, which can take minutes to hours depending on the size of the system. This is particularly significant when working with complex nanomaterials that require quantum or fully reactive models). It is also noted that transfer learning can be a powerful tool to adapt the model to other scenarios. For instance, a model can be trained against MD simulations as done in this paper and then adapted to learn particularities of a system for which only quantum mechanical data is available. Since transfer learning typically requires much less data, this can be done in a feasible manner, and updating a well-trained neural network only requires a few epochs.
As a more general comment the resulting images of fields produced by the generator neural network can be converted into numerical values by using the color mapping that was used to generate the images from the MD results in the first place (thereby, each color is associated with a particular numerical value). When analyzing a result, the colors predicted by the ML algorithm can be converted into a numerical stress value (or any other field data) by reversing the process, using the same colormap but now mapping a color to a numerical value.
Through these developments we showed that this cycle-consistent GAN model opens an avenue for upscaling – in a multiscale scheme – where the predictions from the model can be used to construct analyses of very large samples, based off relatively small and sparse dataset without any known pairing of input and output, such as by tiling overlapping images generated by smaller “patches” of data, in a sliding fashion. Such sliding algorithms are used in other settings such as image segmentation or image generation,29 and can offer an effective mechanism to create very large-scale high-resolution solutions. Moreover, another application area of the model is the prediction of stress fields based off experimental data, where the knowledge of solely positions of atoms is sufficient to predict physical quantities for augmentation or analysis processes. We note that any model has to be first trained against ground truth data, which includes information about how input and output relate. A possible strategy is to pre-train a model against synthetic data as done in this study, and then use such a model in a fine-tuning step where the model is adapted against another dataset, for instance data generated from experimental imaging.
The system allows for atomic deformations in x-, y and z-directions (albeit out-of-plane deflections are minimal due to the 2D nature of graphene, especially under tension). To realize a high-throughput LAMMPS simulation setup we first generate an image (black is the background, and white color resembles void regions in cracks or other defects; added using OpenCV image generation functions). The image is then translated into a graphene lattice, where the distribution of atoms and void follows the image colors. The process is automated via a Python script that generates LAMMPS input files, runs LAMMPS, and analyzes the results.
The training set includes both, cases with just single cracks and cases with a larger number of randomly situated cracks, split half. All MD models are carried out using LAMMPS and feature a series of energy minimization using conjugate gradient (CG), MD runs at near-zero temperature, followed by homogeneous strain application (4.5% uniaxial tensile strain applied in the x-direction via a constant strain rate without lateral relaxation until the desired total strain is reached), and then a GG-MD-GC sequence to render an equilibrium stress field. The initial periodic system size before strain application is 170.93 A × 173.95 A. Each of the systems feature around 10000 carbon atoms (specific number changes due to the existence of defects).
(7) |
The datasets are split via 80:20 into training and testing sets. Before feeding to the neural network, all images are scaled to a resolution of 1024 × 1024 (the model was trained and tested on various resolutions and works generally well, albeit the depth of the generators and/or number of ResNet blocks needs to be adapted).
Note that while the results presented here focus on σvonMises predictions, models can be trained for individual σij components as well, from which then all other stress measures can be computed.
In terms of the overall workflow of the model, the two classifiers Dx and Dy are trained to determine whether stress fields are real or fake. In the adversarial training of this cycle consistent neural network, the generator gets better and better at producing realistic stress fields that can no longer be distinguished from real ones, and vice versa.
We use the loss functions as defined in ref. 22 featuring discriminator losses and cycle consistent generator losses. The cycle consistent generator loss assesses the capacity of the generator to yield realistic images, and also includes identity loss to assess whether an image moved through a cycle of both generators yields the identical image that was started with (eqn (1)–(6)).
We chose a weight for cycle consistency loss as , and weigh the identity loss by λidentity = 5, as suggested in ref. 22.
Due to the cycle consistent formulation of the losses, no pairings of the input and output are necessary, nor is it necessary to have actual pairs even in the dataset.
In some cases, we used a variation of the learning rate (especially for the U-Net mode and for the U-Net-ResNet model) after initial training for a few epochs to ensure stability during higher training epochs (following the suggestion in ref. 22).
In the study where we trained for both lattice deformation and stress field prediction simultaneously (Fig. 9), we used a multistage training process where we first primed the model by training it against a small dataset of only 100 input and 100 output images (unpaired, and pairs do not exist, as for all the other cases) for 20 epochs and then trained it further against the larger dataset with the same size as in the other cases. In other training mechanisms for this problem we first trained a model against a dataset without deformation, then used transfer learning to adapt the model to capture deformation and stress field predictions (results not shown).
A deeper PatchGAN model with a larger number of convolutional layers is used in case where we simultaneously predict deformation and stress fields (two additional Conv2D layers are added). This enabled us to increase the effective patch size and hence capture larger-scale field features beyond the 70 × 70 pixel size in the original model.
Training performances are included in the main figures in the text, and the evolution of all four loss functions are depicted in Fig. S1 (ESI†). For this problem, only single crack deformation fields are used.
Footnote |
† Electronic supplementary information (ESI) available: Additional figures, methods and code details. The ESI also features dataset examples for illustration (only sample images are included in the datasets attached to illustrate the type of data used for the training). See DOI: https://doi.org/10.1039/d2ma00223j |
This journal is © The Royal Society of Chemistry 2022 |