Isaac
Squires
*,
Amir
Dahari
,
Samuel J.
Cooper
and
Steve
Kench
Dyson School of Design Engineering, Imperial College London, London SW7 2DB, UK. E-mail: i.squires20@imperial.ac.uk; samuel.cooper@imperial.ac.uk
First published on 3rd February 2023
Imaging is critical to the characterisation of materials. However, even with careful sample preparation and microscope calibration, imaging techniques can contain defects and unwanted artefacts. This is particularly problematic for applications where the micrograph is to be used for simulation or feature analysis, as artefacts are likely to lead to inaccurate results. Microstructural inpainting is a method to alleviate this problem by replacing artefacts with synthetic microstructure with matching boundaries. In this paper we introduce two methods that use generative adversarial networks to generate contiguous inpainted regions of arbitrary shape and size by learning the microstructural distribution from the unoccluded data. We find that one benefits from high speed and simplicity, whilst the other gives smoother boundaries at the inpainting border. We also describe an open-access graphical user interface that allows users to utilise these machine learning methods in a ‘no-code’ environment.
Broadly, there are two approaches to inpainting – classical statistical reconstruction6 and machine learning reconstruction. A variety of classical reconstruction techniques exist, such as diffusion-based7–9 and structure/exemplar-based.10–12 The most ubiquitous technique is exemplar-based inpainting, whereby the occluded region is filled in from the outer edge to center with the best matching patches that are ‘copied-and-pasted’ from the unoccluded region.13–16 Barnes et al. proposed the PatchMatch algorithm for fast patch search using the natural coherency of the image.12 Tran et al. extended the PatchMatch algorithm to microstructural inpainting.17,18 They outline that typically, machine learning based inpainting requires large, labelled datasets, and that their classical statistical reconstruction method does not. This approach can be used to reconstruct grayscale image data, however, with patch-based approaches, the reconstructed region will contain exactly copied regions which may be unrealistic.19
Convolutional neural networks (CNNs) form the basis of many visual machine learning tasks. The majority of generative methods that use CNNs take the form of autoencoders,20 diffusion models21 or generative adversarial networks (GANs).22 General purpose inpainting models using these methods have been developed, which have been extremely successful across many applications.23–27 However, many of the state-of-the-art (SOTA) models require large labelled datasets for training from scratch or fine-tuning pretrained models.28,29 SOTA models are also often very deep (i.e. many hidden layers in the CNN) to allow the synthesis of a wide variety of complex features, which makes training computationally expensive and, therefore, only available to those with access to high performance computers, resulting in poor accessibility to the general community. It is not only the accessibility of training that is limited, but also the application of trained models is often restricted. It is possible to apply these existing large scale image models to material science problems and depending on the application the success is varied. What is not possible, is the direct integration of materials science assumptions and requirements, such as statistical homogeneity. As such, an opportunity exists for an open-source inpainting method specifically designed for materials science that is computationally inexpensive to train and also works well in scenarios where data is severely limited.
Microstructural image data has properties that can be exploited in order to address the issues outlined above. Often micrographs are taken of the bulk of a material, and the resulting data is homogeneous. Therefore, any large enough patch of microstructure is statistically equivalent to any other patch. This allows a single image to be batched into a statistically equivalent set of smaller images, hence forming a training dataset for a generation algorithm. This eases the requirement on collecting a large dataset consisting of many distinct images, which would be the case if each entire image was a single instance of a training example. Furthermore, once a generative model has been trained on a statistically representative dataset, it can then be used to generate arbitrarily large images that would be impractical to collect experimentally. This idea was demonstrated by Gayon-Lombardo et al., who developed a GAN framework with an adjustable input size.30 Unlike the majority of GAN models, where the output size of the generator must match that of the entire generated image, the output microstructural generators need only be big enough to capture key features and can therefore be far smaller. The relative simplicity of microstructural features also reduces the required number of parameters, shrinking training times and reducing memory requirements. These properties of microstructural data greatly reduce the memory and compute requirements when training generation algorithms, and also mitigate the need for large training datasets. However, it also means the method is restricted to cases where the data is homogeneous (i.e. samples taken from any random point in the image are statistically equivalent).
GANs are a family of machine learning models characterised by the use of two networks competing in an adversarial game. They are capable of generating samples from an underlying probability distribution of an input training dataset. Mosser et al. introduced GANs as a method for reconstructing synthetic realisations of a homogeneous microstructure.31 Further methods have been developed to reconstruct 3D multi-phase microstructure, generate 3D images from 2D data, and fuse multi-modal datasets together.30,32,33 These models make use of some assumptions about microstructural image data outlined earlier to shrink the memory and compute requirements of training. These methods have successfully demonstrated the ability to generate synthetic volumes that are statistically indistinguishable from the training data, but do not solve the specific problem of inpainting. GANs have emerged as the most common machine learning method for inpainting microstructure. Ma et al. developed an automatic inpainting algorithm which involves two steps, firstly the classification and segmentation of the occluded region, followed by inpainting.34 A U-Net performs the segmentation of the damaged region, and an EdgeConnect model performs the inpainting.35,36 This method requires a large dataset of manually labelled damaged regions, which makes this method hard to generalise to all types of defect. Karamov et al. developed a GAN-based method, with an autoencoder generator for inpainting grayscale, 3D, anisotropic micro-CT images.37 This method demonstrated moderate success but struggled to form contiguous boundaries. The resulting inpainted microstructure had an observable hard border of non-matching pixels. Although having discontiguous borders does not affect some global statistics such as volume fraction or pixel value distributions, it is extremely important for microstructural scale modelling. Consider a diffusion simulation on a porous medium to extract the tortuosity factor,38 any discontiguities in the phase boundaries may have a significant impact on the resulting flow field.
This paper outlines two novel GAN-based methods for inpainting microstructural image data without the need for large datasets or labelled data. These methods are designed to be applied in different scenarios. Each approach seeks to satisfy two key requirements for successful inpainting, namely the generation of realistic features to replace the occluded region, and the matching of these features to existing microstructure at the inpainting boundary. The first method, generator optimisation (G-opt), uses a combination of a standard GAN loss (maximise realness of generated data according to the discriminator) and a content loss (minimise the pixel-wise difference between generated and ground truth boundary) to simultaneously address both goals. The resulting generator is well optimised for a specific inpaint region, but cannot be applied to other defect regions without retraining. The second method, seed optimisation (Z-opt), decouples the two requirements by first training a GAN to generate realistic microstructure, and then searching the latent space for a good boundary match. This means the generator can be applied to any occluded region in the image after training, but boundary matching can be less successful. It is important to note that these methods are stochastic, and that the inpainted region is not meant to reconstruct the ground truth. Instead these techniques aim to synthesise entirely new, but statistically equivalent regions of microstructure, whilst maintaining a contiguous border with the unoccluded region. These generated inpaintings do not represent the ‘true’ underlying microstructure, but rather one of many statistically indistinguishable synthetic possibilities. Due to the stochastic nature of these methods, there is no single solution to this problem, and a family of solutions can be synthesised.
Additionally, in Section 5 this paper presents a graphical user interface (GUI) through which users can easily apply these methods to their own data. The purpose of this is to provide democratic access to a tool for materials scientists from a range of disciplines. The GUI requires no coding experience and has been made open source to accompany this paper.
Interestingly, the architecture of G remains the same for all sizes of occluded region. To match the network output size to the occluded region, we change the spatial size of the random input seed, which is an established technique for controlling image dimensions when generating homogeneous textures of material micrographs. In the standard network that we use, increasing the input seed size by 1 results in an output size increase of 8. The size of the selected occluded region is thus restricted to be a multiple of 8 pixels in each dimension, allowing for an associated integer seed size. This calculated seed size is increased by four (padding of two in each direction) when passed to G in order to generate the boundary region on which the content loss is calculated.
During evaluation, the fixed seed is passed to G and the boundary region of the output is replaced with the boundary region of the original image, such that only the occluded region is replaced. As the seed is fixed, this will generate the exact same inpainting each time G is evaluated. For occluded regions larger than 64 × 64 pixels, the fixed seed can be adjusted by replacing central elements of the seed with random noise. This creates stochastically varying microstructure in the centre of the generated region, but does not alter the generated output at the boundaries.
The methods developed in this paper take a frame of width 16 pixels when calculating the content loss. Transpose convolutions propagate information outwards with each layer, meaning a single seed will affect a whole region of space in the output. In order to safely change the seeds to not affect the border matching, we ensure a buffer of 8 pixels on each side, and a minimum area of 32 pixels in the center to change. Therefore, the minimum seeds size is 10 × 10, as this generates a 64 × 64 pixel image. Above this, the seeds that can be changed scale with the following formula (assuming square seeds): nΔ = 2 × (nseed − 10), where nΔ is the number of central seeds that can be changed if nseed is the total seed size. For example, for a 12 × 12 seed, a 4 × 4 region can be changed. It is the hyperparameters of the network, specifically the transpose convolutions in the generator, that constrain the minimum size of the inpainting region to 64 × 64 pixels, this is not a fundamental limit and can be altered by adjusting the hyperparameters. For further details and a visual demonstration, the reader is referred to ESI Fig. 2.†
The Z-opt is performed by first calculating the MSE between the frame of the generated region and the ground truth. Then, whilst holding the weights of G constant, the MSE is backpropagated to the seed, which is treated as a learnable parameter. If the iterative updates to the seed are unconstrained, its distribution of values deviate significantly from the random normal noise distribution used during training. This is problematic, as although the resulting MSE on the boundary is potentially very low, the central features in the occluded region become unrealistic.
Initially, we attempt to address this deviation through a simple re-normalisation procedure of the seed after each seed update, which can be implemented by subtracting the mean of the seed and dividing by the standard deviation. However, the output of the optimisation after many iterations appeared to deviate from realistic microstructure and become blurry. A histogram of the values of the seed showed that the seed became non-normally distributed, and although retained a mean of 0 and standard deviation of 1, it in fact became bimodal, with two peaks centered around 1 and −1. To keep the seed normally distributed, a KL loss (a statistical measure of distance between two distributions) between the seed and a random normal seed was introduced which anchored the optimised seed to the distribution of random normal seeds. This stopped the more unrealistic features being generated and enforced a normal distribution throughout the optimisation process.
First, we calculate the difference between the pixels outside the edge of the inpainted region (which belong to the original image) and the pixels inside the edge of the inpainted region (which belong to the generated image). The squares of these differences form a distribution that describes the mean squared error of neighbouring pixels. A ground truth distribution is then calculated by taking the mean squared error between all neighbouring pixels in the original image. A Kolmogorov–Smirnov test for goodness of fit41 is then used to return the probability that the distribution calculated from the inpainted border and the distribution calculated from the ground truth are the same. For comparison, this border contiguity test was also performed on an inpainting of zeros, uniform noise and the output of a trained generator given an unoptimised random input seed (and therefore agnostic to the border) as shown in Fig. 1.
The p-value for the ground truth gives a reference value for what ‘perfect’ inpainting looks like for this microstructure, and the order of magnitude of the p-value can be used to compare different inpainting methods, and quantify how discontiguous the border of the inpainting is relative to the ground truth. The ground truth p-value is not necessarily 1, as the KS test is performed between the MSE distributions of neighbouring pixels across the whole image and the border of the ‘to be’ inpainted region. We expect the p-value to be closer to one the more the border region is representative of the global distribution.
Fig. 3 shows the results of volume fraction analysis on the inpainted microstructures. By enforcing the boundary of our generated volume to match, we naturally restrict the space of possible structures, and therefore we do not necessarily expect to recover exactly the same VF distribution as the ground truth data. However, we do expect our generator to be capable of producing this distribution when given a random boundary agnostic seed. Therefore, in Fig. 3, for each method, two distributions are shown: firstly, where no boundary matching has taken place, and secondly where it has.
We first consider the case with no boundary matching. KS tests were performed on each method to compare the distributions of volume fractions to the ground truth, the full results are shown in ESI Table 1.† The p-value is a measure of how probable it is these samples were taken from the same distribution. In the boundary agnostic case, the G-opt method produces distributions with large p-values (0.73–0.97), indicating a good agreement with the ground truth distribution. The boundary agnostic Z-opt method produces smaller p-values (0.022–0.43), revealing poorer agreement with the ground truth. As these generators are identical in architecture and were trained for the same number of iterations, this indicates that the addition of the content loss during training improves the overall quality of the generator.
It is possible that because the content loss is introduced from the start of training, G can immediately start to learn kernels that produce realistic features, without requiring useful information from D. This inevitably speeds up the convergence of G, and also aids in training D, as ‘realness’ of the output of G will be improved earlier in training. Without this content loss, G is entirely reliant on the information from D, and therefore cannot start to learn realistic features until D has learned to discriminate them. It is possible that the benefit G-opt gains from content loss in the early stages of training may be balanced out over longer training times, and that Z-opt may reach the same overall performance, but in more iterations. However, it is important to note that the difference in loss functions means the loss landscapes each method is exploring are fundamentally different, and therefore they will never converge to the exact same solution.
Fig. 3 shows that the boundary optimised VF distributions of the G-opt method are constrained within the bounds of the distribution produced by the boundary agnostic case. This suggests that, although the VF distribution of the boundary matched seed is not similar to the ground truth distribution, the VFs of the generated microstructure are at least a subset of the underlying VF distribution. On the other hand, the boundary optimised Z-opt distribution is significantly offset from the distribution produced by the boundary agnostic case. Specifically, the metal phase shows a significant decrease in volume fraction and the ceramic phase a significant increase in volume fraction. This is also clearly visible in 2, as there appears to be an over representation of the white phase. This can be explained by the seed optimisation process. During training, G is given seeds that are sampled from a random normal distribution. When the seed is optimised post-training, the optimisation pushes the seed into a region where the boundary is best matched, and although the seed is encouraged to retain its normality, this region of latent space may not have been well sampled in training, therefore generating samples that do not follow the same statistics as the underlying data.
To quantify the contiguity of the border, the analysis outlined in Section 2.3 was performed on the inpainted result of both methods. This analysis reveals the G-opt method produces borders that are indistinguishable from the ground truth, yielding a p-value of 1. The Z-opt method performs worse, and produces a more significant result, despite the border not being noticeably discontiguous.
Fig. 4 shows a comparison of the two methods for the grayscale case. The contiguity analysis reveals a more noticeable disparity between the p-values of the two methods, with G-opt (1.4 × 10−6) outperforming Z-opt (2.7 × 10−14) by many orders of magnitude. However, the significance value for G-opt is still significantly lower than the ground truth (0.017). This is corroborated by inspecting the inpainting visually as small discontiguities in the G-opt method are visible. The Z-opt method shows much clearer and more distinct boundaries, with some unrealistic features emerging in the bulk.
Analysis of volume fraction of phases is not possible for unsegmented data, which makes assessing the quality of the generated output challenging. Instead of comparing derived microstructural metrics, we plot the distribution of continuous pixel values and compare to the ground truth.
As evident in Fig. 5, the optimisation of the seed drives the generator to output more pixels with the value 1. This was reflected in Fig. 4, as there appeared to be an over representation of white regions in the microstructure. However, the unoptimised Z-opt output appears to contain more 1 s than the ground truth too. This indicates the training has not reached convergence, as the statistical properties of the ground truth have not been recreated. Similarly to the n-phase case, it appears that the content loss in G-opt offers a real advantage to the training and pushes the statistics towards the ground truth.
Fig. 5 Case 2: grayscale. A histogram of pixel values for 128 samples of size 80 × 80 pixels. The vertical axis is the frequency of occurrence of a particular bin of pixel values. |
The third case is a colour image of a terracotta pot (micrograph 177) taken from DoITPoMS.43 As colour is an additional level of complexity, the model was trained for 300k iterations. A comparison of the two methods is shown in Fig. 6. For this case, the occluded region contains a material artefact. Contiguity analysis reveals a stark difference in the performance of the two methods, also corroborated by visual inspection. The p values of the G-opt method (1.1 × 10−13) are many orders of magnitude larger than the Z-opt method (3.7 × 10−46), and visually the borders appears much more contiguous.
The pixel distributions shown in Fig. 7 reveal that both methods fail to replicate the distribution very well. There is a notable change in the shape of the distribution when fixing the seed in G-opt, this appears to flatten the peak of the distribution. This is also observed post-optimisation of the seed in Z-opt, and it seems that this moves it further away from the ground truth. Therefore in both cases, it appears that the fixing of the seed or the seed optimisation reduces the similarity between the ground truth statistics and the statistics of the generated data.
Variation in the inpainted region when changing the random seed implies over-fitting has not occurred during training.44 This demonstrates that the proposed methods do not require large datasets for training. They do, however, rely on the assumption that the data is homogeneous. Additionally, it is important to note that the generated data will only be as statistically representative of the material as the unoccluded region.
The optimisation of the seed to minimise the content loss appears to push the generator to generate unrealistic microstructure. This was confirmed by the distribution of VFs in Fig. 3. Fig. 8 shows the inpainted micrograph during seed optimisation. This demonstrates that as the seed is optimised the boundary becomes better matched, but some unrealistic features emerge. It is interesting to note that the intermediate results after 100 and 1000 iterations are particularly unrealistic, and that the microstructure returns to more realistic at long optimisation times. This could be because the seed first seeks to satisfy the easier MSE condition on the border, and then searches for a more normal seed distribution to satisfy the KL loss. It is clear that the seed that corresponds to a perfect matching boundary either does not lie in the space of realistic microstructures or at least this process is unable to satisfy both conditions, hence motivating the alternate method.
As previously mentioned, both methods were trained using the same hyperparameters (ESI Table 2), with the only difference in the training procedure being the fixing of the seed and the inclusion of the content loss. The G-opt method therefore takes longer per iteration. However, once trained the G-opt method is much faster to evaluate, with the Z-opt requiring a new optimisation for each new instance. Overall, there is a trade-off between training time, generation time and quality, meaning a method should be chosen according to the application.
Ultimately, the user determines whether or not the model or optimisation has converged. The hyperparameters in this paper are a guide, but can be tuned for different use cases. For example, for more complex materials, the number of filter layers in the networks can be increased, the training time extended and the number of optimisation iterations increased. The volume fraction and border contiguity analysis outlined in this paper are useful guides when comparing different methods and sets of hyperparameters. However, a universal, quantitative metric was not found to measure convergence across all materials, and therefore the user must still ultimately judge convergence by visual inspection.
The GUI is designed for quick and simple use of the tool. The user flow is roughly as follows:
(1) Loads in an image to inpaint from their files.
(2) Selects the image type and desired method.
(3) Draws either a rectangle or polygon around the occluded regions.
(4) Initiates training.
(5) Watches as the image is updated with the models attempt at inpainting during training.
(6) Decides if the model has converged and stops training.
(7) Generates new instances of the inpainted region.
(8) Saves the inpainted image as a new file.
At present, the rectangle drawing shape has been implemented for the G-opt method and the polygon drawing method for the Z-opt method. This is due to the relative ease of implementation. However, there is no fundamental reason why the two methods could not be adapted in the future to solve for the alternate shape types. Additional further work on the GUI will include a saving and loading models option, threading the optimisation of the seed during training for speed and an option to edit the hyperparameters and model architecture via the GUI. For the time being, the GUI can be built locally, allowing the user to adjust the finer details of the method. If this is not required, the GUI can be run from a downloadable executable file, requiring no coding experience or knowledge.
This work can be trivially extended to 3D inpainting. The extension to 3D microstructural GANs has been demonstrated in multiple applications.30,32 All that would be required would be to replace 2D (transpose-) convolutions with 3D (transpose-) convolutions, and add a spatial dimension to the seed. One potential challenge of extending to 3D would be identifying 3D defects through a simple visual interface.
The case studies explored in this paper demonstrate the success of this method, and provide a platform for applying these techniques to real materials problems. In another study by the authors, this inpainting technique was used as part of a data processing pipeline to generate 3D micrographs from 2D images, where the methods from this work inpainted scale bars from the initial 2D images, enabling more of the original image to be included in the training data.45
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2dd00120a |
This journal is © The Royal Society of Chemistry 2023 |