Daniil A.
Boiko
,
Evgeniy O.
Pentsak
,
Vera A.
Cherepanova
,
Evgeniy G.
Gordeev
and
Valentine P.
Ananikov
*
Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Pr. 47, Moscow 119991, Russia. E-mail: val@ioc.ac.ru
First published on 29th April 2021
Smoothness/defectiveness of the carbon material surface is a key issue for many applications, spanning from electronics to reinforced materials, adsorbents and catalysis. Several surface defects cannot be observed with conventional analytic techniques, thus requiring the development of a new imaging approach. Here, we evaluate a convenient method for mapping such “hidden” defects on the surface of carbon materials using 1–5 nm metal nanoparticles as markers. A direct relationship between the presence of defects and the ordering of nanoparticles was studied experimentally and modeled using quantum chemistry calculations and Monte Carlo simulations. An automated pipeline for analyzing microscopic images is described: the degree of smoothness of experimental images was determined by a classification neural network, and then the images were searched for specific types of defects using a segmentation neural network. An informative set of features was generated from both networks: high-dimensional embeddings of image patches and statics of defect distribution.
Assessment of the surfaces of layered materials and revealing defect areas is a question of key importance since the performance of the materials strongly depends on the absence/presence of defects. High-quality defect-free surfaces are important for a number of industrial applications, such as electronics,1 LEDs,17 composites,18 and materials with high mechanical strength.19 (Fig. 1). In contrast, it could be supposed that materials with a large number of defects are practically useless, but certainly this is not the case. Materials with surface defects are highly important in other areas (Fig. 1). For example, defect engineering may help to develop sensors with remarkably high selectivity,20 defect construction in electrode materials plays a crucial role in electrochemical reactions,21 defect-abandoned charcoal materials are superior for water purification,22,23 and controlled surface defects govern the synthesis of industrially important catalysts.24,25 Thus, detection of the number of defects and classification of the type of defect is a highly important topic of paramount practical importance (see more details in the ESI†).
Fig. 1 A few representative areas of application of different smooth materials and materials with defects. Stars represent uniformly distributed point defects. |
Some types of defects can be visualized by electron microscopy and traced in chemical transformations.26–29 However, the difficulty lies in obtaining sufficient microscopy contrast within a surface composed of the same elements. The challenge is that various types of defects are “hidden”, i.e. invisible with regular inspection by standard analytic tools. The defects may remain “invisible” until the material is used in a certain application, where a different chemical reactivity of defect areas comes into play (defect areas are usually much more chemically reactive than smooth areas).
For the analysis of complex patterns, deep learning has received more attention in a variety of chemical applications, from the analysis of materials and objects present in the chemical lab30 to the discovery of receptor inhibitors.31 The application of machine learning for the analysis of microscopy images is a high priority area for nanoscience. As a few representative examples, the application of machine learning in scanning electron microscopy (SEM) imaging involved the creation of the SEM image dataset,32 which was then analyzed using convolutional neural networks.32 Machine learning was used in transmission electron microscopy (TEM) for particle recognition and tracking.33 An in-depth study was performed for particle segmentation.34 In optical spectroscopy, a neural network-based algorithm was used to autonomously search for 2D materials.35 Nanoparticle recognition was accelerated in scanning probe microscopy.36
In the present study, we utilized the concept for visualization of defects using metal nanoparticles as contrast agents, and we developed a machine learning approach for automated analysis of both parameters—the number and type of defects. Two machine learning tasks were solved, and the corresponding results and algorithms were analyzed, yielding unique features for the quality assessment of layered carbon materials.
Thus, the appearance of ordered patterns on the surface of a material may serve as an indication of the presence of defect areas. Moreover, nanoparticles highlight chemically reactive areas, which cannot be done by direct imaging without contrast agents. The visual arrangement of metal particles underlines the topography of the defect area (Fig. 2). Particularly, using electron microscopy, a number of different defects, such as sheet borders, grain boundaries, sheet bends, and point defects, may be visualized on the marked carbon surface. The experimental procedure for nanoparticle attachment to the carbon surface is simple and straightforward. Subsequent electron microscopy analysis generates a large number of images suitable for automated analysis. Decoration by nanoparticles has been successfully demonstrated for various materials.43–47 Recently, an electron microscopy dataset with a thousand images was created to visualize defect areas marked with metal nanoparticles.48
Then, the material with deposited nanoparticles was filtered off and subjected to SEM analysis. To illustrate the difference, two carbon materials with and without ordered defects on the surface were analyzed, and a clear difference in the degree of palladium nanoparticle ordering was observed (Fig. 3). Nanoparticle ordering marked the underlying surface defects.
Fig. 3 (a) General scheme of the experiment; (b) experimental image showing ordered nanoparticle location; (c) experimental image showing random nanoparticle location. |
This experiment was performed to test the reproducibility and to analyze real experimental microscopy images in the present study. A general applicability of this approach was confirmed for a number of systems.37,49–51 In the present article, we also performed an analysis of microscopy images published previously.48
These three particle movement pathways are well known and described in the literature.52 Adsorption is crucial for the deposition of nanoparticles onto the carbon material surface. Binding energies between palladium clusters and carbon material surfaces were previously calculated37 and show large (tens of kcal mol−1) differences between moderate binding to pristine surfaces and strong binding to defects such as sheet borders, grain boundaries, and Stone–Wales defects. The reverse process—nanoparticle leaching—was also studied as a part of the redeposition process of Pd nanoparticles.53 Concerning on-surface nanoparticle movement, Pd atoms are known54 to be able to move on the graphene surface until they face a defect area, where their movement stops.
To substantiate the correctness of the assumptions used for the simulation by the Monte Carlo method in this work, we performed molecular modeling of the adsorption of palladium clusters on the graphite surface using the GFN2-XTB quantum chemical method, which is based on the DFT approach. The molecular model consisted of a sheet of graphene C262 and a Pd13 cluster. Since the area of a defect-free carbon material is much larger than the area of defects, at the first stage, the cluster is most likely adsorbed on the defect-free graphite surface. Binding of the cluster to the carbon surface occurs due to the coordination of one of the Pd3 flat faces of the icosahedron with the C2 and C3 groups of carbon atoms. The energy of this process at this level of theory is −91.9 kcal mol−1; that is, simple desorption of the cluster from the graphite surface is unlikely. However, the migration of a cluster over a defect-free graphite surface apparently occurs without significant energy changes. In particular, the gradual displacement of the cluster from the center to one of the edges as a result of relaxed scan calculation leads to insignificant fluctuations in the total energy of the system: the maximum increase in the total energy did not exceed 5.0 kcal mol−1. Therefore, the activation energy of this process may be approximately in the same range.
To further confirm the possibility of low-energy cluster migration over a defect-free graphite surface, we performed a molecular dynamics (MD) simulation of the Pd13·C262 system by the GFN2-XTB method at a temperature of 350 K, which corresponds to the temperature of palladium nanoparticle formation on the graphite surface in the experiment. The initial state for MD simulation was the optimized structure of the Pd13·C262 complex, in which the cluster was adsorbed at the center of the carbon material surface. As a result of MD simulation, the cluster gradually moved to one of the edges of the carbon sheet in approximately 7 ps. The mechanism of migration of this cluster is an alternation of “sliding” and “rolling” of the particle over the graphite surface. In both cases, the energy consumption for the dissociation of one of the Pd–C bonds is immediately compensated by the energy gain due to the formation of a new Pd–C bond; as a result, the total energy change during cluster migration turns out to be rather small (close to zero).
A significant exothermic effect is achieved when the cluster is bound to the unsaturated edges of the graphite surface. For example, the adsorption energy of a cluster at the armchair edge is −191.0 kcal mol−1, while the adsorption energy at the zigzag edge is −248.9 kcal mol−1. In both cases, the cluster is bonded to edge carbon atoms by five palladium atoms. Thus, the energy profile of the process of interaction of palladium clusters with a graphite surface, which is the basis for the Monte Carlo simulation, is fully confirmed by the quantum chemistry method.
To perform the simulation, probabilities were chosen in a way (Fig. 4) in which they qualitatively agree with the energy surface but also allow us to observe the dynamics of the process (see details in the “Methods” section). In the general model, for each type of carbon material surface gi, we have conditional probabilities and for adsorption onto defect gi and leaching and movement for particles located on gi, respectively. The conditional probability was decomposed via the chain rule: , where event c represents a close approach of a nanoparticle to the carbon material surface.
In the quasi-equilibrium assumption, the probabilities of finding the particle on a specific type of surface or in the solution can be estimated using a Boltzmann distribution. For example, in the particle leaching case we have:
To run the simulation, a number of particles (initially all of them are in the solution) were generated, and then iteratively, each type of movement was sampled and applied to each particle. The system converges at hundreds or thousands of iterations, and the exact number depends on the number of particles (Fig. 5). Corresponding snapshots from representative simulation steps are presented as images in the ESI.† Dynamic surface changes during the simulations were captured by video, and the movies are available in the ESI.†
The results clearly indicate that the presence of defects is highly likely to be responsible for the formation of ordered patterns. Simulations show strong agreement with experimental data: the distribution of final nanoparticle locations closely resembles the actual distribution, and curves of palladium content in the solution and on the material are in agreement with the first-order kinetics of the experimental processes.37 For the disordered case, particle positions are different from experimental positions; this is caused by the inherent stochasticity of the deposition process onto smooth materials (random attachment) without distinct defect areas on the surface.
The problem of determining the degree of order may be solved either by sequential decomposition of the problem or by developing an algorithm, which would give a degree of order directly from the image. The first approach would mean that the location of particles is needed, so then one could use distances between the observed distribution of particle positions and uniform distributions (statistical tests of that are well known58). Despite its strong mathematical grounding, this method has some disadvantages: first, in some cases, particles are out of focus in SEM images, so determination of their exact positions is difficult; second, depending on the topology of the material, the density of particle positions may change (see an example in the ESI†). Therefore, we chose a much more robust second option—to train the end-to-end pipeline, based on classification neural networks (Fig. 6).
Fig. 6 Two tasks for neural networks and ways for further analysis. From top to bottom: name of the task, network architecture scheme, loss function(s), the output of the network, possible ways for further analysis. The triangle represents the contracting/expanding path. Vertical rectangles represent fully connected layers. The gray shape between triangles represents a change to feature space and layer concatenation, as is done in U-Net.59 |
Many neural network architectures have been developed to solve image classification problems. The modern state of the field began with the development of AlexNet, the winner of the 2012 ImageNet competition.60 Like many other architectures, it follows an encoder-fully connected classifier scheme.
We compared three popular neural networks: AlexNet,60 ResNet,61 VGG80-encoder-based networks. As classifiers return “probability”, depending on the threshold, one may obtain different accuracy, precision and recall scores. It is important to note that these “probabilities” may not be interpreted as real probabilities, especially in some modern neutral networks.62 Therefore, to maintain threshold invariance, classifiers are usually compared using the ROC AUC score (area under the receiver operating characteristic curve).
These networks were trained using cross-entropy loss. The learning was extremely fast: in many cases (except for the AlexNet network), the networks were able to provide ≥90% accuracy, and ≥0.95 ROC AUC score within only 100 epochs. The ResNet-based model outperforms AlexNet in all metrics, having fewer parameters (Table 1).
Neural network | Number of parameters | Accuracy | Precision | Recall | ROC AUC score |
---|---|---|---|---|---|
a “All” means two groups of people who passed and did not pass the test. b Only those who passed the test (see the ESI for details). | |||||
AlexNet | 57M | 0.80 (0.907) | 0.71 (0.907) | 1.00 (0.980) | 0.92 (0.943) |
ResNet34 | 21M | 0.95 (0.979) | 0.91 (0.980) | 1.00 (0.993) | 0.98 (0.987) |
VGG-13 | 129M | 0.95 (0.984) | 0.91 (0.987) | 1.0 (0.993) | 1.0 (0.995) |
Human brain (all)a | Many | 0.869 | 0.943 | 0.796 | N/A |
Human brain (passed the test)b | Many | 0.896 | 0.962 | 0.837 | N/A |
Human brain (expert, one of the authors) | Many | 1.000 | 1.000 | 1.000 | N/A |
These results were compared with manual human labeling. 245 people were asked to look at some examples of ordered and disordered images and then to classify 25 images: first 5 images were used as a test—they contained very simple examples and let us exclude those results, where the problem was misunderstood (see all poll questions in the ESI†). Human results were good, but compared to the neural networks, people were much slower (few seconds vs. milliseconds), and the metrics were worse. This indeed confirms the importance of automated analysis in microscopy imaging.
It can be easily computed using standard backpropagation.63 The results may be improved by further operations using deconvolution or guided backpropagation methods.64
Even using standard backpropagation (Fig. 7), we see that empty areas of the carbon material surface have the greatest impact on the neural network's decisions. The image of the disordered class, in contrast, gives a strong response in only one ordered-looking part of the image. Therefore, we can consider that the neural network indeed learned some semantically relevant features.
Fig. 7 Examples of saliency maps (right) of correctly classified experimental images (left) of both classes. |
NN = C(E(I)) |
Therefore, during training, one is not only learning parameters of the classification network itself but also training an encoder, which is learned in that way, in which it keeps the most valuable information to perform the classification.
We show that these vector representations of images indeed reflect semantically meaningful information about the image: the position of the specific image patch in the feature space of the neural networks can tell a lot not only about the degree of order (two classes are separable even in two PCA coordinates) but also about the sample (Fig. 8). We see, for example, that sample S3 can be relatively easily separated from other images. Some of the samples overlap greatly, especially S1 and S3.
Fig. 8 Positions of random patches from images of different degrees of order (a) and different samples (b). |
Training classifiers, we obtain feature vectors for other machine learning algorithms: we can put another model (gradient boosting, for instance) on top of the features and solve other machine learning tasks. Distances between embeddings for image patches of different materials can be applied to compare them.
In conclusion, it is shown that the degree of nanoparticle ordering can be easily identified using basic classification neural network architectures such as AlexNet. The networks indeed learn semantically relevant information and may give not only information about the degree of order but also information about the sample in general.
For analysis we considered four types of defects:
(1) Sheet borders
Usually, they have sharp sheet borders with nanoparticles. The brightest part of the image if one is not overexposed.
(2) Grain boundaries
A curvy pattern of particle positions. Usually, it is not possible to detect using only background (without particles) images.
(3) Topological defects
Surface bends caused a soft gradient of brightness. Particles were located on concave defects, while fewer particles were usually observed on convex defects.
(4) Point defects
Local defects can bind groups of at least 5 agglomerated nanoparticles.
Sheet borders and topological defects can be observed even without Pd-NP markers, but grain boundaries and point defects cannot be observed by traditional SEM techniques.
The training data were generated by labeling 15 images. Models were compared using the intersection over union (IoU) score on the validation dataset. This metric is defined as follows:
As the labels contain thin lines with very high variability in their positioning, the scores are not very high. For the best model—FPN with SE-ResNet50 encoder, 0.31 IoU score was obtained. However, as we see (Fig. 9), the model is absolutely suitable for the determination of defect statistics. Defects present on the material surface are correctly identified. Moreover, the high generalization ability of the model helped to find defect sites that were not labelled by the expert. This also lets us suggest that using pseudolabelling along with the entire dataset one may improve the model's accuracy.
Further analysis of line objects (sheet borders and grain boundaries) can be performed based on skeletonized images (Fig. 10). Skeletonization is a process of sequential dilation of the mask until lines become one pixel wide. Therefore, the number of pixels in the skeletonized image would give a good estimation of the length of the corresponding patterns. This also reduces the inherent noise of the neural network outputs.
Thus, segmentation makes it possible to analyze the reasons for the nonuniform distribution of defects on the surface of the carbon materials. At the same time, segmentation was successfully performed for images of surfaces with complex morphology, as well as for images of mediocre quality. This was enabled by extensive augmentation of the images.
Machine learning analysis of experimental images is presented in two tasks: classification and segmentation. By classifying images into ordered and disordered images, one can obtain a “probability” of being ordered, which can be used as a metric of uniformity, surface smoothness and the absence of extensive defects. It is shown that image embeddings from the encoding part of the network are semantically relevant. An analysis of learned features was also carried out. The trained networks showed superhuman performance.
Solving the segmentation problem, we additionally obtained more information about the types of defects, which caused irregularities of nanoparticle positions. Further ways of analysis are described: it was found that skeletonization operation is the crucial step for statistical analysis of linear defects.
These results can be combined into a unified approach to material development (Fig. 11). To create a new material with desired properties, we propose imaging of the material followed by automated neural network analysis (this includes classification to get the degree of order and image patch embeddings and segmentation to identify specific types of defects and compute corresponding statistics), search for correlations between observed properties and neural network analysis results. The results of such machine learning analysis are indeed linked with the actual structure of the material. Knowing how the structure of a material impacts its properties, one can redesign the synthetic procedure and repeat the same process again. This procedure would be much less effective without automated analysis, as there would be no quantitative measure for structural changes.
Fig. 11 Summary of the proposed approach for material development, boosted by nanoparticle imaging and neural network processing. |
In this work, we test the described approach using carbon materials as an example. We expect that this approach can also be useful for a number of other layered materials, where defect regions can react with metal nanoparticles. Further research on this topic is underway in our laboratory.
The edges of the graphene plane were unsaturated on all four sides and were models of the grain boundaries characteristic of the graphite and graphene planes.55–57 The palladium cluster is an icosahedral structure with one palladium atom in the center.
The Pd13 cluster, when coordinated with ligands, is prone to a transition to a low-spin electronic state.72 Taking into account this fact, as well as the features of the GFN2-XTB method, all molecular systems were calculated in the singlet electronic state. As shown earlier, different isomers of the Pd13 cluster differ from each other by no more than 1 eV.72 Since the adsorption energies are very significant and substantially exceed this difference, the analysis of various isomers of the Pd13 cluster by the GFN2-XTB method was not performed in this work.
To calculate the adsorption energy of the Pd13 cluster on the C262 surface, a total optimization of an isolated cluster, an isolated plane, and a Pd13·C262 complex was performed. The adsorption energy was calculated as:
Eads = Ecomplex − Ecluster − Eplane |
When performing relaxed scan calculations, the scan coordinate was chosen as the distance between one of the palladium atoms of the cluster, which is directly bonded to the graphite plane, and one of the carbon atoms at the edge of the plane. At the initial scanning point, the cluster was adsorbed at the center of the carbon surface. In the course of scanning, the selected Pd–C distance decreased from 9.943 Å to 2.544 Å in 50 scanning steps, i.e. the step size (∼0.148 Å) was small enough so that when the cluster was moved at each step, there would be no cluster rearrangements.
Molecular dynamics simulations were performed at 350 K (NVT ensemble, Berendsen thermostat), and time steps were equal to 1.0 femtosecond. The analysis of the MD modeling results was carried out using the VMD software package.73
For each simulation step, each possible type of movement was applied to every particle. For the adsorption process, solution particles were moved randomly across the field and then adsorbed with 0.1 probability. For particles already positioned on the surface, leaching and surface movement processes were applied. We set leaching from the surface being not possible. For surface movement, shifts were randomly sampled within (±1, ±1) squares. Surface movement from the defect surface was set to be impossible as well, while for the pristine surface, it occurred in every step.
To solve segmentation problems, 15 images were manually labeled; each labeled image contained 4 different layers of labels. Labelling was performed in the GIMP image editor.74 The labeling was performed by a trained expert, and the labels were reviewed by another author.
The training images were heavily augmented. This included horizontal and vertical flips (50% probability), random shifts, scaling and rotations (100% probability, but specific values were sampled from (−0.1, 0.1) for scaling, (−0.6, 0.6) for shifts with reflection, (−45, 45) for rotations), grid distortions (50% probability), one of sharpening, blurring, or motion blur (90% probability), random contrast (90% probability), random crops of image patch with size (192, 192). For other parameters default values were used. Augmentations were performed using the Albumentations77 python package. Augmentations help to artificially increase dataset size and provide more possible examples for the neural network model. Moreover, using blurring and distortion augmentations, the network can learn how to work in complicated cases with low image quality.
Segmentation models PyTorch78 implementations were used. U-Net59 and FPN79 neural network architectures with ResNet,61 VGG,80 Inception,81 and SE-ResNet82 encoders were compared. Both architectures similarly use multiple convolution and pooling layers, but in FPN encoder features are not just copied, but passed through 1 × 1 convolution and added. In our case, this model worked better, but the difference was not significant. The VGG block is just a set of convolutions, ReLU (rectified linear unit activation function), pooling layers and (optionally) batch normalizations. The ResNet block idea is based on an “identity shortcut connection”, which solves the vanishing gradient problem to some extent and enables researchers to train large models without loss of performance. The squeeze-excitation block (SE) models the relationship between channels. In the case of the Inception-v4 encoder, residual connections were optimized and combined with the previous version of the Inception architecture, where multiple types of convolutions (with different windows) were applied to the feature map at the same time.
The training of the network was performed using the Adam optimizer. It is a first-order optimization method that extends basic stochastic gradient descent with momentum, running averages for both first and second momentums, and adaptive learning rate. This method has already shown good performance in this type of task. A learning rate equal to 10−4 was used at the very start and then decreased by a factor of 10 after the first 750 epochs.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d0sc05696k |
This journal is © The Royal Society of Chemistry 2021 |