Mikhail Yu.
Kurbakov
a,
Valentina V.
Sulimova
a,
Andrei V.
Kopylov
a,
Oleg S.
Seredin
a,
Daniil A.
Boiko
b,
Alexey S.
Galushko
b,
Vera A.
Cherepanova
b and
Valentine P.
Ananikov
*b
aTula State University, Lenina Ave. 92, 300012 Tula, Russia
bZelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow, 119991, Russia. E-mail: val@ioc.ac.ru; Web: https://AnanikovLab.ru
First published on 4th July 2024
Carbon materials have paramount importance in various fields of materials science, from electronic devices to industrial catalysts. The properties of these materials are strongly related to the distribution of defects—irregularities in electron density on their surfaces. Different materials have various distributions and quantities of these defects, which can be imaged using a procedure that involves depositing palladium nanoparticles. The resulting scanning electron microscopy (SEM) images can be characterized by a key descriptor—the ordering of nanoparticle positions. This work presents a highly interpretable machine learning approach for distinguishing between materials with ordered and disordered arrangements of defects marked by nanoparticle attachment. The influence of the degree of ordering was experimentally evaluated on the example of catalysis via chemical reactions involving carbon–carbon bond formation. This represents an important step toward automated analysis of SEM images in materials science.
Scanning electron microscopy (SEM) is one of the major methods for studying these materials.14–16 This method involves scanning the material's surface with an electron beam, providing high resolution and simplifying surface morphology analysis.17 However, several material surface features, such as defects (irregularities in the atom arrangement and, consequently, electron density), are difficult to study. Advanced methods such as atomic force microscopy can be used, but they only cover a limited sample surface area.
Previously, a new approach was developed based on the significant difference in the binding energies of palladium nanoparticles to material surfaces.18 By depositing nanoparticles on the surface, they can be used as markers for defects. In some cases, analysis is further simplified by objects with higher atomic numbers appearing brighter in the SEM images (Z-contrast).
In recent years, computational modeling in general and particularly machine learning algorithms were actively employed in nanotechnology.19–21 A significant contribution was made in optimizing synthesis of nanomaterials,22–25 analyzing nano-scale properties,26,27 developing datasets,28,29 new algorithms,30,31 and revealing correlations between structure and properties,32 as well as to evolve methodology applied to micro- and nanoscale dynamics33,34 and spectroscopy.35
However, electron microscopy image analysis connected with particles arrangement insights remains a significant problem. Much work has focused on detecting individual particles,36,37 but research on more global material features is limited, especially considering materials ordering to be involved in human analysis.38,39 One major target for analysis is distinguishing between ordered and disordered materials and determining their overall order. Notably, the intuitive understanding of the ordered arrangement that each researcher may possess does not allow for reliable scientific research without its formalization.
Despite the fact that deep learning approaches currently occupy leading positions in solving machine learning problems, especially computer vision problems, the interpretability of the entire set of several million internal parameters of models has not yet been solved. As such, heatmaps of various types are often used to explain the features of the neural network model (visually investigating model behavior) rather than dependencies in the source data.58 The main purpose of this study is to formalize the concept of the ordered arrangement of particles based solely on their coordinates in the image.
The project described here aims to provide researchers with intelligible and controllable attributes of orderliness that can lead to a reliable interpretation of underlying chemical processes. This work presents a novel approach for determining the order of carbon materials as a measure of their underlying structure. The method uses data from the visualization of metal nanoparticles and formalizes the concept of ordering the relative position of these nanoparticles, taking into account the specificity of human perception. As a result, we propose a set of characteristics that, on the one hand, is in good agreement with intuitive understanding and, on the other hand, allows us to quantitatively describe data in terms of understandable and easily interpreted physical parameters. In this case, the classifiers training and statistical quality assessment of their work allowed us to check the adequacy of the constructed features. A plausible connection between the ordering of defects on carbon materials and catalytic activity is discussed based on experimental electron microscopy analysis of catalysts before and after the reaction.
Fig. 1 illustrates the primary steps in analyzing the ordering, presented in the form of a flowchart diagram. First, nanoparticles are detected in the SEM image. Next, based on their positions, an interpretable feature description is generated, which is necessary for comprehensive explainable ordering analysis.
![]() | ||
Fig. 1 The main stages of the ordering analysis of nanoparticles arrangement in a SEM image. The numerical labels correspond to the section numbers of the present article. |
A preliminary step in forming a feature description is detecting nanoparticles in the SEM image. After that, information about the positions of the nanoparticles (coordinates of their centers) in the image becomes available. A previously proposed nanoparticle detection method based on the exponential approximation of image fragments41 was successfully applied to real SEM images of nanoparticles. Fig. 2 provides an example of the detection results.
![]() | ||
Fig. 2 Image analysis: (a) the original SEM image; (b) the nanoparticles arrangement in the background of the SEM image; and (c) the map of nanoparticles arrangement used for order analysis. |
To analyze the ordering, we used only the nanoparticle arrangement information, eliminating factors such as background, average brightness, and SEM image resolution (Fig. 2c).
An attempt to formalize the concept of orderliness within the proposed approach is based on two interrelated assumptions about the nanoparticle arrangement.
This assumption makes it possible to form an orderliness characteristic group based on orientations (O-group):
• O1 feature. General consistency of orientations. This characteristic is a general estimate of the consistency of nanoparticle local group orientations, regardless of their reliability. This reflects the idea that the more consistent the orientations are, the more ordered the nanoparticles arrangement is.
• O2 feature. Partial consistency of orientations. As mentioned above, a SEM image may contain disordered regions with unreliable orientations of local nanoparticles groups (Fig. 3a). Considering such orientations can distort the estimate of the order of nanoparticle arrangement. Therefore, this characteristic estimates the consistency of local directions (such as O1) but takes into account only those that have high reliability.
• O3 feature. The fraction of reliable orientations. This characteristic is based on estimating the number of local nanoparticles groups with highly reliable orientation, and reflects the idea that the more groups have highly reliable orientations, the more ordered the nanoparticle arrangement is.
Determining the orientations of local groups of nanoparticles and calculating features based on them are described in more detail in subsection 4.3 “Features based on orientations of nanoparticles local groups”.
It should be noted that in the ordered areas, the neighboring orientations of local nanoparticle groups (shown in red in Fig. 3a) already form some appearance of smooth lines visible by the human eye. This observation is the basis of the following assumption about orderliness.
This assumption makes it possible to form a group of orderliness characteristics based on smooth lines (L-group):
• L1 feature. Number of constructed lines. This characteristic reflects the idea that the more lines can be constructed, the more ordered the nanoparticle arrangement is.
• L2 feature. Smoothness of the constructed lines. This characteristic is a generalized estimate of all constructed lines in terms of smoothness (the integrated index of local similarity of small polyline fragments with a straight line). This reflects the idea that the smoother lines can be formed, the more ordered the nanoparticle arrangement is.
• L3 feature. Rectilinearity of the constructed lines. Like the L2 characteristic, this feature is a generalized estimate of all constructed lines but from the point of view of similarity to a straight line throughout the polyline.
• L4 feature. The fraction of connected nanoparticles. This characteristic suggests that the more nanoparticles that are connected into lines, the more ordered the nanoparticle arrangement is.
The formation of smooth polylines and the calculation of features based on them are fully described in subsection 4.4 “Features based on smooth lines”.
Combining features based on the orientations of local groups of nanoparticles and features based on smooth lines makes it possible to form a well-interpreted feature description for each of the SEM images. This description forms an important basis for further explainable analysis of nanoparticles arrangement orderliness using explainable machine learning methods,42,43 as opposed to the use of unexplainable deep neural networks.44
• (O1) General consistency of orientations;
• (O2) Partial consistency of orientations;
• (O3) The fraction of reliable orientations;
• (L1) Number of lines constructed
• (L2) Smoothness of the constructed lines;
• (L3) Rectilinearity of the constructed lines;
• (L4) The fraction of connected nanoparticles.
Fig. 4 shows a graphical representation of each of the SEM images in the space of three features (O2, O3 and L4), the most informative for determining whether ordering occurred.
![]() | ||
Fig. 4 Representation of SEM images in the space of the three most informative features (O2, O3, L4). |
It also shows that the classes of ordered and disordered images are locally concentrated even when only a portion of the proposed features is used. This suggests that the feature description reflects the real relationship between the ordering and the nanoparticle arrangement.
The issue of determining the order of particles is a novel one, with an example of using convolutional neural networks described previously.38
The classifier quality was estimated using a 5-fold cross-validation45,46 procedure. It should be noted that the experimental datasets are typically unbalanced (for example, 750 ordered and 250 disordered images in the dataset used in the present study). Therefore, when forming cross-validation folds, stratification must be carried out.47 For more information about the quality indicators used, see subsection 4.6.
The results of applying the proposed approach were compared to the results of the previous work38 that solved the considered problem using deep neural networks. It is important to note that the literature study was focused on detecting nanoparticle ordering and therefore provides information only about the “ordered” target class (Table 1). Note that the whole image was used as the initial data for training the neural network, and not the coordinates of the particles, as in the proposed approach.
Neural network | Number of parameters | Accuracy | Precision | Recall | F | AUC |
---|---|---|---|---|---|---|
AlexNet | 57 M | 0.80 | 0.71 | 1.00 | 0.83 | 0.92 |
ResNet34 | 21 M | 0.95 | 0.91 | 1.00 | 0.95 | 0.98 |
VGG-13 | 129 M | 0.95 | 0.91 | 1.0 | 0.95 | 1.0 |
Table 1 shows the main quality indicators of 5-fold cross-validation for three convolutional networks, where the target class is ordered images. Table 2 shows the main quality indicators by 5-fold cross-validation for the Support Vector Machines (SVM) linear classifier based on the proposed interpreted features, where the target class is both ordered and disordered images.
Target class | Accuracy | Precision | Recall | F | AUC |
---|---|---|---|---|---|
Ordered (750) | 0.957 | 0.97 | 0.98 | 0.97 | 0.989 |
Disordered (250) | 0.93 | 0.90 | 0.91 |
Tables 1 and 2 demonstrate that the SVM linear classifier, which utilizes only 7 interpretable features, performs marginally better than the approach based on convolutional neural networks that use abstract parameters in the tens and hundreds of millions.
The results obtained confirm our assumption that the proposed characteristics reflect the real relationship between the ordering and nanoparticles arrangement. From this point of view, the type of classifier used does not play a large role since interpretability is achieved at the expense of the feature space. Linear SVM was chosen because it is theoretically justified and allows visualization of the separability of objects in a system of explicable features (see ESI, section 5†).
The Suzuki–Miyaura reaction was carried out under relatively mild conditions (70 °C; see Methods section, subsection 4.7), and palladium deposited on nanoglobular carbon was chosen as the catalyst (Fig. 5a). This type of support has a random distribution of surface defects, so that the deposited nanoparticles are also distributed chaotically. Examination of the sample by electron microscopy after the catalytic reaction showed that the nanoparticles were almost completely dissolved or detached from the support (Fig. 5b). In addition, accumulation of agglomerates may indicate particle movement on the surface followed by agglomeration.
The harsher conditions of the Mizoroki–Heck reaction (140 °C; see Methods section, subsection 4.7) were chosen to demonstrate the behavior of palladium on a graphite support. As previously shown, this type of support exhibits an ordered arrangement of defects, as shown in Fig. 5c and d. However, the experiment showed that more stringent reaction conditions did not result in the pattern observed in the Suzuki–Miyaura reaction. It was found that support with an ordered defect array is more resistant to the metal leaching phenomenon.
Although the nanoparticles are still present in the images before and after the reaction, their location may change, which will help shed light on the dynamic processes in the solution. Fig. 5e shows the results before and after the reaction for a larger image number of the ordered material. For these images, which are not included in the dataset,48 the proposed ordering parameters were calculated (see the ESI, section 6†), which allows them to be displayed in the appropriate space. Fig. 5e shows that these images are usually ordered. On average, the order after the reaction is greater than the order before the reaction.
These results confirm the importance of the effects underlying nanoparticle ordering in dynamic processes occurring during chemical reactions. The development of automated methods for nanoparticle ordering analysis will contribute to the development of new, more efficient catalytic systems in the future.
Notably, the proposed approach based on explicable data analysis allows us to explicitly interpret the classification result based on formalized ordering features, which is impossible for a neural network represented as a “black box” model. This is important because the proposed approach can form the basis of a more general indicator of orderliness—the degree of orderliness.
We also showed that nanoparticle ordering is strongly related to the dynamic processes occurring in chemical reaction mixtures. Undoubtedly, the application of these models will have a significant impact on automating SEM image analysis in carbon material research and material science in general.
These SEM images were obtained using a field-emission scanning electron microscope (FE-SEM) Hitachi SU8000. The operation conditions involved secondary electron mode at an accelerating voltage of 10–30 kV and an operating distance of 6–12 mm.
In accordance with the proposed approach, the orientation of a local group of nanoparticles is understood as the prevailing direction along which the nanoparticles of this group line up.
Therefore, this section describes the following:
• The proposed method for forming local groups of nanoparticles prevailing directions (subsection 4.3.1),
• The proposed method for computing the prevailing direction for a local group of nanoparticles (subsection 4.3.2) and
• Three orientation-based features (O-features) (subsections 4.3.3–4.3.5).
Each of the detected nanoparticles is the starting point for the formation of a local group (so initially, the local group consists of only one nanoparticle).
A new nanoparticle for adding to a group is selected as the nanoparticle with the minimum Euclidean distance to the nearest nanoparticle of the group. The proper nanoparticle can be easily found on the basis of the neighborhood graph constructed by Prim's algorithm.49
In the simplest case, the adding process is continued until forming a group of the given size has finished. However, it should be noted that in a number of cases, the minimum distance nanoparticle can be situated far enough from the group. This is especially true for regions in a SEM image with low local nanoparticle density. It is evident that adding far nanoparticle is undesirable because it can lead to a distortion of the group properties.
To solve this problem, we propose the use of an early stopping criterion based on the special threshold, which represents the average local density of nanoparticles in areas with their most intense accumulation and can be estimated on the basis of k·N minimal distances between nanoparticles j = 1, …, k·N:
![]() | (1) |
Thus, the modified Prim's algorithm for forming a local group of nanoparticles can be represented as follows.
Algorithm 1. Modified Prim's algorithm
G = {i} | # Indices of nanoparticles of the group |
E = {e11, …, eNN} | # Euclidean distance matrix |
s | # Maximum number of nanoparticles in one group |
1 while (|G| < s): | |
2 | j = argmin eGḠ # index of nanoparticle closest to the group |
3 | if (∃g∈G: egj ≤ d) then: |
4 | G = G∪j |
5 | else: break |
As a result of the proposed procedure, the located groups may be completely different, partially overlap, or be exactly the same. The number of nanoparticles in each local group is upper bounded by some predefined value s, which is the parameter of the proposed method. At the same time, small groups of nanoparticles (with a size less than four) were excluded from further analysis.
In this case, the direction of the local group of nanoparticles corresponds to the maximum eigenvector of the covariance matrix. The tilt angle of this vector Θ relative to the horizontal can be calculated by the following formula:
![]() | (2) |
![]() | (3) |
This estimate takes values in the range [0, 1] and shows how much the arrangement of nanoparticles is “elongated” in the prevailing direction. The best possible value q = 1 is reached when all the nanoparticles in the local group are located on the same straight line.
Fig. 6 shows a visual representation of the main characteristics for computing O-features for disordered and ordered arrangements of nanoparticles in SEM images.
![]() | ||
Fig. 6 Visual representation of the main characteristics for computing O-features for disordered (left) and ordered (right) arrangements of nanoparticles. |
Fig. 6a and b show the nanoparticles detected in the disordered (left) and ordered (right) SEM images that would be used for further analysis.
Fig. 6c and d show graphs of the probability distribution functions for all prevailing direction tilt angles, where the dashed black line corresponds to the case of a uniform distribution of tilt angles.
Fig. 6e and f show the prevailing direction tilt angles jointly with their reliabilities, where the dashed red line shows the threshold reliability value. It should be noted that some local groups can be characterized by the same angle and reliability values; thus, they fall into the same point on the graph. The point brightness indicates the number of local groups with the same characteristics (the more local groups there are, the brighter the color is).
Fig. 6g and h show graphs of the probability distribution functions for only high-reliability prevailing direction tilt angles (for which the reliability is higher than the threshold – only points above the red dotted line in Fig. 6e and f).
As shown in Fig. 6, the representation of main characteristics of the ordered and disordered nanoparticles arrangements differ significantly from each other. Therefore, these features are expected to be quite informative for further analysis to distinguish between ordered and disordered nanoparticle arrangements.
A quantitative measure of the general consistency of orientations (prevailing directions) can be computed on the basis of the Shannon entropy.52
The tilt angle in the prevailing direction always takes values in the range [−90°, +90°]. To calculate the Shannon entropy, this range is divided into m intervals of equal length, and the empirical probabilities pi of the angle falling into each interval i = 1, …, m are computed.
Then, the value of the Shannon entropy H can be calculated by the following formula:
![]() | (4) |
Note that the maximum possible entropy value is limited and can be reached in the case of a uniform distribution:59
H* = log2m. | (5) |
The final value of the O1 feature can be calculated as the ratio (4) on (5) and reflects the effective value.59 The negative sign in the ratio is required to normalize the values of the features – the greater, the better.
![]() | (6) |
In this case, O1∈[−1, 0], and the highest value of 0 can be achieved when all the nanoparticles are arranged in a single line. The value of bins in this work was taken as 90.
![]() | (7) |
The proposed O3 defines the proportion of reliable orientations taken into account at that reliability value:
![]() | (8) |
In this case, O3∈[0, 1], where the larger its value is, the more ordered the nanoparticle arrangement is.
The natural approach is to connect dots that are close to each other. Generally, this problem is solved by constructing the shortest unenclosed path (SUP).53 However, in this case, there are two problems due to the specifics of the applied task being solved.
The first problem is related to the building smooth lines. By a smooth line here, we mean a line, each small section of which is similar to a straight line. The presence of these lines is typical for images with an ordered arrangement of nanoparticles. However, the use of SUP together with the traditional Euclidean distance in most cases leads to the construction of strongly curved lines, the presence of which cannot serve as an indicator of nanoparticle orderliness.
To solve this problem, we propose a new adaptive metric that, in addition to the Euclidean distances between points, considers the prevailing directions of nanoparticle local groups (subsection 4.3.1) and their reliability, as well as the consistency of a new point (which is a candidate for adding to the line) with the already constructed part of the polyline to possess the smoothness property. The description of the proposed metric, named the metric of prevailing directions (MPD), is given in subsection 4.4.1 of this section.
The second problem is related to the fact that the SUP method is focused on connecting all points into a single line, while within the framework of the applied problem being solved, it is necessary to build separate long smooth lines, not necessarily using all the available points. As a result, some nanoparticles may remain not belonging to any line at all.
In this regard, subsection 4.4.2 of this section proposes a modification of the SUP method, which allows us to take into account the indicated specifics.
Subsections 4.4.3–4.4.5 contain, respectively, a description of the interpreted features L1–L4 based on the lines constructed in accordance with the proposed approach.
![]() | (9) |
Note that the difference in the tilt angles of the prevailing directions can be characterized by one of the adjacent angles at the intersection of these directions. Since the sine of the adjacent angles is not important for estimating the difference, we use the sine of the difference in tilt angles of the prevailing directions. Additionally, the use of a sine allows one to normalize the magnitude of the angle difference so that the larger its value is, the further away the nanoparticles are located.
According to the metric, the greater the average unreliability of determining prevailing directions is, the greater the distance of the metric is (the respective nanoparticles are more distant from each other). At the same time, if the difference in tilt angle is large, then the corresponding nanoparticles will be considered distant even if the average unreliability is small due to the occurrence of maximum operation.
On the other hand, in the process of constructing a specific line, the value of the MPD can be corrected to ensure the smoothness of the constructed lines via so-called angular coaxiality coaxijk, which is similar to the cosine similarity measure;54 however, in contrast, it is scaled to the limits of [0,1]:
![]() | (10) |
Fig. 7 illustrates the concept of angular coaxiality.
![]() | ||
Fig. 7 Illustration of the idea of angular coaxiality, where the last point and last but one point of the line are marked in black and the yellow points are candidates for adding to the line. |
The resulting corrected MPD value is defined as follows:
![]() | (11) |
It should be noted that the corrected MPD values (11), in contrast to the basic MPD values (9), are dynamically changed in the process of constructing each line and cannot be computed in advance.
Fig. 8 shows the contributions of the main parts of the corrected metric of the prevailing directions (11) for three consecutive steps of choosing the nearest nanoparticle.
Fig. 8a, d and g show the traditional Euclidean distance between points. Fig. 8b, e and h show the basic MPD distance (9), which is based on the Euclidean distance and the direction information jointly related to the reliability. Fig. 8c, f and i show the corrected MPD (11), which allows greater smoothness of the line under construction to be reached.
Each next point to add to the line can be found on the basis of the shortest unclosed path (SUP) method,55 which we modify to incorporate dynamically computed distances (11), elongate the line on two sides (by adding new points before the first and after the end point) and stop line formation if the corrected MPD distance (11) exceeds the adopted threshold; this approach is computed as a special case of (9):
thr = C·d + (1 − C) × wthr, | (12) |
The description of the proposed algorithm is given in a general form, where the term “point” implies the center of the nanoparticle and the term “index” implies the ordinal number of a nanoparticle:
Algorithm 2. Construction of lines via the modified shortest unclosed path method
L = 〈i〉 | # Indexes of points forming a line (i – index of the starting point) |
M = {m11, …, mNN} | # Matrix of the basic metric of prevailing distances (4) |
N | # Number of detected nanoparticles |
1 while (|L| ≤ N): | |
2 | l = L0 # indexes of the leftmost point in the line |
3 | r = L|L| # indexes of the rightmost point in the line |
4.1 | # indexes of the closest points to the line on the left |
J = {j1, …, jx}, ∀j∈J: mlj ≤ thr; J∩L = ∅ | |
4.2 | # corrected MPD metric values (6) for the nearest points to the left of the line |
ML = {ml1, …, mlx} | |
4.3 | # index of the closest point to the line on the left in the corrected MPD metric |
j = argmin ML | |
5.1 | # indexes of the closest points to the line on the right |
K = {k1, …, ky}, ∀k∈K: mrk ≤ thr; K∩L = ∅ | |
5.2 | # corrected MPD metric values (6) for the nearest points to the right of the line |
MR = {mr1, …, mry} | |
5.3 | # index of the closest point to the line on the right in the corrected metric |
k = argmin MR | |
6 | if ((J ≠ ∅) and (mlj ≤ mrk)) then: |
7 | L = 〈j, L〉 |
8 | else if ((K ≠ ∅) and (mrk ≤ mlj)) then: |
9 | L = 〈L, k〉 |
10 | else: break. |
One pass of this algorithm allows one line to be contracted. To construct a new line, a new starting point should be chosen, and the algorithm should be reapplied.
To exclude the starting line from unreliable points, we set the minimum reliability value qmin that acts as the threshold while choosing the starting point.
Let P = (p1, …, pn) be a constructed polyline consisting of n ordered nanoparticles and P* = (pa, …, pb) be a fragment. Then, the metric coaxiality for this fragment can be defined as follows:
![]() | (13) |
The rectilinearity of a single polyline expresses its similarity to a straight line throughout it and can be calculated as the metric coaxiality (13) of the full polyline. The L3 characterizes the rectilinearity of all constructed lines at once and is computed by averaging individual rectilinearity values. The smoothness of a one polyline expresses its local similarity to a straight line and can be computed as the average metric coaxiality of all polyline fragments of some size fsize.
The L2, like the L3, characterizes the smoothness of all constructed lines at once and is computed by averaging individual smoothness values.
• The proportionality coefficient for early stopping in local groups formation (subsection 4.3.1): k = 3;
• The weight coefficient for estimating the local nanoparticles density in a SEM image (subsection 4.3.1): wd = 1.5;
• The maximum number of nanoparticles in a local group (subsection 4.3.1): s = 8;
• The reliability threshold for computing the partial consistency of orientations (subsection 4.3.4, subsection 4.4.2): qmin = 0.85;
• Proportionality coefficient to adjust the degree of influence of individual parts of the proposed metric of prevailing directions (subsection 4.4.1): C = 0.025;
• Weight coefficient of the angular coaxiality in the metric of prevailing directions to ensure line smoothness (subsection 4.4.1): wcoax = 1.75;
• Minimum line length in nanoparticles (subsection 4.4.3): Lmin = 12;
• The size of a polyline local fragment is used to estimate the smoothness of the constructed lines (subsection 4.4.4): fsize = 6.
The number of lines found depends not only on the nature of the image but also on the parameters of the search algorithm, which vary from all possible lines (Fig. 9c and f) to no-lines (Fig. 9d). The optimal values of the parameters given above were chosen to guarantee a significant number of lines on the ordered images (Fig. 9b) and a small number of short lines on the disordered images (Fig. 9e). For more information about the effect of the algorithm parameters on the construction of lines, see the ESI section 1.†
![]() | ||
Fig. 9 Illustration of the dependence of the construction of smooth lines on the parameters of the proposed approach for ordered and disordered images. |
The proposed algorithms were implemented in Python.
Based on the corresponding methods from the scikit-learn56 package, the following steps were implemented: calculations of the prevailing directions of nanoparticles local groups and of the corresponding reliabilities (principal component analysis – decomposition.PCA); training of a linear SVM classifier (svm.SVC: the core is linear, the regularization parameter is 10); evaluation of the classifier quality (cross-validation – model_selection.StratifiedKFold: the number of folds is 5).
Detection of nanoparticles was performed based on a parallel algorithm proposed in our previous work.57
The authors implemented algorithms were used for the formation of local groups of nanoparticles and the construction of smooth lines based on the proposed modification of the shortest unclosed path method.
Depending on the number of nanoparticles in the original SEM image, the operating time (excluding the detection stage) of the proposed implementation varies from a couple of seconds for ∼1000 nanoparticles to several dozens of minutes for ∼20000 nanoparticles. In the dataset under study, the most common number of nanoparticles in the SEM image corresponds to ∼5000, which is processed in a few minutes. The indicated time costs correspond to calculations on a personal computer with the following characteristics: processor – Intel® Core™ i7-9700k (3.6/4.9 GHz); RAM – 16 Gb (DDR4, 3866 MHz); SSD: 256 Gb, operating system – Windows 10 ×64. Parallel computing technologies were not used in this experiment.
![]() | (14) |
At the same time, most classifiers can balance the decision rule either toward increasing the number of correctly recognized positive class objects (ordered) or toward reducing the number of incorrectly classified negative class objects (disordered) using some hyperparameters. In this regard, such characteristics as Precision,
![]() | (15) |
![]() | (16) |
The F-measure (F) is a widely known measure that attempts to combine these two indicators and characterize the quality of the classifier with a single number.60 It is defined as the harmonic mean between Precision and Recall:
![]() | (17) |
The AUC61 is estimated as the area bounded by the ROC curve and the axis of false positive classifications (FPR). The ROC curve reflects the ratio of the sensitivity of the algorithm (TPR) and its specificity (FPR):
![]() | (18) |
![]() | (19) |
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4nr00952e |
This journal is © The Royal Society of Chemistry 2024 |