Kiran
Vaddi
*a,
Huat Thart
Chiang
a and
Lilo D.
Pozzo
*ab
aDepartment of Chemical Engineering, University of Washington, Seattle, WA, USA. E-mail: kiranvad@uw.edu; dpozzo@uw.edu
bDepartment of Material Science and Engineering, University of Washington, Seattle, WA, USA
First published on 8th June 2022
Synthesizing complex nanostructures and assemblies in experiments involves careful tuning of design factors to obtain a suitable set of reaction conditions. In this paper, we study the application of Bayesian optimization (BO) to achieve autonomous retrosynthesis of a specific nanoparticle or nano-assembly structure, shape, and size starting from a set of reagents selected a priori. We formulate the BO as a shape matching problem given target spectra as a structural proxy with a goal to minimize the shape discrepancy. The proposed framework is grounded in analyzing the spectra as belonging to function spaces and a Riemannian metric defined on them. The metric decomposes spectral similarity into amplitude and phase components. It provides a shape matching distance to optimize as opposed to purely intensity similarity obtained from the commonly used mean squared error (MSE). Applying the framework to experimental and simulated spectra, we demonstrate the advantage of shape matching over MSE and other generic functional distance measures.
Black-box optimization such as Bayesian Optimization (BO) is commonly used to optimize a black-box function using a surrogate model and a utility function that is used to guide sequential decisions about evaluation points. Application of BO to structure optimization of nanoparticles has recently been studied in problems that involve optimizing a characteristic response collected through experiments such as UV-Vis spectroscopy incorporated into high-throughput frameworks.3–5 Oftentimes, the characteristic responses collected as a spectrum (i.e., a signal over a discrete sample of a stimulus e.g.: wavelength) are not suited for direct usage in BO. Researchers have looked at defining score functions that return a scalar similarity between a query and target spectra (e.g.: Euclidean distance,4 Cosine distance,3etc.). However, the similarity functions are heavily based on expert knowledge about analyzing the signal and only consider differences on the intensity scale using vector-space distance measures. One way to overcome the limitations (or bottlenecks created from the need for expert knowledge) is to optimize the shape of the spectra i.e., match the query spectra shape to target spectra such that both the local and global features are optimized simultaneously.
Shape matching is advantageous over Euclidean distance in many characterization techniques of interest and provides a natural way to evaluate similarities based on semantic and scientific meaning. For example, in UV-Vis spectroscopy, the shape of the spectra, which is defined by the molar attenuation coefficient, generally gives information on the intrinsic properties of the nanoparticle such as the particle shape or size,6 while the intensity of the spectra gives information on extrinsic properties, such as the concentration. In addition, the intensity of the spectra can be influenced by the particle shape and size distribution, the dielectric properties of the solvent, and the surface chemistry of the nanoparticles, which may introduce further parameters for optimization.7 Because the objective of many optimization campaigns in inorganic nanoparticle retrosynthesis is to obtain particles of desired shapes or sizes, we hypothesize that having a similarity metric that primarily accounts for the structure would be advantageous in the optimization. Given that complex spectra provide information about phenomena occurring over multiple length scales, spectral shape matching provides a viable option for simultaneously matching the relevant patterns of multi-scale phenomena without over-emphasizing variations on the y-scale.
In this paper, we study BO in combination with shape matching on function spaces and the underlying Riemannian manifold structure. We provide a generic framework that considers spectral data as points on function spaces and use differential geometry-based approaches to define similarity measures. These similarity measures are then compared with other, more commonly used approaches, such as Euclidean distance, or peak positions of UV-Vis spectra. The rest of the paper is arranged as follows: we first introduce and review various mathematical frameworks such as Bayesian optimization, function spaces, and Riemannian metrics used in this paper before applying them to a case study of simulated Gaussian spectra and a high-throughput gold nanorod synthesis experiment. We then conclude and provide directions for future usage and applications of interest.
(1) |
As an example, expected improvement10 computes utility using eqn (2) measuring the improvement given a threshold α as:
(2) |
(3) |
(4) |
The shortest path on any given manifold between two points is called a geodesic and the corresponding path length as geodesic distance . A generic form of geodesic distance is thus an optimization problem itself (eqn (5)) but for particular manifolds and choice of a metric, there are closed-form expressions.
(5) |
Many choices for the inner product exist with the most commonly used -inner product given by eqn (6) and the resulting function space is called a -space. For any two functions we first map its domain to an unit interval [0, 1] and compute inner product using eqn (6).
(6) |
The tangent space is the entire -space thus allows us to define geodesics in a closed form given by a straight line τf1 + (1 − τ)f2 and the geodesic distance is a simple vector norm:
(7) |
Note that the distance in eqn (7) is different from the commonly used mean-squared error (MSE) between two functions as it involves the integration of functions over the domain. More importantly, MSE is simply a similarity measure between the intensities (or y-scale of one-dimensional function) while eqn (7) provides a geodesic distance between the functions. For more details on the differential geometry of functions, readers are referred to ref. 12 and 13.
(8) |
Fig. 2 Example functions from eqn (8). The functions drawn in dotted and solid black lines are at an equal distance from the function drawn in solid-blue line when compared using an MSE distance and the metric but not through the amplitude–phase distance. Similarly, the dash-dot black curve is rightly classified as further than the solid black curve from the solid blue curves using amplitude–phase distance but not others Table 1. |
When measured using an MSE similarity both the solid and dotted black curves in Fig. 2 are equidistant at 3.34 units from the solid blue curve. This is expected as the MSE only considers the variation along the y-scale thus unable to distinguish between the two functions that are distinct only along the x-axis (i.e. f2 can be obtained by linearly shifting the f1 along the x-axis) where its y-scale variation w.r.to f0 is invariant. This behavior also occurs with the metric (equidistant at 0.34 units) as the value of f0 is zero where the variation of x-axis between f1 and f2 occur. To account for this, we consider another Riemannian metric defined on function spaces by first decoupling the total variation/distance between functions as amplitude (i.e. variation along the y-axis) and phase (i.e. variation along the x-axis).
The amplitude and phase variations are defined by considering a warping function γ of the domain that continuously maps any given function f1 to f2 without changing the relative amplitudes. More precisely, γ: [0, 1] ↦ [0, 1] is a boundary preserving diffeomorphism (i.e. a smooth function with an inverse) with γ(0) = 0 and γ(1) = 1. The set of warping functions denoted as Γ forms a mathematical structure called the group (i.e. a set with an inverse and identify properties under an associated action) and it can be understood by how it acts on functions f. For example, if we define the group action of a γ ∈ Γ as simply warping the domain using (f, γ) = f◦γ, we can compute γ that optimally warps f1 to f2 by solving . The norm, however, is known to suffer from pinching effect among others thus a square-root slope function (SRSF) metric is used instead,13,14 details of which are out of the scope of this paper. SRSF transforms a function f using:
(9) |
The resulting function representation q is again a differential manifold and . We can once again use the differential geometry of the -space to compute the inner product as described below. If v is a tangent vector in the tangent space , then its corresponding tangent vector using SRSF is given by . The required inner product for is defined as described in Section 2.3:
(10) |
The required warping function γ is now computed by solving for where the group action is . The warping function γ allows us to define: (i) the amplitude – that doesn't change with the action of γ, (ii) the phase – that only changes with the action of γ. We can now use γ to decompose the function space into the amplitude space and the phase space and assign separate metrics to compute relevant distances. The amplitude space will comprise of “orbits” as functions that can be obtained interchangeably by warping their domain alone.** These orbits are not vector spaces thus we need to define a notion of distance. For any given pair of SRSF's q1, q2 in the amplitude space, we define their distance using the orbits as:
(11) |
Intuitively, the amplitude distance measures the minimum distance between two functions after alignment. The phase space of the functions is defined by the set of warping functions Γ. Phase distance between two functions f1, f2 is equivalent to function distance between the corresponding warping function γ and the identity warping function γ(t) = t, t ∈ [0, 1]. Warping functions attain a well-known spherical geometry (the infinite dimensional Hilbert spheres i.e. points with unit norm in infinite dimensions required to fully represent a function space) upon representing them using the SRSF transformation i.e. (since (t) > 0). This is because,
(12) |
Since qγ uniquely maps a given γ, we can conclude that with the geodesics given by the great circles . Thus the required phase distance between the functions f1, f2 is given by arc length of the great circle:
(13) |
The total (un-weighted) distance between functions f1, f2 can now be computed using:
dap(f1,f2) = da(f1,f2) + dp(f1,f2) | (14) |
Using the distance in eqn (14), we can now successfully differentiate between the solid and dotted black curves in Fig. 2 where both curves have zero amplitude distance to the reference blue curve but are at a phase distance of 0.55 and 0.62 respectively. We refer to the distance function in eqn (14) as the amplitude–phase distance to differentiate it from the SRSF distance function obtained using eqn (5) under the SRSF inner product in eqn (10) (i.e. dSRSF = ‖f1 − f2‖SRSF). The SRSF distance function suffers from a similar problem as that of the distance and considers both the solid and dotted black curves to be equidistant at 1.78 units from the solid blue curve. The effectiveness of shape matching can be inferred from the distances each metric assigns to f3 relative to f1. We observe that functions f1, f3 are a very similar to each other as their peaks are perfectly aligned (at λ = −2) but f1 is similar to f0 in shape than f3. While all the metrics successfully differentiate f1, f3 to be distinct (see Table 1), the amplitude–phase distance measures the distance to f3 as 4% longer than f1 which is in agreement with our intuition while other metrics consider it to be ≈10% shorter. The amplitude–phase distance, therefore, is a suitable measure for shape matching. Note that, alternatively, we can define total shape matching distance to be a weighted combination of da and dp to better capture the retrosynthesis target. For example, if we want to prioritize matching nanoparticle size than concentration as our retrosynthesis target, we can weigh phase distance more to better match the peak position.
Measures of similarity such as MSE, do not perform well in high-dimensional data sets such as the ones we are interested in (i.e., spectra or scattering profiles), and quantifying the amount of similarity between two functions using them may not be meaningful. One reason for this is the curse of dimensionality,15 where trends in data become counter-intuitive as the number of dimensions increases. One aspect of the curse of dimensionality is that in higher-dimensional vector spaces, almost all the points are equidistant thus the notion of similarity is not well defined. Some of the recent works in machine learning and deep learning address this problem by formulating learning as a geometric problem defined by symmetry groups and differential manifolds.16 Our current approach for functions belongs to a similar category where we exploit the underlying symmetries (i.e. invariance of inner-product under domain warping encoded via group structure on the warping functions) along with their underlying geometry (given by infinite-dimensional Hilbert sphere).
Each optimization had a batch size of 4 samples and the iterative process continued until a total of 7 batches had been synthesized. The concentrations of CTAB, gold(III) chloride trihydrate, and gold seeds were kept constant and equal to those that were used to synthesize the target sample (see the column target concentration in Table 2). The search space for the autonomous retrosynthesis is then defined as the two-dimensional reaction space of the concentrations of silver nitrate ([AgNO3]) and ascorbic acid ([AA]). An OT2 liquid handling robot was used to autonomously synthesize the samples using an in-house developed control software OT2-DOE,†† and a Biotek plate reader was used to characterize the samples using UV-Vis spectroscopy with wavelengths of 400–900 nm in increments of 5 nm. The samples were made in 96-well polystyrene microplates, which were heated to around 30 °C during the synthesis using a hot plate. After the synthesis, the samples were kept at the same temperature for 50 minutes, so that the nanoparticles could fully grow, before being characterized by UV-Vis spectroscopy. All the retrosynthesis campaigns had identical initial conditions (i.e., the first batch had the same concentrations and measured spectra).
Reagent | Stock solution concentration (M) | Target concentration (M) | Concentration range (M) |
---|---|---|---|
CTAB | 2.0 × 10−1 | 6.40 × 10−2 | 6.40 × 10−2 |
Gold(III) chloride trihydrate | 1.0 × 10−3 | 1.96 × 10−4 | 1.96 × 10−4 |
Silver nitrate | 6.4 × 10−4 | 6.20 × 10−5 | 0 to 7.38 × 10−5 |
Ascorbic acid | 6.3 × 10−3 | 3.60 × 10−4 | 0 to 7.27 × 10−4 |
Gold seeds | 1.8 × 10−5 | 1.44 × 10−6 | 1.44 × 10−6 |
Results from the retrosynthesis of gold nanorods using the amplitude–phase distance are shown in Fig. 5. Each panel in Fig. 5 represents the surrogate at a particular stage of the optimization (annotated by the iteration) along with the data the model has ‘seen’ or been trained on. The surrogate is plotted as continuous contours using the colorbar shown on the far right in Fig. 5. We obtain a best composition estimate (i.e. location in the design space whose spectra best matches the target spectra ut shown in aqua colored star in Fig. 5) in the design space by querying for the maximum of surrogate .
Fig. 5 Optimization trace for a gold nanorod target using the amplitude–phase distance. Each panel shows the surrogate model as a contour plot, data points collected/queried from the experiment in circles, the current best estimate using an aqua-colored star, and the retrosynthesis target using a green-colored star. The x-axis of each plot represents the concentration of silver nitrate (M × 10−5) and the y-axis represents the concentration of ascorbic acid (M × 10−4). All the compositions are annotated with the respective spectra obtained from the experiment. We observe gradual changes to the surrogate approximation with an increase in data collected and the optimization mainly focuses on improving the region with a lot of ‘target-like’ spectra. As argued in the text, the surrogate obtained around iteration 6 and 7 appear closer to an underlying phase diagram of nano-structural geometry obtained from a coarse grid sampling of the design space shown in Fig. 7. |
In Fig. 6 we visualize the optimization campaign for a gold nanorod target structure using a Euclidean distance for spectral similarity similar to Fig. 5.
Fig. 6 Optimization trace for a gold nanorod target using a Euclidean distance similar to Fig. 5. As argued in the text, the surrogate obtained from the optimization is not reflective of the underlying phase diagram in Fig. 7 although the target approximation is relatively similar to that of the amplitude–phase distance. |
As can be seen from Fig. 5 and 6, both the retrosynthesis campaigns result in a fairly similar approximation to the target spectra but do so with distinctly different surrogate models at the end of respective campaigns (see Fig. S2 and S3 in ESI† for other metrics). To understand which surrogate model better captures the true shape-based phase diagram of nano-structural geometries, we performed a coarse grid sampling of the two-dimensional design space shown in Fig. 7. We observe three broad classes of nanostructures in the design space : (a) nanorods – spanning the upper right corner of in orange; (b) nanospheres – spanning left-most part of in blue and (c) space with no nanostructures at the bottom in red. Based on the classes we observe in Fig. 7, we hypothesize that the underlying phase diagram would be a function (assuming continuous, mapping concentrations to the type of nano-structure classes mentioned above) with nearly flat regions representing the three classes. The surrogate model learned during the optimization campaign should at least identify the critical points/regions of the phase diagram function in order to provide trustworthy approximations of the retro-synthesized target. Based on our observations in Fig. 7, for a nanorod target, we note that the surrogate obtained from amplitude–phase distance is a better representation of the underlying phase diagram as it clearly identifies that the top right corner of to contain nanorods with minimal changes to similarity w.r.to target spectra. In contrast, the surrogate from Euclidean distance has a sharp peak near the best estimate followed by a sharp decrease in the space comprising only nanorods effectively only capturing similarity closer to the target not anywhere else. This does not capture the underlying phase diagram structure in terms of flat regions and the nature of function transitions, but it may indicate that the Euclidean distance metric is suitable when differentiating between structures of the same class (e.g., nanorods). We also observe that BO with Euclidean distance metric prioritizes exploitation, as seen by a high number of samples near the target in iteration 7 of Fig. 6, while the one using amplitude–phase prioritizes exploration, as seen by the more dispersed samples in iteration 7 of Fig. 5. Moreover, the Euclidean distance surrogate is highly dependent on samples collected during the exploration phase being close to the target as the true function approximation has a sharp peak that needs to be modeled by the surrogate for the optimization to find the true global maximum.
Fig. 7 Spectra obtained from a coarse grid sampling of the two-dimensional design space in Table 2. Observe that the space is continuous in terms of nano-structural geometries with three broad classes: no nano-structures (red), nanospheres (blue), and nanorods (green). Retrosynthesis target spectrum location is highlighted with a black cross mark. |
Footnotes |
† Electronic supplementary information (ESI) available. See https://doi.org/10.1039/d2dd00025c |
‡ ReLU is a rectified linear unit non-linearity. |
§ This notion is more generic than the commonly known low-dimensional manifold concept in the (applied)machine learning community. |
¶ Because given , where . |
|| Gaussian's are selected both for its simplicity and relevance to poly-dispersity related effects on spectral characterizations of nanoparticles. |
** Orbits w.r.to the warping function group and its associated action. |
†† https://github.com/pozzo-research-group/OT2-DOE/tree/Shape_Matching_Paper. |
This journal is © The Royal Society of Chemistry 2022 |