Benjamin R.
Smith
ab,
Bharat
Pant
c,
Yongtao
Liu
b,
Yu-Chen
Liu
d,
Jan-Chi
Yang
d,
Stephen
Jesse
b,
Anahita
Khojandi
e,
Sergei V.
Kalinin
f,
Ye
Cao
c and
Rama K.
Vasudevan
*b
aBredesen Center for Interdisciplinary Research, University of Tennessee, Knoxville, TN 37996, USA
bCenter for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA. E-mail: vasudevanrk@ornl.gov
cDepartment of Materials Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA
dDepartment of Physics, National Cheng Kung University, Tainan 70101, Taiwan
eDepartment of Industrial and Systems Engineering, University of Tennessee, Knoxville, TN 37996, USA
fDepartment of Materials Science and Engineering, University of Tennessee, Knoxville, TN 37996, USA
First published on 7th February 2024
Understanding the dynamics of domain walls in ferroelectrics is critical both for fundamental reasons of studying interfacial dynamics in disordered media, as well as practical engineering of metastable states with enhanced properties. Piezo response force microscopy (PFM) enables both imaging and writing of ferroelectric domain walls via a biased scanning probe. However, control over positioning of individual domain wall segments to engineer domain wall structures over large areas reproducibly, and particularly, quantification of associated mechanisms remains challenging. Here, we present a reinforcement learning based experimental workflow deployed on an autonomous PFM platform that enables automated data collection of domain walls interacting with pinning sites. The autonomous experiment is used to construct a physics-informed surrogate model of local domain wall response in response to applied electric fields by the PFM tip in prototypical (110) PbTiO3 thin films, and the results are further verified using phase-field simulations. The surrogate enables generation of ‘phase diagrams’ of the domain wall, conditional on initial structure. Subsequently, reinforcement learning is used to optimize tip-modification trajectories for obtaining desired domain wall structures in simulated environments utilizing the surrogate model for the environment dynamics. This workflow shows how automated data collection and autonomous agents can be orchestrated towards realizing domain wall manipulations with precision in scanning probe studies, and how such surrogates can aid in understanding domain wall interactions in ferroelectrics.
The engineering of domain walls and topological defects with the scanning probe microscope (SPM) tip has been ongoing for more than two decades, with early work exploring simple domain poling and observations of events such as the pinning of written ferroelectric domain walls by interfaces,21,22 as well as the occurrence of faceting23 and aging effects.24 The ability to pattern more in-plane polarization orientations was opened up by several studies that showed how the tip-induced electric field's radial symmetry could be broken. This was performed either by motion, as demonstrated for instance by Balke et al.18 to write more complex topological patterns with the SPM tip, or by pre-existing defects, e.g. by Vasudevan et al.,25 who used pulsing at domain walls to produce novel topological defect states. Device-level methods were also discovered, including strategies to “inject” domain walls into ferroelectric devices for memory applications which were explored by Whyte et al.,19 while McGilly et al. utilized resistive Pt electrodes to enable writing of domain walls into capacitor structures,26 with the goal of forming circuitry. Many more studies through the years have explored other methods of tip-based writing of ferroelectric domain walls and more complex domain configurations.27–31
Typically, when domain walls are written electrically with the SPM tip, the tip follows a pre-determined trajectory that is either implicit or explicitly defined, with voltage values that are set once and not amenable to feedback during the writing process. However, this procedure requires human interventions to determine the correct locations to position the tip, and the right values of the excitation pulse to facilitate the desired domain structure modification. This comprises a sequential decision-making task that is conditional on the pre-existing domain structure, the existence of defects, and the current state of the SPM tip, along with the type of domain wall itself, all of which can change spatially, temporally, or both.
It should be noted that the response of domain walls to applied electric potentials has been well studied not only experimentally, but also through a variety of different theoretical means. Numerous works have explored the problem in both pristine32 and defect settings,4,33 for example analytically with Landau theory, or through numerical solutions via phase-field methods34 and even reactive force fields,35 exploring everything from the impact of pre-existing domain walls on switching,36 switching in the vicinity of grain boundaries37 and individual defects,38 to ferroelastic wall deformations39,40 and switching of hierarchical domain patterns.41 However, to date, the models used are generally compared semi-quantitatively (at best) with experimental observations in PFM, largely due to unknowns in tip condition, presence of defects, etc. To facilitate appropriate predictions to guide decision-making for autonomous domain wall engineering in ferroelectrics via SPM, what is needed is a reliable model that can predict changes to domain wall structure for arbitrary applications of applied voltages. Such a model could then be used within existing machine learning algorithms for sequential planning and decision-making tasks, such as reinforcement learning (RL), to create agents that can autonomously create and measure novel domain wall patterns in ferroelectrics. In the process, the surrogate model generated can be interrogated for yielding insights into the dynamics of domain walls within the sample. Note that training of such a model requires significant data that can only realistically be captured in automated settings, thus requiring the use of automated experiments on SPM platforms.42 Moreover, given that this data will require capturing of domain wall perturbations after pulsing at domain walls, it requires us to use computer vision methods to reliably locate domain walls in an automated manner, i.e., the automated setup also requires some basic computer vision or machine learning to capture the necessary dataset.
Here, we present such a workflow towards achieving autonomous domain wall manipulations, showcasing the ability to automatically manipulate domain walls in a prototypical (110) PbTiO3 thin film, develop a physics-informed dynamics model from the ensuing data, and then use this model to both better understand wall displacements in the sample, as well as train RL agents to develop simulated strategies for wall manipulation to achieve desired outcomes. We verify our physics-informed model with phase-field simulations, and find qualitative agreement on wall displacement profiles, and further note the ability to construct functional ‘phase diagrams’ of wall structure as a function of input actions.
We first performed structural and ferroelectric characterization of the thin film. X-ray diffraction confirms the growth of (110) PbTiO3 on SrTiO3 (see ESI S1†) with no secondary phases detected. Next, to confirm the polarization orientations we performed lateral band-excitation piezo response force microscopy measurements (BE-PFM), by poling a region with +8 V applied to the tip to pole one rectangle and −10 V to pole another rectangle immediately below, creating two domains, and then imaging after rotation by 45° and 90° counterclockwise, with the results shown in Fig. 1(a–c) for all three rotation angles. For this (110) film, it is convenient to first transform the coordinate system as shown in the inset in Fig. 1(d), where the principal directions of [001], [−110] and [−1−10] are (in the new coordinate system) [100]T, [010]T and [001]T respectively with the subscript denoting the transformed coordinates.
The initial lateral scan (Fig. 1(a)) appears to show lateral contrast in both domains, but one of the domains (the one poled by +8 V) displays significant charging effects, precluding easy identification. We note in particular that the cross-coupling between the torsional and vertical cantilever modes can make accurate polarization assignment challenging.43 On rotation of the sample 45° counter-clockwise, the amplitudes of both the domains appear equal – strongly implying the existence of two lateral polarization orientations, but with opposite phases (directions). Therefore, we tentatively assign the polarization orientations as per the red and black arrows in Fig. 1(b). Rotating further in the counter-clockwise direction results in the arrow denoted by the red domain to gain contrast, whereas the arrow denoted by the black arrow reduces amplitude. Based on this data we conclude the in-plane orientations are as shown in Fig. 1(a)–(c) by the red and black arrows. We again suggest the reason for the discrepancy in the high lateral amplitude in Fig. 1(a) in the domain poled with positive bias to be related to the charging and cross-coupling effects (and possibly also shear effects).
Since the two polarization orientations are at 90° to each other, the two variants that are responsible are (in the new coordinate system) [−100]T and [011]T, noting that the second variant has a vertical component. This is also consistent with the phase-field simulations (discussed later, shown in Fig. 1(d)) of the initial film structure. The ferroelectricity is further confirmed by a band-excitation switching spectroscopy measurement, shown in Fig. 1(e), which shows that the film is capable of being switched locally with about −3 V of bias at a single location. Therefore, in this film, creating domain walls involves the creation of these 90° ferroelastic domain walls. We note additionally that needle domains are present in the sample.
Our overall experimental plan is shown in Fig. 2. We begin by imaging a virgin region and poling a domain wall by applying alternately −8 V and +8 V to the AFM tip while scanning, on the left and right side. The sample is then imaged again, either with single frequency or band-excitation PFM (always at 1 V AC), and then the position of the written domain wall in the image is extracted from the PFM phase image. This generates the ‘initial’ state of the system which consists of a single written domain wall through the middle of image. As explained above, this generates a ferroelastic domain wall. The tip is then moved to a random location along this domain wall, and then a bias pulse is applied. The bias parameters are chosen to be uniformly distributed in the interval [−10, +10] V for the voltage amplitude, and [50, 500] ms for the pulse width. The sample is then imaged again with PFM. This sequence (image → action → image) is repeated a set number of times until the wall is reset, i.e., until the wall is rewritten in the original configuration. Note that in our dataset, we collected one dataset where the reset frequency was 10, i.e., every ten actions, the wall would be rewritten, as well as a second dataset where there was no resetting applied. These transitions are stored and then used subsequently to train a dynamics model, which is used as the basis for a RL environment that enables agents to learn how to modify domain walls in the system towards desired morphologies.
This experimental procedure is coded with python into a Jupyter notebook, which communicates to an FPGA device (Fig. 2(b)) that then executes the workflow steps on the microscope. Details of the FPGA device operation are provided elsewhere,44 but essentially, this acts as an alternate controller for the microscope enabling customized scan paths and excitations to be applied and controlled via python scripts. An example of the initial image, the bias pulse location and parameters, and the result of the bias pulse, are shown in Fig. 2(c). It is observed that applying voltages above a certain threshold produces wall displacements, as expected. The full dataset and a video showing all captured transitions is provided in the ESI S2.† For simplicity, we probe only the vertical PFM signal. Note also that numerous ‘needle’ domains can be seen in the vicinity of the written wall. Some of the data was captured using single frequency PFM, whereas other data were captured with band-excitation PFM measurements, utilizing 1 VAC excitations in both cases. For the BE-PFM measurements, a frequency band of 333 kHz to 413 kHz was utilized.
Additionally, we add an additional ‘global’ physics loss which is based on the expectation that the magnitude of displacement of the surrogate model should be monotonic for both the bias voltage and pulse width values. It should be noted that the increased displacement with increased voltage is consistently observed in the raw data. However, the same was not true for pulse width, potentially due to changes in tip condition or other exogeneous variables. Moreover, we also note that the application of the voltage could cause changes to the underlying defect structure, such as by injecting or redistributing oxygen vacancies or other mobile ions,46 or creating other types of defects (for example, see work by Evans et al.47). We do not rule out this possibility, but we can control the degree to which we enforce this inductive bias by adjusting the strength of this term in the final loss function if desired.
To account for this global physics loss, we implement an additional term into the loss function that acts to promote the monotonicity of the dynamics model with respect to pulse width. During training, for each transition, we sample across the action space of pulse widths to make additional predictions. We then compare the predicted area displaced for each pulse width and add a penalty when the area does not either remain constant or increase for longer pulse widths i.e., when monotonicity is not observed. This loss function acts as a regularization term to help ensure that known physical trends are implemented into the dynamics (surrogate) model. In total, the loss function can be written as
The surrogate model is trained with the Adam optimizer for 3000 epochs, and we use an 80/20 training/test split with random splitting of the data. Results after training of the predictions of the model along with real wall displacements are shown in ESI S4.† On the test data, the mean absolute error of the predictions is ∼13.2 nm.
However, this diagram clearly still shows some seemingly unphysical behaviors, in particular the dark diagonal that cuts to the top-right of the diagram. One of the reasons for this may be that because we do not consider the surrounding domain structure, and there is limited data, then if a few lower voltage pulses were applied at a domain wall situated next to a strong pinning site (for instance, one of the needle domains), then there would be limited to no motion. The model would fit to these instances and lead to this type of seemingly unphysical result. We attempted to counter this via the addition of physics-based loss regularization, but this is still a ‘soft’ regularization and thus these features could not be eliminated entirely. Alternatively, as mentioned earlier, it is possible that certain defects, e.g. oxygen vacancies, could be injected48 or moved by the application of bias pulses. Such a circumstance would lead to anomalous features on the calculated diagrams based on threshold fields required to initiate such electrochemical processes. We cannot entirely rule out this possibility.
At the same time, the dynamics model allows us to explore the reaction of the domain wall in arbitrary configurations, i.e., to investigate the state dependence. As a simple example, consider the circumstance when the domain wall is already bulged: one would expect a very different response to repeated pulsing in this state, given that the wall configuration is likely already very energetically unfavorable given it will prefer to be flat and reduce elastic energy. This circumstance is modeled in the phase-field simulations in Fig. 3(b) and indicates that when starting with a positive displacement, increasing the positive voltage still does increase the wall displacement in the positive direction, but critically, applying negative bias to this situation very quickly returns the wall to a flat position, even for very low voltages, as seen in Fig. 3(b). That is, there is an asymmetry where returning to a flat profile is more favorable and occurs for lower voltages. This is also seen in the data-driven dynamics model, in Fig. 4(c), and particularly in the ‘phase diagram’ in Fig. 4(d) with brighter colors on the left side of the diagram indicating more propensity to quickly shift the wall to the negative direction, and darker colors on the right indicating resistance to further bulging the wall in the positive direction. These insights suggest that this method can be effectively used to explore the dependence of domain wall dynamics on the existing domain wall configuration, providing another method for domain wall investigations via PFM and automated experiments.
Fig. 5 Switching dynamics. (a) Switched areas as a function of voltage and pulse width, for an initial flat wall (a) and a positively bulged wall (b). (c) Velocities of the domain wall calculated for different voltages and 200 ms pulse width (blue) with linear fits in this log vs. 1/E plot shown as a blue dashed line. The linear fit indicates a creep regime. Compared to data by [1] Tybell et al.49 on PZT films in a different geometry, the calculated slope is significantly lower. |
To obtain more insight, we calculated the elastic energy density of the domain wall in the different configurations and find there is considerably larger elastic energy at these ferroelastic walls than a typical 180° domain wall in e.g., (001) PZT thin films. We find the energy associated with the wall is about 8.2 × 106 J m−3. For comparison, previous calculations in (001) PZT thin films show that the energy density of a ferroelastic wall in that system is about 2.0 × 106 J m−3, and for a ferroelectric wall, it is about 1 × 106 J m−3.49
The elastic energy densities for the different configurations as a function of bias and time are shown in Fig. 3(c, f and i). When a bias is applied, the elastic energy density as a function of time after the bias is turned on varies in a complex manner depending on the initial state of the wall (straight or bulged) and whether positive or negative polarity is applied. Interestingly, the elastic energy density reduces, for ‘bulging’ when positive bias is applied as shown in Fig. 3(c). When a positive bias is applied to an already bulged wall, the overall elastic energy density does not change significantly, when looking at longer time frames (Fig. 3(f)). Conversely, applying negative potential appears to change the elastic energy density more so. However, the change in elastic energy density for small voltages (e.g., −0.65 V), which is sufficient to erase the ‘bulge’, is negligeable. Given that the elastic energy density does not appear to be significantly greater in the bulged state, we believe that the asymmetry that we observe in the surrogate model can be best explained via a straightforward surface energy argument: since the domain walls are obviously more energetically costly than the surrounding domains, eliminating the bulge will be favorable in most circumstances.
Next, we utilized the surrogate model to obtain estimates for the domain wall velocity in this system, similar to the seminal work by Tybell et al.50 We used the model to predict the domain wall velocities for different voltages assuming a pulse width of 200 ms. For this we assume that the electric field E = V/d, where we assume a value of d = 20 nm. The real electric field is likely to be quite complex in such structures,51 but this estimate serves as a reasonable upper bound. We thus computed the domain wall velocities extracted under this approximation and plot them against those of Tybell et al. for PZT domain growth. Accordingly, we plot the log of the velocity against 1/E in Fig. 5(c). The data fits well to a linear slope, i.e., evidence of a creep regime, however it is very evident that the slope is significantly less than those of ref. 49. Note that the two scenarios are not directly comparable, since in the case of the PZT films the experiments by Tybell et al. were performed with nucleation and growth of domains directly underneath the tip, whereas here we are dealing with extension or contraction of pre-existing domain walls in a different orientation. The slope of the velocity is ∼6.5 times lower than that of (001) PZT films. Although the direct comparison cannot be made, we can conclude that (i) the wall appears to be governed by creep dynamics, and (ii) the mobility is significantly reduced compared to 180° walls in PZT, by two orders of magnitude. Further investigations are required to better understand the nature of the pinning potential in this sample.
To learn policies to control the domain wall structure, we employed the deep deterministic policy gradient (DDPG) algorithm proposed by Lillicrap et al.,52 and implemented within d3rlpy.53 To show the potential use of RL, we simplify the scenario by reducing the action space to just one action – the pulse width is fixed, and the agent is able to modify only the voltage applied. The position where the pulse is applied is varied linearly in a pre-selected manner, and the agent is able to apply 10 pulses to achieve a domain wall structure close to a pre-determined target structure. The RL policy used is a simple multi-layer perceptron with two layers each with 256 units. The reward given to the agent is the negative of the mean absolute error between the domain wall structure and the targeted structure. As a function of training, the RL agent learns to manipulate the wall structure closer to the target, as shown in the learning curve in Fig. 6(a).
The environment begins with a relatively flat wall and using an RL policy, actions are taken to move the wall towards the target wall structure, indicated by a red dashed line in Fig. 6(b). An example of the resulting domain wall structure after the trained agent has performed actions are shown in Fig. 6(b) (blue line). We observe the agent learns to take actions in regions where the ideal wall is further away from the original wall (i.e., where rewards can increase most significantly). It should again be noted that these are only performed in the simulated environment, not the real experiment. However, training this policy only takes ∼1 hour on a laptop, and even with increasing the action dimensions, it appears possible that it can in future be deployed on operational AFMs.
There still exist challenges to fully integrate RL automation for domain wall alteration using PFM. Currently, the dynamics model only predicts the 1-D structure of the domain wall. While we can encode information for the areas surrounding the domain wall from PFM images when making predictions, the model itself would have to predict the entire 2-D image for more accurate policies, given the dependence of the actions on the local domain environment. This type of prediction is not feasible due to the limited data currently available. There is a substantial impact of the surrounding domain structure on the underlying dynamics at that position. Given this is a heterogeneous sample, there are many possible domain configurations in the vicinity of a domain wall, and the additional complicating factor of the wall structure itself (for example, bulged or not) will further impact how it responds to electric fields (for instance, see Aravind et al.32). This would require, at minimum, tens of thousands of transitions to be acquired to adequately sample this configurational space, which can only realistically be done through high-throughput scanning methods. This may be possible with methods such as fast scanning via compressive sensing approaches.55
Interestingly, our data was captured on two different instruments (albeit of the same make and model) with different tips on different days, but this did not seem to introduce a level of distribution shift that severely impacted the learning process of the dynamics model. Should this be a problem, one strategy to counter it may be to learn a simple linear model that maps the applied potentials in previous data to applied potentials in the new data that minimizes the discrepancy between the predictions. This is under the assumption that the major tip change will be to affect the applied potential, but not the shape of the domains themselves.
Moreover, ideally one would use the phase-field simulations directly as the physics regularizer, using methods such as structured Gaussian processes, rather than incorporating additional loss terms that are ad hoc defined and may not always be suitable depending on the circumstance. At the same time, the dynamics model learned provides significant insight into the dynamics of domain walls in a state-dependent way that is difficult to recover from more traditional PFM spectroscopic methods, and as such can be convenient for investigating ‘domain wall phase diagrams’. Finally, this approach could be used to optimize for specific properties of the material rather than the specific structure of the domain wall itself, i.e., to solve the inverse design problem of maximizing e.g., photoconductance of domain walls by trialing different wall configurations that maximize the photoconductance reward given to the agent, as opposed to rewarding the generation of a specific structure per se.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3dd00126a |
This journal is © The Royal Society of Chemistry 2024 |