Maximilian Robert
Bailey
*,
Fabio
Grillo
and
Lucio
Isa
*
Laboratory for Soft Materials and Interfaces, Department of Materials, ETH Zürich, Vladimir-Prelog-Weg 5, 8093 Zürich, Switzerland. E-mail: maximilian.bailey@mat.ethz.ch; lucio.isa@mat.ethz.ch
First published on 5th September 2022
Advancements in artificial active matter systems heavily rely on our ability to characterise their motion. Yet, the most widely used tool to analyse the latter is standard wide-field microscopy, which is largely limited to the study of two-dimensional motion. In contrast, real-world applications often require the navigation of complex three-dimensional environments. Here, we present a Machine Learning (ML) approach to track Janus microswimmers in three dimensions, using Z-stacks as labelled training data. We demonstrate several examples of ML algorithms using freely available and well-documented software, and find that an ensemble Decision Tree-based model (Extremely Randomised Decision Trees) performs the best at tracking the particles over a volume spanning more than 40 μm. With this model, we are able to localise Janus particles with a significant optical asymmetry from standard wide-field microscopy images, bypassing the need for specialised equipment and expertise such as that required for digital holographic microscopy. We expect that ML algorithms will become increasingly prevalent by necessity in the study of active matter systems, and encourage experimentalists to take advantage of this powerful tool to address the various challenges within the field.
Perhaps the most popular model system to study the dynamics of active matter are Janus microswimmers,4 which are also arguably the simplest class of synthetic active agents. These (typically micron-sized) particles possess patches with different chemical or physical properties (e.g. catalytic activity) that can generate asymmetric gradients around the particle under certain experimental conditions, leading to motion by self-phoresis.5–8 The dynamics of these Janus microswimmers are typically confined to 2D because of their density and interactions with the substrate, whether they move by self-diffusiophoresis9 or self-dielectrophoresis.10 However, from an applications perspective, the ability to navigate in 3D is highly desirable.11 As such, there is a growing interest in the synthesis of microswimmers that display 3D motion and in developing greater understanding of their swimming behaviour.12–16
Unfortunately, tracking the motion of microswimmers in the third dimension presents new experimental challenges that must be overcome. Conventional light-microscopy techniques are ill-suited to track the diffusion of micron-sized objects in three dimensions, especially when the particles exhibit enhanced mobility. A widespread approach to studying (fluorescent) colloids in 3D is confocal microscopy.17 This involves scanning the volume of interest via a series of Z-stacks, using a pinhole to exclude signal outside the imaged Z-slice. The resultant intensity distributions of the fluorescent particles are then used to obtain a 3D reconstruction from which the particle centres can be identified. This methodology, although accurate, is limited in temporal resolution by the time taken between subsequent Z stacks.18 In the case of active Janus colloids, which can move several body-lengths per second, this leads to a smearing of the traced particle and thus difficulties in accurate centre finding. Furthermore, the surface patches of Janus microswimmers, typically realized with thin metal films, leads to an asymmetry in its optical properties, hindering the proper reconstruction of the intensity distribution and thus jeopardising the tracking.
To overcome these issues, Campbell et al. used the method previously proposed by Spiedel et al. for passive particles to track the vertical position of fluorescent gravitactic Janus microswimmers from 2D microscopy images using the outermost radius of the concentric rings of the fluorescent particles as they swim out of focus.19,20 Unlike confocal microscopy, this method does not require a rigorous model and knowledge of the optical properties of the microscope. A single Janus particle was fixated in gellan gum, from which the “bright ring radius” of the fluorescent particle was extracted at different heights, to which a calibration curve was fitted using a cubic polynomial. The heights of the Janus microswimmers were then predicted from the evolution of the bright ring radius of the fluorescent Janus microswimmers taken from a single 2D plane. The described method effectively extracted 3D trajectories from 2D videos using a fluorescent microscope and a basic piezo Z-stage, allowing standard microscopy image acquisition conditions. However, it is noteworthy that the radius of the bright rings can be on the order of hundreds of pixels, and as the rings cannot overlap, this limits 3D tracking to very dilute suspensions. Zhang et al. proposed a similar approach to track thermally-diffusing non-fluorescent particles by radially projecting the diffraction pattern of a particle and comparing it to a reference model obtained with a Z-stack.21 Least-squares fitting of the radial profile is then used to determine the Z-height, which minimises the difference between the radial projection of the imaged particle and the reference model. This method is highly sensitive to asymmetries in the particle, and therefore is not viable for a freely-rotating Janus particle. Likewise, the detailed methodology presented by Kovari et al., solving for the vertical position by minimising the difference between the interpolated, continuous look up table (LUT) of radial profiles taken with Z-stacks and the radial profile of the imaged particle, relies on the Mie scattering of symmetric particles,22 and therefore also cannot be applied effectively to microswimmers with significant optical asymmetries.
To track non-fluorescent particles in 3D, digital holographic microscopy has emerged as a powerful technique capable of digitally reconstructing 3D images with holograms obtained from the interference pattern between the sample and a reference laser beam.18 The methodology has been demonstrated to track the motion of photo-gravitactic microswimmers in 3D,13 albeit those that principally swim directly upwards, which minimises the extent of visible cap asymmetry. The optical asymmetry of a Janus particle, and the particle-wise variations present in their surface patches, would otherwise further complicate holographic reconstruction.23 Midtvedt et al. used Machine Learning (ML) U-Nets, a convolutional encoder–decoder neural network architecture, to interpret in-line holographic data obtained with localization accuracies comparable in performance to off-axis holographic microscopy.24 They were thus able to reduce the computational cost of holographic microscopy-based 3D tracking. Their DeepTrack 2.0 software provides a fascinating test case for the applicability of ML to the 3D (and 2D) tracking of particles. Despite this potential decrease in computational cost, holographic microscopy still requires specialised imaging configurations, limiting the general accessibility of this technique. Furthermore, the U-Net is trained on simulated Mie scattering spectra of spherical particles, which does not account for the asymmetries present in Janus particles, which can have implications for accuracy as discussed above. Therefore, a general approach enabling the 3D tracking of spherical non-fluorescent microswimmers with asymmetries in their optical properties is still lacking.
For example, photocatalytic TiO2–SiO2 Janus microswimmers synthesised in large quantities via Toposelective Nanoparticle Attachment (TNA), were recently shown to possess unusual 3D mobility, characterised by predominantly 2D motion inter-dispersed with ballistic out-of-plane segments.25 Unsurprisingly, the asymmetric optical properties from the TiO2 nanoparticle cap introduced significant difficulties in accurate 3D tracking by standard methods. Therefore, a simple approach using the particle's grayscale moment of inertia, which varied as a function of its vertical position, was developed. Similar to the strategy in ref. 20, an LUT of the microswimmers' evolving moment of inertia was extracted from a series of particle Z-stacks, to which a cubic polynomial was fitted (see ref. 25 of ESI‡). However, the over-reliance on a singular image property averaged over a few particles is undesirable, and motivates a more robust approach to the 3D tracking of non-fluorescent Janus microswimmers with conventional wide-field microscopy techniques.
We thus return to ML, previously discussed in reference to the Deeptrack 2.0 software, as a promising approach to 3D microswimmer tracking. In contrast to conventional statistical approaches that assume an appropriate data model, ML models algorithmically learn the relationship between a target response and its predictors.26 ML can thus be more generally described as a way that machines can learn to perform tasks without specific programming or a set of rules to follow. Inspired by the ability of ML models to detect underlying patterns in data, we here investigate the suitability of traditional ML techniques for 3D tracking from standard 2D brightfield microscopy images. Relevant features are extracted from light-microscopy videos, and the underlying structure characterising their vertical position are extracted using the freely accessible Scikit-learn package.27 To evaluate our method, we study the classic non-fluorescent Pt–SiO2 particle system (R = 1.06 μm) with a clear optical asymmetry, as an example of a challenging but common example of Janus microswimmers whose vertical position would not be easily identifiable with conventional methods (see Fig. 1). We take Z-stacks of the Pt sputter-coated particles freely diffusing in pure water (i.e. passive due to the absence of a fuel), rather than “sticking” them to the glass slide. This likely increases the labelling error of the Z-stacks, but better represents the experimental conditions of the mobile states and the ability of the active colloids to rotate in 3D as they swim.25 We conclude by studying a different TiO2–SiO2 particle system obtained by TNA with a modified microscope configuration, and demonstrate that traditional ML models can provide a viable and generalisable approach to track non-fluorescent Janus microswimmers moving in 3D.
As the model was trained from experimental Z-stacks (rather than simulated data), it was important to ensure that the particle masks are labelled as consistently as possible. We observed that the diffusing particles occupy different vertical positions above the objective, based on their appearances in the microscopy images. Furthermore, it is difficult to reproducibly assign the same focal point (Z = 0 μm, see Fig. 1) of particles at different regions of the glass cell across Z-stacks taken. By visual inspection of many particle Z-stacks, we noted that each particle defocuses into a black sphere approximately 5 μm above the slice where it is in-focus (see Fig. 1). This provides us with a reference height that can be compared between the different Z-stacks acquired, to ensure consistent labelling between the Z-stacks and thus the data observations. To use this feature at Z = 5 μm as the reference plane, we examine 5 different Z-stacks, and select for each a representative particle mask where the particle first appears as black sphere. We then average these 5 different particle masks to obtain a reference, Z = 5 μm image. We take the radial profile of this created reference mask, which we defined as the average pixel value of the region spanned by a circle with its origin at the centre of the reference image, for different values of the circle's radius (from 1 pixel to 40 pixels, inclusive). We then scan through the Z-stacks, and for each Z-stack we find the 10 Z-slices that were the most similar to the created, reference image. This was achieved by finding the 10 particle masks in each Z-stack which have the lowest squared difference between their radial profile and that of the reference, radial profile. Of these 10 Z-slices, we select the radial profile from the particle with the lowest Z value, and re-scale all values in the Z-stack such that the selected particle mask is now labelled as Z = 5 μm. We nevertheless note that this could lead to certain mislabelling of data points, and therefore increase the model error determined during its validation.
1. The radial profile of the particle mask
2. The difference in the maximum and minimum values of the radial and the horizontal profiles of the particle masks
3. The number of local minima and maxima in the horizontal profile of the particle masks
4. The image moment of inertia from the particle mask
The radial profile of a particle mask from its centre (as described above), can be used to detect the presence of the interference pattern observed in the images.22 To evaluate the radial profile (40 pixels), we took the mean of the pixel intensities of the entire circle spanned by the specified radius from the centre, rather than individual concentric rings as often performed in the literature. This smoothed the noise arising from the wide-field optical configuration used, but more importantly dampened the significant asymmetry in the optical properties arising from the Janus structure of the microswimmers. From the radial profile of each Z-slice particle mask, we also extract the difference between the maximum and minimum value of the radial profile as an additional, potentially important feature. We likewise extract the difference between the minimum and maximum value of the horizontal profile of each particle mask, obtained by averaging across all vertical pixels of the particle mask for each pixel along the horizontal axis.
The horizontal profile is noisier than the radial profile, largely due to the lack of self-similarity present from averaging over an increasingly growing circle, but it is also more sensitive to the fringes of the interference pattern. We therefore smooth the horizontal profile before determining the number of local maxima and minima present, using the inbuilt MATLAB functions islocalmin and islocalmax. The number of minima and maxima is not a continuous relationship, and therefore these features are treated as categorical variables. All other features described and used here are treated as numerical, continuous variables. As they take discrete values, the categorical features (number of minima and maxima), were first treated to be properly included in the trained models (One-Hot-Encoding, OHE). From the distribution of the number of maxima and minima, we find that observations with more maxima and minima than 3 (in both cases) are outliers, and therefore set all values for the maxima and minima >3 to 3. This left two categorical variables which have values of 0, 1, 2, or 3 (8 total features – each with an associated binary value).
Finally, the particle mask's first image moment,34 also referred to as the image's moment of inertia (as it is analogous to the moment of inertia around the image centroid, treating pixel intensities as mass), is calculated for each particle mask.25 Thus, from the initial 111 × 111 particle mask with associated Z-label, 51 predictor variables are initially obtained (43 numerical variables and the 8 categorical variables). Next, we perform feature engineering of the numerical variables to further reduce the dimensionality of the feature space and reduce the potential for overfitting (see Fig. 2).
Before performing feature engineering, we first randomly shuffled and split the data into Training and Test sets. Preventing the model from seeing the Test dataset during the training stage ensures that no information from the Test set can influence the outcomes of the model training. This is necessary to test the generalisability of the model predictions. We randomly shuffled all Z slice observations of the particle masks, then split the initial dataset into 80% for training the models, and 20% for validating its predictions. We then perform feature engineering on the 43 numerical variables identified previously. From visual inspection, we observe that the distributions for the sequential values of the radial profile in particular appear to show significant self-similarity (see Fig. S4, ESI‡). This can be explained by the method used to extract the radial profiles from the particle masks, as we evaluate the mean pixel intensity of a circle of increasing size centred at the origin. Therefore, sequential values of the radial profile will contain significant amounts of information from the previous values. To treat this collinearity present in the numerical feature space, we use the Python factor_analyzer function to reduce the dimensionality of our feature space (see Fig. 2). The factor_analyzer function implements a varimax rotation and can be used to identify underlying latent variables which capture the largest amount of variance amongst the original feature space. This in turn allows a significant reduction in the number of numerical parameters to be input into the model. The number of extracted features that we use depends on the regression model applied, as we explain in further detail later.
After extracting the relevant feature vector from our Z-stack images as described above, we fitted the selected model on the Training dataset, or tuned its hyper-parameters to improve performance where closed-form solutions do not exist. The trained model is then used to predict the Z-labels of the hold-out Test set of observations, and these predictions are then compared to the actual labels obtained from the Z-stack. To evaluate model performance on unseen data, we use the normalised error ε = s(zmeasured − zlabel)/Ztotal (where ε is the normalised error, s(zmeasured − zlabel) is the sample standard deviation of the residuals, and Ztotal is the total valid range of tracking, from ref. 35). Given the diversity of traditional ML models that can be fitted, we focus here on linear models (Linear and Polynomial regression), and a non-linear ensemble Decision Tree model (Extremely Randomised Decision Trees), to investigate the suitability of ML to track the 3D motion of Janus microswimmers. Further discussions of other models that we investigated (Random Forest, XGBoost, and a Voting Ensemble combining the predictions of the Extremely Randomised Decision Trees and XGBoost models), including a brief investigation into the applicability of convolutional neural nets, can be found in the ESI.‡
Using the Python factor_analyzer function, we automatically extract 9 underlying variables from an initial parameter space of 43 numerical features, in doing so reducing the extent of the collinearity in the numerical parameter space. However, as we wish to reduce the dimensionality of the linear regression for the subsequent attempt at polynomial linear regression, we take the first 5 variables with the highest explanatory power (with respect to the total variance) of the initial numerical variables. Between them, 99.5% of the variance in the underlying initial numerical feature space is captured. We finally append the categorical OHE features (number of troughs and peaks) to these 5 features, and run the Scikit-learn in-built LinearRegression class on the training dataset (80% of all observations after shuffling, 31834 observations).
= (XTX)−1·(XTY) | (1) |
From a linear regression model, we obtain a normalised error of 0.080. This translates to an unnormalized value on the order of a particle diameter, which is clearly not acceptable for particle tracking applications. More importantly, we clearly observe that the residuals of the model predictions are non-Gaussian, and that there appears to be an underlying data structure present (see Fig. 3, left column). This demonstrates that a simple linear regression is not sufficient to capture the complexities of the data structure, due to the presence of non-linearities. We therefore move to more complex models, and also do not consider regularization to further improve this simple linear model.
The next ML model that we fit is a polynomial regression model including higher order terms and interactions between the numerical features. In this manner, linear weights can be fitted to non-linear data, and thus a closed-form solution also exists for polynomial regression. The presence of higher order terms and their interactions significantly increases the number of weights to be fitted, and therefore care should be taken in reducing the dimensionality of the feature space and the order of the regression to prevent a combinatorial explosion of parameters (see eqn (2)). The Scikit-learn PolynomialFeatures class transforms the input predictor variables to include all higher order terms and interaction terms between each feature. The LinearRegression class can then be applied to this transformed data set as before. As the categorical features which are one-hot-encoded are sparse columns with binary values, only the numerical features should be transformed with PolynomialFeatures. The categorical OHE features can then be appended to this transformed dataset. From 5 numerical features, transformed to include terms up to the 3rd order, we thus obtain 56 features (and the intercept) to which the 8 categorical features are added. To minimize the possibility of overfitting, we constrain the weights by using Ridge regression. In Ridge regression, a penalty term on the size of the squared weights is added to the cost function to be minimized. This keeps the weights as small as possible, but does not provide feature selection as with LASSO or Elastic-Net regularization techniques. However, it possesses a closed-form solution, making it computationally more efficient and less erratic. Ridge regression can be simply implemented on Scikit-learn using the Ridge class, and we use the GridSearchCV class to identify the optimal penalty term.
(2) |
We find that higher-order polynomial regression provides little to no reduction in training-validation error, and in some cases, can increase it significantly due to over-fitting. We therefore limit our model to a 3rd order polynomial Ridge regression model which we fit to our training data, and then test as previously described. The residuals of the model fitted to the test data are shown in Fig. 3, middle column, bottom row. We note a significant improvement in the normalised prediction error from the linear regression case to 0.032. Furthermore, we see that most of the systematic periodicity in the residuals present for the standard linear regression has disappeared. Nonetheless, we can still see some structure in the residuals, in particular around Z = 0 μm where the asymmetry of the Janus particles is likely the most visible.
The ERT model is based on a modification to the well-known Random Forest (RF) model,41,42 where randomly determined thresholds for each feature at the nodes of a Decision Tree to split the dataset to improve the scoring metric, rather than determining the optimal thresholds for each feature. Not only does this decrease the variance of the model further compared to a standard RF (at the cost of more model bias), it also significantly reduces the training time for an equivalent RF model, as determining the best thresholds for each feature at each node is one of the most computationally intensive tasks of training. We expect that the high variance present in the input image data makes a model that prioritises variance at the cost of bias more effective at making predictions of the particle's vertical positions.
We train our ERT ensemble model using the Scikit-learn ExtraTreesRegressor class. Unlike the linear regressors, we input all latent features identified during the exploratory factor analysis step (9 factors), which we find reduces the prediction error. With the categorical OHE features, we therefore initially have 17 predictor variables to input into our ERT ML model. To identify potentially non-significant features, we furthermore add a “noise” feature during training, with values drawn from a normal distribution for each observation.43 After fitting the ERT, we call the feature_importances_ attribute of the fitted model and find that the observation that the number of peaks and troughs ≥3 in the horizontal intensity profile of the particle masks are both less informative to the model than the random noise term. We therefore reassign all peaks and troughs with a value ≥2 to a value of 2, and re-fit the categorical OHE features, reducing the feature space by 2 features to a total of 15 variables. Due to the large number of hyperparameters which can be tuned when fitting a ERT (e.g. number of trees, feature sub-set size per node, depth of a tree etc.), we use k-folds (k = 10) cross validation using the GridSearchCV functionality of Scikit-learn to identify the optimal fitting parameters by minimizing the RMSE on the k validation sets.
After identifying the optimal hyperparameters by K-folds cross validation, we fit an ERT model on the entire training set. We then determine the generalizability of the trained model on the Test dataset, and find an improvement in the model error to 0.021. Furthermore, the structure of the residuals is more homoscedastic than those of the polynomial regression model (see Fig. 3, right column) with only some bias at the extreme values, which was also observed when fitting 3D data with Deeptrack 2.0,24 albeit the explanation for their prediction errors does not apply to our case. The superior performance of the ensemble decision tree models appears to justify their selection, and indicates their suitability for the problem of tracking the 3D motion of Janus microswimmers.
We extract the same features from the particle masks; however, the exploratory factor analysis stage identified 12 relevant latent factors in this case rather than 9. We attribute this to a range of factors, such as the different optical configuration used, which allows us to scan a wider Z range, as well as the different optical asymmetry of the particles studied. We directly train our ERT on these 12 features extracted using factor_analyzer, and in this manner, we are able to obtain a normalised error of 0.019, with no notable underlying structure in the residuals as shown in Fig. 4a and b.
We thus see that traditional ML models, specifically Decision Tree-based ensemble models, are an effective approach to track the vertical position of Janus particles and therefore capable of 3D tracking. As a demonstration, we show some sample 3D trajectories that we obtain using our trained ERT model (see Fig. 4c–e). Specifically, we show the Z-tracking of a highly chiral “looping” particle, whose motion would present significant difficulty to track using conventional methods. We also show the Z trajectory of a particle that moves in 2D, noting the small length scales over which positional fluctuations occur in the vertical direction.
Under these conditions, we find that the trajectories of 2D swimming particles are relatively noise-free in the Z direction, removing most of the spurious displacements which were occasionally present in previous attempts at 3D tracking of Janus microswimmers with a wide-field microscope.25 We nonetheless stress the importance of checking the model predictions as outlier swimmers will possess noisy trajectories, which should be removed from the analysis. The presence of contamination on the glass slide, or even uneven illumination fields, could all contribute to additional sources of noise in the trajectories of swimming particles. It is also critical that the trajectories analysed are limited to the bounds of the optical configuration used (around 40 μm), and we strongly advise against extrapolating to vertical positions beyond these values. The Z-range over which particles can be tracked could be extended by using objectives with a greater depth of field (lower numerical aperture). However, this comes at the cost of a lower lateral resolution, such that an optimal configuration needs to be found for the specific system of interest.
We moreover find that traditional ML techniques are effective at tracking the vertical positions of two different Janus particle systems, with videos taken using different optical conditions. We find that Decision-Tree based ensemble models are the most accurate type of traditional ML model for our defined problem. The Extremely Randomised Decision Tree model is the best stand-alone traditional ML technique for predicting the vertical positions of Janus particles, and combining its predictions with those of a Hypothesis Boosting Gradient model in a Voting Regressor may lead to a very slight improvement in overall performance (see Fig. S5, ESI‡). We limit our discussion of DL convolutional neural networks to the SI, but we note that these models can accept particle masks directly, circumventing the time-consuming process of feature extraction. Nevertheless, traditional ML models provide better understanding and control over inputs, and we favour the use of simpler models where possible.
Using the freely-available and well-documented Scikit-learn packages, the training of ML models can be performed with a few simple lines of code. Coupled with the extensive learning resources widely available,45 this enables a low-barrier extension of existing resources to new problems. We highly encourage experimentalists in the active matter community to explore the fascinating possibilities of ML by developing their own models. If nothing else, it provides an engaging learning exercise in a burgeoning field, but can likely help address experimental challenges in a manner not possible by currently well-established protocols for particle tracking.
Footnotes |
† The ML tracking code is freely available at https://github.com/maximilia164/ML_3Dtrack |
‡ Electronic supplementary information (ESI) available: Further information on image feature extraction, comparison of different decision-tree ensemble models, and evaluation of a convolutional neural network architecture. See DOI: https://doi.org/10.1039/d2sm00930g |
This journal is © The Royal Society of Chemistry 2022 |