Frederik
van Veen
*ab,
Luca
Ornago
a,
Herre S.J.
van der Zant
*a and
Maria
El Abbassi
a
aDepartment of Quantum Nanoscience, Delft University of Technology, Delft 2628CJ, The Netherlands. E-mail: Frederik.vanVeen@empa.ch; h.s.j.vanderzant@tudelft.nl
bTransport at Nanoscale Interfaces Laboratory, Empa, Swiss Federal Laboratories for Materials Science and Technology, Überlandstrasse 129, Dübendorf CH-8600, Switzerland
First published on 11th October 2023
Break-junction experiments are used to statistically study the electronic properties of individual molecules. The measurements consist of repeatedly breaking and merging a gold wire while measuring the conductance as a function of displacement. When a molecule is captured, a plateau is observed in the conductance traces otherwise exponentially decaying tunnel traces are measured. Clustering methods are widely used to separate these traces and identify potential sub-populations in the data corresponding to different molecular junction configurations. As these configurations are typically a priori unknown, unsupervised methods are most suitable for the classification. However, most of the unsupervised methods used for the classification perform poorly in the identification of these small sub-populations of molecular traces. Robust removal of tunnelling-only traces before clustering is thus of great interest. Neural networks have been proven to be powerful in the classification of data samples with predictable behaviour, but often show large sensitivity to the underlying training data. In this study we report on a neural network method for the separation of tunnelling-only traces in conductance vs. displacement measurements that achieves excellent classification performance for complete and unseen data sets. This method is particularly useful for data sets in which the yield of molecular traces is low or which comprise of a significant number of traces displaying a jump from tunneling features to a molecular plateau.
A common approach to statistically determine the single-molecule conductance value is to construct conductance histograms from a set of conductance vs. displacement (breaking) traces and fit the prominent peaks with a log-normal distribution.10 However, this approach can lead to inaccurate data interpretation as measurement sets may exhibit breaking curves of distinct molecular configurations (e.g. different injection points, participation of additional molecules). In such cases, the most probable conductance value obtained from the raw histograms cannot be attributed to a unique molecular conformation.11 Clustering algorithms can help to separate the traces into categories of distinct molecular configurations, to be analysed individually. Most conventional unsupervised learning algorithms, however, perform poorly in capturing small subpopulations from data sets with highly non-uniform cluster sizes due to the uniform effect.12,13 In particular, in the case of low yield measurements, most of the molecular classes are visible only after several steps of over-clustering.
Convolutional neural networks (CNN) are a class of artificial feed-forward neural networks with initial convolutional layers. These layers contain filters that are optimised to identify features in the input data that are characteristic for distinguishable classes. CNNs have been shown to be particularly powerful for image recognition tasks, and can be applied for the analysis of break-junction experiments, as breaking traces can be considered as 1D and 2D images. While the more commonly used unsupervised methods extract the average features of groups of traces, CNNs treat each breaking trace individually, and are thus useful for the identification of small subpopulations. Here, we describe a supervised deep learning approach, using CNNs, to improve the performance of the unsupervised clustering methods by initially removing tunnelling-only traces from the measurement set.
A convolutional neural network was trained to distinguish between tunnelling and molecular traces on a large dataset of roughly 200000 labeled breaking traces obtained for alkanedithiols, displaying very diverse breaking traces. A schematic of the chemical structure of the alkanedithiols is shown in the ESI,† Section S6. Once trained, this network is used to label and remove the tunnelling-only traces in sets of unseen breaking traces. We show that the network fulfills the important requirement of generalization, showing excellent performance for complete and unseen experimental datasets of different molecules with different anchoring groups and breaking traces.
A single breaking trace describes the conductance of the junction at increasing electrode displacements. In the absence of bridging molecules, the junction conductance decays exponentially as the gap size between the electrodes increases, typical for direct tunnelling across a barrier of increasing length, as observed for the orange-colored traces in Fig. 1b. Target molecules can bridge the gold electrodes after rupture of the point contact, which is typically identified by the presence of a conductance plateau in the breaking trace, as seen in the green- and blue-colored traces in Fig. 1b. Fig. 1c shows a reduced feature space representation, obtained by applying principle component analysis, for the three-class measurement set recorded for hexanedithiol,14 shown in Fig. 1a. Due to the large amount of tunnelling traces, the measurement set shows a high variance in the dimensions describing tunnelling features, indicated by the large spread of the tunnelling class in Fig. 1c. A zoomed-in view around the origin (Fig. 1d) displays the large overlap of traces from the different configurations, complicating the separation of the different (molecular) configurations. The challenge is now to efficiently separate the molecular traces from the ones displaying tunneling only, after which unsupervised clustering methods can be utilized to capture only the variance in the molecular set.
Fig. 1 (a) Two-dimensional (left) and one-dimensional (right) conductance histograms built from a set of breaking traces recorded for hexanedithiol. The data is taken from ref. 14. (b) Examples of individual breaking traces showing three different configurations: tunneling (orange), single molecule (blue) and traces with plateau lengths larger than the molecule (green). (c) Reduced feature space of all the breaking traces in the dataset, obtained by applying principle component analysis (PCA). (d) Zoomed-in region in the reduced feature space, containing a mixture of traces from the three different configurations. |
To train the neural networks, large training data sets (roughly 100000 traces per molecule) were used from a previous study of mechanically controlled break-junction (MCBJ) measurements on propanedithiol (ADT3), hexanedithiol (ADT6) and octanedithiol (ADT8), displaying a large variety of molecular traces.14 To label the data, we used an unsupervised learning algorithm to cluster the individual measurements sets into many (i.e. 100) subclasses. The classes displaying very clean tunnelling and molecular features were labeled accordingly. From these labelled sets, we constructed a training set with equal amounts of tunnelling and molecular traces and similar amounts of traces from ADT3, ADT6, and ADT8. Note that one could also collect large amounts of tunnelling traces from measurements of the bare gold samples. However, as the presence of molecules can influence the tunnel barrier between the electrodes,2 the used collection scheme might capture more diverse tunnelling behaviour.
For each breaking trace the region within 0.5–1.0 × 10−6G0 (G0 = 2e2/h ≈ 77 μS), and within the 0.5–3 nm displacement range was transformed into a discrete feature vector using the histogram method explained in the ESI,† Section S1. These ranges were chosen since good initial results were obtained while the electrode displacement range includes most plateau lengths observed in the experiments. To train the networks, we exposed them to the labeled breaking traces, in batches of 1600 traces per iteration. The cross-entropy function was utilized to calculate the network loss, while the network parameters were optimized via the adaptive moment estimation (Adam) algorithm.15 80 percent of the labeled data was used to train the networks, while the remaining 20 percent was used to determine the generalization of the network. Fig. 2b displays the loss and network accuracy for both the training and validation sets as a function of epochs. A single epoch denotes a complete pass of the labeled data through the neural network, while updating its parameters. In order to visualize the difference between the training and validation curves, we omitted the first 10 epochs (reducing the loss and accuracy range of the plots). The full curves are shown in the ESI,† Section S2. A constant step-size of α = 10−3 was used together with exponential decay rates β1 = 0.9 and β2 = 0.999, and ε = 10−8. All models in this study were developed with the pytorch open source library.16
The breaking traces in the validation set are likely to display very similar features to the ones in the training set, since they are obtained using the same parameters (i.e. same sample and molecules). However, the model should also generalize well to measurement sets of different molecules, containing breaking traces with distinct shapes. To investigate the model's ability, we design the following test: first, we train the model based on only two chain lengths, and second we check the classification accuracy of the remaining compound. The classification performance after this training is summarized in Fig. 2d. The figure shows from left to right the classification accuracy of the network for ADT3 (trained on ADT6 and ADT8), ADT6 (trained on ADT3 and ADT8) and ADT8 (trained on ADT3 and ADT6). The CNN achieves excellent classification for all three molecules, with accuracies exceeding 95% for all of them, indicating that it generalizes the classification task well to unseen data. The network achieves higher accuracies for the molecular traces than for the tunneling traces, with higher accuracies (bars) reached on the molecular traces. This is likely the result of the network being penalized more for false classification of molecular traces than tunneling ones.
Additionally, the network performance was benchmarked against commonly used unsupervised techniques (K-means and Gaussian mixture model). For this benchmarking, the ratio between molecular and tunneling traces was varied, ranging from 1:1 to 1:10. The two-class clustering results of this benchmark are shown in Fig. 2c, displaying the classification performance, averaged over the three datasets, for the different methods and ratios. The full results, without averaging are shown in the ESI,† Section S4. Firstly, it can be seen that the unsupervised techniques work well when the amount of molecular and tunneling traces are similar. When the ratio between the two drops, the performance reduces drastically. For all molecules, the CNN outperforms the unsupervised methods significantly. The unsupervised methods take into account all the features in the breaking traces, diluting the features that are relevant and molecule-dependant with ones that are not, while the CNN learns to capture only the ones that are relevant for the distinction between tunneling and molecular.
As expected, the classification performance of the CNN remains also constant when varying the ratio of molecular traces to tunneling traces. In addition to the higher accuracy, the network, once trained, also outperforms the considered unsupervised two-class clustering techniques substantially in terms of time, by more than a factor 10. This becomes especially advantageous for large data sets.
For both molecules, the tunnelling classes show clean exponentially decaying features, while the remaining set of traces show no clear tunneling features, indicating that the network separates only and most tunneling traces. These observations are also confirmed from a more detailed evaluation (see the ESI,† Section S3.1 and S3.2); subsequent clustering with k-means of the obtained tunnelling and molecular sets into 15 (ADT6) and 10 (OPE3-diSAc) subclasses shows that for ADT6, 97.5 percent of the tunneling traces are removed while zero molecular ones were discarded, and that for OPE3-diSAc, none of the traces have been wrongfully labeled by our network.
The classification results obtained for measurements on OPE3-Pyr (a) and OPE3-NH2 (b) are shown in Fig. 4. From the two-dimensional histograms it can be seen that the tunneling sets contain very little (OPE3-Pyr) or no (OPE3-NH2) molecular features (see also the ESI,† Section S3.3 and S3.4). The detailed evaluation in the ESI† shows that for OPE3-NH2, no molecular traces are separated, while for OPE-Pyr very few molecular traces were separated, and they only show very short jumps and do not display a significant molecular plateau. For both molecules, the histograms constructed from the remaining traces still display prominent exponentially decaying features. From the clustering evaluation, however, we find merely breaking traces that contain molecular plateaus. A large percentage of these traces display a tunneling part, followed by a jump to a molecular plateau, as reported in ref. 17. Individual example traces of this type have been added to the molecular histograms (Mol) of both molecules as black-lined overlay in Fig. 4. These results indicate the robustness of the model, achieving also excellent classification, without additional training, for breaking traces with different anchoring groups and even molecular traces with initially a tunneling signal.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3tc02346j |
This journal is © The Royal Society of Chemistry 2023 |