Yuki Komoto*ab,
Takahito Ohshiroab,
Yuno Notsuc and
Masateru Taniguchi*a
aSANKEN, Osaka University, 8-1, Mihogaoka Ibaraki, Osaka 567-0047, Japan. E-mail: komoto@sanken.osaka-u.ac.jp
bArtificial Intelligence Research Center, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka 567-0047, Japan
cKakogawa Higashi High School, 232-2 Kakogawachoawazu, Kakogawa, Hyogo 675-0039, Japan
First published on 7th October 2024
Acetylation of lysine, a component of histones, regulates transcriptional activity. Simple detection methods for acetyl lysine are essential for early diagnosis of diseases and understanding of the physiological effects. We have detected and recognized acetyl lysine at the single-molecule level by combining MCBJ measurement and machine learning.
Single-molecule measurement emerges as a promising method for the novel detection of modified amino acids.11 One of the most typical single-molecule measurement method is mechanically controllable break junction (MCBJ). In MCBJ method, the metal narrow wire fabricated onto flexible substrate is broken by bending the substrate to form nanometer-scale gap.11–13 By directly measuring molecules within the gap, single-molecule measurement offers rapid, sensitive detection without preprocessing.11 Previous studies have successfully demonstrated the detection of amino acids using single-molecule measurement.14–16 Moreover, the ability to differentiate between tyrosine and phosphorylated tyrosine, a notable example of modified amino acids, was demonstrated.14 While some of the 20 amino acids present challenges in achieving maximal single-molecule current discrimination,14,15 recent advancements in machine learning applied to single-molecule measurement data have enhanced molecular differentiation capabilities.17–21 Hence, leveraging machine learning analysis holds promise for advancing the detection and discrimination of modified amino acids. This study aimed to detect and distinguish acetylated lysine at the single-molecule level using single-molecule measurement techniques.
We conducted single-molecule measurements using the Mechanically Controllable Break Junction (MCBJ) method, as illustrated in Fig. 1(A).17,18,22,23 The MCBJ substrates depicted in Fig. 1(B) and (C) were fabricated using nanofabrication techniques.17,18,22 For a comprehensive understanding of the fabrication process, refer to the ESI.† We subjected 1 μM aqueous solutions of L-lysine (Lys) and Nε-acetyl-L-lysine (AcLys) to measurement.
Fig. 1 (A) Schematic image of MCBJ measurement. (B) Schematic image of the MCBJ substrate. (C) SEM image of the narrow gold bridge of MCBJ substrate. (D) Molecular structures of Lys and AcLys. |
The current–time profiles of lysine and AcLys for measurements conducted with a nanogap width of 0.56 nm are presented in Fig. 2(A) and (B). Notably, pulsed signals were observed during the measurement of both molecules. Fig. 2(C) and (D) illustrates enlarged views of these pulse signals, which are attributed to the passage of a single molecule.17,18 We successfully detected acetylated lysine at the single-molecule level. Analyzing individual signals without employing statistical data is unsuitable because of the variability in the conductance of single-molecule signals.24,25 Therefore, we generated histograms of the maximum currents Ip, as depicted in Fig. 2(E) and (F), a common analytical approach in single-molecule measurements.12,24 In the histogram for lysine, a prominent peak emerged at approximately 38 pA. However, the maximum current of the single-molecule signal for acetylated lysine appeared to be lower than that of lysine, with the histogram lacking a discernible peak at approximately 38 pA. The average maximum current values for lysine and AcLys were 38 and 23 pA, respectively. Statistical analysis indicated a clear distinction between the single-molecule signals of lysine and AcLys. The effect of signal misdetection at low currents can be regarded as insignificant based on the measurement results at blank and low concentrations (ESI†). From the current histogram analysis, the amino acids are not distinguishable from the average Ip. However, statistical analysis indicated a clear distinction between distribution of the single-molecule signals of lysine and AcLys.
First-principles calculations were performed to investigate the reduction in the current resulting from acetylation. The transmission of conduction through a single molecule τ is described by the Breit–Wigner formula, τ = 4ΓLΓR/{(ε − EF)2 + (ΓL + ΓR)2}.25–27 Here, EF, ε, and ΓL,R denote the Fermi level of the electrodes, energy alignment of conduction orbital, and coupling to left/right electrodes, respectively. Energy alignment ε is typically the HOMO level of the measured molecule. The coupling Γ is interaction between the molecule and electrodes. The overlap between the orbital of the metal electrode atom and the orbital of the molecule results in the broadening of the molecular orbital. Γ is level broadening of transmission. The Breit–Wigner model assumes resonance tunneling through the conduction levels of molecules separated by a double barrier that permeates depending on the coupling ΓL/R. Molecular orbitals identical to electrode level provides resonance with maximum transmission. The larger difference of the conduction orbital and the electrode level provides the smaller the transmission. According to the Breit–Wigner formula, the molecular states near the Fermi level of electrodes exhibit high conductance. Consequently, in single-molecule junctions, the conduction orbital is predominantly associated with the HOMO.22,26,27 Thus, we performed Density Functional Theory (DFT) calculations to compute the HOMO of isolated lysine and AcLys molecules by Gaussian.28 Details of the DFT calculations are provided in ESI1.† The calculated HOMO energies are shown in Fig. 3(A) and (B). The HOMO energies were determined to be 3.8 eV and 4.0 eV relative to the Fermi energy of Au(111) for lysine and AcLys, as illustrated in Fig. 3(C).29,30 The energy difference between AcLys and the Fermi energy of the gold electrode was more significant than that of lysine. A larger energy difference causes a decrease in conductance. The orbital shapes of both molecules were similar, with neither orbital being extensively distributed at the acetylated amino group. This suggests that acetylation did not significantly alter the interactions between the molecules and the electrode. Thus, based on our analysis, acetylation is implicated in the reduction in conductance. The consistency between the measurement results and DFT calculations validates the experimental results.
Fig. 3 (A and B) Isosurface of HOMO of Lys (A) and AcLys (B) calculated by DFT B3LYP/6-31G. Isovalue is 0.02. (C) Schematic energy diagram of HOMO of Lys and AcLys and Au Fermi level. |
As discussed earlier, lysine and AcLys exhibit distinct currents owing to their different electronic structures, leading to discernible behavior in single-molecule signals. However, the conductance histograms in Fig. 2(E) and (F) demonstrate considerable overlap, making it difficult to differentiate the individual signals based on the current histogram. We applied machine learning techniques for signal identification based on statistical training data to address this issue.17,18 The signal identification process using machine learning is outlined in Fig. 4(A). Initially, signals were extracted from the measured current profiles and converted into features suitable for machine learning classification. These features include the maximum current Ip, average current Iave, signal duration td, and 10-dimensional shape vector (S1, S2. S3, …, S10) of the signal, as depicted in Fig. 4(B). A 10-dimensional shape vector was derived by dividing the signal into 10 sections along the time axis, calculating the average current for each section, and normalizing these values to the maximum current of the signal. Feature vectors were defined as 13-dimensional vector of (Ip, Iave, td, S1, S2, S3, …, S10). Here, each element of the feature vector is standardized to convert to a dimensionless quantity with mean 0 and standard deviation 1. Subsequently, the obtained 13-dimensional feature vectors were divided into training and test data in a 9:1 ratio. To mitigate bias in the training data, undersampling was performed to equalize the training data for each class. Then, the random forest classifier was trained with 30000 training signals and used to predict the test data individually.31 The discrimination results for lysine and AcLys are illustrated in the confusion matrix in Fig. 4(C). The evaluation was conducted using 10-fold cross-validation to ensure an unbiased assessment. The confusion matrix presents the mean and standard deviation of the ten discrimination results. Lys and AcLys were successfully discriminated with an F value of 0.72, where the F-measure served as a performance metric, defined as the harmonic mean of sensitivity and specificity. With a discrimination accuracy of 0.72 for a single molecule, it becomes feasible to identify the target with an accuracy of 90% with 9 signals and 99% with 25 signals through majority voting (ESI2†). Lysine and AcLys were successfully identified using a single-molecule signal. Moreover, considering that proteins encompass amino acids beyond lysine, we extended the analysis to include three other molecules, with glycine as an example of different amino acids. The classification results for the three amino acids are illustrated in Fig. 4(D), with all correctly predicted molecules and an F-measure of 0.56. This result underscores the potential of our approach for post-translational analysis of peptides and proteins.
Machine learning has demonstrated the capability to identify single-molecule signals for lysine and AcLys, with classification accuracy dependent on the training data size.21 The relationship between accuracy and the number of training signals is examined in Fig. 5. Fig. 5(a) and (b) illustrate that the discrimination accuracy increased rapidly until the signals reached approximately 3000 on a linear scale. Using approximately 3000 signals in the training data, the discrimination accuracy was determined at 0.7 and 0.5 for distinguishing between two and three molecules. Additionally, plots of accuracy against the logarithm of the number of signals, as depicted in Fig. 5(c) and (d), reveal that the discrimination accuracy increases nearly linearly with the logarithm of the number of signals. These results suggest that the single-molecule signals within the training data exhibited significant diversity. A more accurate identification can be achieved by augmenting the training data comprising a wide distribution of signals in the feature space. Notably, the increase in the discrimination accuracy did not saturate with the number of signals, as shown in Fig. 5(c) and (d). Consequently, augmenting the number of training signals was inferred to effectively enhance accuracy. This analysis offers an effective strategy for achieving high-accuracy single-molecule identification, emphasizing the potential of machine learning in single-molecule measurements.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4ra05488a |
This journal is © The Royal Society of Chemistry 2024 |