Jun
Li
a,
JingFei
Xie
bc,
Aljaž
Godec
d,
Keith R.
Weninger
e,
Cong
Liu
b,
Jeremy C.
Smith
fg and
Liang
Hong
*ah
aSchool of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai 200240, China
bInterdisciplinary Research Center on Biology and Chemistry, Center for Excellence in Molecular Synthesis, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 201203, China
cUniversity of the Chinese Academy of Sciences, Beijing 100049, China
dMathematical BioPhysics Group, Max Planck Institute for Biophysical Chemistry, Göttingen 37077, Germany
eDepartment of Physics, North Carolina State University, Raleigh, NC 27695, USA
fUT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
gDepartment of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996, USA
hInstitute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China. E-mail: hongl3liang@sjtu.edu.cn
First published on 4th August 2022
Internal motions of folded proteins have been assumed to be ergodic, i.e., that the dynamics of a single protein molecule averaged over a very long time resembles that of an ensemble. Here, by performing single-molecule fluorescence resonance energy transfer (smFRET) experiments and molecular dynamics (MD) simulations of a multi-domain globular protein, cytoplasmic protein-tyrosine phosphatase (SHP2), we demonstrate that the functional inter-domain motion is observationally non-ergodic over the time spans 10−12 to 10−7 s and 10−1 to 102 s. The difference between observational non-ergodicity and simple non-convergence is discussed. In comparison, a single-strand DNA of similar size behaves ergodically with an energy landscape resembling a one-dimensional linear chain. The observed non-ergodicity results from the hierarchical connectivity of the high-dimensional energy landscape of the protein molecule. As the characteristic time for the protein to conduct its dephosphorylation function is ∼10 s, our findings suggest that, due to the non-ergodicity, individual, seemingly identical protein molecules can be dynamically and functionally different.
Various experiments have demonstrated measurements of the internal dynamics of ensembles of a folded protein under physiological conditions to be non-exponential in time.3,6,7 However, this non-exponential (or ‘anomalous’) behavior has been described using ergodic models (such as fractional Brownian motion, where subjective movements of the particle are anti-correlated6,8–10) or from non-ergodic models (such as a subdiffusive continuous-time random walk, where the particle is trapped by energy basins that obey a power-law distribution of waiting times without a finite mean3,11). Whereas the non-exponential scenario has been found in numerous single-molecule fluorescence experiments and molecular dynamics (MD) simulations,6,8–10 non-ergodic interpretations have been relatively unexplored.3
The present work focuses on discussing the observational non-ergodicity of a protein observed in the time windows probed by the smFRET experiments (10−1 to 102 s) and MD simulation (10−12 to 10−7 s). Although the systematic experimental exploration of the non-ergodicity of proteins molecule is lacking, its existence is consistent with, and indirectly supported by, experimental observations of static disorder in enzymatic behavior,12–18 in which reaction rates of individual enzyme molecules are found to be many-fold different, with the differences sustained for the entire experimental time window (∼hours). Notwithstanding, the vast majority of single-molecule and ensemble experiments have described protein internal motions using ergodic frameworks.6,19–24 Whether protein internal motion is non-ergodic on any given timescale remains actively debated among theoretical and computational researchers,11,25,26 and its resolution requires thorough experimental tests.
Here, to examine the ergodicity of protein internal dynamics over a range of times, we conduct single-molecule fluorescence resonance energy transfer (smFRET) experiments and all-atom molecular dynamics (MD) simulations on the cytoplasmic protein-tyrosine phosphatase (SHP2). SHP2 is a multi-domain protein (Fig. 1a), participating in multiple cellular signaling processes, including the Ras/MAPK and Hippo/YAP pathways.27 As reported recently, SHP2 is prone to liquid–liquid phase separation (LLPS),2 in which the proteins coalesce to form condensation droplets different from the surrounding cytoplasmic environment.2,28 LLPS of SHP2 has been demonstrated to play a crucial role in regulating and triggering Noonan syndrome (NS),29 juvenile myelomonocytic leukemias (JMMLs),30 and cancers.2,27 Although this protein is used mainly as a model system in the present work to characterize the dynamical heterogeneity in a typical globular protein; there may be some implications for LLPS formation, discussed later.
Fig. 1 Protein SHP2 and single-stranded DNA (ssDNA) internal dynamics were revealed by smFRET. (a) Schematic diagram of the experimental setup used for the protein single-molecule measurements. The structure of SHP2 contains two Src homology-2 domains (N-SH2, grey; C-SH2, blue) and a PTP domain (gold). Cy3-Cy5 (green and red spheres), a FRET pair of dye molecules, were labeled on residues 87 and 266, i.e., N-SH2 and PTP domain, respectively. The 1D4-tagged protein was immobilized on PEG passivated coverslips through a biotinylated antibody (fab-biotin) and imaged via TIRF microscopy. (b–f) Five representative single-molecule (sm) fluorescence trajectories of protein, where the intensities of donor and acceptor dye molecules, IA and ID, are presented in the upper panel while the resulting FRET efficiency, EFRET = IA/(IA + ID), is shown in the bottom panel. The events of photobleaching are indicated by arrows. (g) The overall distribution (top) of FRET efficiency P(EFRET) was obtained from 254 protein smFRET trajectories together with the distributions of EFRET for each of the five single-molecule (bottom) trajectories presented in (b–f). The illustrated histogram reveals a shoulder centered at 0.2 (ultra-low FRET, state I) and three major low/mid/high FRET states centered at 0.45 (II), 0.65 (III), 0.8 (IV). The total EFRET histogram was fitted with four Gaussian peaks (blue line in top panel). (h) Control experiment of single-stranded DNA (ssDNA) dynamics. ssDNA was labeled with a cy3/5 FRET pair at 50 monomer separations. (i and j) Two experimental ssDNA single-molecule traces. (k) The ensemble-averaged distribution of FRET efficiency (green bars and area) P(EFRET), and FRET histogram of one trajectory (green lines) for the ssDNA. Unlike the SHP2 (g), which can assume several FRET states, the ssDNA shows a single FRET state centering at 0.25 ± 0.05, similar to what was previously reported.32 |
The present MD simulations and smFRET experiments demonstrate that functional inter-domain motions in the protein show heterogeneity over two wide time windows: from 10−12 to 10−7 s and 10−1 to 102 s. Moreover, as illustrated by control simulations and experiments on a single DNA chain of similar size, which behaves ergodically, we demonstrate how the anomalous dynamics of the protein arises from the characteristic protein energy landscape, which has a much higher dimensionality and unique hierarchical structure. Importantly, biochemical studies have determined that the timescale associated with SHP2 phosphatase activity is tens of seconds.2 As the observed non-ergodicity extends beyond this timescale, this could impact the function of this enzyme in its native biochemical signaling network.
We obtained 254 single-molecule FRET trajectories of SHP2, for which the fluorescence intensity of Cy3 and Cy5 are anti-correlated over time, and the trajectories used for analysis were truncated before photobleaching. Five representative single-molecule FRET trajectories are plotted in Fig. 1b–f. As can be seen, over the time window (0–200 seconds) probed, some protein molecules stay in one FRET state (Fig. 1b and c), while others transit between two (Fig. 1d and e) or three (Fig. 1f) distinct states. This behavior indicates that any single protein molecule explores only a portion of the conformational space sampled by the ensemble over the observation time window. To further illustrate this heterogeneity, Fig. 1g compares P(EFRET), the overall histogram of EFRET, averaged over an ensemble of 254 trajectories (blue, top panel) with those derived from each of the five individual trajectories in Fig. 1b–f. The ensemble-averaged P(EFRET) exhibits three major peaks, at 0.45 (II), 0.65 (III), and 0.8 (IV), with a small shoulder at 0.2 (I), indicating at least four conformational states observed. In contrast, two of the five single trajectories (sm1, sm2) are located in one state, whereas the other three (sm3 to sm5) transition between two or three states in the time window observed.
We note that the differences in the FRET values between the four states are significantly larger than the fluctuations within one state, and are also larger than the fluctuations of fluorescence intensity when the protein is labeled by only one dye molecule (see Fig. S1†). Moreover, for comparison, we also provide the smFRET results of a single-stranded DNA (Fig. 1h–k, experimental details in supplementary information†), denoted as ssDNA, whose radius of gyration (Rg) is ∼3.4 nm, close to that of the SHP2 protein (Rg ∼ 2.7 nm). The ssDNA presents only one FRET state (Fig. 1k; EFRET = 0.25 ± 0.05, for mean ± s.d.), similar to previous reports.32,33 All the above comparisons demonstrate that the four observed FRET states of SHP2 result from different conformations of the protein molecule rather than photobleaching, blinking, fluctuation of laser intensity, or any other instrumental or environmental factors.
To quantitatively characterize how each single-molecule FRET trajectory explores the four conformational states in the protein, we applied a four-state hidden Markov model (HMM).34 Details of the model can be found in the ESI.† The analysis was conducted on 127 trajectories chosen from the overall 254 such that each of them lasted at least 100 s before photobleaching. Only the first 100 s of the chosen trajectories were analyzed to guarantee that the comparison was conducted for the same length of time. The 127 trajectories were categorized into nine subgroups. As can be seen in Fig. 2, subgroups I to IV correspond to the case in which the protein molecule stays in one single state over the entire 100 s (Fig. 2a–d), and subgroups V to VIII correspond to molecules transitioning between two states (Fig. 2e–h) and subgroup IX is the case in which the protein transitions among three states over the 100 s (Fig. 2i). We show the relative populations of these nine subgroups in Fig. 2j. As can be again seen, a significant degree of dynamical or conformational heterogeneity is present among different trajectories. This heterogeneity, i.e., the same species of the molecule exhibiting distinct conformations, and remaining different on the 0.1–200 s time window, is a manifestation of observational non-ergodicity.35 The transitions between different FRET states are further analyzed in a transition density plot (Fig. S3†), in which the transitions between states I and II and between states III and IV are most evident.
(1) |
Fig. 3 Subdiffusive, non-ergodic, and aging dynamics in SHP2 derived from molecular dynamics (MD) simulations. (a) TA-MSD as a function of lag time Δ for each of 100 independent MD trajectories. (b) Comparison of MD-derived ensemble-averaged MSD (EA-MSD, green squares) without time averaging and the time-ensemble-averaged MSD (TEA-MSD, blue circles). The dashed lines indicate asymptotes of power-law fits. These difference between EA-MSD and TEA-MSD directly proves the breaking of ergodicity on the MD time window.36,37 The oscillations of EA-MSD are caused by a limited amount of simulation trajectories used for analysis. (c) Scatter distribution, (ξk), at Δ = 1 ns is skewed with the primary peak located much below 1. (d) TEA-MSD as a function of observation time, t, with three fixed lag times Δ as indicated (2 ns, 4 ns, and 8 ns). The dashed line in (d) guides the trend of decay. |
Another standard test for ergodicity is the scatter distribution,36ϕ(ξk), where ξk is defined as a dimensionless ergodic-breaking parameter . ϕ(ξk) gives information on the distribution of TA-MSD among trajectories at a given lag time, Δ. For an ergodic or homogeneous system, ϕ(ξk) will show a narrow peak at ξk = 1, whereas a non-ergodic process will assume a skewed distribution, with the peak located away from 1.36 As shown in Fig. 3c, the MD-derived ϕ(ξk) is rather broad, with the most intense peak located well below 1, indicating the protein molecules in many MD trajectories are highly restrained, displaying flexibilities below the average. Hence, again, the inter-domain motion of the protein is non-ergodic and heterogeneous over the 1 ps ∼ 100 ns time window probed by MD.
Non-ergodic phenomena have been reported in various complex biological systems, such as the diffusion of a nanoparticle in an actin filament network,38,39 the lateral movement of protein molecules in the cell membrane,40,41 and the transportation of protein granules in the cytoplasm of living cells.42,43 Accompanying the non-ergodicity, these systems often show striking aging phenomena in which the effective mobility of the studied particle is reduced upon increasing the observation time,36,37 as manifested as a decay of the TEA-MSD over t at a given Δ. Non-ergodicity is related to the aging properties of the processes involved, that is, the dependence of physical observables on the time span between the initialization of the system and the start of the measurement. Fig. 3d shows the TEA-MSD vs. the trajectory length, obtained by truncating the data at an observation time t and performing a temporal average (i.e., a moving average). As shown in Fig. 3d, the TEA-MSD decays with t as a power-law: TEA-MSD ∼ t−0.2. Indeed, the internal dynamics of SHP2 ages with the observation time. This aging behavior is often interpreted by the framework of continuous-time random walk (CTRW),38–43 and thus why we derive the waiting time distribution in Fig. S7.† We found that the waiting time distributions are broadly distributed as τ−(1+α) with power-law CTRW exponent α = 0.8, indicating CTRW contributes to complicated protein internal dynamics.
The protein's TEA-MSD shows aging and subdiffusion (αt < 1), which indicates the combination of non-ergodic CTRW and ergodic models as the underlying mechanisms of protein internal dynamics.36 The Gaussian distributed step-size function (Fig. S8c†) and anti-persistency velocity correlation function (Fig. S8d†) of protein inter-domain distance x(t), suggest ergodic FBM subordinated to the CTRW. If we assume a free diffusion is adopting a mixed origin of CTRW and FBM. This implies the relation of αe = αβ and αt = 1 − α + αβ, where α is the power-law exponent of the waiting time, and β is twice of the Hurst exponent. In the present work, as α = 0.8 is smaller than 1 (see Fig. 3d in the main text); and αt should be larger than αe, contradicting the results in Fig. 3b. We note that αe > αt was also found in various biological systems.44–47 These works often attribute this finding to the confinement effect.44,45 Confinement is unambiguously present in the present work as the studied object is the distance between the two domains of the protein SHP2, which is structurally constrained. Moreover, as revealed in ref. 11, the underlying energy landscape is self-similar and fractal. All these could lead to the observation of αe > αt.
Non-converged dynamics can result from two phenomena. One of these is long memory in dynamics beyond observational time. For example, fractional Brownian motion with an infinitely long memory will never converge but will itself be ergodic.50 The other phenomenon is the existence of too many distinct conformational states for a single protein molecule to sample over the observation time, i.e., observational non-ergodicity.11,51 Hence, by comparing the dynamical behavior of the ssDNA, one can unambiguously conclude that non-convergence alone cannot cause the non-ergodicity observed in SHP2 protein.
An example of a CCTN obtained from the MD trajectory of the protein is presented in Fig. 5a. The network is highly complex and inhomogeneous, forming loosely connected hubs (see the regions enclosed by the dashed lines), where the inter-hub connections are rather limited, but the nodes inside the hub are densely connected with each other. We also performed the same network analysis from the MD trajectory of ssDNA (Fig. 5b). Compared to the protein, the CCTN of the ssDNA is much simpler, with most nodes having only two neighboring nodes and connected linearly without forming many hubs.
To quantitatively examine the connections in the networks of the two systems, we analyzed the degree distribution, P(d), i.e., the probability distribution of the number of connections per node.55 As shown in Fig. 5c, P(d) of the protein is much broader than that of the ssDNA, where the width of the distribution is 9.0 in the protein, about 5 times wider than that in the ssDNA (s.d. = 1.7). And the majority of nodes in the protein CCTN have more than 10 connecting neighbors, far more than that in the ssDNA (mean = 3.5). Moreover, one can examine the topological structure of the networks. Here, we applied a box covering method (see details in ESI†) to estimate the fractal dimension.55,56 The fractal dimension determined for the protein network is 1.7 (Fig. 5d, blue), consistent with an earlier study on another protein, phosphoglycerate kinase, for which the value was found to be 2.4.11 In contrast, the fractal dimension of the ssDNA is about 1.0, indicating it resembles a one-dimensional linear network (Fig. 5d, purple). We also compared the node strength (s), i.e., the frequency of visiting each node in the network.57 As shown in Fig. 5e, the CCTN of the protein has many heavily visited nodes (s > 100), which are the center nodes of the hubs (highlighted in green in Fig. 5a). In contrast, such heavily visited nodes are absent for ssDNA. This results from the hierarchical structure of the energy landscape of the protein in which the protein frequently visits the nodes inside any given hub but takes a long time to escape out as relatively few transition paths connect to external hubs. As a result, long-lived metastable conformations of the protein result (see Fig. 1 and 2).
The above comparative analysis reveals that the SHP2 protein has a much more complex energy landscape than the ssDNA, with a higher dimensionality and a much more hierarchical structure, and the conformational states have many more connecting neighbors. We note there exist many local structures and constraints (e.g., α-helix and β-sheets, which are stabilized by intrachain hydrogen bonding, disulfide linkages, ionic bonding, etc.) inside the structure of the protein, which will limit the protein conformational changes. All these features lead to the protein molecule having many different pathways to transit between any two distant states, and also lead to it staying in single metastable states, the hub centers, for long times. This network structure leads to heterogeneous dynamics among individual protein molecules observed over a long period of time, i.e., observational non-ergodicity.
Finally, we note that the timescales explored by MD simulations (10−12 to 10−7 s) and by single-molecule FRET experiments (0.1 to 200 s) differ by six orders of magnitude. However, as shown in Fig. S11,† both the topological structure and the degree distribution of the CCTN of the protein are scale-free, i.e., independent of whether the simulation is 100 ns or 1 μs long. This scale-free and self-similarity character of the energy landscape was shown earlier for several different proteins over many decades in time.11 Hence, we attribute the non-ergodic dynamics in the protein to its characteristic high-dimensional, hierarchical, self-similar complex energy landscape. We note that an unambiguous confirmation of such non-ergodicity observed in simulation can extend to the experimental time window that can not be accessed by all-atom MD simulations. It might be able to be examined by the coarse-grained simulation, e.g., ref. 58, which is beyond the present work and could be carried out in the future.
Dynamical heterogeneity on functional timescales, due to relaxation processes existing on these timescales or longer, will theoretically lead to functional differences. The observed dynamical heterogeneity in the protein is thus likely to lead to the population splitting of individual enzyme molecules with theoretically different catalytic rates.4,5 This is consistent with the experimental observation of “static disorder” of enzymatic rates among individual enzyme molecules, in which the catalytic rates of individual enzyme molecules can be many-fold different, with the differences sustained for hours.12–16,18 Moreover, one can see from Fig. 1g that the protein is trapped in very different conformational states for tens or hundreds of seconds. Such long-lived diverse conformational states could trap the SHP2 protein molecules in different conformations for sufficiently long times to diffusively find a partner with complementary shape and electrostatic interactions, leading to association and, in turn, triggering the liquid–liquid phase separation (LLPS) for which this particular protein is known.2
A final, intriguing question arises as to whether observational non-ergodicity among individual protein molecules will disappear when the observation time extends beyond hundreds of seconds probed here. For a single protein in an aqueous solution, at some point in time, the folding: unfolding equilibrium will be well sampled, and if one ignores degrading chemical reactions, one would then expect ergodicity to be reached. However, this question cannot be addressed in this work. Further, an experimental work on another multi-domain protein,3 Hsp90, using plasmon rulers has revealed extremely long-lived (∼12 hours) open and closed configurations. The extent of non-ergodicity in internal motions of proteins of different structures and functions and the biological implications of this will be a topic for future research.
Footnote |
† Electronic supplementary information (ESI) available. See https://doi.org/10.1039/d2sc03069a |
This journal is © The Royal Society of Chemistry 2022 |