Sequence determinants of protein phase separation and recognition by protein phase-separated condensates through molecular dynamics and active learning

Abstract

Elucidating how protein sequence determines the properties of disordered proteins and their phase-separated condensates is a great challenge in computational chemistry, biology, and biophysics. Quantitative molecular dynamics simulations and derived free energy values can in principle capture how a sequence encodes the chemical and biological properties of a protein. These calculations are, however, computationally demanding, even after reducing the representation by coarse-graining; exploring the large spaces of potentially relevant sequences remains a formidable task. We employ an “active learning” scheme introduced by Yang et al. (bioRxiv, 2022, https://doi.org/10.1101/2022.08.05.502972) to reduce the number of labelled examples needed from simulations, where a neural network-based model suggests the most useful examples for the next training cycle. Applying this Bayesian optimisation framework, we determine properties of protein sequences with coarse-grained molecular dynamics, which enables the network to establish sequence–property relationships for disordered proteins and their self-interactions and their interactions in phase-separated condensates. We show how iterative training with second virial coefficients derived from the simulations of disordered protein sequences leads to a rapid improvement in predicting peptide self-interactions. We employ this Bayesian approach to efficiently search for new sequences that bind to condensates of the disordered C-terminal domain (CTD) of RNA Polymerase II, by simulating molecular recognition of peptides to phase-separated condensates in coarse-grained molecular dynamics. By searching for protein sequences which prefer to self-interact rather than interact with another protein sequence we are able to shape the morphology of protein condensates and design multiphasic protein condensates.

Graphical abstract: Sequence determinants of protein phase separation and recognition by protein phase-separated condensates through molecular dynamics and active learning

Associated articles

Supplementary files

Article information

Article type
Paper
Submitted
10 maj 2024
Accepted
26 jul 2024
First published
03 aug 2024
This article is Open Access
Creative Commons BY license

Faraday Discuss., 2025, Advance Article

Sequence determinants of protein phase separation and recognition by protein phase-separated condensates through molecular dynamics and active learning

A. Changiarath, A. Arya, V. A. Xenidis, J. Padeken and L. S. Stelzl, Faraday Discuss., 2025, Advance Article , DOI: 10.1039/D4FD00099D

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements