Unifying sequence-structure coding for advanced protein engineering via a multimodal diffusion transformer

Abstract

Modern protein engineering demands integrated sequence–structure representations to tackle key challenges in designing, modifying, and evolving proteins for specific functions. While sequence-based methods are promising for generating novel proteins, incorporating structure-oriented information improves the success rate and helps target corresponding functions. Therefore, rather than relying solely on sequence or structure-based approaches, a consensus strategy is essential. Here, we introduce ProTokens, machine-learned “amino acids” derived from structural databases via self-supervised learning, providing a compact yet information-rich representation that bridges sequence and structure modalities. Instead of treating sequences and structures separately, we build PT-DiT, a multimodal diffusion transformer-based model that integrates both into a unified representation, enabling protein engineering in a joint sequence–structure space, streamlining the design process and facilitating the efficient encoding of 3D folds, contextual protein design, sampling of metastable states, and directed evolution for diverse objectives. Therefore, as a unified solution for in silico protein engineering, PT-DiT leverages sequence and structure insights to realize functional protein design.

Graphical abstract: Unifying sequence-structure coding for advanced protein engineering via a multimodal diffusion transformer

Supplementary files

Article information

Article type
Edge Article
Submitted
16 Mar 2025
Accepted
14 May 2025
First published
15 May 2025
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry
Creative Commons BY license

Chem. Sci., 2025, Advance Article

Unifying sequence-structure coding for advanced protein engineering via a multimodal diffusion transformer

X. Lin, Z. Chen, Y. Li, Z. Ma, C. Fan, Z. Cao, S. Feng, J. Zhang and Y. Q. Gao, Chem. Sci., 2025, Advance Article , DOI: 10.1039/D5SC02055G

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements