Pre-trained language models for protein and molecular design
Abstract
Pre-trained language models (PLMs) have recently emerged as a powerful tool, showcasing exceptional performance not just in natural language understanding but also in the realm of biological research. The advantage of PLMs lies in their ability to leverage the structural similarity between biological sequences and natural language. PLMs offer novel solutions for protein research and drug design applications. By pre-training on extensive unlabeled biological sequences and then fine-tuning for specific tasks, PLMs have delivered remarkable results. To summarize the growing landscape of PLMs in biological research, this paper integrates exemplary PLMs and common datasets, demonstrating the potential and application prospects of PLMs in prediction and generation tasks.