Sequence-based peptide identification, generation, and property prediction with deep learning: a review
Abstract
Over the past few years, deep learning has demonstrated itself to be a powerful tool in many areas, especially bioinformatics. With its previous success in DNA and protein related studies, deep learning has now been brought to the field of peptide science as well. It has been widely used in sequence-based peptide identification, generation, and property prediction. The publications on this subject over the past two years are summarized in this review. The deep learning models reported are mainly convolutional neural networks, recurrent neural networks, hybrid models, transformers, and other generative models like variational autoencoders and generative adversarial networks, as well as algorithms like input optimization. Application areas include antimicrobial peptides, signal peptides, and major histocompatibility complex binding peptides, among others. This review develops content according to the general workflow of deep learning, while illustrating adaptations and techniques specific to certain example problems. Some issues and future directions are also discussed, such as approaches for model interpretation, benchmark datasets, automation in deep learning, and rational peptide design techniques.