Latent spaces for antimicrobial peptide design†
Abstract
Current antibacterial treatments cannot overcome the rapidly growing resistance of bacteria to antibiotic drugs, and novel treatment methods are required. One option is the development of new antimicrobial peptides (AMPs), to which bacterial resistance build-up is comparatively slow. Deep generative models have recently emerged as a powerful method for generating novel therapeutic candidates from existing datasets; however, there has been less research focused on evaluating the search spaces associated with these generators from which they sample their new data-points. In this research we employ five deep learning model architectures for de novo generation of antimicrobial peptide sequences and assess the properties of their associated latent spaces. We train a RNN, RNN with attention, WAE, AAE and Transformer model and compare their abilities to construct desirable latent spaces in 32, 64, and 128 dimensions. We assess reconstruction accuracy, generative capability, and model interpretability and demonstrate that while most models are able to create a partitioning in their latent spaces into regions of low and high AMP sampling probability, they do so in different manners and by appealing to different underlying physicochemical properties. In this way we demonstrate several benchmarks that must be considered for such models and suggest that for optimization of search space properties, an ensemble methodology is most appropriate for design of new AMPs. We design an AMP discovery pipeline and present candidate sequences and properties from three models that achieved high benchmark scores. Overall, by tuning models and their accompanying latent spaces properly, targeted sampling of new anti-microbial peptides with ideal characteristics is achievable.