Molecular Transformer unifies reaction prediction and retrosynthesis across pharma chemical space
Abstract
Predicting how a complex molecule reacts with different reagents, and how to synthesise complex molecules from simpler starting materials, are fundamental to organic chemistry. We show that an attention-based machine translation model – Molecular Transformer – tackles both reaction prediction and retrosynthesis by learning from the same dataset. Reagents, reactants and products are represented as SMILES text strings. For reaction prediction, the model “translates” the SMILES of reactants and reagents to product SMILES, and the converse for retrosynthesis. Moreover, a model trained on publicly available data is able to make accurate predictions on proprietary molecules extracted from pharma electronic lab notebooks, demonstrating generalisability across chemical space. We expect our versatile framework to be broadly applicable to problems such as reaction condition prediction, reagent prediction and yield prediction.