Transmol: repurposing a language model for molecular generation

Rustam Zhumagambetov; Ferdinand Molnár; Vsevolod A. Peshkov; Siamac Fazli

doi:10.1039/D1RA03086H

Transmol: repurposing a language model for molecular generation†

Rustam Zhumagambetov,

^a Ferdinand Molnár,

^b Vsevolod A. Peshkov

*^c and Siamac Fazli

*^a

Author affiliations

* Corresponding authors

^a Department of Computer Science, School of Engineering and Digital Sciences, Nazarbayev University, Nur-Sultan, Kazakhstan
E-mail: siamac.fazli@nu.edu.kz

^b Department of Biology, School of Sciences and Humanities, Nazarbayev University, Nur-Sultan, Kazakhstan

^c Department of Chemistry, School of Sciences and Humanities, Nazarbayev University, Nur-Sultan, Kazakhstan
E-mail: vsevolod.peshkov@nu.edu.kz

Abstract

Recent advances in convolutional neural networks have inspired the application of deep learning to other disciplines. Even though image processing and natural language processing have turned out to be the most successful, there are many other domains that have also benefited; among them, life sciences in general and chemistry and drug design in particular. In concordance with this observation, from 2018 the scientific community has seen a surge of methodologies related to the generation of diverse molecular libraries using machine learning. However to date, attention mechanisms have not been employed for the problem of de novo molecular generation. Here we employ a variant of transformers, an architecture recently developed for natural language processing, for this purpose. Our results indicate that the adapted Transmol model is indeed applicable for the task of generating molecular libraries and leads to statistically significant increases in some of the core metrics of the MOSES benchmark. The presented model can be tuned to either input-guided or diversity-driven generation modes by applying a standard one-seed and a novel two-seed approach, respectively. Accordingly, the one-seed approach is best suited for the targeted generation of focused libraries composed of close analogues of the seed structure, while the two-seeds approach allows us to dive deeper into under-explored regions of the chemical space by attempting to generate the molecules that resemble both seeds. To gain more insights about the scope of the one-seed approach, we devised a new validation workflow that involves the recreation of known ligands for an important biological target vitamin D receptor. To further benefit the chemical community, the Transmol algorithm has been incorporated into our cheML.io web database of ML-generated molecules as a second generation on-demand methodology.

Supplementary files

Article information

DOI: https://doi.org/10.1039/D1RA03086H
Article type: Paper
Submitted: 20 Apr 2021
Accepted: 22 Jul 2021
First published: 27 Jul 2021
This article is Open Access

Download Citation

RSC Adv., 2021,11, 25921-25932

Permissions

Request permissions

Transmol: repurposing a language model for molecular generation

R. Zhumagambetov, F. Molnár, V. A. Peshkov and S. Fazli, RSC Adv., 2021, 11, 25921 DOI: 10.1039/D1RA03086H

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

RSC Advances

Transmol: repurposing a language model for molecular generation†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Transmol: repurposing a language model for molecular generation

Social activity

Search articles by author

Spotlight

Advertisements