Zetian
Mao
a,
Yoshiki
Matsuda
bc,
Ryo
Tamura
*ade and
Koji
Tsuda
*ade
aGraduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 2778561, Japan. E-mail: tamura.ryo@nims.go.jp; tsuda@k.u-tokyo.ac.jp
bFixstars Corporation, 3-1-1 Shibaura, Minato-ku, Tokyo 108-0023, Japan
cGreen Computing System Research Organization, Waseda University, 27 Wasedacho, Shinjuku-ku, Tokyo 162-0042, Japan
dCenter for Basic Research on Materials, National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan
eRIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
First published on 23rd June 2023
Ising machines are hardware-assisted discrete optimizers that often outperform purely software-based optimization. They are implemented, e.g., with superconducting qubits, ASICs or GPUs. In this paper, we show how Ising machines can be leveraged to gain efficiency improvements in automatic molecule design. To this end, we construct a graph-based binary variational autoencoder to obtain discrete latent vectors, train a factorization machine as a surrogate model, and optimize it with an Ising machine. In comparison to Bayesian optimization in a continuous latent space, our method performed better in three benchmarking problems. Two types of Ising machines, a qubit-based D-Wave quantum annealer and GPU-based Fixstars Amplify, are compared and it is observed that the GPU-based one scales better and is more suitable for molecule generation. Our results show that GPU-based Ising machines have the potential to empower deep-learning-based materials design.
These methods employ a continuous-valued vector as the latent representation to enable smooth navigation with gradient information. However, the fitness functions in the latent space are rich in local minima and may not be optimized completely.12 Although the optimization performance is hard to measure, we suspect that the performance of latent space-based models is affected adversely by local minima. A dilemma here is that increasing the latent space dimensionality leads to better expressivity but causes difficulty in optimization. If a powerful optimizer is available, it may be exploited to break out of this dilemma.
Ising machines are specialized hardware that can solve quadratic unconstrained binary optimization (QUBO).13 QUBO with M bits is described as
(1) |
Highly efficient global optimization of QUBO can be achieved by various physical platforms other than superconducting qubits. Examples include an ASIC-based digital annealer,17 coherent Ising machine18 and quantum approximate optimization algorithm (QAOA) implemented on gate-based quantum computers.19 Among them, we focus on a GPU-based Ising machine called a Fixstars Amplify Annealing Engine20 (referred to as Amplify later on). It is based on simulated annealing and uses multi-level parallel processing on multiple GPUs to find optimal solutions. Amplify relies on conventional semiconductor technologies, but can handle large scale problems up to 130000 bits with full connection.
In this paper, we construct a discrete latent space-based molecule generation framework by combining an adaptation of JTVAE and Amplify. We evaluate the expressivity of our discrete latent space in comparison to the continuous counterpart and elucidate how many bits are necessary to accurately represent the fitness functions and optimize them. It turns out that the number of required bits is around 300, which is beyond the limit of the D-Wave Advantage. Our approach was tested in three different benchmarking problems, and we observed that it outperformed the combination of continuous VAE and Bayesian optimization in three different benchmarking problems. Our results show that the combination of a discrete latent space and a powerful discrete optimizer is a viable approach to solve various molecular design problems.
The workflow of bVAE-IM is detailed in Fig. 1. First, a binary junction tree VAE (bJTVAE) is trained with the unlabeled examples to construct a latent space. We obtain bJTVAE by modifying the junction tree VAE4 with the Gumbel softmax reparametrization16 such that the latent representation is a bit vector. We selected junction tree VAE as our base model, because it can deal with graph representations of molecules without relying on string representations like SMILES.23 As a result, we obtain a decoder that translates a bit vector to a molecular graph.
After the latent space is obtained, our task of generating a molecule boils down to finding the best bit vector in the latent space. In a continuous latent space, a commonly used method is to employ a Gaussian process surrogate model and choose the best latent representation by optimizing an acquisition function.1 In our case, we employ a factorization machine24 as the surrogate model and the best bit vector is selected by optimizing the trained FM with an Ising machine. The functional form of FM is described as
(2) |
In Fig. 2, the average reconstruction accuracies over 10 runs are shown. The accuracy of bJTVAE grows monotonically up to d = 300 and starts to saturate after d > 300. At d = 300, the accuracy of bJTVAE is roughly on par with that of JTVAE, indicating that the binary latent space is capable of encoding molecules without much information loss. It also implies that molecular generators need a high dimensional latent space beyond the limit of D-Wave annealers.
As the unlabeled data, we employed nu = 250000 randomly selected molecules from ZINC. The labeled data are created by adding the properties to nl = 10000 randomly selected molecules from ZINC using the computational oracles. To investigate extrapolative performances, we intentionally limited our labeled data to those with poor properties: penalized logP∈[−3,3], TPSA∈[0,100] and GSK3-β + JNK3 + QED + SA∈[0,0.5]. As a baseline method, we employ a standard approach called VAE-BO consisting of JTVAE and Gaussian process-based Bayesian optimization as implemented in Kusner et al.2 Note that the Gaussian process is initially trained with the same set of labeled data and retrained whenever a molecule is generated. Additionally, we compared our method with random sampling in discrete and continuous latent spaces, which are denoted as bVAE-Random and VAE-Random, respectively. See Section 2 in the ESI† for details about training.
VAE-BO and bVAE-IM are applied up to the point that 300 molecules are generated. This is repeated five times with different random seeds. bVAE-IM employed Amplify and the dimensionality of the latent space is set to 300. The property distributions of the generated molecules are shown in Fig. 3a. In all of the three cases, the top molecules of bVAE-IM were better than those of VAE-BO and the initial labeled data. See Table 1 for detailed statistics and Fig. S2 in the ESI† for examples of generated molecules. Note that the experimental results for three additional properties are shown in Section 4 in the ESI.† This result implies that, using a high performance discrete optimizer, it is possible to construct a competitive molecular generator. The performance improvement of bVAE-IM from bVAE-Random is larger than that of VAE-BO from VAE-Random. It may serve as evidence that the superiority of bVAE-IM results from better optimization, not from better expressivity of the latent space. However, the adversarial effect of local minima in the continuous latent space of VAE-BO is hard to measure precisely, because guaranteed global optimization in a high dimensional space is extremely hard.
Property | Method | Maximum | Mean of top 5% | Mean of top 10% | Runtime (h) |
---|---|---|---|---|---|
logP | bVAE-IM | 5.606 ± 0.263 | 5.144 ± 0.089 | 4.948 ± 0.067 | 4.96 ± 0.44 |
bVAE-Random | 3.263 ± 0.245 | 2.359 ± 0.091 | 2.061 ± 0.093 | — | |
VAE-BO | 3.699 ± 0.138 | 3.303 ± 0.185 | 3.120 ± 0.235 | 9.46 ± 3.86 | |
VAE-Random | 2.888 ± 0.331 | 2.135 ± 0.173 | 1.836 ± 0.130 | — | |
Initial data | 2.999 | 2.640 | 2.445 | — | |
TPSA | bVAE-IM | 221.552 ± 5.430 | 192.030 ± 4.433 | 181.276 ± 3.541 | 5.32 ± 0.39 |
bVAE-Random | 149.338 ± 10.748 | 125.319 ± 2.758 | 116.168 ± 2.175 | — | |
VAE-BO | 148.908 ± 7.452 | 127.947 ± 3.351 | 119.713 ± 2.574 | 8.38 ± 1.67 | |
VAE-Random | 149.320 ± 11.146 | 122.744 ± 1.973 | 114.105 ± 1.736 | — | |
Initial data | 99.990 | 95.832 | 92.691 | — | |
GSK3β + JNK3 + QED + SA | bVAE-IM | 0.637 ± 0.028 | 0.444 ± 0.018 | 0.386 ± 0.021 | 7.90 ± 0.32 |
bVAE-Random | 0.264 ± 0.017 | 0.203 ± 0.008 | 0.179 ± 0.006 | — | |
VAE-BO | 0.388 ± 0.029 | 0.265 ± 0.012 | 0.230 ± 0.007 | 10.25 ± 4.37 | |
VAE-Random | 0.272 ± 0.028 | 0.198 ± 0.007 | 0.177 ± 0.006 | — | |
Initial data | 99.990 | 95.832 | 92.691 | — |
The runtime for generating a molecule involves an oracle call, retraining of the surrogate model, optimization of the surrogate model, and decoding. Ising models are expected to accelerate the optimization part, but the impact on the total runtime is limited, because the other parts take substantially more time. Table 1 shows the total runtime for bVAE-IM and VAE-BO. BVAE-IM was more efficient, but the difference is less pronounced when the oracle takes more time (i.e., GSK3β + JNK3 + QED + SA).
Fig. 3b shows the property distributions under different latent space dimensions. At 50 and 100 dimensions, the D-Wave quantum annealer (Advantage 4.1) was applied as well, but Amplify outperformed D-Wave in these cases. This is probably due to the limitation of current qubit technologies such as fast annealing time and Hamiltonian control errors.29 Although the GPU-based Ising machine is clearly a better choice at present, the situation might change as qubit technologies improve.
Latent spaces provided by deep learning models have revolutionized how molecules are generated. Complex valence rules can now be learned from data and need not be implemented explicitly. Nevertheless, modeling molecular properties in the latent space and optimizing them are not straightforward. We have shown how molecule generators can be improved by powerful optimizers such as Ising machines. The development of deep learning methods is rapid, and the variational autoencoders employed in this paper may no longer be among the best methods. However, our approach can basically be applied to newer models with latent spaces such as transformers.10
Current quantum-based optimizers have a scalability problem as pointed out in this paper. In addition, environmental noise often prevents quantum-based optimizers from reaching the global minimum. Nevertheless, quickly developing technologies of Ising machines may solve these problems to the point that quantum-based ones are preferred over GPU-based methods.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3dd00047h |
This journal is © The Royal Society of Chemistry 2023 |