Kan
Hatakeyama-Sato
*a,
Yasuhiko
Igarashi
b,
Takahiro
Kashikawa
c,
Koichi
Kimura
c and
Kenichi
Oyaizu
*a
aDepartment of Applied Chemistry, Waseda University, Tokyo 169-8555, Japan. E-mail: satokan@toki.waseda.jp; oyaizu@waseda.jp
bFaculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennodai, Tsukuba 305-8573, Japan
cFujitsu Ltd, Kanagawa 211-8588, Japan
First published on 2nd December 2022
We introduce quantum circuit learning (QCL) as an emerging regression algorithm for chemo- and materials-informatics. The supervised model, functioning on the rule of quantum mechanics, can process linear and smooth non-linear functions from small datasets (<100 records). Compared with conventional algorithms, such as random forest, support vector machine, and linear regressions, the QCL can offer better predictions with some one-dimensional functions and experimental chemical databases. QCL will potentially help the virtual exploration of new molecules and materials more efficiently through its superior prediction performances.
Evaluating molecular and material properties is essential in materials and chemo-informatics.4,5 Various supervised models, trained to predict specific properties from explanatory variables by learning their statistical relationships, have been developed in machine learning.5 There are many supervised models represented by linear algorithms, support vector machines, and decision tree-based ensembles.5 Also, recent deep learning technology of neural networks has broken the limit of prediction accuracy by drastically increasing the model complexity.6 In chemical and material fields, appropriate use of such supervised models afforded the prediction of versatile material and chemical properties, such as conductivity,7,8 energy level,9 photoconversion efficiency,10 and toxicity.4 Their prediction accuracy can exceed human experts and traditional computational simulations, gradually forming a solid platform for data-oriented science.4,5,8–10
On the other hand, most data science projects still have difficulty with the reliable prediction of experimental properties. The main challenges are (a) the lack of trainable records and (b) complex molecular interactions. Due to the high cost of actual experiments, typical database sizes of material projects are around 101–102,7 whereas deep learning mainly targets databases with over 104 records.4–6 Although recent deep learning approaches, represented by fine-tuning (transfer learning) and multipurpose learning, may offer an opportunity for small datasets, they may not be the complete solutions due to the still insufficient and diverse material databases to learn.2,7,11
As a promising approach, sparse modeling aims to extract linearity between explanatory and target variables.12,13 The method is powerful against small datasets owing to the simple linear assumption, and successful data analyses have been reported.12,14 Still, material and chemical properties appear from complex, non-linear atomic interactions.5,8 The linear approximation would face difficulties in expressing the non-linear system.
Here, we introduce quantum circuit learning (QCL), an emerging algorithm for supervised learning.15–18 QCL works on the rule of quantum mechanics. It can predict various parameters likewise to classical models. The current quantum systems (noisy intermediate-scale quantum computers: NISQ)19,20 face the problems of calculation noise and the limited processable number of information units (qubits). Nevertheless, the advantage of quantum nature would appear under the support of classical computers.19,20
Quantum machines and simulators have been examined in several fields, including quantum chemistry, combinatorial optimization, and machine learning.21–26 Quantum neural networks, such as autoencoders, and generative adversarial networks, are the main prototypes of quantum machine learning.21–24 They offer new potential as supervised or unsupervised algorithms.
Since QCL is a frontier for machine learning, few reports have been published on authentic regression tasks.15,17,27 The success of prediction with a toxicity dataset of organic molecules was reported,17 whereas the study was still conceptual. The prediction processes and advantages have been unclear, especially from chemical and material viewpoints.
Here, we conducted a more comprehensive study on QCL with standard datasets of one-dimensional functions and chemical properties. Both simulated and actual quantum computing was undertaken to clarify the challenges of QCL. Various hyperparameters were optimized for higher accuracy. The comparison with conventional models contrasted the benefits of the new approach: capable of learning both linear and non-linear functions even from small datasets. The property was also favorable for predicting the so-called extrapolating data region, which is essential for chemical and material research.
In the case of an n-qubit system, a 2n-dimensional complex vector is needed for the qubit state expression.28 This seems confusing from the viewpoint of standard Euclidean space, but it is required to express complex interactions of qubits, called quantum entanglement. Quantum systems become more complicated at exponential speed along with n, affording massively parallel computing, which is harder to be simulated by classical computers.
In a similar way to conventional machine learning, QCL treats a task of ŷ = fθ(x),15 where ŷ is the predicted value for a target parameter y, x is an explanatory variable, and θ is a trainable parameter. Generation of fθ only by a current quantum system (NISQ) is not feasible due to the limited computation power. Currently, only the prediction part of fθ(x) is assigned to the quantum system (or simulator), and other parts, such as loss calculation (e.g., (ŷ − y)2) and parameter optimization, are done by classical computers.15
A mathematical expression of QCL for regression is not so complex (Fig. 1, see the Experimental section for derivation). The initial 2n-dimensional quantum state vector, expressed by (1, 0, …, 0)T, is transformed into another state w = (w1, w2, …, wn2)T by multiplying two complex operational matrices of V(x) and U(θ) (eqn (1)). Then, ŷ is calculated from the squares of w1, w2, …, wn2 (eqn (2)).
(1) |
(2) |
In a quantum circuit, V(x) and U(θ) correspond to the encoding and interaction steps of qubits, respectively (Fig. 1). Before calculation, all directions of qubit vectors are set upward along with the z-axis (corresponding to a vector of (1, 0, …, 0)T). The vectors are changed by the rotation gates of V(x). The encoded states are further rotated and interacted with according to another matrix U(θ). Finally, the direction of one qubit (or theoretically any number of qubits) is observed to obtain the interacted result.15,17 The regression model can learn a variety of functions because of the universality of the quantum circuit28 and the non-linear transformation steps during prediction (i.e., x ↦ V(x), θ ↦ U(θ) and the final prediction by eqn (2)).15
In eqn (1), the naïve determination of V and U is not easy because they are 2n × 2n-dimensional matrices (i.e., over 1018 parameters with n = 30). In QCL, the matrices are prepared by repeated products of elementary gate components, such as Ri,x(t), Ri,y(t), and CNOTi,j (eqn (3), Fig. 2).
(3) |
R i,x(t) and Ri,y(t) are rotation gates, changing the i-th qubit state and affecting nothing against the others.28 One qubit state (without entanglement) can be visualized as an arrow in a sphere (Bloch sphere, Fig. 2a). The gates change the angle of an i-th qubit along with the x- (or y-) axis (eqn (4), eqn (5)). A CNOTi,j gate can switch the state of the j-th qubit according to the condition of the i-th qubit (Fig. 2b, eqn (6)). The gate is similar to an XOR operation in classical circuits. The three components are known as universal gate sets, which can make an arbitrary quantum circuit from their products.28
(4) |
(5) |
(6) |
Due to the restriction of quantum physics, the interacted state w itself is not observable by actual quantum systems. Instead, other parameters, such as the probabilities of upward (↑) or downward (↓) eigenstates, are experimentally observable (p↑ and p↓, respectively, Fig. 1). During actual quantum computing, such eigenstates are sampled via repeated calculations, and the probability difference between the two is calculated to obtain ŷ (eqn (7)).
(7) |
For clearer understanding, we calculated analytical solutions of ŷ with some simple quantum circuits (Table S1†). An example quantum circuit, encoding x = (x1,x2)T by four rotational gates, and interacting by one CNOT and one rotational gate, is shown in Fig. 3a. Even the simple circuit gives a very complex analytical solution of ŷ from eqn (2), consisting of repeated trigonometric functions (Fig. 3a). The complexity should mathematically correspond to the superparallel and entangled nature of the quantum system.
Regardless of the complex equation of ŷ in QCL, the value always ranges from −1 to +1 because of the unitary nature of the operational matrices, V and U (V†V = I, U†U = I, where † indicates complex conjugates).15,28 This study tries to clarify the actual effects of such complex yet systematic prediction algorithms for various regression tasks.
Fig. 4 Predicting the function of y = sin(x), x, or, ex−1 using QCL, support vector machine for regression (SVR), random forest regression (RFR), Bayesian ridge regression (BYR), and Gaussian process regressor (GPR) (RBF + Dot-product). In the case of QCL, circuit parameters of n = 2 and m = 3 are chosen. Predictions were made from state-vector calculations. Other results are shown in Fig. S2.† (b) Predicting the function of y = sin(x) using an actual quantum computer (IBM Quantum) with m = 2, 3, or 4. Models were trained by state-vector calculations. Full results are shown in Fig. S12.† |
Preliminary optimization of quantum circuits15,17 and our optimization revealed that the following configuration was concise and practical for regression tasks (for details, see results and explanations in Fig. S1†). First, the inputted value x should be encoded by two rotational gates Ri,y(xi) and Ri,x(xi). Then, neighboring qubits had to be made to interact with CNOT gates, with θ-dependent gate rotation Ri,y(θj)Ri,x(θj+1)Ri,y(θj+2). The CNOT interaction and rotation should be repeated m times (Fig. 3b and S1†).
Some 1notes on circuit design should be mentioned. Mitarai et al. proposed15 the use of sin−1 or cos−1 to preconvert xi for linear encoding of the inputted value (Table S1†). On the other hand, we noticed that the conversion induced unfavorable bending of ŷ around |xi| ≅ 1, giving prediction errors (Fig. S1†). The bending was caused by the large curvature of the inverse functions (sin−1, cos−1) near ±1.
For qubit interaction, three gates, such as Ry, Rx, and Ry, were introduced for one qubit. The selection was because at least three gates are needed for arbitrary rotation for the complex vectors (i.e., X–Y decomposition).15 Instead of using systematic CNOT gates,17 non-parameterized random qubit interactions, known as the Transverse-field Ising model, could be employed.15 However, the regression was unstable and sometimes failed, depending on the randomness of the qubit interactions with a small circuit depth m ≤ 3, which motivated us to use repeated CNOT gates (Fig. S1†).
A QCL circuit with the qubit number of n = 2 and the depth of m = 3 was selected to learn the one-dimensional functions. Here, practically a one-dimensional vector x = (x1,x1) was encoded in the two-qubit circuit, enabling the higher expressiveness of fθ(x) (Fig. S1†). The final output was scaled by a constant prefactor of two as a hyperparameter unless noted otherwise (ŷ ↦ 2ŷ).15,17
QCL circuit was able to learn versatile functions of y = x, sin(x/2), sin(x), and ex−1 (Fig. 4a and S2†). The machine learning model was trained with about six training datasets randomly sampled, and they predicted about ten random testing data. No significant prediction errors were detected (Fig. 4a and S2†).
The unique advantage of the QCL model is highlighted by comparing it with the conventional regression models.18 We examined standard algorithms of support vector machine for regression (SVR), random forest regression (RFR), Bayesian ridge regression (BYR), and Gaussian process regressor (GPR) (Fig. 4a, S2, S3 and Table S2†).5 SVR works on the kernel trick, enabling the learning of even non-linear functions. RFR is a standard ensemble model of decision trees. Its reliable prediction is frequently employed with experimental material databases.5 BYR is a robust model for probabilistic linear regression, potentially strong against small datasets. GPR is similar to SVR, but its stochastic process can offer more flexible fitting with smaller datasets.
SVR, RFR, and BYR could not predict either y = x, sin(x), or ex−1. Prediction in extrapolating regions, where xi and ŷ range out of the training datasets, was unsuccessful (Fig. 4a and S2†). The SVR model assuming a non-linear Gaussian kernel mimicked sin(x), but gave a bent curve in the untrained regions of y = x. RFR displayed similar responses to SVR, with the unfavorable step-wise prediction by the noncontinuous decision-tree algorithm. The linear BYR model was predictable of y = x, but never of sin(x). Even though their hyperparameters were changed, the three models could not predict the functions (Fig. S3†). Due to the algorithm biases, many conventional supervised models could not switch linear and non-linear predictions.
We also examined more complex machine learning models, GPR and multilayer perceptron (MLP). GPR with radial basis function (RBF)-type kernels offered promising predictions due to their non-linear and stochastic algorithms (Fig. 4a, S3†).18 MLPs with different activation functions (ReLu, tanh, and sigmoid) and hidden layer numbers (1 to 4) did not afford sufficient performances. The models could switch linear and non-linear functions, but larger errors were obtained because of too many trainable parameters against the inputted data.
The performances of the regression models were validated by repeating the random data preparation and learning processes 30 times (Fig. S4, Table S3†). On average, two GPR models with RBF or RBF + Dot-product kernels exhibited the smallest error. The following best was the QCL regression: the model, capable of handling linear and non-linear interactions from small databases, offers a new option for solving regression tasks of various datasets.
We calculated |wi|2 for the trained QCL models of y = sin(x) and x (Fig. 5a, b and S6†). Bending curves were observed for each term against x even with the linear function of y = x. This means that the QCL model worked through non-linear processes even with linear systems. The final output was made from the slight difference of |w1|2 + |w2|2 and |w3|2 + |w4|2.
Fig. 5 (a) Prediction process of y = sin(x) by a trained QCL model (m = 3, n = 2) or multilayer perceptron (MLP, 8-dimensional hidden layer and activation function of ReLu). The model was trained with 24 random records (gray plots). Black lines show predictions, and colored lines represent latent variables. (b) Prediction process for y = x. (c) Extrapolating predictions by QCL, GPR (RBF), and MLP models. After randomly generating 100 points, 70% of the data with high y were selected as testing (extrapolating) data. Full results are shown in Fig. S7 and S8.† |
As a control for QCL, MLP regressors were examined for prediction. The MLP model contained one 8- (or 16-) dimensional hidden layer for prediction with the standard non-linear activation function of ReLu or tanh.29 The overall design of QCL and MLP is slightly similar (Fig. S5b†). Both models encode x to the first latent vector w′, convert into another state of w by a θ-dependent transformation, and finally calculate ŷ from w. The main differences are (a) QCL maintains complex latent variables, whereas MLP usually has real numbers, and (b) only linear (or more precisely, unitary) transformation is available by QCL during the conversion of w′ to w.
MLP models were not predictive of the one-dimensional functions with small training data (around 20 records, Fig. 5a, S6†). A simple formula of y = sin(x) could not be fitted by MLP, even though different hyperparameters were employed (hidden layer sizes of 8 or 16 and activation functions of ReLu or tanh, Fig. S6†). The simplest y = x was successful, yet y = sin(x/2) and ex−1 were partially failing. Although complexing the circuit design, deep learning, will enhance fitting results, it also induces overfitting problems and requires larger datasets.
The QCL model is specialized in mimicking the gently sloping, non-linear functions. The model gave better performances in predicting linear, ex−1, and sin(x/2) functions compared to GPR and MLP (Fig. 5, S7 and S8†). The prediction error did not change drastically even though the extrapolation ratio in the dataset was changed from 10% to 90%, whereas others did (Fig. S8†). On the other hand, poorer results were obtained with steeper functions, sin(x) and sin(2x). The QCL model with the current parameter was unable to fit sin(2x) (n = 2, m = 3, Fig. S 2†). The results indicated that the current QCL model was specialized in predicting gently sloping, non-linear functions.
Of course, QCL models can be tuned to fit sin(2x) by optimizing hyperparameters (e.g., change the scaling prefactor c for prediction of ŷ ↦ cŷ, Fig. S9†). With a larger prefactor, more accurate fitting was achieved by QCL. However, the steeper fitting simultaneously spoiled the extrapolating prediction, yielding larger errors; the tuning to complex curves induced a side effect against extrapolation tasks. In this article, we decided to focus on the gently sloping, non-linear aspect of QCL with the specific configuration mentioned before. The character was also beneficial in predicting molecular properties from chemical structures (vide infra).
The smooth characteristics of the QCL model originated from the small trainable parameters and the regularization effect of quantum gates. The model had much smaller trainable parameters than MLP. The dimension of θ for QCL (n = 2, m = 3) was only 15, whereas 27 and 51 parameters were needed even for the unsuccessful MLP models (with hidden layer dimensions of 8 and 16, respectively). The smaller trainable parameters and continuous sinusoidal basis resulted in smooth curves. Furthermore, the unitary restriction of |w1|2 + |w2|2 + |w3|2 + |w4|2 = 1 should also have suppressed the outlier prediction as the regularization.
The smooth regression design was beneficial to fitting functions with noises (Fig. S10†). QCL, GPR, and MLP models were fitted with sin(x) or x with Gaussian random noises. QCL and GPR could basically fit the data when the noise level was 0 to ca. 40% to the original functions. On the other hand, the predictions by standard MLP were easily bent unnaturally because of the overfitting of the noised data. The smaller trainable parameters and unitary restriction of QCL should have contributed to adequate noise tolerance.
In summary, the current QCL model has a chance to outperform conventional linear and non-linear regression algorithms when smooth curves are supposed with the original datasets. Although the actual chemical and material systems do not always meet the requirement, the success in sparse (linear) modeling5,11 encourages researchers to expand the idea to smooth non-linear functions by QCL or other algorithms. Further tuning of QCL models will also offer capabilities of more complex functions, which should be examined in future research.
Instead of calculating state vectors using eqn (1), prediction can also be made by observing the actual quantum system's eigenvalues: this is the real QCL (eqn (7)). Calculation cost will not increase exponentially because nature automatically does the calculation according to quantum mechanics.
The probabilistic sampling was examined with an IBM quantum computing machine (Fig. S 12†). The model was trained via the state-vector method. Then, we calculated statistical probabilities of upward (↑) or downward (↓) eigenvalues (p↑ or p↓) from the quantum system to predict ŷ = p↑ − p↓ (eqn (7)).
Quantum sampling suffered from more significant prediction errors than the classical state-vector calculation. The mean squared error (MSE) for the training dataset of y = sin(x), with a circuit of qubit number n = 2 and depth m = 2, was 0.0007 and 0.15 for state-vector and quantum sampling methods, respectively. When the circuit depth was increased to 3 or 4, the predicted values did not look like the original trigonometric curves. The errors were mainly caused by the computational noise of the quantum system (Fig. S12†).19 For practical usage, the number of quantum gates in the circuit must be reduced to suppress the effects of noise. More facile access to quantum machines is also essential because calculation takes about 101–103 seconds to predict just one record by the heavily crowded cloud system. The superparallel advantage of quantum machines for QCL will be achieved when the computers can handle large qubit numbers n (≫ 10) with negligible noise and prompt server responses.
Apart from hardware, the development of theoretical approaches is also essential. For instance, QCL accepts the limited domain of ŷ and xi. The unitarity of operational matrices restricts the predicted value of −1 ≤ ŷ ≤ 1. Although not mandatory, the explanatory variable xi should range in −π ≤ xi ≤ π owing to the periodicity of trigonometric functions in rotational gates (eqn (4) and (5)). For practical regression tasks, linear or non-linear conversion may be needed, whereas xi and y were set in [−1,+1] in this theoretical study (e.g., use of sigmoid: 1/(1 + e−x) and logit: log(ŷ/(1 − ŷ))).
As explanatory variables, molecular features in the databases were calculated by a conventional ca. 200-dimensional descriptor algorithm of RDKit.31 The method can facilely quantify molecular characteristics by various indicators, such as molecular weight and the number of specific atoms in a molecule. Due to the high calculation cost of QCL, the descriptors were compressed to an 8-dimensional vector by principal component analysis.32 All explanatory and target variables were normalized in [−1,+1].
Small datasets were prepared artificially, assuming the actual materials informatics projects. From the master databases, 8, 16, 32, 64, 128, 256, or 512 records were sampled randomly. Then, the top 20% records of y were extracted as the testing data: these were model tasks for extrapolating regression. The random selection and regression tasks were repeated 2000/(dataset size) times for statistical verification (Fig. S13†).
QCL improved the prediction performance more than conventional models with several conditions. For instance, QCL exhibited the smallest MSE of 0.25 for the testing data, with the melting point database of 64 random records (Fig. 6a). Larger errors were observed with other models (RFR: 0.35, SVR: 0.30, BYR: 0.57, GPR: 0.61). Most of the ŷ by RFR and SVR ranged in the region of only trained y, meaning that extrapolation was unsuccessful, due to their decision-tree and radical basis kernel-based algorithms.2
Fig. 6 Regression results for chemical properties. (a) Actual and predicted parameters for the melting point dataset, using QCL (n = 8, m = 3) and other regressors. Dataset size was 64. The top 20% of y records were extracted as testing data, whose MSE is shown as orange numbers. (b) MSE for the regression tasks of melting point as a function of dataset size. Datasets were generated randomly and repeatedly. Transparent regions show standard errors with 68% confidence intervals. Results for PLS are not shown because the average MSE was too large. (c) Results for ESOL. The results with other databases (Fig. S14†) and results for interpolating tasks are shown in Fig. S15 and S16.† RBF + Dot-product was used for GPR. |
Linear-compatible models of BYR and GPR made some extrapolating predictions, exceeding the maximum y of training records (Fig. 6a). However, the model underestimated several test cases, giving large MSEs of 0.57 and 0.61, respectively. Another linear regression algorithm, partial least squares regression (PLS), was also examined as a regular model for materials informatics.33 Nevertheless, the model suffered from the largest MSE of 0.94. We doubt that the linear models could not faithfully catch up with the nonlinearity of the current experimental systems.
The models' performances were examined by repeating the random shuffling and regressions (Fig. 6b, c, S14†). Up to the dataset size of 100, QCL almost displayed the smallest error of the models (Fig. 6b). The quantum model was also robust against tiny datasets of ESOL and Solv (Fig. 6c and S14†). The QCL model also benefited from regular interpolating regression tasks, where 20% of testing data were sampled randomly (Fig. S15†). The model exhibited the best performance with the ESOL datasets, up to 32 records. Naturally, other models sometimes outperformed QCL under different conditions. There is no omnipotent algorithm applicable to any problem (no-free-lunch theorem).34 More careful analysis of predicting processes for each case is needed to pursue better performances in future research.
Although we currently have no clear clue about the remarkable performances of QCL, the gently sloping assumption of datasets might be a key to prediction. As demonstrated with the one-dimensional functions, the QCL model could fit linear and smooth curves (Fig. 4). If the experimental molecular structure (x)–property (y) relationships were not so fluctuated, their data trend could be mimicked by QCL. We are examining the data trends more carefully by considering the multivariable factors and distinguishing which functions are suitable for QCL.
A drawback of QCL for material exploration is the limited dimension of explanatory parameters. If conventional models conducted regressions without dimension reduction, they offered better performances than QCL (Fig. S14 and S15†). From another perspective, however, we can understand that the still large prediction errors by QCL were soluble by expanding the dimension. Preliminarily selecting essential parameters by other methods, such as sparse modeling,12 will also be critical to utilize QCL.
The encoding method of x to quantum circuits is also a challenge of QCL.15–17,26 The current model did not require two qubits for one variable of xi, in contrast to one-dimensional regressions (Fig. S1†). No significant improvement in prediction was detected even though the descriptors were compressed to 4-dimensional vectors and inputted to the 8-qubit model (i.e., x = (x1,x1, x2,x2, …, x4,x4), Fig. S16†). The success may be explained by the exponential nature of the state vectors (i.e., 2n-dimensional vectors and fully connected interactions). Encoding multiple values to one qubit is gradually becoming possible,26,28 whose circuit optimization will also increase the dimensions of explanatory parameters.
(8) |
For an n-qubit system, 2n-dimensional bases are needed to describe an arbitrary quantum state because of quantum entanglement.28 Their bases can be made by the tensor products of |0〉 and |1〉 (eqn (9)).
(9) |
By quantum computing, the initial state of |0…0 = |0〉 ⊗ |0〉 ⊗…⊗|0〉 = (10…0)T is transformed into another form of |ψ〉 by repeated application of quantum gates (e.g., Rx, Ry, and CNOT), as unitary matrices (eqn (1) and (3)). In actual quantum systems, the state vector |ψ〉 itself cannot be observed, but expected values of some Hermitian operators are observable. Although there are many observation ways, the most straightforward and popular operation is to detect upward (↑) or downward (↓) eigenvectors for one qubit against the z-axis (eqn (7)).15,28 Its mathematical expression can be given by applying a Pauli's Z operator to the first qubit in a circuit (eqn (10)).
(10) |
For actual quantum computation and its simulation, predictions were conducted using a Qiskit library.40 For higher accuracy, the training parameter θ was preliminarily set by the state vector calculation by Qulacs. During prediction, sampling was repeated 1000 times for one record to obtain p↑ and p↓. The cloud service of IBM Quantum systems was used for quantum computing. Five-qubit systems were employed for sampling (mainly using a machine named ibmq_quito, having a quantum volume of 16 and clops of 2.5 K).
The following conventional models were introduced using the scikit-learn module: support vector machine for regression (SVR, radial basis function kernel), random forest regression (RFR, 100 decision trees), Bayesian ridge regression (BYR), Gaussian process regressor (GPR), partial least squares (PLS) regression (default dimension of 8). GPR models were constructed using some selected kernels plus a white kernel. Unless noted otherwise, default hyperparameters were used. MLP models were prepared using a Keras library.41 The model had a one-dimensional input layer, one (or multiple) 8- or 16-dimensional hidden layer(s), and a one-dimensional output layer (multiple hidden layers were examined in Fig. S3 and S4†). Relu, sigmoid, or tanh activation functions were introduced in the model. All training data (24 records) were simultaneously inputted into the model, using MSE loss and Adam optimizer. Due to the limited records, training was systematically repeated for 1000 epochs without making validation datasets.
For regression, quantum circuit models with n = 8 and m = 3 were employed. Datasets were prepared by randomly sampling 8, 16, 32, 64, 128, 256, or 512 records from the master databases (Fig. S13†). All variables (y,xi) in each dataset were normalized in the range of [−1,+1]. As testing data, 20% of the top y records were extracted in Fig. 6 and S14.† Random 20% data were extracted for testing in Fig. S15 and S16.† The random dataset generation and prediction processes were repeated 2000/ (dataset size) times for statistical verification. The figures display the test data's mean squared error (MSE) as box plots. The maximum y-axis was set to be 4 for easier understanding (excessive outliers are not shown in the graphs). Unless noted otherwise, default hyperparameters of the scikit-learn library were used for the conventional models.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2dd00090c |
This journal is © The Royal Society of Chemistry 2023 |