Sonaldeep
Halder
a,
Anish
Dey
b,
Chinmay
Shrikhande
a and
Rahul
Maitra‡
*a
aDepartment of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India. E-mail: rmaitra@chem.iitb.ac.in
bDepartment of Chemical Sciences, Indian Institute of Science Education and Research Kolkata, West Bengal 741246, India
First published on 17th January 2024
The development of various dynamic ansatz-constructing techniques has ushered in a new era, making the practical exploitation of Noisy Intermediate-Scale Quantum (NISQ) hardware for molecular simulations increasingly viable. However, such ansatz construction protocols incur substantial measurement costs during their execution. This work involves the development of a novel protocol that capitalizes on regenerative machine learning methodologies and many-body perturbation theoretical measures to construct a highly expressive and shallow ansatz within the variational quantum eigensolver (VQE) framework with limited measurement costs. The regenerative machine learning model used in our work is trained with the basis vectors of a low-rank expansion of the N-electron Hilbert space to identify the dominant high-rank excited determinants without requiring a large number of quantum measurements. These selected excited determinants are iteratively incorporated within the ansatz through their low-rank decomposition. The reduction in the number of quantum measurements and ansatz depth manifests in the robustness of our method towards hardware noise, as demonstrated through numerical applications. Furthermore, the proposed method is highly compatible with state-of-the-art neural error mitigation techniques. This resource-efficient approach is quintessential for determining spectroscopic and other molecular properties, thereby facilitating the study of emerging chemical phenomena in the near-term quantum computing framework.
The use of neural network-based ML models to represent quantum states has been widespread in the realm of classical many-body theories and error mitigation protocols.19–33 The ability of these models to proactively learn the intricate interrelation between different basis functions that span the N-electron Hilbert space corresponding to a quantum state can be leveraged to generate its low complexity representation in quantum computers. The learned state, often called the neural quantum state (NQS), forms the backbone of neural quantum state tomography (NQST)34 and neural error mitigation35 (NEM). This manuscript entails the utilization of a Restricted Boltzmann Machine (RBM),36–38 a powerful regenerative ML model, to construct an expanded wavefunction in terms of the dominant many-body basis after learning the correlation from what may be ascribed as the “primary excitation subspace”. Simple many-body perturbative measures intimately guide this process. The generated ansatz corresponding to this expanded wavefunction further involves the inclusion of a suite of two-body operators with an effective one hole–one particle excitation – the so-called scatterers – resulting in an extremely low-depth yet highly expressive ansatz. Additionally, our method can be efficiently integrated with NEM, enhancing its efficacy for NISQ implementation.
In section 2.1, we briefly introduce the RBM's functioning and the generative process deployed to produce the dominant contributors (many-body basis) for the wavefunction expansion. In section 2.2, we set the background for the disentangled Unitary Coupled Cluster (dUCC) ansatz39 and Variational Quantum Eigensolver (VQE),4 which will be used to generate and optimize the wavefunction. In section 2.3, we introduce an innovative protocol that combines RBM and second-order Möller–Plesset perturbation theory (MP2) to craft a compact ansatz, leveraging the use of scatterer operators.
To model a probability distribution of m-dimensional training data in the form of computational basis measurements, an n-dimensional hidden representation of the state is constructed. A bias is assigned to each visible and hidden unit. A weight matrix is set to establish a connection between the visible and the hidden layers, which are of dimensions n × m. Connections only exist between the visible and hidden units, not between the units of the same layer. This restriction in the network topology makes RBMs different from conventional Boltzmann machines. RBM aims to find the optimal values of the biases and the weight matrix so that the probability distribution modeled explains the observed data well. RBM constructs a joint probability distribution for the configuration {v, h}
(1) |
(2) |
E(v, h) is parameterized by the biases and weights. The parameterized form of E(v, h) is written as
(3) |
The model is trained to minimize E(v, h). To do that, the log of likelihood function is constructed
(4) |
(5) |
(6) |
It must be noted that obtaining the derivative of a log-likelihood function can become intractable. Contrastive Divergence (CD), Persistent Contrastive Divergence (PCD) and Parallel Tempering (PT) are some of the approximations that are commonly employed to overcome this.
The model is trained on a collection of binary vectors (representing a many-body basis), with their probability distribution ascertained by measuring an initially prepared approximate wavefunction. The hidden layers of the model decipher the correlation existing within this wavefunction. Once the training is complete, the model generates new binary vectors, preserving the correlation it learned before. The generation is accomplished through Gibbs sampling. Starting from a given binary input vector (v0), a hidden layer representation (h0) is calculated based on p(h|v0). This, in turn, generates a new representation of the visible layer based on p(v|h0). This recursive process generates new binary vectors v′. They correspond to the most dominant contributors to the wavefunction of a given many-fermion system. The exact parametric representation of this state is built up using the dUCC ansatz, which is further variationally optimized. This leads to section 2.2, where we briefly discuss the composition of the dUCC ansatz and the variational principle employed to optimize the ansatz parameters.
|ψ(θ)〉 = Û(θ)|Φo〉. | (7) |
For the dUCC ansatz, this unitary is characterized by a pool of ordered, non-commuting, anti-hermitian particle–hole operators ({}) with the reference state taken to be the single Hartree–Fock (HF) determinant |Φo〉 = |χiχj…〉, with χs being the spin-orbitals. The ansatz can be written as
(8) |
μ = μ − †μ | (9) |
μ = †a†b…ji | (10) |
In the above equations, μ represents a multi-index particle–hole excitation structure as defined by the string of creation (†) and annihilation () operators with the indices {i, j, …} denoting the occupied spin-orbitals in the Hartree–Fock state and {a, b, …} denoting the unoccupied spin-orbitals. μ acts on the reference state to generate a many-body basis,
μ|Φo〉 = |Φμ〉 | (11) |
In a quantum computing framework, the operators described in eqn (7)–(11) are realized in terms of quantum gates and computational basis. Required transformations are carried out using standard mapping techniques. As such, these many-body bases ({|Φμ〉}) can also be represented as binary vectors (or bit strings). In this work, these vectors are often referred to as configurations.
For all practical applications, the dUCC ansatz is constructed using only a subset of the total possible excitation operators (μ). The corresponding parameters are optimized by invoking variational minimization of the electronic energy.
(12) |
In adherence to the Rayleigh–Ritz principle, the variationally obtained minimum energy gives the upper bound to the exact ground state energy (Eo) for the given molecular Hamiltonian.
(13) |
The inherent expressibility40 of the chosen ansatz assumes a pivotal role in generating a trial state that recovers a substantial amount of correlation energy. It can be achieved through the incorporation of higher-order excitations (μ) in the ansatz. However, implementing such an ansatz on quantum hardware requires very deep quantum circuits. In light of the limitations posed by current quantum devices, the execution of such circuits becomes unfeasible. Developing novel techniques to generate a compact and expressive ansatz becomes essential for practically utilizing quantum computing platforms for accurate molecular energy calculations.
|ΨSD〉 = ÛSD(θopt)|Φo〉 | (14) |
(15) |
Here, Σm denotes the set of bit strings of length m. Eqn (15) denotes the expansion of |ΨSD〉 in the computational basis ({|X〉}). During the variational optimization of parameters associated with ÛSD(θ), one may choose to do a partial optimization. As will be evident in the subsequent steps, it is the relative |CX|2 (eqn (21)) that are important and not their exact values.
(16) |
{S} → single excitations | (17) |
{D} → double excitations | (18) |
The probability order for the singles and doubles operator blocks independently follows
(19) |
(20) |
(21) |
The ground and various excited determinants are mapped from the many-body basis to the computational basis:
(22) |
|Φμ〉 = μ|Φo〉 ∀μ ∈ ÛSD | (23) |
The wavefunction generated using ÛdUCCRBM(θ) at this stage spans what can be called as the primary excitation subspace. An approximate wavefunction, which only occupies the primary excitation subspace, captures limited correlation.
(24) |
{|Yμ}〉 represents the dominant contributors to the secondary excitation space.
(25) |
[, D] → T | (26) |
The scatterers can be configured in a nested commutator form to generate even higher-order excitations such as quadruples.
(27) |
Such implicit generation of the high-rank excitations is only possible when the scatterers possess specific destruction orbital labels that are common to one of the indices of the cluster operators such that they satisfy non-commutativity. Let us say we only account for dominant triples generated by RBM. In that case, we take suitable non-commuting scatterers that combine with rank two excitation operators already present in ÛdUCCRBM leading to the desired triples. The appearance of the high-rank excitations through the nested commutators is a direct consequence of the disentangled structure of the unitary (see S2 in the ESI†). Thus, the chosen scatterer and the excitation operator (with which the scatterer is non-commutative) are paired up in a factorized manner. After such incorporation, ÛdUCCRBM can be written as:
(28) |
The overall depth of ÛdUCCRBM is greatly reduced due to the introduction of scatterers and will be evident in the Results and discussion section.
The interwoven structure of the ansatz that includes high rank correlation through its decomposition into lower rank operators is reminiscent of a double exponential (factorized) coupled cluster ansatz41 and its unitarized variant.7 Such double exponential structure of the waveoperator in terms of two-body operators is crucial for the exactness of the wavefunction for which the variational minimum analytically satisfies the contracted Schrödinger equation (CSE)42 – a necessary and sufficient condition for the wavefunction to satisfy the Schrödinger equation. Contrarily, the wavefunctions generated from a single exponential with generalized two-body operators43,44 do not satisfy the CSE.42 The CSE and its anti-hermitian variant (ACSE) leads to the direct determination of energy and two-electron reduced density matrices of many-electron molecules45,46 and is shown to be more expressive and efficient than the UCC ansatz. The ACSE has been implemented for quantum simulations, both on simulators and devices47–51 at the cost of shallow quantum circuits.
(29) |
The most dominant combination is ascertained by the largest value of the product of the MP2 measures of the associated scatterers, given it exceeds a predefined threshold (set to 10−6 here). The higher-order excitation (T in eqn (29)) is excluded from the ansatz when no combination meets this criterion. This approach is based on the rationale that a more substantial MP2 value of the scatterer enhances its ability to facilitate connections between the low-rank and high-rank excitations, which should be from a many-body perturbation theory viewpoint. Subsequently, the included high-rank excitations are reintroduced to the model (in the form of binary vectors) for further training. The probabilities for these high-rank excitations are determined by the product of two factors: the probability of the double excitation from |ΨSD〉 and the squared modulus of the MP2 values of the scatterers involved in creating the desired high-rank excitation. Steps 4 to 6 are iterated with the improved training set until the RBM ceases to generate new high-rank configurations. This marks the termination of our protocol. The resultant ansatz (ÛdUCCRBM), which we will call the RBM-dUCC ansatz, represents the final output of this protocol.
The expansion of the initial wavefunction to incorporate the contribution from high-order excitations occurs in steps. First, we run the sequence of the abovementioned steps, starting from |ΨSD〉 to target all dominant triply excited configurations. When RBM does not produce any new such configurations, we stop the generation procedure. The ansatz, (ÛdUCCRBM), at this stage, can be denoted as RBM-dUCCSDTS, where the subscript S on T signifies that this ansatz generates triply excited configurations through the utilization of scatterers. It may be noted that while we focus on generating the secondary subspace through triply excited configurations at the leading order, a scatterer that is present later in ÛdUCCRBM may combine with it to form a quadruply excited configuration. Although we call our ansatz RBM-dUCCSDTS, it may still implicitly produce some higher excited configurations, such as quadruples. This event is fortuitous and results in higher-than-desired expressibility.
The optimization of the RBM-dUCCSDTS parameters is performed in the VQE framework. The resultant state, which now expands the secondary subspace, contains dominant triply excited configurations and may contain higher order excitations due to fortuitous combinations. This state can be fed to the protocol described in steps 2 to 6 to further produce explicit quadruples by constructing RBM-dUCCSDTSQS. This would require explicit cascades of scatterers as described in eqn (27). Such a procedure can be continued to make the ansatz even more expressive. It is to be noted that as the single excitation operators appear at the end, no scatterer can act on them to produce redundant configurations. A real quantum device has inherent noise, potentially introducing errors in obtaining |ΨSD〉. However, once a regenerative ML model (such as RBM) learns the probability distribution, it can mitigate the errors by employing powerful NEM. As the remaining steps do not involve any quantum measurement, no additional error accumulates. The ease of integrating NEM into our method adds to its elegance.
Before proceeding further, it is important to discuss the feasibility and costs associated with learning a quantum state using a neural network, specifically in the context of Step-3 in our protocol. This process draws parallels with Neural Quantum State Tomography (NQST),34 an efficient machine learning-based Quantum State Tomography (QST) technique that sidesteps the substantial computational costs of traditional brute-force QST methods. In their work, Torlai et al.34 showcased the effectiveness of optimal Restricted Boltzmann Machine (RBM) models in efficiently performing QST for highly entangled states described by over a hundred qubits. Consequently, our approach, similarly employing RBM to learn the initial quantum state, holds promise for application in large-scale systems. Once the machine learning (ML) model is trained, it deciphers correlations within the learned state and symbolically generates dominant excited determinants. This generative process utilizes Gibbs sampling37 to directly generate dominant determinants based on the weights and biases of visible and hidden nodes set during training. Importantly, we avoid exhaustive searches through the entire space of possible determinants to identify dominance. The symbolic knowledge of dominant determinants is then used to construct an appropriate shallow depth ansatz. Hence, using an optimal RBM structure in terms of the number of hidden nodes, learning rate, etc.,52 one can efficiently learn an approximate state and accurately generate dominant determinants. Enhancing the efficiency of the generation process involves configuring the model or its components to preserve the spatial and spin symmetry of the generated determinants. In our study, as detailed in the Results and discussion section, we implement tower sampling to retain spin symmetry. Generating dominant determinants using RBM is conceptually a variant of selected configuration interaction. As elucidated by Herzog et al.,33 the main challenge in this RBM-based approach arises from verifying whether the RBM-generated excited determinants are already incorporated within the ansatz at a given step. For every proposed determinant, this scales as O(Ndet2),33 with Ndet being the number of determinants already incorporated “within” the ansatz. With the increase in system size, this does not requisitely approach the complexity of Full Configuration Interaction (FCI) as Ndet represents the set of dominant configurations, which practically increases sub-exponentially with the system size, even for highly correlated systems.
To summarize our procedure, we first obtain a low-level approximation to the wavefunction (using a low-rank ansatz such as dUCCSD) from a quantum device and learn the probability distribution using RBM. The learned wavefunction is now expanded iteratively with the help of many body perturbative measures. We end up with a highly compact ansatz capable of inducing high-order configurations required to describe the correlation effects properly. The overview of our procedure is depicted in Fig. 2. In section 3, we showcase the efficacy of this method by generating dominant triply excited configurations starting from a dUCCSD ansatz. Moreover, we describe the significantly low gate depth of the constructed ansatz. We consider a variety of molecules at various geometries to carry out this study, highlighting our method's general applicability.
Apart from the energy accuracy, we also provide the circuit depth for conventional dUCCSD, dUCCSDT, and RBM-dUCCSDTS. As can be discerned from Fig. 3, the energy obtained after variationally optimizing RBM-dUCCSDTS is very close to that obtained using conventional dUCCSDT with a difference of Hartree, all while using far fewer CNOT gates (a measure of circuit depth). The former even has fewer CNOT gates than conventional dUCCSD. This tremendous reduction in the number of CNOT gates reflects the high suitability of the RBM-generated ansatz for the NISQ hardware. As the energy for RBM-dUCCSDTS is not below the conventional dUCCSDT for any of the tested systems, the fortuitous generation of higher-than-desired configurations (that is, beyond triples) has not occurred here. As we explicitly target triply excited configurations, another important assessment would be to check the overlap between the optimized wavefunctions generated using conventional dUCCSDT and RBM-dUCCSDTS ansatzes. Thus, we provide the overlap between these two wavefunctions in Fig. 4. It can be discerned from this plot that the overlap difference from unity is or less. This indicates the ability of RBM to recognize the dominant triply excited configurations properly. Additionally, it highlights the efficacy of MP2 theory in guiding ML predictions.
Fig. 4 Difference of the overlap (S) of optimized wavefunctions generated by the RBM-dUCCSDTS ansatz (ΨdUCCSDTRBMS) and that by conventional dUCCSDT (ΨdUCCSDTConv) from unity. Here, S = 〈ΨdUCCSDTRBMS|ΨdUCCSDTConv〉 and the molecular systems taken are identical to that used during the study depicted in Fig. 3. |
A comparison of our protocol with the popular ADAPT-VQE6 is presented in Fig. S1 of the ESI.† Under the noiseless case, the ADAPT protocol does produce an exceptionally compact ansatz capable of capturing large correlation energy (with minutely less accuracy than RBM-dUCCSDTS). However, the noise present in NISQ hardware can impair ADAPT's ability to generate an optimum ansatz with sufficient expressibility.40 This point is further elaborated in section 3.2. RBM-dUCC construction only requires measurements during the initial state preparation and, thereby, is substantially resilient to noise (corroborated by Fig. 5 and Table S1 of the ESI†).
Another challenge that requires scrutiny is the capacity of our devised method to effectively handle systems characterized by pronounced multireference traits in their ground states against other known to be parameter-efficient methods. For this purpose, we test our protocol on a linear chain H6 (STO-3G) with the nearest neighbor bond distance set to 2 Å. We compare our results with the k-uCJ ansatz,56 which is known for exceptional performance within multireference systems. As can be discerned from Table 1, the results produced by RBM-dUCCSDTS are comparable in terms of variational parameters (an indirect indication of circuit depth) and accuracy. However, our dynamic methodology possesses the advantage of tailoring the ansatz based on the specific correlations inherent in a molecular system. This flexibility is in contrast to fixed-size ansatzes like k-uCJ, which only consider the molecular size while building the ansatz. This adaptive nature is crucial for constructing a shallow depth ansatz while dealing with low to moderately correlated systems.
Ansatz used for ground state energy evaluation | Number of variational parameters | Error in correlation energy (in percentage) |
---|---|---|
1-uCJ | 77 | 1.40 |
2-uCJ | 154 | 0.05 |
RBM-dUCCSDTS | 127 | 0.48 |
1. Single qubit gate errors composed of a depolarizing error channel followed by the thermal relaxation error channel on the respective qubit.
2. Two qubit gate errors composed of a two-qubit depolarizing error channel followed by single qubit thermal relaxation error on the involved qubits.
3. Single qubit readout-errors on measurement.
Under the same noisy backend, we also provide a comparative performance of conventional dUCCSD and dUCCSDT ansatzes. For variational optimization of the ansatz parameters, we use the Simultaneous Perturbation Stochastic Approximation (SPSA)57 with the maximum iteration set to 500 and the initial point taken to be zero for all parameters. In general, SPSA is considered to be a good optimizer under noise as it requires fewer measurements during the optimization as compared to CG. The number of shots used to obtain the requisite expectation values was set to 10000. To reduce the time of measurement simulations, we levied the normal distribution approximation on the expectation values.54 The resultant outcomes are portrayed through the graphical representation in Fig. 5. For this study, we used linear BH (rB–H = 2.25 Å). The large number of CNOT gates in the dUCCSDT ansatz (≈19000) results in a substantial accumulation of error. This renders the VQE optimization scheme completely useless. Even in the case of conventional dUCCSD, the trajectory is not well-behaved and leads to higher energy. RBM-dUCCSDTS provides the best results in terms of accuracy and stability. This is a direct effect of exceptionally few gates in this ansatz. We performed analysis on two variations of RBM-dUCCSDTS.
1. RBM-dUCCSDTS – 1: the initial state (|ΨSD〉) was prepared under the noise and was used to train RBM. The final ansatz obtained after the sequence of steps outlined in section 2.3 was also variationally optimized under the same noise.
2. RBM-dUCCSDTS – 2: the initial state (|ΨSD〉) was prepared with noiseless simulation. However, the final ansatz generated through RBM was optimized under noise.
For both variations, the accuracy is superior to conventional dUCCSD. Although we cannot produce a noiseless initial state in real quantum hardware, our study aims to show the deteriorating effect of using a noisy initial wavefunction during the generation of the RBM-dUCC ansatz. However, in this case, one can seamlessly use NEM to mitigate the errors and obtain a well-trained RBM. This would make the red curve (Fig. 5) move down towards the orange one. One can easily extend NEM to the final generated ansatz, too. This would further enhance the accuracy. It must be noted that in the study depicted in Fig. 5, we are not focussing on the absolute accuracy of the various ansatzes but their relative behavior on average. One must apply layers of error mitigation protocols (including NEM) to bring the results within the chemical accuracies. As the depth of the RBM-dUCC ansatz is substantially low, the cost overhead when applying such error-mitigating protocols would, in general, also decrease.
A standout feature of our methodology lies in its ability to operate without expectation-value evaluations once an approximate initial state is established. This provides two significant advantages: it reduces the measurement overhead and becomes particularly beneficial when considering the inherent errors associated with measurements on NISQ hardware. Any ansatz constructing methodology that heavily relies on such expectation terms to tailor the ansatz would inevitably grapple with these errors, resulting in a final ansatz that significantly deviates from the ideal one. Hence, an optimal protocol must generate an ansatz well-suited to the coherence time of NISQ hardware and achieve this in a manner that minimizes susceptibility to noise. The protocol expounded in this article adeptly incorporates both of these essential features as exemplified in Fig. 3 and 5. Methods such as ADAPT-VQE,6 which depend on expectation value measurements during ansatz construction, are highly likely to show some complications under hardware noise. In the ESI,† we present a comparison of the performance between the ADAPT ansatz and RBM-dUCCSDTS. It is noted that the ADAPT ansatz exhibits some sensitivity to noise despite generating an impressively shallow depth ansatz in noiseless simulations. In contrast, RBM-dUCCSDTS performs consistently well both with and without noise. For this comparison, we do not use additional error mitigation techniques.35,58–63
Footnotes |
† Electronic supplementary information (ESI) available: A detailed description of the construction of the Restricted Boltzmann Machine (RBM), a brief discussion on scatterer operators and a comparison between RBM-dUCC and ADAPT ansatzes. See DOI: https://doi.org/10.1039/d3sc05807g |
‡ Centre of Excellence in Quantum Information, Computing, Science & Technology, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India (Email: E-mail: rmaitra@chem.iitb.ac.in). |
This journal is © The Royal Society of Chemistry 2024 |