Arif
Ullah
*a,
Yu
Huang
a,
Ming
Yang
a and
Pavlo O.
Dral
*bc
aSchool of Physics and Optoelectronic Engineering, Anhui University, Hefei, 230601, Anhui, China. E-mail: arif@ahu.edu.cn
bState Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, 361005, Fujian, China. E-mail: dral@xmu.edu.cn
cInstitute of Physics, Faculty of Physics, Astronomy, and Informatics, Nicolaus Copernicus University in Toruń, ul. Grudziądzka 5, 87-100 Toruń, Poland
First published on 5th September 2024
Neural networks (NNs) accelerate simulations of quantum dissipative dynamics. Ensuring that these simulations adhere to fundamental physical laws is crucial, but has been largely ignored in the state-of-the-art NN approaches. We show that this may lead to implausible results measured by violation of the trace conservation. To recover the correct physical behavior, we develop physics-informed NNs (PINNs) that mitigate the violations to a good extent. Beyond that, we propose a novel uncertainty-aware approach that enforces perfect trace conservation by design, surpassing PINNs.
Neural networks (NNs) present an efficient approach to learn complex spatio-temporal dynamics in high-dimensional space. NNs and other machine learning (ML) methods have proven to be proficient at predicting future time evolution of quantum states as a function of historical dynamics.19–27 In addition, NNs can directly predict the future quantum states as a function of time and/or simulation parameters.28–32
However, a crucial aspect of quantum simulations is adherence to fundamental physical principles. In simulating open quantum systems, it is essential for an approach to uphold the core physical principle of conserving the trace (the sum of probabilities for all possible states) of the reduced density matrix (RDM S), which should always be equal to 1, i.e., TrS(S) = 1, where TrS represents trace over system degrees of freedom.
Despite the appeal of NNs, existing research on ML-based simulations of quantum dissipative dynamics has largely ignored trace conservation.19–30,32 To the best of our knowledge, only one study has mentioned, albeit in the context of a relatively simple system (spin-boson), that ML models, given sufficient data, were able to implicitly learn trace conservation to a reasonable degree.21 However, we cannot expect that it always holds, especially in much more complex situations and when it is difficult to obtain ample amount of data for implicit learning of the trace conservation. In general, the ML models can implicitly learn physical laws from the data but if left unchecked (unconstrained) or applied for situations too far from the training data, they can also spectacularly fail.
Physics-Informed Neural Networks (PINNs), introduced in 2017,33–35 present a promising solution to this problem.36,37 By incorporating physical constraints directly into the neural network architecture, PINNs ensure that the model's predictions adhere to underlying physical laws. This approach has been successfully applied across various fields, including fluid dynamics,38,39 seismic inversions in 2D acoustic media,40 chemical simulations,41 quantum dynamics,42 and electronic structure calculations.43
In this paper, we explore whether NNs inherently conserve trace and demonstrate that unconstrained models can lead to unphysical results due to trace violations. To address this, we develop physics-informed neural networks that significantly reduce trace conservation violations. However, we find that even with the integration of physical knowledge, physics-informed NNs alone are not sufficient. To ensure correct physical behavior, we introduce an uncertainty-aware hard constraint (U-aware HC) approach that enforces perfect trace conservation by design.
The subsequent sections of this paper are structured as follows. In the “Theory and methodology” section, we establish the foundational theory of open quantum systems and detail the various NN models employed in our study, including physics-agnostic and unconstrained NNs. We highlight the trace violations by these models and introduce physics-informed NNs (PINNs). Additionally, we discuss the associated loss functions used for training these models and introduce the U-aware HC constraint for rigorous enforcement of physical constraints. Following that, in the “Results and discussion” section, we present our findings, comparing the performance of our PINN approach and HC constraint against existing models. We discuss the effectiveness of these approaches in enforcing physical laws and achieving accurate simulations. Finally, in the “Concluding remarks” section, we summarize our key findings, explore the broader implications of our study, and outline potential future research directions.
(t) = −i[H,ρ(t)], | (1) |
S(t) = TrE(U(t,0)ρ(0)U†(t,0)), | (2) |
SB model: the SB model describes the temporal evolution of a qubit system (two-state system) interacting with an environmental bath comprising multiple independent harmonic oscillators. The system's total Hamiltonian, expressed in the basis of the excited (|e〉) and ground (|g〉) states, is given by:
(3) |
(4) |
FMO complex: the FMO complex, a trimer in green sulfur bacteria, plays a crucial role in photosynthesis. Each monomer in the complex contains chlorophyll molecules that act as energy transfer sites, typically numbering seven or eight.45 The energy transfer within an FMO monomer is described by the Frenkel exciton model Hamiltonian:46
(5) |
For our analysis, we utilize the same ohmic spectral density function with a Drude–Lorentz cutoff as in eqn (4), assuming a uniform spectral density across all sites.
(6) |
(7) |
(8) |
Here, we classify purely data-driven NNs into two categories: “physics-agnostic NNs” and “unconstrained NNs”. Physics-agnostic NNs are models that are not exposed to the complete data and thus remain unaware of the underlying physical laws and constraints. Unconstrained NNs, in contrast, are exposed to the entire data but do not incorporate physical constraints in their loss functions.
To emphasize on the issue of trace-violation by these data-driven NNs, we show their performance in Fig. 1 with two examples: the relaxation dynamics within the SB model and the excitation energy transfer (EET) in the 7-site Fenna–Matthews–Olson (FMO) complex. As shown, these data-driven models fail to conserve the trace in both processes. In each case, we utilize convolutional neural networks (CNNs) and OSTL-based recursive dynamics propagation (Rec-OSTL)
(9) |
We use MLQD package47 and train the models on data from the QD3SET-1 database48 (see Results and discussion section for details). The training approach mirrors state-of-the-art methods reported previously.20,29
In essence, for the physics-agnostic scenario (Fig. 1A and C), we train individual CNNs for each diagonal RDM element, employing a loss function that gauges the deviation of NN-predicted values from their reference counterparts S,nn:
(10) |
As these models are not exposed to the dynamics of all states, they lack knowledge of trace conservation. We show that a much better solution is the unconstrained NN—a single, multi-output CNN designed to learn all RDM elements, incorporating a loss function that aggregates errors across all states (sites) (Fig. 1B and D):
(11) |
However, despite being exposed to the dynamics of all states, this solution still exhibits minor but noticeable trace violations. It is important to note that trace violations can be reduced to some extent with additional training, as demonstrated in Fig. S1 of the ESI.† However, further improvement becomes limited as the model approaches the point of overfitting. Additionally, our observations indicate that the improvement in trace conservation with increasing memory time tm is somewhat unpredictable and does not follow a consistent trend. Despite this, there was a noticeable improvement in the accuracy of the dynamics predictions, as shown in Table S1.†
(12) |
(13) |
In these equations, we can tune and the deviations from trace conservation by weight factors α and η, respectively. Here we use α = 2.0 and η = 1.0. Note that the unconstrained NN with the loss defined by eqn (11) is a special case of the PINN with α = 1.0 and η = 0.
While PINNs significantly improve trace conservation compared to purely data-driven NNs, they can still exhibit minor violations (as we'll demonstrate later). This is because the physical constraints incorporated within the PINNs loss function are typically considered “soft.” In simpler terms, PINNs are nudged towards satisfying the constraints during training, but they aren't strictly enforced.49,51
To overcome the limitations of PINNs, we propose a novel approach that enforces trace conservation by design. This approach utilizes an U-aware HC (uncertainty-aware hard-coded) constraint, guaranteeing strict adherence to physical laws during simulations. Unlike PINNs, the U-aware HC constraint operates outside of the loss function. This allows for a more direct and rigorous enforcement of the trace conservation law, rectifying potential violations during the simulation process.
The key idea is as follows: after making predictions with machine learning models, there will inevitably be a deviation from perfect trace conservation. We can calculate this residual deviation for each time step as:
(14) |
We can redistribute the residual deviations between each state as:
HCS,nn(t)=S,nn(t) + wn(t)ΔTr(t). | (15) |
Here, we need to make such a choice for state-specific weighting factors wn that the trace is one. Also, it should be statistically motivated. Different states might be predicted with different uncertainty and for certain predictions we want smaller corrections (smaller weighting factors). Hence, we also need state-specific uncertainty quantification (UQ) of NN predictions. Similar problems were also faced in the prediction of partial atomic charges predicted by statistical models which do not necessarily add up to integer values: the suggested solution also was to redistribute the deviation from the correct total charge over atoms based on the UQ calculated as the disagreement between the models in ensemble.52,53 This shows how very different research field can inspire the solutions in the unrelated field.
Here we introduce a new approach for UQ. We train an additional, auxiliary multi-output CNN with the same loss function as the main PINN but we shift the reference values by a prior p2 (we assume that the main PINN model is trained with prior p1 = 0). In other words, we train the CNN on S + p2J (J is a unit matrix with all elements 1) with the predictions given by:
auxS,nn(t) = aux-NNS,nn(t) − p2J. | (16) |
The UQ metric is given then as the absolute deviation of the auxS,nn(t) from the main model predictions:
Dnn(t) = |auxS,nn(t) − S,nn(t)|. | (17) |
The state-specific weighting factors wn we suggest to obtain as the normalized distances:
(18) |
The implementation of eqn (15) with the weighting factors defined with the eqn (18) ensures that TrS(HCS(t)) = 1. It's crucial to distinguish our proposed U-aware HC constraint-based approach from the conventional trace normalization technique, S/TrS(S), commonly employed in non-trace conserving traditional methods.
Here's why our proposed U-aware HC constraint approach stands out:
• Generality: the U-aware HC constraint approach is purely machine learning-based approach and not limited to trace conservation. It can be tailored to enforce various physical constraints across diverse domains within machine learning studies. For example, it could be used to ensure the preservation of total charge in simulations of molecular systems, especially when learning individual charges for each atom.
• Uncertainty-aware correction: the U-aware HC constraint approach goes beyond simple normalization by incorporating an UQ metric (eqn (17)) along with a weighting factor (eqn (18)). This allows for targeted corrections. States (or sites) with greater uncertainty (deviations) receive larger corrections, while those with smaller deviations receive smaller adjustments. This ensures a refined correction process tailored to the level of uncertainty observed.
For the SB model, we acquire high-quality training data from the publicly available QD3SET-1 database.48 This comprehensive database provides pre-computed dynamics using the hierarchical equations of motion (HEOM) approach.11,54,55 The specific training dataset, denoted by consists of 1000 trajectories simulated across a four-dimensional parameter space encompassing system-bath coupling strength, bath reorganization energy, bath relaxation rate, and inverse temperature (represented by ε/Δ, λ/Δ, γ/Δ, and βΔ, respectively). In similar manner, training data for 7-site FMO complex was also extracted from QD3SET-1 database. This dataset encompasses 1000 training instances, capturing the dynamics for both possible initial excitation sites (site-1 and site-6) within the complex. In the considered data set, the dynamics is propagated for a range of simulation parameters chosen from a three-dimensional space The method used for propagation is the trace conserving local thermalizing Lindblad master equation (LTLME)56 with the system Hamiltonian parameterized by Adolphs and Renger.57
(19) |
For the training process, we adopted OSTL-based recursive dynamics propagation (eqn (9)) where the RDM S(t) at each time step transforms into a 1D vector with dimension M = number of sites + (2 × number of the upper off-diagonal terms). As in RDM only the upper off-diagonal terms are learned. In addition, the real and imaginary parts of each off-diagonal term are separated. More details can be found in ref. 47. The target is multi-time step dynamics which is in the same shape as the input. Here we predict the dynamics of 20 time-steps in one shot and which is then fed to the model recursively for the prediction of the next 20 time-steps dynamics. In all cases, we trained a CNN model, implemented in the MLQD package47 and the uncertainty-aware HC constraint is incorporated with priors set as (p1, p2) = (0, 0.1).
To improve training efficiency, we utilized farthest point sampling28,58 to select a subset of training trajectories. For both the symmetric SB model (ε/Δ = 0) and FMO complex with initial excitation on site-1, 400 trajectories were chosen for training, with the remaining used for testing.
In our study, we trained CNN models with identical architectures across all four scenarios. The models used for dynamics propagation yielded nearly identical validation losses, with approximately 1.2 × 10−5 in the SB case and 1.1 × 10−7 in the FMO complex. Introducing trace constraints and adding a prior do impact computational efficiency. Including a trace constraint in the loss function increases its complexity, and the addition of a prior makes the model more challenging to fit, potentially leading to longer training times.
For example, in our experiments, the unconstrained NN model for FMO complex reached a validation loss of 1.01 × 10−7 at epoch 194. In contrast, the PINN model with the same architecture achieved a similar loss of 1.62 × 10−7 at epoch 785, and the auxiliary model in the case of PINN with U-aware HC attained loss of 2.22 × 10−7 at epoch 1142. On our machine (GeForce RTX Nvidia 4090 GPU), each epoch took approximately 1 second, resulting in total training times of 194 seconds, 785 seconds, and 1142 seconds, respectively. While the addition of constraints and priors increases computational time, the overall increase is not significant given the advanced computational resources available today.
Fig. 2 demonstrates the effectiveness of the PINNs and the uncertainty-aware HC constraint in maintaining trace conservation during simulations of quantum dissipative dynamics for the SB model and the FMO complex. We revisit the same cases as presented in Fig. 1 for purely data-driven NNs. As expected, the PINNs (Fig. 2A and C) shows a significant improvement in trace conservation compared to purely data-driven neural networks (Fig. 1). However, as previously discussed, PINNs rely on “soft constraints” within the loss function, which can lead to minor deviations from perfect trace conservation.
Fig. 2 Trace conservation in NN-based simulations using PINNs and the uncertainty-aware HC constraint approach. This figure replicates Fig. 1 (data-driven NN) for SB model and FMO complex, demonstrating improved conservation with PINNs (panels A and C) and perfect trace conservation achieved by combining U-aware HC constraint with PINNs (panels B and D). In the case of SB model, an initial period of tmΔ = 2.0 serves as a seed for the model's predictions and results are presented for a test trajectory with characteristic frequency γ/Δ = 9.0, system-bath coupling λ/Δ = 0.6, and inverse temperature βΔ = 1.0. For the FMO complex, an initial dynamics of tm= 0.2 ps is provided as an input and the initial excitation is considered on site-1, with parameters γ = 400 cm−1, λ = 40 cm−1, and temperature T = 90 K. |
Perfect trace conservation is achieved via the U-aware HC constraint, as demonstrated in Fig. 2B and D. By explicitly incorporating this constraint within the PINNs framework, we maintain perfect trace conservation throughout the simulations for both the SB model and the FMO complex. This finding underscores the benefit of enforcing strict physical constraints by design, rather than solely relying on the model's ability to learn physical principles indirectly.
Additionally, we present the corresponding population dynamics for all four cases in Fig. S2 and S3 of the ESI.† To evaluate the accuracy of each model in dynamics propagation, we provide the MAE averaged over all time steps for each state (site) in Table 1. From the MAE comparison, we observe that all models have tiny errors for populations, so the trace conservation did not have much impact on the quality of the dynamics in the studied cases. However, the trace conservation might have a big impact in the cases where ML struggles to learn and predict dynamics with such an accuracy. As described above, the additional computational cost for enforcing the trace conservation is not that high either, which does not justify the use of the non-conserving approaches in case they break down and have even worse behavior than in Fig. 1. In any case, using trace-conserving approaches can be considered as a good prophylactic against unphysical behavior.
n | SB model | FMO complex | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | Avg(n) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | Avg(n) | |
a All values are in units of 10−3 except for those marked with * which are in units of 10−4. | |||||||||||
Physics-agnostic NN | 4.37 | 4.20 | 4.29 | 2.57 | 1.39 | 7.91 | 1.99 | 2.06 | 7.62* | 9.12* | 2.51 |
Unconstrained NN | 7.10 | 7.42 | 7.26 | 1.76 | 2.60 | 2.34 | 6.23* | 5.91* | 6.94* | 4.40* | 1.29 |
PINN | 6.14 | 6.10 | 6.12 | 3.93 | 1.13 | 2.76 | 2.15 | 1.55 | 6.15* | 2.17 | 2.04 |
PINN + U-ware HC | 6.08 | 6.08 | 6.08 | 1.41 | 1.53 | 5.22 | 1.26 | 1.54 | 6.73* | 2.10 | 1.96 |
First, purely data-driven NN models, including physics-agnostic and unconstrained NNs, can effectively capture correlations between state-specific populations. However, they lack explicit enforcement of physical laws, leading to potential violations of trace conservation.
Second, PINNs offer a significant improvement by incorporating physical knowledge into the loss function. This method penalizes deviations from physical constraints, enhancing the accuracy of simulations. Despite this advancement, PINNs still rely on “soft constraints,” which can result in minor violations of physical constraints like trace conservation.
Finally, U-aware HC constraint approach addresses the limitations of PINNs by enforcing trace conservation by design rather than solely through the loss function. The U-aware HC constraint utilizes uncertainty quantification techniques to redistribute residual errors and correct potential trace violations, ensuring physically consistent simulations throughout.
It is important to note that while we did not explicitly enforce a positivity constraint in our case-since all diagonal elements remained strictly positive-such a constraint could be incorporated if necessary.
To conclude, our findings underscore the importance of integrating well-defined physical constraints into NN models. The methods developed in this study are broadly applicable and can be adapted to enforce other essential constraints in various domains. For instance, in molecular simulations where individual atomic charges are learned, our different-prior approach for uncertainty quantification as well as an approach for redistributing residual error in atomic charges could be used as an alternative to existing, related approaches52,53 for ensuring total charge conservation. By extending these techniques, we can improve the fidelity and reliability of NN-based simulations across a wide range of scientific and engineering applications.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4dd00153b |
This journal is © The Royal Society of Chemistry 2024 |