In Kyung
Baek‡
a,
Soo Hyung
Lee‡
a,
Yoon Ho
Jang
a,
Hyungjun
Park
a,
Jaehyun
Kim
a,
Sunwoo
Cheong
a,
Sung Keun
Shim
a,
Janguk
Han
a,
Joon-Kyu
Han
b,
Gwang Sik
Jeon
a,
Dong Hoon
Shin
a,
Kyung Seok
Woo
*a and
Cheol Seong
Hwang
*a
aDepartment of Materials Science and Engineering, and Inter-University Semiconductor Research Center, Seoul National University, Seoul, 08826, Republic of Korea. E-mail: kevinwoo@snu.ac.kr; cheolsh@snu.ac.kr
bSystem Semiconductor Engineering and Department of Electronic Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107, Republic of Korea
First published on 5th April 2024
Bayesian networks and Bayesian inference, which forecast uncertain causal relationships within a stochastic framework, are used in various artificial intelligence applications. However, implementing hardware circuits for the Bayesian inference has shortcomings regarding device performance and circuit complexity. This work proposed a Bayesian network and inference circuit using a Cu0.1Te0.9/HfO2/Pt volatile memristor, a probabilistic bit neuron that can control the probability of being ‘true’ or ‘false.’ Nodal probabilities within the network are feasibly sampled with low errors, even with the device's cycle-to-cycle variations. Furthermore, Bayesian inference of all conditional probabilities within the network is implemented with low power (<186 nW) and energy consumption (441.4 fJ), and a normalized mean squared error of ∼7.5 × 10−4 through division feedback logic with a variational learning rate to suppress the inherent variation of the memristor. The suggested memristor-based Bayesian network shows the potential to replace the conventional complementary metal oxide semiconductor-based Bayesian estimation method with power efficiency using a stochastic computing method.
Fig. 1a shows a simple example of a Bayesian network consisting of four variables: ‘Cloudy’, ‘Sprinkler’, ‘Rain’, And ‘Wet grass’.7 The network system consists of nodes and edges, representing an individual variable and a relationship between two variables, respectively. The edges are shown as arrows indicating the direction of the causal relationship, where the starting and ending points of the arrows represent the cause (parent node) and the result (child node), respectively. Furthermore, a conditional probability table (CPT) is assigned for each node to show the conditional dependency between the node and its parent node. The CPT illustrates the probabilities that the given node is ‘True’ or ‘False’, depending on the state of the parent node being either ‘True’ or ‘False.’
P(Cloudy = T) is a prior probability, the probability of the weather being ‘Cloudy’ estimated from the long-term observed data.8 Also, P(Sprinkler = T|Cloudy = F) represents the likelihood, which can be determined by the observed conditional probability in the CPT data in Fig. 1a. Fig. 1b illustrates the causal relationship within the bayesian network in Fig. 1a, including the prior probabilities and likelihoods.
Bayesian inference refers to the computation of the posterior probability, which is unavailable in the CPT data.8 For instance, P(W = T) is the nodal probability not explicitly shown in the CPT. The marginalization based on Bayes' theorem must be conducted to infer this probability (ESI, Note S1†). P(W = T|C = T) is the hidden conditional probability signifying the causal relationship between ‘Wet grass’ and its grandparent node, ‘Cloudy’. In addition, P(R = T|W = T) is the inverse conditional probability characterizing the relationship between the cause and the result in an opposite manner. Bayesian inference enables finding the hidden and inverse conditional probabilities. Note S2 of the ESI† shows the Bayesian inference of the network displayed in Fig. 1a, where the Bayesian inference of P(R = T|W = T) requires extensive analytic computations, including a multiplication and marginalization process to convert probabilities into likelihoods. Therefore, calculation complexity increases exponentially as the node number in the Bayesian network increases. Specifically, the complexity of the analytic calculation in the Bayesian inference is O(2n) for the binary case, where ‘n’ is the number of nodes within the Bayesian network.9
In a general Bayesian network, many nodes may have multiple parent and child nodes, and multiple hops to ancestors and descendants may be present, where hop means the number of edges between the two nodes. For example, the Bayesian network of schizophrenia and mixed dementia diagnosis has 29 nodes, with several nodes having up to 5 parent nodes and 4 child nodes.10 In such cases, the arithmetic operations involved in Bayesian inference become computationally challenging with conventional complementary metal-oxide-semiconductor (CMOS) technology.11–13
Besides, the Bayesian inference requires random numbers to calculate the probability. In conventional CMOS technology, lookup tables, comparators, and linear feedback shift registers (LFSRs) have been used to generate random numbers. However, due to the deterministic characteristics of the CMOS hardware, for example, a thirty-two-stage LFSR was used to extract random numbers, which required ∼1200 transistors.14 The conventional CMOS-based algorithm also requires complex floating-point calculations, consuming excessive energy.15,16 All these factors render implementing the Bayesian inference in CMOS circuits challenging regarding area and power consumption.
On the other hand, due to their inherent stochastic properties, emerging memory devices, such as magnetic tunnel junctions (MTJs) and memristors, have been utilized as random number generators.17–21 The MTJ offers a robust operation but requires complex thin film material stacks, complicating its fabrication process. Also, its low on/off ratio (only 2∼3) causes errors during the output sensing, thus requiring additional amplifying circuits. In contrast, the memristor consists of a simple metal-insulator-metal (MIM) structure with an on/off ratio of several orders of magnitudes, negating the demerits of MTJ devices. However, its non-volatile memory switching requires repeated application of RESET (switching from the low resistance state (LRS) to the high resistance state (HRS)) voltages, which requires an additional voltage source and time step.22 In contrast, the threshold switching (TS) device, which switches to the HRS even without the RESET voltage application from the LRS after the SET (switching from the HRS to the LRS), can alleviate the problem, rendering it a suitable random source for Bayesian circuits.
This study suggested an efficient circuit for the Bayesian network and Bayesian inference using a Cu0.1Te0.9/HfO2/Pt (CTHP) diffusive memristor, exhibiting a TS behavior with an on/off ratio exceeding 104.23 A probabilistic-bit (p-bit) neuron capable of controlling spiking probability by varying the input voltage was demonstrated using this TS device. In the Bayesian network hardware, each p-bit neuron represents a node, and positive edge-triggered D flip-flops and a 2n × 1 multiplexer (MUX), where ‘n’ signifies the number of parent nodes, constitute the edges. The probability of each node being ‘True’ was derived through parallel sampling in the Bayesian network hardware using the p-bit neurons. Furthermore, Bayesian inference was implemented by calculating conditional probability through the intersection and division of sampled nodal probabilities using an additional peripheral p-bit neuron. A feedback procedure with an exponentially decreasing learning rate was incorporated to avoid the inherent memristor noise, enhancing the accuracy of the Bayesian inference even for complex Bayesian networks.
Cu-based filamentary switching memristors usually exhibit non-volatile behavior due to injecting a large amount of Cu ions into the oxide, forming thick Cu filaments.34 Conversely, the device shows volatile TS behavior when Cu and Te are co-sputtered with a sufficiently small atomic ratio of Cu (ca., Cu0.1Te0.9 as in this work). In this case, the number of Cu ions driven into the HfO2 film decreases, causing the filament size to fall below a threshold for stable filament formation.35 Consequently, the filament dissolves to reduce the interface energy between the Cu filaments and the HfO2 matrix when the voltage is removed, thus showing the TS behavior.
Fig. 3a shows 40 consecutive current–voltage (I–V) curves with a 10 nA compliance current. After the electroforming process occurred at 3.25 V during the first I–V sweep, the device showed a volatile switching with the threshold voltage between 1.5 V and 2.7 V. A sufficiently high voltage is required to ionize Cu atoms and nucleate at the Pt surface to form the first Cu filaments inside pristine HfO2. After electroforming, the effective thickness of the oxide decreases due to the residual Cu filament within the oxide, thus reducing the threshold voltage.26
The intrinsic stochasticity of the threshold voltage is derived from the random detachment of Cu nanoclusters from the active electrode (Cu0.1Te0.9). The device switches to an on-state by a positive voltage and spontaneously returns to an off-state upon voltage removal, exhibiting TS behavior. In contrast, Fig. S3 of the ESI† shows that the device with Cu0.2Te0.8 does not exhibit a stable TS behavior since the amount of Cu clusters remaining in the oxide increases during switching. Moreover, in the case of Cu0.3Te0.7, the set voltage shifts to the lower voltage region during the sequential DC sweeps, ultimately exhibiting non-volatile resistive switching (RS) behavior. As a result, the Cu0.1Te0.9 device that shows a stochastic TS behavior without memory was selected for the Bayesian network implementation.
The pulse operation further confirmed TS behavior of the Cu0.1Te0.9 device, as shown in Fig. 3b. With a 5.8 V input voltage (Vin), the CTHP memristor switches to the on-state after a delay of ∼70 μs. After the pulse termination, the CTHP memristor returns to its off-state with a relaxation time of ∼500 μs. These stochastic TS behaviors of the CTHP memristor could be adopted to compose a p-bit neuron, as discussed below.
A p-bit neuron circuit consisting of a CTHP memristor, a series resistor Rs (2.2 MΩ), and a comparator (HA17393, Renesas, Japan) is implemented, as shown in the inset of Fig. 3c. It is designed to output either Vdd (4.6 V in this work) or 0 V probabilistically, where the input voltage controls the probability. As the input voltage increases, the probability of the memristor becoming the on-state increases. Consequently, the input voltage applied to the comparator exceeds its reference voltage (Vref) of 0.3 V more frequently, thus showing a higher probability of output Vdd. Fig. S4 of the ESI† shows the p-bit outputs at three different input voltages (5.40 V, 5.60 V, and 5.80 V). For the p-bit generation, each cycle has a pulse length of 400 μs with 10 ns of leading and trailing times, and the pulse cycle was set to 4 ms. Fig. 3c shows the spiking probability of the p-bit neuron circuit based on the input voltage, and the average and standard deviation (SD) are calculated from 512 samples at each voltage point. The spiking probability in response to input pulses follows a sigmoidal relation, suitable for the Bayesian network. Fig. 3d shows the endurance of the CTHP-based p-bit neuron by showing the uniform HRS and LRS resistance during 4 × 106 cycles under the same pulse length and cycle as in Fig. 3c. The p-bit neuron can operate for much more than 4 × 106 cycles because the endurance test was conducted at a voltage of 7 V, which switches the CTHP memristor to 100% probability.
Fig. 4 Working principle of the p-bit neuron-based Bayesian network. (a) Schematic of the interconnection between two nodes in a simple Bayesian network. Two nodes from Fig. 1a, ‘Cloudy’ and ‘Sprinkler,’ are represented as p-bit neurons in dashed boxes. The interconnection between the two nodes consists of a positive edge-triggered D flip-flop and a multiplexer (MUX). The MUX interconnects the two nodes by selecting the input voltage for the ‘Sprinkler’ node according to the output of the ‘Cloudy’ node. (b) The timing diagram for the circuit in (a). The output of the p-bit neuron, Vout, is generated probabilistically for each node according to the Vin. The delay between Vin and Vout is due to the delay time of the memristor. The D flip-flop samples the input (Vout, Cloudy) at every rising edge of the clock and updates the output (OutCloudy). |
First, an input pulse voltage of 5.612 V is applied to a p-bit neuron of the ‘Cloudy’ node. The probability that the neuron output produces Vout, Cloudy is 50%, thereby defining the value for Pprior(C = T). Then, Vout, Cloudy feeds into a D flip-flop that acts as a buffer memory, and the output of the D flip-flop (OutCloudy) enters into a MUX, which stores the CPT data in voltage values. Subsequent pulses are selected according to the binary states of parent nodes, and the amplitudes of the pulses are determined from the CPT.
Fig. 4b illustrates the timing diagram of the interconnection circuit between the ‘Cloudy’ and ‘Sprinkler’ nodes. Following the clock signal, Vin, Cloudy is applied to the input of the p-bit neuron of the ‘Cloudy’ node with a pulse length of 400 μs with a period of 4 ms (first row). Vout, Cloudy (=4.6 V) in response to Vin, Cloudy is generated from the p-bit neuron with the various delay times in each cycle marked as a red or a blue line (ground) when it is ‘1’ or ‘0’ (second row). OutCloudy (3.3 V pulse in this work) is updated with Vout, Cloudy values at the rising edge of the clock signal through the D flip-flop, which synchronizes the outputs of all nodes at each cycle (third row). The synchronization is necessary for multiple-parent cases with different delay times. After the 2 × 1 MUX receives OutCloudy as an input, it generates a voltage signal that defines the spiking probability of the ‘Sprinkler’ node. For instance, if OutCloudy is ‘1’ (i.e., 3.3 V) the MUX yields an output of 5.49 V (fourth row), corresponding to the 10% spiking probability of the ‘Sprinkler’ node. Therefore, for example, during the 100 sampling periods, ∼50 of OutCloudy is ‘1’. These 50 OutCloudy then induce ∼5 of OutSpringkler being ‘1’ (fifth and sixth rows) among the 50 operation cycles of the Sprinkler node. In this way, the Vin, Sprinkler encodes the conditional probability of P(S = T|C = T). For the remaining ∼50 cases of the OutCloudy being ‘0’, the MUX yields an output of 5.612 V (fourth row), which then induces ∼25 of OutSpringkler being ‘1’ during the remaining 50 operation cycles. In this case, the conditional probability refers to P(S = T|C = F). Consequently, the ‘Sprinkler’ node output, OutSprinkler, encodes the entire probability of P(S = T). As shown in Table 1, the theoretical value of P(S = T) is 0.3, which can be derived from the above experiment using P(S = T) = P(S = T|C = T) + P(S = T|C = F), where P(S = T|C = T) and P(S = T|C = F) values are 0.5 × 0.1 and 0.5 × 0.5, respectively. The CTHP memristor exhibits volatile TS behavior, eliminating the RESET process throughout these repeated sampling cycles.
Nodal probability | Theoretical | Inference | Number of samples | |
---|---|---|---|---|
100 | 1000 | |||
P(C = T) | 0.5 | Mean | 0.499 | 0.500 |
SD | 0.043 | 0.044 | ||
P(S = T) | 0.3 | Mean | 0.308 | 0.301 |
SD | 0.042 | 0.040 | ||
P(R = T) | 0.5 | Mean | 0.498 | 0.501 |
SD | 0.039 | 0.044 | ||
P(W = T) | 0.647 | Mean | 0.653 | 0.647 |
SD | 0.042 | 0.039 |
A similar circuit can represent the entire Bayesian network shown in Fig. 1. Fig. 5 shows the overall circuit diagram of the Bayesian network composed of four p-bits. The probability values between the nodes are encoded as the amplitudes of the voltage pulse of the MUX connecting the nodes. As the ‘Wet grass’ node has two parents, a 4 × 1 MUX receives synchronized OutSprinkler and OutRain pulse streams as inputs. Subsequently, Vin, Wet grass are selected from four voltage sources according to the binary states of OutSprinkler and OutRain. Therefore, P(C = T), P(S = T), P(R = T), and P(W = T) can be derived through parallel sampling of the respective node outputs. Here, parallel sampling indicates a simultaneous counting of Out signals of each node for a given Vin, Cloudy.
Fig. 5 Implementation of a simple Bayesian network. A schematic of the Bayesian network in Fig. 1a, consisting of four p-bit neuron circuits. Each node corresponds to a CTHP-based p-bit neuron circuit. |
Moreover, the sampling process (O(1)) replaces the analytical Bayesian inference (O(2n)) of P(S = T), P(R = T), and P(W = T), which are not explicitly provided in the CPT. Specifically, the analytical Bayesian inference of P(W = T) consists of probability marginalization regarding the CPT of the parent nodes. The calculation of the nodal probabilities is detailed in Note S1 of the ESI.†
Table 1 summarizes the inference results of individual probabilities obtained from 100 and 1000 samples for each node shown in Fig. 5. A single sampling result is achieved by counting the number of output spikes resulting from the 128 input pulses into each node. The inferred mean values of the probabilities show proximity to the theoretical values with the normalized mean square error (NMSE) of 1.05 × 10−4 for 100 samples and 1.61 × 10−6 for 1000 samples. The cycle-to-cycle variation of CTHP memristors may have resulted in deviations from the mean values. Still, their SD was only ∼0.04, suggesting the robustness of the suggested method to infer the nodal probabilities. Moreover, the device's cycle-to-cycle variation, which resulted in a sigmoid curve variation (Fig. 3), did not affect the inference accuracy significantly, as shown in Fig. S5 of the ESI.†
Suppose that, for example, Ppost (R = T|W = T) is sought, corresponding to the probability of raining when wet grass is observed, which is not a priori known from the given CPTs. This value can be found by a complicated theoretical mean, as shown in Note S2 of the ESI,† or through the inference using the suggested p-bit Bayesian circuit shown in Fig. 6. Ppost (R = T|W = T) can be expressed as P(R = T ∩ W = T)/P(W = T) by Bayes' theorem. An AND gate (upper AND gate in the left portion of Fig. 6a) efficiently implements P(R = T ∩ W = T) in the numerator by receiving pulses from two p-bit neurons as inputs (P(R = T) and P(W = T), which are reported in Table 1). In other words, the AND gate outputs a pulse only when the two inputs are simultaneously ‘1’. It should be noted that these two probability values have a conditional interrelationship.
On the other hand, dividing the P(R = T ∩ W = T) by P(W = T) requires additional circuit elements composed of an additional peripheral node and division feedback logic, as shown in Fig. 6a. The idea behind this suggested circuit is that the probability for the additional peripheral p-bit neuron (Peri node), Pperi, is assumed to correspond to the P(R = T ∩ W = T)/P(W = T) value. Thus, its value is taken as the solution to the problem when the inference error becomes sufficiently small. Then, the outputs of the ‘Wet grass’ and Peri nodes are input to another AND gate (lower AND gate in Fig. 6a), and the output of this AND gate corresponds to P(W = T) × Pperi because these two nodes are independent. Finally, the difference between the outputs of the two AND gates, defined as the error, ε, in the right portion of Fig. 6a, is estimated, which is then minimized by varying the input voltage to the Peri node. The ε minimization steps are described below.
The Pperi is initially set to 0.5 by inputting 5.612 V to this node. Then, after sampling 128 pulses from each node representing P(R = T), P(W = T), and Pperi, two AND gates output the intersection of the input p-bit pulses. For the probability calculation, the number of spiking pulses is divided by the total pulse number of 128.
Following the intersection calculation, the division feedback logic is utilized to infer the posterior probability using two output pulse streams from each AND gate. In the division feedback logic block shown in the right portion of Fig. 6a, Pperi is adjusted to equalize the number of spiking pulses from two AND gates. To perform this equalization, the difference between two probabilities, the ε, is calculated by using a field programmable gate array (the equation in the feedback logic block of Fig. 6a). Subsequently, the feedback voltage directed to the peripheral p-bit neuron is modified to minimize the ε. In this feedback stage, the ε is multiplied by the learning rate η, (η = α × exp(−β × current iter/total iter)) to determine the desired amount of change in the subsequent Pperi (δPperi). As a result, the feedback probability Pn+1 is equal to Pn + εn × ηn, where ‘n’ is the current number of feedbacks. The process of the probability feedback is described as follows.
Pn+1 = Pn + εn × η | (1) |
Starting with P0 = 0.5, Pn+1 corresponds to the spiking probability of the peripheral node after the (n+1)th feedback. εn and ηn are the error and the learning rate at the (n+1)th feedback, respectively. Subsequently, the relationship between the feedback voltage and the spiking probability is shown as
Pn+1 = f(Vn+1) | (2) |
The spiking probability in response to the feedback voltage after the (n+1)th feedback follows the sigmoidal function, as shown in Fig. 2f. Therefore, the (n+1)th feedback voltage is given by
Vn+1 = f−1(Pn+1) | (3) |
The (n+1)th feedback voltage directed to the peripheral p-bit neuron is an inverse function of the sigmoidal function. During the feedback iteration, the learning rate (ηn) exponentially decreases as the ‘n’ increases, allowing for a gradual and incremental feedback mechanism. After twenty feedback iterations (Pperi = P20), the ε is minimized, and finally,
P(R = T ∩ W = T) ≈ P(W = T) × Pperi | (4) |
(5) |
Fig. 6b shows the feedback results for five posterior probabilities of the network in Fig. 1a. The Pperi rapidly approaches the target value in the early iterations due to the high η. In contrast, in the later iterations, the feedback is depressed, preventing deviation from the target value. This process is similar to the simulated annealing method in the p-bit network.39 Throughout the inference, the feedback iterations and pulse numbers were chosen as 20 and 128, respectively. These values were selected considering the tradeoff between the calculation overhead and accuracy, detailed in Fig. S6 of the ESI.†
Table 2 summarizes the inference results of the five posterior probabilities. Meanwhile, the p-bit neuron outputs were inverted using a NOT gate for the probability of the nodes being ‘False.’ The mean values of all the posterior probabilities in the Bayesian network are precisely inferred with a low NMSE of 6.58 × 10−4 and 6.91 × 10−4 for 100 and 1000 samples, indicating that the division feedback logic feasibly infers the correct answers even within 100 samples. The SD values are also low (∼0.02) for 100 and 1000 samples, suggesting that the influence of the device variation is minimal. Further details regarding the variance tolerance of the proposed method are provided in Fig. S7 and 8 of the ESI.†
Nodal probability | Theoretical | Inference | Number of samples | |
---|---|---|---|---|
100 | 1000 | |||
P post(S = T|W = T) | 0.430 | Mean | 0.427 | 0.430 |
SD | 0.020 | 0.022 | ||
P post(R = T|W = T) | 0.708 | Mean | 0.711 | 0.707 |
SD | 0.022 | 0.019 | ||
P post(C = T|W = T) | 0.576 | Mean | 0.578 | 0.575 |
SD | 0.022 | 0.021 | ||
P post(W = F|S = F) | 0.473 | Mean | 0.474 | 0.472 |
SD | 0.022 | 0.022 | ||
Ppost(W = F|R = F) | 0.622 | Mean | 0.619 | 0.619 |
SD | 0.023 | 0.021 |
Finally, the high potential of the suggested method for inferencing in a complex Bayesian network was examined using the Bayesian network with 20 nodes and 7 layers, where the CPTs between the nodes are randomly generated, as shown in Fig. 7a. Fig. 7b shows the hardware implementation method for node 4 in the network, where an 8 × 1 MUX is utilized to encode the CPT from three parents (nodes 3, 16, and 17).
The inference results of the suggested method are shown in Fig. 7c and d. Fig. 7c provides an overview of the theoretical posterior probability values across the entire network, calculated by a method similar to that in Note S2 of the ESI.† At the same time, Fig. 7d illustrates the inference outcomes of the posterior probabilities using the suggested Bayesian network circuit. The theoretical and inference values show 380 posterior probabilities, except for 20 posterior probabilities of the nodes conditioned on themselves (colored as white squares in Fig. 7c and d). The inference results in Fig. 7d show the mean value of 100 inferences for each posterior probability. The inference results match well with the theoretical results, implying that the suggested method can be used to analyze complex networks, such as autonomous vehicles, medical diagnosis, and forecasting.40–42
Table 3 shows five instances of inference outcomes for two inference samples (100 and 1000). The condition and result nodes are significantly distant in most of these conditional probabilities. For example, six hops are required between nodes 1 and 15. Nevertheless, the SD value is within 0.02 for most probabilities. This capacity for precise inference is further demonstrated by the low SD values (<0.03) of all the inference results, even in the 100 samples, as presented in Fig. S9 of the ESI.† The NMSE of all the mean inference probabilities in this complex Bayesian network is 3.37 × 10−3 for 100 and 1000 samples. It demonstrates accurate inferences with suppressed noise with only 100 samples, even in a complex Bayesian network.
Nodal probability | Theoretical | Inference | Number of samples | |
---|---|---|---|---|
100 | 1000 | |||
P post(19 = T|0 = T) | 0.660 | Mean | 0.658 | 0.660 |
SD | 0.020 | 0.020 | ||
P post(2 = T|10 = T) | 0.820 | Mean | 0.811 | 0.814 |
SD | 0.019 | 0.017 | ||
P post(1 = T|15 = T) | 0.130 | Mean | 0.133 | 0.133 |
SD | 0.020 | 0.015 | ||
P post(15 = F|1 = F) | 0.331 | Mean | 0.333 | 0.331 |
SD | 0.020 | 0.020 | ||
P post(13 = F|10 = F) | 0.515 | Mean | 0.514 | 0.513 |
SD | 0.023 | 0.022 |
In contrast to the analytical approach, which suffers from an exponential increase in computational resources with the increasing number of nodes, the proposed method achieves accurate inference of posterior probabilities by utilizing a constant number of pulses and feedback iterations. Further details regarding the inference and feedback are described in Fig. S10 of the ESI.†
Table 4 summarizes the comparison between different Bayesian inference circuits using various devices. A simple device structure, a high on/off ratio, and volatility of the CTHP memristor decreased the required number of transistors in a CTHP-based p-bit circuit compared to that in the previous studies.11,14,38,43 Remarkably, the power consumption per random neuron output of a CTHP p-bit neuron was significantly lower than that of CMOS-based LFSRs. The lower power consumption of the CTHP p-bit neuron is attributed to replacing random bit generation in a conventional LFSR with the inherently stochastic CTHP TS device. The CTHP p-bit neuron could be operated with a maximum power consumption of 186 nW, details of which estimation are included in the Experimental section below and Fig. S11 of the ESI.† Moreover, the CTHP p-bit neuron with a low current level generates random bits with lower power than those in previous studies of MTJ- and SiOx nanorod-based circuits, where an additional reset scheme or an extensive pulse width for the probability representation was further required.38,43 The detailed breakdown and calculation of the energy consumption in the suggested CTHP p-bit neuron are included in Table S1 and Note S3 of the ESI.†,44 For the accuracy of the Bayesian inference, the inference circuit based on the CTHP p-bit neuron achieved a lower NMSE in the inference of the network of four nodes than that of the network with similar sizes (∼ five nodes) based on the MTJ- and SiOx nanorod-based circuit.38,43 Furthermore, the inference for a more complex Bayesian network consisting of 20 nodes showed a comparable NMSE (3.37 × 10−3) to that in the other studies with simpler (∼ five nodes) networks.
CMOS11,14 | MTJ43 | SiOx nanorods38 | This work | |
---|---|---|---|---|
Device structure | Complex | Complex | Simple MIM | Simple MIM |
On/off ratio | — | 2∼3 | 104∼105 | 104 |
Device volatility | Volatile | Non-volatile | Non-volatile | Volatile |
Number of transistors | >1200 | >35 | 10 | 10 |
Power consumption | 33.06 mW | 158.9 μW | 4.06 μW | <186 nW |
Energy | 275.6 μJ | 692.4 fJ | 1.767 pJ | 441.4 fJ |
Accuracy (NMSE) | — | 1.24 × 10−3 | 2.41 × 10−2 | 7.5 × 10−4 |
(6) |
The power consumption of the CTHP (PCTHP) is described by a function of Vin, Vnode, and RS (eqn (7)), where (Vin − Vnode) is equal to VCTHP.
(7) |
Kirchhoff's voltage law shows that the RCTHP can be represented as RS × (Vin − Vnode)/Vnode. Therefore, PCTHP could be presented as eqn (8).
(8) |
As a result, the total power consumption is given as eqn (9).
(9) |
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3na01166f |
‡ These authors contributed equally. |
This journal is © The Royal Society of Chemistry 2024 |