Minsun
Cho
ab,
Marin
Franot
ac,
O-Joun
Lee
d and
Sungyeop
Jung
*ae
aAdvanced Institute of Convergence Technology, Seoul National University, Suwon, 16229, Republic of Korea
bInstitute of Industrial Technology, Korea University, Sejong 30019, Republic of Korea
cÉcole Nationale Supérieure d’Électrotechnique, d’Électronique, d’Informatique, d’Hydraulique et des Télécommunications, Toulouse INP, Toulouse, 31000, France
dDepartment of Artificial Intelligence, The Catholic University of Korea, Bucheon 14662, Republic of Korea
eDepartment of AI Semiconductor Engineering, Korea University, Sejong 30019, Republic of Korea. E-mail: sungyeopjung@korea.ac.kr
First published on 10th September 2024
We present an approach to adopt deep neural networks for the development of a compact model for transistors, namely a neural compact model, including transfer learning to enhance accuracy and reduce model development time. We examine the effectiveness of this approach when the electrical data for neural networks is scarce and costly and when the electrical characteristics to be modeled are highly non-linear. By using technology computer-aided design simulations, we constructed a dataset of the electrical characteristics of organic field-effect transistors with Gaussian disorder that exhibit highly non-linear current–voltage curves. Subsequently, we developed neural compact models by modifying conventional deep learning models and validated the effectiveness of transfer learning with testing through various experiments. We showed that the neural compact model with transfer learning provides an equivalent accuracy at a significantly shorter training time.
A LUT compact model is a set of tables of electrical characteristics2 which includes a search function, interpolation routines, and an interface to the circuit simulator. A LUT compact model is computationally simple while relatively large tables are required to ensure its accuracy. An analytical compact model is a set of mathematical equations derived from device physics which can model the device behavior of the entire operation regime using a set of device parameters obtained by a few representative measurements. An analytical compact model is translated into computer languages, such as Verilog-A or C, which makes it easy to distribute and interface with a circuit simulator. In the meantime, as more complicated physical phenomena are introduced due to distinctive material characteristics, device structure, and operation mechanism, the number of model equations and parameters increases. As a consequence, parameter extraction that ensures an accurate fitting, becomes a complicated task. In addition, compact model development could take a significantly long time (i.e. several years).
An ANN-based compact device model, or a neural compact model, has been reported for more accurate and faster modeling.3–5 The effects of scaling6 and novel device technology7 have been studied. However, most of the existing neural compact models have limitations in their accuracy for modeling in design instances (W, L and T), bias (VDS and VGS), and technology parameters, and result in non-physical behavior such as non-zero current at VDS = 0 V and/or an asymmetric I–V model. Furthermore, indirect approaches such as using a physics-augmentation loss function7 or coupling with a standard analytical compact model8 still require significant technology expertise despite the possibility of reducing the size of the neural network. Few presented a loss-epoch plot of training and test9 to discuss overfitting, which is the production of a neural model that corresponds too closely or exactly to a particular set of data and may therefore fail to fit additional data or predict future observations reliably. Often, technology computer-aided design (TCAD) simulation to obtain the training/test data is not calibrated to experimental data.10 Unlike a diode,11 a calibrated TCAD dataset exploring the temperature domain is not available for transistors,12–14 perhaps due to the cost and calibrated dataset scarcity of experimental data.
Organic field-effect transistors (FETs) have been studied as key electronic devices for flexible and printed electronics owing to the semiconducting properties of molecules and polymers.15,16 Charge transport in organic semiconductors with Gaussian disorder is manifested by thermally assisted hopping via energetically and spatially random sites, which exhibit a complex temperature, charge carrier concentration and electric-field dependence.17,18 There have been reports on analytical compact models for organic FETs,7,19 models based on the transition function20 and the overdrive voltage21 to cover all operation voltages and models to consider non-ideal effects in channel length modulation,22 mobility and contact resistance23 and temperature variation.24,25 Despite continuous effort, it takes some time for a complete compact mode to be established. Moreover, an analytical compact model is not yet available for transistors with interesting device behavior, such as negative transconductance26 or negative differential resistance.27
To address these issues, a neural compact model based on deep learning without physics-informed equations is presented, which could unleash the requirements of technology expertise. In addition, we propose a method for training a neural compact model efficiently using transfer learning. The rest of this paper is organized as follows. In the Results and discussion section, we present the dataset construction, the charge carrier transport and electrical characteristics of organic FETs, the base and transfer learning algorithm and function approximation results, and the circuit simulation results. We also discuss the robustness analysis of data scarcity on the accuracy and efficiency of the neural compact model. In order to evaluate the accuracy, we adopt the mean absolute percentage error (MAPE), which is stricter than the R-squared value that is common in the deep learning community. Finally, we conclude this paper.
To describe this behavior using a neural compact model, we constructed a dataset using the TCAD simulation (Atlas version 5.14.0.R) calibrated against the experimental measurement.18 As shown in detail, VGS varies from −60 to 5 V, decreasing by 1 V. VDS varies from −60 to 15 V, equally spaced by 5 V. The temperature was varied from 200 to 300 K at 20 K (Fig. 1(b)). In order to ensure a sufficient number of data points for learning, 100 Monte-Carlo simulations were conducted for each set of VDS, VGS and T and augmented by Gaussian noise injection to 10000. The raw data requires preprocessing and classification steps to be applied to the neural network (Fig. 1(c)). The data undergoes normalization for training, during which the min–max scaling normalization formula29 is utilized to readjust the data so that all features are precisely situated between 0 and 1. To segregate the data for training, prediction, and validation, the data was split in an 8:2 ratio. Subsequently, categorized validation data is employed to assess the predicted data.
We conducted PCA on both the calibrated TCAD simulation data and experimental data to examine the quality of the TCAD simulation data and to confirm the similarity of the TCAD simulation data to the experimental data. PCA efficiently extracts important information without requiring many parameters, thereby making it possible to represent complex data in lower dimensions. The purpose of PCA is to find and project the axis that preserves the variance of the original data as much as possible. In essence, the basis vectors along the x-axis are significant, and reducing the dimensions entails determining which components are key factors and which are deemed noise. PCA is advantageous for applying and utilizing data in modeling, as it helps identify patterns between data points and enhances data analysis.30,31
The key function involved reducing the dataset to a given number of principal components (T, VDS) using a Python environment. We computed the eigenvalues (eigvals) and eigenvectors (eigvecs) using functions provided by the NumPy library and selected the necessary number of principal components based on them. Subsequently, each sample was projected onto the principal components to generate new feature vectors. PCA was then performed by extracting the data from the T and VDS columns. This process facilitated a comprehensive analysis of the dataset's structure and patterns through visualization.
Fig. 2 illustrates the visualization results of the ID–VGS curve data of the calibrated TCAD simulation. In Fig. 2(a) and (b), VDS and temperature are, respectively, set as labels. Along the principal component analysis axis 1 (PCA1), the variation in temperature for a fixed VDS condition (−60 V) leads data to spread by a largest 0.8 (ΔmaxT) between 280 K and 300 K, while the variation in VDS for a fixed temperature (T = 300 K) leads data to spread by a largest 3.8 between 0 V and −15 V (ΔmaxVDS) along PCA1. In the meantime, the variation along PCA2 is negligibly insignificant compared to PCA1 by a factor of 10−5. Therefore, the effects of both temperature and VDS are significant, with the latter being at a maximum of about 2 times more pronounced. This suggests that VDS variation could be less apparent to learning the temperature variation. It can be inferred that in the context of transfer learning, a model that has been pre-trained with data from all VDS conditions of 300 K, 260 K, and 220 K would infer the other temperature data more easily. In summary, PCA could provide information on possible challenges and guide the assessment of a model's performance in transfer learning scenarios. Additionally, the experimental data exhibits a PCA pattern similar to that of the TCAD simulation data, suggesting consistency across different data sources (see Fig. S1 in ESI†).
The input layer consists of three neurons: VGS, VDS, and T. The hidden layer has a tandem structure of fully connected (FC) layers and convolution (Conv) layers, consisting of 2 FC layers (512 neurons each), 2 convolution (Conv) layers and 3 FC layers (1024, 512, and 66 neurons, respectively). We conducted an initial experiment with an MLP network composed of three FC layers; however, in terms of the accuracy and ability to describe the hump characteristics, its performance was not sufficiently good. To mitigate this issue, the presented layer structure was chosen. Among the seven layers, each convolution layer was introduced to extract features for different subnets based on the VGS and VDS. The two FC layers preceding the convolution layers were designed to capture the overall characteristics of the curve and the electrical properties of the learning model. The three FC layers serve to apply the local captures from the previous convolutional layers to the prediction of a single output, which is the current. The first FC layer uses the tanh function as the activation function for VGS and the sigmoid function as the activation function for VDS, forming different subnets. Therefore, the effects of VGS and VDS could be represented separately for each type of neuron. The others have adopted the ReLu function as the activation layer. In particular, the neurons in the second FC layer are connected to both tanh and sigmoid neurons from the first FC layer. These neurons represent the coupled effects of VGS and VDS on the channel potential and carrier concentration. For faster and more stable training, batch normalization (BN) was conducted at each layer's input.35 The output layer has a single neuron representing the drain current ID. The neural network is trained for 200 epochs, and the model is evaluated using the mean square error (MSE) loss function. Detailed information about the hyperparameters used and the model training are listed in Table 1.
Parameters | Base model | Transfer model |
---|---|---|
Hidden layers | 7 | 7 |
Optimizer | Adam; RMSprop | RMSprop |
Epochs | 200 | 200 |
Activation function | Tanh; sigmoid; ReLu | Tanh; sigmoid; ReLu |
Learning rate | 0.001 | 0.001 |
Loss function | MSE | MSE |
Next, we explain the transfer learning used to develop a more effective and efficient neural compact model. Transfer learning benefits from the knowledge acquired during pre-training and updating of the model through fine-tuning using backpropagation, which could reduce the total computational time and cost. In this study, as shown in Fig. 3(b), we performed transfer learning in the VDS and temperature T domain. For the former, we conducted pre-training for three temperatures, e.g. 300, 260 and 220 K, and then fine-tuned the model to predict the results for three unseen temperatures, e.g. 280, 240 and 200 K. For the latter, we conducted pre-training for four VDS, e.g. −60 V, −40, −20 and 0 V, and then fine-tuned the model for five unseen VDS, e.g. −50, −30, −10, 5 and 15 V. The configuration of the hidden layers remains the same as the base model. Only the third to fifth FC layers (FC3–FC5) were updated during fine-tuning (orange arrow and lines). The key evaluation metrics for our model are how fast it predicts the unseen variables and how accurately it represents the device characteristics under these conditions. We believe that these aspects can be observed through the epoch-loss graph as well as the I–V curves on linear and semilog scales.
Detailed information regarding the hyperparameters used and defined for the transfer model is provided in Table 1. As shown in Table 1, the most significant difference in hyperparameters between the base learning model and the transfer learning model is the unique choice of the RMSprop optimizer.36 Being classified as a gradient descent optimization method, the Adam37 and the RMSprop38 optimizer have been frequently compared. In this paper, we decided to employ the RMSprop optimizer based on the results and discussion in the ESI†, which demonstrated that the RMSprop optimizer offers better enhancements for the hump characteristics specific to our datasets.
(1) |
At this point, we aimed to compare the accuracy of MAPE values in our model with those of other models in the literature. The MAPE depends widely on factors such as the number of data samples and the number of labels in the model. The MAPE of the ID–VGS characteristics under 15% is commonly considered a good evaluation metric.39 There have been reports of the MAPE of S-parameters with respect to VGS, VDS and frequency around 0.2–2%.40 Our model computes the MAPE of ID–VGS characteristics in T and VDS domains; the data size is considerably larger and complex (ID varies exponentially) and the model accuracy is high.
In addition to these good figure-of-merits, the ‘hump’ feature that is more pronounced at lower temperatures is successfully modelled (see the semilog plots in Fig. 6). The fine-tuning took 5 h 31 min 22 s for transfer learning in the temperature domain.
We aimed to further examine the device characteristics and consistency not only by the ID–VGS curves on linear and semilog scales but also by its first- and second-order derivatives. Fig. 6(c) and (d) present the results of the first-order derivative (g) and second-order derivative (h) of current with respect to VGS. It can be observed from representative VDS conditions that the predicted values and test data show close consistency across all temperature conditions. The model has demonstrated its ability to handle predictions for all considered VGS, VDS and temperature conditions with excellent performance using knowledge of the effects of carrier concentration, electric field, and temperature acquired by transfer learning. Similar to the base learning, we confirmed that the transfer learning in the temperature domain is free from the overfitting problem by referring to the MSE loss of training and test data with respect to the number of epochs (see Fig. S2(b), ESI†).
In terms of accuracy and time cost, we compared the base learning model without transfer learning, with transfer learning in the temperature domain, and with transfer learning in the VDS domain (Table 2). First, the MSE loss, R-squared, and RE values indicated that a slightly higher accuracy was achieved for transfer learning. In addition, transfer learning in the temperature domain was more accurate than that in the VDS domain. This could originate from the spread of the data points shown in the PCA plots. Second, the total time was reduced by about half when transfer learning was adopted, demonstrating the efficiency of the transfer learning approach. In detail, we analyzed the time cost of transfer learning by classifying the total time into pre-training time (three seen temperature conditions and four seen VDS conditions) and fine-tuning time (three unseen temperature conditions and five unseen VDS conditions). These are compared to the total time required to train and test the base learning model for the unseen data (the data used for fine-tuning data in transfer learning) for a fair comparison. In the temperature domain, it took 5 h 31 min for transfer learning, which is about half of homologous base learning (10 h 40 min). In the meantime, it took 7 h 24 min for transfer learning in the VDS domain, which is about half of homologous base learning (16 h 31 min). This analysis confirms that the transfer learning model, under the same conditions, outperforms the base model in terms of accuracy and time cost. Furthermore, the advantages of the transfer model can also be observed in Fig. 5. The scatter plot of the experimental values against the model's predictions of transfer learning (in both the T and VDS domains) aligns almost perfectly on the y = x line, which supports that the accuracy does not degrade upon applying transfer learning.
Model type | Non-transfer learning model | Transfer learning model | ||
---|---|---|---|---|
T domain | V DS domain | T domain | V DS domain | |
Pre-training time | — | — | 10 h 27 min 52 s | 12 h 11 min 13 s |
Fine-tuning time | — | — | 5 h 31 min 22 s | 7 h 24 min 56 s |
Total time | 10 h 40 min 51 s | 16 h 31 min 23 s | 5 h 31 min 22 s | 7 h 24 min 56 s |
MSE loss | 0.0354 | 0.0481 | 0.0226 | 0.0318 |
R-squared | 0.998 | 0.996 | 0.999 | 0.998 |
MAPE (%) | 6.53 | 8.17 | 4.32 | 7.28 |
In more detail, the difference in the total time, specifically for transfer learning on unseen data, between the temperature (5 h 31 min) and VDS (7 h 24 min) experiments originates from the difference in the number of data labels. The effective total time, i.e. (total time)/(sample number × unseen label number in transfer domain × the other domain label number), is comparable. Here, the total time represents the time taken for transfer learning on the unseen data. In the temperature domain, in detail, the time taken for each of the three unseen labels in the transfer domain (T = 280, 240 and 200 K) for 16 VDS labels, each comprising the same number of data samples (100 samples) is 5 h 31 min, equating to approximately 6 min per label. Meanwhile, in the VDS domain, which consists of five unseen labels in the transfer domain (VDS = −50, −30, −10, 0 and 5 V) for 6 temperature labels each, the time required is approximately 14.8 min per label.
Finally, we confirmed that transfer learning in the drain–source voltage domain is free from the overfiting problem by referring to the MSE loss of the training and test data with respect to the number of epochs (see Fig. S2(c), ESI†).
Fig. 8 Schematic circuit diagram of a resistive-load inverter. R = 2.8 MΩ, VDD = −60 V. Voltage transfer curves at various temperatures: T = 300, 280, 220, and 200 K. |
Fig. 8 shows the voltage transfer curves of the resistive-load inverter gate at various temperatures using the neural compact model and TCAD simulation (as the ground truth). The input voltage Vin was varied from 0 to −60 V. The simulation results are summarized in Table 3. The neural compact model accurately predicts the switching voltage Vsw and its shift toward a more negative voltage with a decrease of temperature. In addition, the neural compact model successfully describes the noise margin and its narrowing with a decrease of temperature. The proposed neural compact model achieved a high accuracy of less than 5% under all temperature conditions in terms of error defined by error = |(neuralcompactmodel) − (groundtruth)/(neuralcompactmodel)| × 100.
Parameter | 300 K | 280 K | 220 K | 200 K | |
---|---|---|---|---|---|
3*NMH (V) | Neural compact model | 20.76 | 20.70 | 12.14 | 7.20 |
Ground truth | 20.70 | 20.50 | 12.60 | 7.29 | |
Error (%) | 0.29 | 0.97 | 3.79 | 1.25 | |
NM L (V) | Neural compact model | 5.85 | 4.20 | 4.46 | 5.66 |
Ground truth | 6.0 | 4.30 | 4.50 | 5.89 | |
Error (%) | 2.56 | 0.97 | 0.90 | 4.06 | |
V SW (V) | Neural compact model | −26.3 | −27.6 | −34.1 | −36.8 |
Ground truth | −26.5 | −28.0 | −34.0 | −37.0 | |
Error (%) | 0.8 | 1.4 | 0.3 | 0.5 |
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4tc01224k |
This journal is © The Royal Society of Chemistry 2024 |