A neural compact model based on transfer learning for organic FETs with Gaussian disorder

Minsun Cho ab, Marin Franot ac, O-Joun Lee d and Sungyeop Jung *ae
aAdvanced Institute of Convergence Technology, Seoul National University, Suwon, 16229, Republic of Korea
bInstitute of Industrial Technology, Korea University, Sejong 30019, Republic of Korea
cÉcole Nationale Supérieure d’Électrotechnique, d’Électronique, d’Informatique, d’Hydraulique et des Télécommunications, Toulouse INP, Toulouse, 31000, France
dDepartment of Artificial Intelligence, The Catholic University of Korea, Bucheon 14662, Republic of Korea
eDepartment of AI Semiconductor Engineering, Korea University, Sejong 30019, Republic of Korea. E-mail: sungyeopjung@korea.ac.kr

Received 27th March 2024 , Accepted 9th September 2024

First published on 10th September 2024


Abstract

We present an approach to adopt deep neural networks for the development of a compact model for transistors, namely a neural compact model, including transfer learning to enhance accuracy and reduce model development time. We examine the effectiveness of this approach when the electrical data for neural networks is scarce and costly and when the electrical characteristics to be modeled are highly non-linear. By using technology computer-aided design simulations, we constructed a dataset of the electrical characteristics of organic field-effect transistors with Gaussian disorder that exhibit highly non-linear current–voltage curves. Subsequently, we developed neural compact models by modifying conventional deep learning models and validated the effectiveness of transfer learning with testing through various experiments. We showed that the neural compact model with transfer learning provides an equivalent accuracy at a significantly shorter training time.


Introduction

A compact model of a circuit element is sufficiently simple to be incorporated in circuit simulators, is sufficiently accurate to make the result of the simulation useful to circuit designers, and is rigorous enough to capture device physics.1 To improve simplicity, accuracy and physical rigor, a look-up table (LUT), and analytical or artificial neural network (ANN) compact models have been introduced.

A LUT compact model is a set of tables of electrical characteristics2 which includes a search function, interpolation routines, and an interface to the circuit simulator. A LUT compact model is computationally simple while relatively large tables are required to ensure its accuracy. An analytical compact model is a set of mathematical equations derived from device physics which can model the device behavior of the entire operation regime using a set of device parameters obtained by a few representative measurements. An analytical compact model is translated into computer languages, such as Verilog-A or C, which makes it easy to distribute and interface with a circuit simulator. In the meantime, as more complicated physical phenomena are introduced due to distinctive material characteristics, device structure, and operation mechanism, the number of model equations and parameters increases. As a consequence, parameter extraction that ensures an accurate fitting, becomes a complicated task. In addition, compact model development could take a significantly long time (i.e. several years).

An ANN-based compact device model, or a neural compact model, has been reported for more accurate and faster modeling.3–5 The effects of scaling6 and novel device technology7 have been studied. However, most of the existing neural compact models have limitations in their accuracy for modeling in design instances (W, L and T), bias (VDS and VGS), and technology parameters, and result in non-physical behavior such as non-zero current at VDS = 0 V and/or an asymmetric IV model. Furthermore, indirect approaches such as using a physics-augmentation loss function7 or coupling with a standard analytical compact model8 still require significant technology expertise despite the possibility of reducing the size of the neural network. Few presented a loss-epoch plot of training and test9 to discuss overfitting, which is the production of a neural model that corresponds too closely or exactly to a particular set of data and may therefore fail to fit additional data or predict future observations reliably. Often, technology computer-aided design (TCAD) simulation to obtain the training/test data is not calibrated to experimental data.10 Unlike a diode,11 a calibrated TCAD dataset exploring the temperature domain is not available for transistors,12–14 perhaps due to the cost and calibrated dataset scarcity of experimental data.

Organic field-effect transistors (FETs) have been studied as key electronic devices for flexible and printed electronics owing to the semiconducting properties of molecules and polymers.15,16 Charge transport in organic semiconductors with Gaussian disorder is manifested by thermally assisted hopping via energetically and spatially random sites, which exhibit a complex temperature, charge carrier concentration and electric-field dependence.17,18 There have been reports on analytical compact models for organic FETs,7,19 models based on the transition function20 and the overdrive voltage21 to cover all operation voltages and models to consider non-ideal effects in channel length modulation,22 mobility and contact resistance23 and temperature variation.24,25 Despite continuous effort, it takes some time for a complete compact mode to be established. Moreover, an analytical compact model is not yet available for transistors with interesting device behavior, such as negative transconductance26 or negative differential resistance.27

To address these issues, a neural compact model based on deep learning without physics-informed equations is presented, which could unleash the requirements of technology expertise. In addition, we propose a method for training a neural compact model efficiently using transfer learning. The rest of this paper is organized as follows. In the Results and discussion section, we present the dataset construction, the charge carrier transport and electrical characteristics of organic FETs, the base and transfer learning algorithm and function approximation results, and the circuit simulation results. We also discuss the robustness analysis of data scarcity on the accuracy and efficiency of the neural compact model. In order to evaluate the accuracy, we adopt the mean absolute percentage error (MAPE), which is stricter than the R-squared value that is common in the deep learning community. Finally, we conclude this paper.

Methods

High mobility organic FET dataset acquisition

Fig. 1(a) shows the structure and charge transport mechanism of the IDTBT (indaceno dithiophene-co-benzo-thiadiazole)-based organic FET. Despite the highly ordered face-on morphology of the IDTBT thin film that leads to high mobility,28 the intermolecular distance (ID) requires thermally assisted hopping via energetically and spatially random sites to complete the charge transport (Fig. 1(a)).18 Therefore, the mobility, and hence, the IDVGS characteristics, exhibit dependence on the temperature, field and charge carrier concentration.18
image file: d4tc01224k-f1.tif
Fig. 1 (a) Chemical structure of indacenodithiophene-co-benzo-thiadiazole (IDTBT), the charge carrier transport via thermally assisted hopping through inter-molecular distance (ID) in the face-on semiconductor thin film of the top-gate bottom-contact organic field-effect transistor, energy and spatial diagram of the thermally assisted hopping under Gaussian disorder model, (b) calibration of measurement-TCAD data at various temperatures, and (c) dataset generation, learning and regression (a.k.a. function approximation) using a neural network.

To describe this behavior using a neural compact model, we constructed a dataset using the TCAD simulation (Atlas version 5.14.0.R) calibrated against the experimental measurement.18 As shown in detail, VGS varies from −60 to 5 V, decreasing by 1 V. VDS varies from −60 to 15 V, equally spaced by 5 V. The temperature was varied from 200 to 300 K at 20 K (Fig. 1(b)). In order to ensure a sufficient number of data points for learning, 100 Monte-Carlo simulations were conducted for each set of VDS, VGS and T and augmented by Gaussian noise injection to 10[thin space (1/6-em)]000. The raw data requires preprocessing and classification steps to be applied to the neural network (Fig. 1(c)). The data undergoes normalization for training, during which the min–max scaling normalization formula29 is utilized to readjust the data so that all features are precisely situated between 0 and 1. To segregate the data for training, prediction, and validation, the data was split in an 8[thin space (1/6-em)]:[thin space (1/6-em)]2 ratio. Subsequently, categorized validation data is employed to assess the predicted data.

We conducted PCA on both the calibrated TCAD simulation data and experimental data to examine the quality of the TCAD simulation data and to confirm the similarity of the TCAD simulation data to the experimental data. PCA efficiently extracts important information without requiring many parameters, thereby making it possible to represent complex data in lower dimensions. The purpose of PCA is to find and project the axis that preserves the variance of the original data as much as possible. In essence, the basis vectors along the x-axis are significant, and reducing the dimensions entails determining which components are key factors and which are deemed noise. PCA is advantageous for applying and utilizing data in modeling, as it helps identify patterns between data points and enhances data analysis.30,31

The key function involved reducing the dataset to a given number of principal components (T, VDS) using a Python environment. We computed the eigenvalues (eigvals) and eigenvectors (eigvecs) using functions provided by the NumPy library and selected the necessary number of principal components based on them. Subsequently, each sample was projected onto the principal components to generate new feature vectors. PCA was then performed by extracting the data from the T and VDS columns. This process facilitated a comprehensive analysis of the dataset's structure and patterns through visualization.

Fig. 2 illustrates the visualization results of the IDVGS curve data of the calibrated TCAD simulation. In Fig. 2(a) and (b), VDS and temperature are, respectively, set as labels. Along the principal component analysis axis 1 (PCA1), the variation in temperature for a fixed VDS condition (−60 V) leads data to spread by a largest 0.8 (ΔmaxT) between 280 K and 300 K, while the variation in VDS for a fixed temperature (T = 300 K) leads data to spread by a largest 3.8 between 0 V and −15 V (ΔmaxVDS) along PCA1. In the meantime, the variation along PCA2 is negligibly insignificant compared to PCA1 by a factor of 10−5. Therefore, the effects of both temperature and VDS are significant, with the latter being at a maximum of about 2 times more pronounced. This suggests that VDS variation could be less apparent to learning the temperature variation. It can be inferred that in the context of transfer learning, a model that has been pre-trained with data from all VDS conditions of 300 K, 260 K, and 220 K would infer the other temperature data more easily. In summary, PCA could provide information on possible challenges and guide the assessment of a model's performance in transfer learning scenarios. Additionally, the experimental data exhibits a PCA pattern similar to that of the TCAD simulation data, suggesting consistency across different data sources (see Fig. S1 in ESI).


image file: d4tc01224k-f2.tif
Fig. 2 PCA plot results depicting the distribution of current data with respect to VGS, labeled as temperature and VDS, are shown. In (a), the analysis is based on temperature, encompassing all data points of the IV curves. Conversely, (b) presents the analysis based on VDS, incorporating all data points of the IV curves.

Base and transfer learning model

Fig. 3(a) shows the neural compact model adopting a 1-dimensional (1-D) convolution neural network (CNN) architecture. In a CNN,32 convolutions are applied to extract features, where the first function of a convolution is represented by the input data of the network, while the second is represented by a kernel, which is a sliding filter used to extract features. The term ‘1-D’ signifies that both the convolutional kernels and sequences of data are applied to have a one-dimensional structure. The 1-D CNN has been commonly employed for tasks such as time-series analysis, text analysis,33 and fault detection.34 The data of IDVGS characteristics, in essence, has the same structure. Thereby, a 1-D CNN was employed to exploit the sequential invariance of data by performing the same operations across different segments of input data points and the feature extraction capability for the hump shape of the current–voltage characteristics. In addition, the 1-D CNN could prevent the original correlation in the current, gate–source voltage, drain–source voltage, and temperature from being destroyed by direct conversion into a multi-dimensional form.34
image file: d4tc01224k-f3.tif
Fig. 3 (a) Schematic diagram of the structure of the 1-dimensional convolutional neural network (1-D CNN) in tandem with fully connected (FC) layers used in this study. (b) Schematic diagram of neural compact modeling by transfer learning. For transfer learning, knowledge from pre-training (yellow line) is transferred to fine-tuning (orange line).

The input layer consists of three neurons: VGS, VDS, and T. The hidden layer has a tandem structure of fully connected (FC) layers and convolution (Conv) layers, consisting of 2 FC layers (512 neurons each), 2 convolution (Conv) layers and 3 FC layers (1024, 512, and 66 neurons, respectively). We conducted an initial experiment with an MLP network composed of three FC layers; however, in terms of the accuracy and ability to describe the hump characteristics, its performance was not sufficiently good. To mitigate this issue, the presented layer structure was chosen. Among the seven layers, each convolution layer was introduced to extract features for different subnets based on the VGS and VDS. The two FC layers preceding the convolution layers were designed to capture the overall characteristics of the curve and the electrical properties of the learning model. The three FC layers serve to apply the local captures from the previous convolutional layers to the prediction of a single output, which is the current. The first FC layer uses the tanh function as the activation function for VGS and the sigmoid function as the activation function for VDS, forming different subnets. Therefore, the effects of VGS and VDS could be represented separately for each type of neuron. The others have adopted the ReLu function as the activation layer. In particular, the neurons in the second FC layer are connected to both tanh and sigmoid neurons from the first FC layer. These neurons represent the coupled effects of VGS and VDS on the channel potential and carrier concentration. For faster and more stable training, batch normalization (BN) was conducted at each layer's input.35 The output layer has a single neuron representing the drain current ID. The neural network is trained for 200 epochs, and the model is evaluated using the mean square error (MSE) loss function. Detailed information about the hyperparameters used and the model training are listed in Table 1.

Table 1 Hyperparameters for the base model without transfer learning and transfer model
Parameters Base model Transfer model
Hidden layers 7 7
Optimizer Adam; RMSprop RMSprop
Epochs 200 200
Activation function Tanh; sigmoid; ReLu Tanh; sigmoid; ReLu
Learning rate 0.001 0.001
Loss function MSE MSE


Next, we explain the transfer learning used to develop a more effective and efficient neural compact model. Transfer learning benefits from the knowledge acquired during pre-training and updating of the model through fine-tuning using backpropagation, which could reduce the total computational time and cost. In this study, as shown in Fig. 3(b), we performed transfer learning in the VDS and temperature T domain. For the former, we conducted pre-training for three temperatures, e.g. 300, 260 and 220 K, and then fine-tuned the model to predict the results for three unseen temperatures, e.g. 280, 240 and 200 K. For the latter, we conducted pre-training for four VDS, e.g. −60 V, −40, −20 and 0 V, and then fine-tuned the model for five unseen VDS, e.g. −50, −30, −10, 5 and 15 V. The configuration of the hidden layers remains the same as the base model. Only the third to fifth FC layers (FC3–FC5) were updated during fine-tuning (orange arrow and lines). The key evaluation metrics for our model are how fast it predicts the unseen variables and how accurately it represents the device characteristics under these conditions. We believe that these aspects can be observed through the epoch-loss graph as well as the IV curves on linear and semilog scales.

Detailed information regarding the hyperparameters used and defined for the transfer model is provided in Table 1. As shown in Table 1, the most significant difference in hyperparameters between the base learning model and the transfer learning model is the unique choice of the RMSprop optimizer.36 Being classified as a gradient descent optimization method, the Adam37 and the RMSprop38 optimizer have been frequently compared. In this paper, we decided to employ the RMSprop optimizer based on the results and discussion in the ESI, which demonstrated that the RMSprop optimizer offers better enhancements for the hump characteristics specific to our datasets.

Hardware and software environment

The model training was conducted utilizing both the 12th Gen Intel(R) Core(TM) i9-12900K, 3.20 GHz, 64.0RAM CPU environment and Geforce-rtx4090 GPU environment. Through the torch device configuration, the model operates seamlessly in both the CPU and GPU environments. The software setup for model training was carried out entirely in Python 3.7 environment, utilizing Tensorflow and PyTorch libraries for building and training the models. Data selection was based on identifying the signal length differences in the log files and selecting the data with the highest number of cases as training data.

Results and discussion

Effectiveness of transfer learning on the neural compact model

Base neural compact model. For the base neural compact model without transfer learning, a base learning model was used for the IDVGS data for all VDS and temperature conditions. The number of epochs was determined to be 200 by comparing the MSE loss of the training and test data at various epoch numbers (see Fig. S2(a) in the ESI), with an early stopping option. In Fig. 4(a) and (b), the IDVGS curves of the predicted and test data are plotted for all VDS conditions at 300 K selectively. We emphasize that the transfer characteristics exhibit neither linear behavior for a linear regime (VDS < VGSVT) nor quadratic behavior for a saturation regime (VDS > VGSVT). In addition, the ‘hump’ feature in IDVGS for −60 V < VDS < 5 V was effectively modeled using the base neural compact model. Regarding the accuracy of the base learning, the neural compact model can be evaluated by the goodness of fit in Fig. 4(c) and the value of the loss obtained through the MSE loss function. This shows that the predicted and test data are almost identical for all conditions (see Fig. 5(a)), confirming that the neural compact model demonstrates a well-functioning IDVGS relationship. Additionally, the MSE loss of 0.0354, the R-squared value of 0.998, and the MAPE of 6.53% also support the accuracy of the model. In this paper, the MAPE (mean absolute percentage error) was selected as one of the model evaluation metrics, and its formula is as follows:
 
image file: d4tc01224k-t1.tif(1)

image file: d4tc01224k-f4.tif
Fig. 4 Results of the base learning model: predicted (filled circles) and test (open circles) IDVGS characteristics for various VDS values at 300 K. VDS varies from −60 V to 15 V in increments of 5 V (denoted by a different color). (a) On the linear scale, and (b) on the log scale. Each color of the symbols represents the same VDS conditions in (a) and (b). (c) The MSE loss function of the base learning versus epochs: the prediction loss (blue line) and the test loss (orange line).

image file: d4tc01224k-f5.tif
Fig. 5 Variance graphs of the experimental and predicted values for y in relation to temperature are presented in (a) and (b). The current of the non-transfer model is plotted in (a), while the current of the transfer model is plotted in (b). Graphs (a) and (b) include results for temperatures of 280 K, 240 K, and 200 K for an accurate comparison. The current variance graph for y between the experimental and predicted values related to the drain–source voltage is plotted in (c). Graphs (a) and (c) include the results for VDS of −50 V, −30 V, −10 V and 0 V.
Transfer learning for unseen temperature conditions. The results of the transfer learning, i.e. the predicted and test IDVGS curves, are shown in Fig. 6. Transfer learning in the temperature domain, as evident from the result graphs for the two representative VDS conditions (VDS = −5 V and −60 V) with the most distinct differences, shows high accuracy for all temperature conditions (both seen (300, 260, and 220 K) and unseen (280, 240, and 200 K) temperature conditions). An R-squared score of 0.999, which is one of the evaluation metrics, also indicates a good evaluation value close to 1. The MSE loss was 0.0318. The MAPE in percentage was 4.32% (see Table 2).
image file: d4tc01224k-f6.tif
Fig. 6 I DVGS characteristics under linear (VDS = −5 V) (a) and (e) and saturation (VDS = −60 V) (b) and (f) regimes on linear and semilog scales. The results of the transfer learning, prediction data (circle) and test data (filled circle) consist of red (seen) and blue (unseen) data. (c) and (d) First-order derivative (i.e. transconductance, gm) of (a) and (b), and (g) and (h) Second-order derivative of (a) and (b). The temperature conditions of the data range from 300 K to 200 K in decrements of 20 K. For transfer learning, the temperature conditions of the seen data are 300 K, 260 K, and 220 K, and those of the unseen data are 280 K, 240 K, and 200 K.

At this point, we aimed to compare the accuracy of MAPE values in our model with those of other models in the literature. The MAPE depends widely on factors such as the number of data samples and the number of labels in the model. The MAPE of the IDVGS characteristics under 15% is commonly considered a good evaluation metric.39 There have been reports of the MAPE of S-parameters with respect to VGS, VDS and frequency around 0.2–2%.40 Our model computes the MAPE of IDVGS characteristics in T and VDS domains; the data size is considerably larger and complex (ID varies exponentially) and the model accuracy is high.

In addition to these good figure-of-merits, the ‘hump’ feature that is more pronounced at lower temperatures is successfully modelled (see the semilog plots in Fig. 6). The fine-tuning took 5 h 31 min 22 s for transfer learning in the temperature domain.

We aimed to further examine the device characteristics and consistency not only by the IDVGS curves on linear and semilog scales but also by its first- and second-order derivatives. Fig. 6(c) and (d) present the results of the first-order derivative (g) and second-order derivative (h) of current with respect to VGS. It can be observed from representative VDS conditions that the predicted values and test data show close consistency across all temperature conditions. The model has demonstrated its ability to handle predictions for all considered VGS, VDS and temperature conditions with excellent performance using knowledge of the effects of carrier concentration, electric field, and temperature acquired by transfer learning. Similar to the base learning, we confirmed that the transfer learning in the temperature domain is free from the overfitting problem by referring to the MSE loss of training and test data with respect to the number of epochs (see Fig. S2(b), ESI).

Transfer learning for unseen drain–source voltage conditions. Next, we performed transfer learning for the VDS conditions to demonstrate the generalization performance of the neural compact model. As shown in Fig. 7, we confirmed the neural compact to predict all VDS conditions (−60 to 15 V at 10 V) at two representative temperature conditions (300 and 200 K). The result graphs for the prediction show high consistency with the test values for all conditions, i.e. both seen and unseen VDS conditions, on linear and logarithmic scales. Furthermore, similar to the modeling of the temperature conditions, excellent agreement was obtained for the first and second derivatives of ID for VGS (Fig. 7(c), (d), (g) and (h)). The MSE loss for transfer learning in the VDS domain was 0.0318. The overall R-squared value was 0.998. The MAPE was 7.28%. The fine-tuning time was 7 h 24 min, which is similar to that in the temperature domain.
image file: d4tc01224k-f7.tif
Fig. 7 I DVGS characteristics at temperature T = 300 K (a) and (e) and temperature T = 200 K (b) and (f) regimes on linear and semilog scales. The results of the transfer learning, prediction data (circle) and test data (filled-circle) consist of red (seen) and blue (unseen) data. (c) and (d) First-order derivative (i.e. transconductance, gm) of (a) and (b), and (g) and (h) second-order derivative of (a) and (b). The VDS conditions of the data range from −60 V to 15 V in decrements of 10 V. For transfer learning, the VDS conditions of the seen data are −60 V, −40 V, −20 V and 0 V, and those of the unseen data are −50 V, −30 V, −10 V, 5 V and 15 V.

In terms of accuracy and time cost, we compared the base learning model without transfer learning, with transfer learning in the temperature domain, and with transfer learning in the VDS domain (Table 2). First, the MSE loss, R-squared, and RE values indicated that a slightly higher accuracy was achieved for transfer learning. In addition, transfer learning in the temperature domain was more accurate than that in the VDS domain. This could originate from the spread of the data points shown in the PCA plots. Second, the total time was reduced by about half when transfer learning was adopted, demonstrating the efficiency of the transfer learning approach. In detail, we analyzed the time cost of transfer learning by classifying the total time into pre-training time (three seen temperature conditions and four seen VDS conditions) and fine-tuning time (three unseen temperature conditions and five unseen VDS conditions). These are compared to the total time required to train and test the base learning model for the unseen data (the data used for fine-tuning data in transfer learning) for a fair comparison. In the temperature domain, it took 5 h 31 min for transfer learning, which is about half of homologous base learning (10 h 40 min). In the meantime, it took 7 h 24 min for transfer learning in the VDS domain, which is about half of homologous base learning (16 h 31 min). This analysis confirms that the transfer learning model, under the same conditions, outperforms the base model in terms of accuracy and time cost. Furthermore, the advantages of the transfer model can also be observed in Fig. 5. The scatter plot of the experimental values against the model's predictions of transfer learning (in both the T and VDS domains) aligns almost perfectly on the y = x line, which supports that the accuracy does not degrade upon applying transfer learning.

Table 2 Comparison of accuracy and time cost between the neural compact models using non-transfer (base) and transfer learning
Model type Non-transfer learning model Transfer learning model
T domain V DS domain T domain V DS domain
Pre-training time 10 h 27 min 52 s 12 h 11 min 13 s
Fine-tuning time 5 h 31 min 22 s 7 h 24 min 56 s
Total time 10 h 40 min 51 s 16 h 31 min 23 s 5 h 31 min 22 s 7 h 24 min 56 s
MSE loss 0.0354 0.0481 0.0226 0.0318
R-squared 0.998 0.996 0.999 0.998
MAPE (%) 6.53 8.17 4.32 7.28


In more detail, the difference in the total time, specifically for transfer learning on unseen data, between the temperature (5 h 31 min) and VDS (7 h 24 min) experiments originates from the difference in the number of data labels. The effective total time, i.e. (total time)/(sample number × unseen label number in transfer domain × the other domain label number), is comparable. Here, the total time represents the time taken for transfer learning on the unseen data. In the temperature domain, in detail, the time taken for each of the three unseen labels in the transfer domain (T = 280, 240 and 200 K) for 16 VDS labels, each comprising the same number of data samples (100 samples) is 5 h 31 min, equating to approximately 6 min per label. Meanwhile, in the VDS domain, which consists of five unseen labels in the transfer domain (VDS = −50, −30, −10, 0 and 5 V) for 6 temperature labels each, the time required is approximately 14.8 min per label.

Finally, we confirmed that transfer learning in the drain–source voltage domain is free from the overfiting problem by referring to the MSE loss of the training and test data with respect to the number of epochs (see Fig. S2(c), ESI).

Circuit simulation using the neural compact model

Acircuit simulation using the neural compact model is presented in this subsection. The operation of a resistive-load inverter gate consisting of a resistor of 2.8 MΩ connected between the VDD and output and a p-type organic FET of which the gate, source, and drain are connected, respectively, to the input, the ground, and the output (see Fig. 8) was simulated at various temperature conditions. DC analysis was conducted to verify the effect of the hump on the voltage transfer characteristics. The results were characterized in terms of the voltage transfer curve, Vinversus Vout. In order to investigate the accuracy, the data from the mixed-mode simulation were compared to those of the neural compact model.
image file: d4tc01224k-f8.tif
Fig. 8 Schematic circuit diagram of a resistive-load inverter. R = 2.8 MΩ, VDD = −60 V. Voltage transfer curves at various temperatures: T = 300, 280, 220, and 200 K.

Fig. 8 shows the voltage transfer curves of the resistive-load inverter gate at various temperatures using the neural compact model and TCAD simulation (as the ground truth). The input voltage Vin was varied from 0 to −60 V. The simulation results are summarized in Table 3. The neural compact model accurately predicts the switching voltage Vsw and its shift toward a more negative voltage with a decrease of temperature. In addition, the neural compact model successfully describes the noise margin and its narrowing with a decrease of temperature. The proposed neural compact model achieved a high accuracy of less than 5% under all temperature conditions in terms of error defined by error = |(neuralcompactmodel) − (groundtruth)/(neuralcompactmodel)| × 100.

Table 3 Comparison of the noise margin and switching voltage error between the neural compact model and ground truth
Parameter 300 K 280 K 220 K 200 K
3*NMH (V) Neural compact model 20.76 20.70 12.14 7.20
Ground truth 20.70 20.50 12.60 7.29
Error (%) 0.29 0.97 3.79 1.25
NM L (V) Neural compact model 5.85 4.20 4.46 5.66
Ground truth 6.0 4.30 4.50 5.89
Error (%) 2.56 0.97 0.90 4.06
V SW (V) Neural compact model −26.3 −27.6 −34.1 −36.8
Ground truth −26.5 −28.0 −34.0 −37.0
Error (%) 0.8 1.4 0.3 0.5


Robustness analysis for data scarcity

We conducted transfer learning to assess the performance of our model more systematically by training it with various amounts of target data. Transfer learning could be efficient and effective when the data is scarce under certain conditions; therefore, the amount of target data was set to 100 to represent the equivalent number of data available and decreased to 50, 20, and 10 sets to represent a situation where data is scarce. The results of the transfer learning on the temperature and drain voltage domains are summarized in Fig. 9. For the temperature domain (Fig. 9(a)), the initial attempt with 100 sets demonstrated excellent evaluation metrics, showing an MSE loss of 0.0226, R-squared of 0.9975, and MAPE of 4.32%. Through fine-tuning with 50, 20, and 10 sets, the model consistently exhibited similar loss and R-squared values to the initial attempt, thus validating the superiority of our transfer learning model. Notably, a more abrupt degradation in the mean absolute percentage error (MAPE) was observed from 10 sets (MAPE = 23.8%) onwards, suggesting that 20 sets (MAPE = 12.7%) could be the most appropriate number of data samples. For the drain voltage domain (Fig. 9(b)), the initial attempt with 100 sets provided an MSE loss of 0.0318, R-squared of 0.99822, and MAPE of 7.28%. Fine-tuning with 50, 20, and 10 sets, the model showed a similar degradation pattern to that in the drain voltage domain with a pronounced increase of the RE from 10 sets (MAPE = 28.7%) onwards, indicating that 20 sets (MAPE = 15.9%) could be more desirable. Comparing the values of MAPE between the transfer learning in the temperature and drain voltage domains, it can be inferred that transfer learning in the drain voltage domain is a more difficult task. The figure-of-merit values are summarized in Tables S1 and S2 (ESI). In addition, the accuracy of the model could be analyzed by the current–voltage plots, i.e. the transfer curve IDVGS (in both linear and semi-logarithmic scales) as well as the first and second derivatives of the transfer curve (see Fig. S4–S12, ESI).
image file: d4tc01224k-f9.tif
Fig. 9 Effect of the number of data samples on the accuracy of the neural compact model represented by means of MSE (in red), R-squared (in pink) and MAPE (in green): transfer learning in (a) the temperature domain, and (b) the drain voltage domain.

Conclusions

We demonstrated a neural compact model for organic field-effect transistors with Gaussian disorder that exhibits carrier concentration, electric field and temperature dependence of mobility. Complex charge transport physics is successfully incorporated into the neural compact model without any hand-crafted modeling effort and sacrificing accuracy. The advantages of neural network-based modeling for devices have been demonstrated, particularly in terms of accuracy and time cost. Furthermore, beyond the scope of conventional models, we succeeded in applying a knowledge-driven transfer learning approach for regression in the unseen domains. This approach not only maintains a good accuracy of the base model but also enables time-efficient modeling. Finally, we demonstrated that a neural compact model can be assessed by the quality of the learning data as well as the neural network. The size, distribution, and scarcity of data are important metrics that determine the accuracy and time cost of a neural compact model. We expect that a neural compact model could be an alternative choice, particularly for designing a circuit with semiconductor devices with complex non-linear IV characteristics.

Author contributions

MC conducted data curation and formal analysis, and wrote the original draft; MF conducted data curation OJL participated in the review and editing; and SJ conducted conceptualisation, formal analysis, review and editing, administered the project and supervised the work.

Data availability

The data supporting this article have been included as part of the ESI. The data and codes are available upon reasonable request.

Conflicts of interest

There are no conflicts to declare.

Acknowledgements

This work was supported in part by the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (MSIT) of Korea under grants NRF-2022M3F3A2A01076569, NRF-2021R1F1A1064384 and NRF-2022R1F1A1065516.

References

  1. G. Gildenblat, Compact Modeling: Principles, Techniques and Applications, Springer, Netherlands, 2010 Search PubMed.
  2. P. Meijer, IEEE Trans. Circuits Syst., 1990, 37, 335–346 CrossRef.
  3. F. Wang and Q.-j Zhang, 1997 IEEE MTT-S International Microwave Symposium Digest, 1997, 2, 627–630 vol. 2.
  4. H. B. Hammouda, M. Mhiri, Z. Gafsi and K. Besbes, Am. J. Appl. Sci., 2008, 5, 385–391 CrossRef.
  5. J. Xu and D. E. Root, 2015 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization (NEMO), 2015, pp. 1–3.
  6. H. Habal, D. Tsonev and M. Schweikardt, MLCAD 2020 - Proceedings of the 2020 ACM/IEEE Workshop on Machine Learning for CAD, 2020, 111–116.
  7. Y. Kim, S. Myung, J. Ryu, C. Jeong and D. S. Kim, International Conference on Simulation of Semiconductor Processes and Devices, SISPAD, 2020, 2020-September, 257–260.
  8. M.-Y. Kao, H. Kam and C. Hu, IEEE Electron Device Lett., 2022, 43, 974–977 CAS.
  9. J. Lim and C. Shin, IEEE Access, 2020, 8, 158237 Search PubMed.
  10. T. Hirtz, S. Huurman, H. Tian, Y. Yang and T.-L. Ren, J. Semicond., 2021, 42, 124101 CrossRef.
  11. H. Y. Wong, M. Xiao, B. Wang, Y. K. Chiu, X. Yan, J. Ma, K. Sasaki, H. Wang and Y. Zhang, IEEE J. Electron Devices Soc., 2020, 8, 992–1000 CAS.
  12. S. Woo, H. Jeong, J. Choi, H. Cho, J.-T. Kong and S. Kim, Electronics, 2022, 11, 2761 CrossRef.
  13. K. Ko, J. K. Lee, M. Kang, J. Jeon and H. Shin, IEEE Trans. Electron Devices, 2019, 66, 4474–4477 CAS.
  14. Q. Chen and G. Chen, 2016 7th International Conference on Computer Aided Design for Thin-Film Transistor Technologies (CAD-TFT), 2016, pp. 1–1.
  15. A. Tsumura, H. Koezuka and T. Ando, Appl. Phys. Lett., 1986, 49, 1210–1212 CrossRef CAS.
  16. G. Horowitz, Adv. Mater., 1998, 10, 365–377 CrossRef CAS.
  17. S. Jung, Y. Lee, A. Plews, A. Nejim, Y. Bonnassieux and G. Horowitz, IEEE Trans. Electron Devices, 2021, 68, 307–310 Search PubMed.
  18. Y. Lee, S. Jung, A. Plews, A. Nejim, O. Simonetti, L. Giraudet, S. D. Baranovskii, F. Gebhard, K. Meerholz, S. Jung, G. Horowitz and Y. Bonnassieux, Phys. Rev. Appl., 2021, 15, 024021 CrossRef CAS.
  19. C.-H. Kim, Y. Bonnassieux and G. Horowitz, IEEE Trans. Electron Devices, 2013, 61, 278–287 Search PubMed.
  20. M. Estrada, A. Cerdeira, J. Puigdollers, L. Resendiz, J. Pallares, L. Marsal, C. Voz and B. Iñiguez, Solid-State Electron., 2005, 49, 1009–1016 CrossRef CAS.
  21. O. Marinov, M. J. Deen, U. Zschieschang and H. Klauk, IEEE Trans. Electron Devices, 2009, 56, 2952–2961 CAS.
  22. B. Iñiguez, R. Picos, D. Veksler, A. Koudymov, M. S. Shur, T. Ytterdal and W. Jackson, Solid-State Electron., 2008, 52, 400–405 CrossRef.
  23. S. Jung, J. W. Jin, V. Mosser, Y. Bonnassieux and G. Horowitz, IEEE Trans. Electron Devices, 2019, 66, 4894–4900 CAS.
  24. J. Park, Y. Lee, G. Horowitz, S. Jung and Y. Bonnassieux, J. Mater. Chem. C, 2023, 11, 13579–13585 RSC.
  25. M. Estrada, A. Cerdeira, I. Mejia, M. Avila, R. Picos, L. Marsal, J. Pallares and B. Iñiguez, Microelectron. Eng., 2010, 87, 2565–2570 CrossRef CAS.
  26. H.-S. Shin, H. Yoo and C.-H. Kim, IEEE Trans. Electron Devices, 2022, 69, 5149–5154 CAS.
  27. S. Kim, T. Park, H. J. Yun and H. Yoo, Adv. Mater. Technol., 2022, 7, 2201028 CrossRef CAS.
  28. D. Venkateshvaran, M. Nikolka, A. Sadhanala, V. Lemaur, M. Zelazny, M. Kepa, M. Hurhangee, A. J. Kronemeijer, V. Pecunia, I. Nasrallah, I. Romanov, K. Broch, I. McCulloch, D. Emin, Y. Olivier, J. Cornil, D. Beljonne and H. Sirringhaus, Nature, 2014, 515, 384–388 CrossRef CAS.
  29. V. N. G. Raju, K. P. Lakshmi, V. M. Jain, A. Kalidindi and V. Padma, 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), 2020, pp. 729–735.
  30. J. Zhang, H. Guo and Z. Chen, J. Phys.: Conf. Ser., 2021, 1873, 012058 CrossRef.
  31. J. Shlens, A Tutorial on Principal Component Analysis, 2014, https://arxiv.org/abs/1404.1100.
  32. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel, Neural Comput., 1989, 1, 541–551 CrossRef.
  33. Y. Kim, Convolutional Neural Networks for Sentence Classification, 2014, https://arxiv.org/abs/1408.5882.
  34. O. Janssens, V. Slavkovikj, B. Vervisch, K. Stockman, M. Loccufier, S. Verstockt, R. Van de Walle and S. Van Hoecke, J. Sound Vib., 2016, 377, 331–345 CrossRef.
  35. S. Ioffe and C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015 Search PubMed.
  36. G. Hinton, N. Srivastava and K. Swersky, Lecture 6e: RMSprop: Divide the gradient by a running average of its recent magnitude, https://www.cs.toronto.edu/tijmen/csc321/slides/lecture_slides_lec6.pdf.
  37. D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, 2017 Search PubMed.
  38. S. Ruder, An overview of gradient descent optimization algorithms, 2017 Search PubMed.
  39. H. Habal, D. Tsonev and M. Schweikardt, Proceedings of the 2020 ACM/IEEE Workshop on Machine Learning for CAD, 2020, pp. 111–116.
  40. Z. Marinković, G. Crupi, D. M.-P. Schreurs, A. Caddemi and V. Marković, Microelectron. Eng., 2011, 88, 3158–3163 CrossRef.

Footnote

Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4tc01224k

This journal is © The Royal Society of Chemistry 2024
Click here to see how this site uses Cookies. View our privacy policy here.