Nita
Samantaray
*a,
Arjun
Singh
*a and
Anu
Tonk
b
aDepartment of Applied Sciences, The Northcap University, Gurugram, India. E-mail: nita20asd006@ncuindia.edu; arjunsingh@ncuindia.edu
bDepartment of Multidisciplinary Engineering, The Northcap University, Gurugram, India
First published on 5th October 2024
Perovskite solar cells (PSCs) have gained attention for their characteristics of high efficiency and commercial viability. However, the efficiency of a PSC depends on various factors. One such important parameter is the bandgap of the active layer as it plays an important role in PSCs with regards to the amount of light absorption. Thus, it influences the overall performance of the solar cell. It is important to predict the bandgap of the active layer in PSCs to achieve an effective fabrication process. In this study, we compared six machine learning (ML) models to predict the bandgap. The models were created using a dataset of 500 devices, such as MAPbI3, FAPbI3, CsSnI3 and CsMAPbI3, obtained from The Perovskite Database Project. These models were further validated using a different dataset of 50 devices. The models were created using ML methods: random forest, gradient boosting regressor, k-nearest neighbours (KNN), AdaBoost, Gaussian process regressor, and bagging. The feature parameters considered for the models were the A coefficient, B coefficient, and C coefficient, out of various other parameters such as the perovskite dimension, perovskite thickness, perovskite deposition temperature, and perovskite deposition time. The random forest model showed better results compared to other models with a low mean absolute error (MAE) of 0.000775, low mean squared error (MSE) of 0.00000920, and high coefficient of determination (r2) of 0.9994.
Sustainability spotlightOur research aims to enhance the efficiency of perovskite solar cells (PSCs) by accurately predicting the bandgap of the active layer—a critical factor in light absorption and overall performance. We evaluated six machine learning models, using a dataset of 500 devices, to predict the bandgap. Among these, the random forest model demonstrated superior performance with a low mean absolute error (MAE) of 0.000775, a low mean squared error (MSE) of 0.00000920, and a high coefficient of determination (r2) of 0.9994. Our work supports the UN Sustainable Development Goals: affordable and clean energy (SDG 7); industry, innovation, and infrastructure (SDG 9); and climate action (SDG 13). |
To improve the efficiency of PSCs, extensive research into understanding and optimizing the properties of perovskite materials has been carried out. The bandgap of the active layer is a key parameter that directly impacts the absorption spectrum and photon-to-electron conversion efficiency of PSCs.4 Consequently, predicting and optimizing the bandgap is essential for achieving high-performance PSCs. Traditional methods for predicting the bandgap rely on complex theoretical models and experimental techniques, which can be time-consuming and resource-intensive.5 In recent years, ML techniques have emerged as powerful tools for predicting material properties with high accuracy and efficiency.6
In 2024, Miah et al. emphasized the critical role of bandgap tuning in enhancing both the performance and stability of PSCs, offering insights into mechanisms that optimize efficiency while addressing degradation factors. Perovskite materials exhibit excellent optoelectronic properties and efficiency in solar cells, but their low stability hinders commercialization.7
In 2024, Ghosh et al. published a study that focused on predicting the bandgaps of nitride perovskites using four machine learning (ML) models: multi-layer perceptron (MLP), gradient boosted decision tree (GBDT), support vector regression (SVR), and random forest regression (RFR). The models were trained on 1563 nitride perovskites with bandgaps between 1.0 and 3.1 eV.8
Sadhu et al. recently suggested that ML models accurately forecast PSC parameters, with key features such as the grain size, band gap, and electron/hole mobility driving performance optimization for commercialization. The study focused on the analysis and prediction of the performance of PSCs using machine learning techniques.9
In our study, we investigated the accuracy of various ML models in predicting the bandgap of perovskite materials used in PSCs. We utilized a dataset from The Perovskite Database3 comprising information on 500 devices, including different perovskite compositions such as MAPbI3, FAPbI3, CsSnI3, and CsMAPbI3. The dataset contains a range of feature parameters relevant to the fabrication process, including perovskite dimensions, coefficients, thickness, deposition temperature, and deposition time. Our primary objective is to compare the performance of six different ML models in predicting the bandgap of the perovskite material, and identify the most accurate and reliable predictive model.
To achieve our objective, we employed six ML methods: random forest, gradient boosting regressor, k-nearest neighbors (KNN), AdaBoost, Gaussian process regressor, and bagging. These models are trained on a dataset of 500 devices, and subsequently validated using a separate dataset consisting of 50 devices. The performance of each model is evaluated based on key metrics, such as the mean absolute error (MAE), mean squared error (MSE), and coefficient of determination (r2). Additionally, we analysed the feature importance of the selected parameters to gain insights into their influence on bandgap prediction.
Our results demonstrate that the random forest model outperforms the other ML models in predicting the bandgap of perovskite materials for PSCs, exhibiting low MAE and MSE values, and a high coefficient of determination (r2). This highlights the accuracy of the random forest model in capturing the complex relationships between the input parameters and the bandgap of the active layer. Furthermore, our analysis sheds light on the importance of specific feature parameters in determining the bandgap, providing valuable guidance for optimizing the fabrication process of PSCs. Overall, this study contributes to advancing the understanding and predictive capabilities of ML techniques in the field of perovskite solar cells, paving the way for enhanced device performance and widespread adoption of this promising renewable energy technology.
The random forest regression model in Fig. 1 accurately predicts the bandgap, enabling researchers to systematically explore and fine-tune perovskite compositions, dimensions, and deposition parameters.
Heatmap analysis was used to identify the feature parameters in predicting the bandgap of perovskite materials for perovskite solar cells (PSCs). Heatmaps provide a visual representation of the correlation between each input feature parameter and the target variable (bandgap).15 The original dataset for this study includes essential parameters, such as the power conversion efficiency (PCE), open-circuit voltage, short-circuit current, and fill factor. However, since the bandgap pertains specifically to the active layer, i.e., the perovskite material, rather than the entire perovskite solar cell, we focused on parameters directly associated with the active layer.16 These parameters include the perovskite dimension, perovskite thickness, perovskite deposition temperature, perovskite deposition time, and the A, B, and C coefficients.
Further, by analysing the heatmap as mentioned in Fig. 2, we were able to identify the feature parameters with the strongest correlations to the bandgap, indicating their importance in the predictive model. Hence, we identified that the A coefficient, B coefficient, and C coefficient have higher and positive heatmap coefficients, i.e., 0.70, 0.83, 0.63, respectively, and are strongly correlated with the bandgap compared to the other parameters. Hence, we considered the A coefficient, B coefficient, and C coefficient for training the models and performing further analysis.
Visualizing the feature importance through a heatmap provided significant clarity on how each parameter influenced the bandgap prediction. The heatmap allowed for an intuitive understanding of the relative significance of each feature by displaying their importance in a clear, color-coded matrix. This method made it easier to identify the most impactful factors and how they correlated with the predicted bandgap.
The heatmap provided a visual and analytical tool that enhanced the interpretability of feature importance, aiding researchers in focusing on the most important parameters for improving bandgap prediction and the overall performance of PSCs.
The performance and accuracy of the models were then verified using the following metrics.
(1) |
(2) |
(3) |
These metrics offer quantitative assessments of a model's generalization capacity, providing insights into its correlation with the data, accuracy in predicting target outputs, and overall error magnitude. By analysing these metrics, researchers and practitioners can evaluate and compare the efficiency of different trained models.
Sl. No. | ML model | Mean absolute error (MAE) | Mean squared error (MSE) | Coefficient of determination (r2) |
---|---|---|---|---|
1. | Random forest | 0.000775 | 0.00000920 | 0.9994 |
2. | Gradient boosting regressor | 0.0222 | 0.0041 | 0.8841 |
3. | k-Nearest neighbors (KNN) | 0.0365 | 0.0143 | 0.5914 |
4. | AdaBoost | 0.0283 | 0.0029 | 0.9163 |
5. | Gaussian process regressor | 0.0351 | 0.0082 | 0.7662 |
6. | Bagging | 0.0448 | 0.0162 | 0.6673 |
In this analysis of machine learning models for predicting the bandgap of perovskite materials in PSCs, we visualized the performance metrics of each model using a line graph with markers. The above table offers a clear and concise comparison of the mean absolute error (MAE), mean squared error (MSE), and coefficient of determination (r2) across different machine learning models. Each line in the graph corresponds to a specific performance metric, while the markers indicate the values associated with individual machine learning models. It reveals distinct patterns in model performance, with random forest demonstrating the lowest MAE and MSE values, as well as the highest r2 coefficient among all models evaluated.
Table 2 enhances the comprehensibility of our findings, enabling researchers and practitioners in the field to easily discern the relative performance of different machine learning approaches for predicting the bandgap of perovskite materials in PSCs.
As future scope, there are several aspects for research in this domain that can be explored. Firstly, research can be directed towards optimizing the feature parameters used in the ML models to further enhance prediction accuracy. Additionally, studies can explore more efficient ML algorithms and ensemble techniques that can even yield better performance in bandgap prediction.17 A few advanced machine learning models (such as Neural Networks) or Transformer models (such as PolyNC or polyBERT) can be further taken into consideration for this work.18–20 Furthermore, exploring the same model in different datasets would provide a more comprehensive understanding of the factors influencing the bandgap variation in PSCs. Overall, continued research in this field holds the promise of advancing the development of efficient and commercially viable perovskite solar cells.
This model also can be integrated into the material design workflow. For instance, when new compositions of perovskite materials (e.g., mixed halides or organic-inorganic hybrids) are proposed, the model can quickly estimate the bandgap without needing extensive experimental trials. This could significantly reduce the time and cost associated with experimental material characterization, allowing for more efficient screening of potential parameters for high-efficiency solar cells.
In addition, the model could be used to guide the fabrication process by offering real-time predictions of bandgap during deposition stages, helping engineers maintain optimal conditions. This would not only improve device performance, but also ensure reproducibility across different manufacturing batches.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4su00370e |
This journal is © The Royal Society of Chemistry 2024 |