Armi
Tiihonen
*a,
Kati
Miettunen
ab,
Janne
Halme
a,
Sakari
Lepikko
a,
Aapo
Poskela
a and
Peter D.
Lund
a
aNew Energy Technologies, Department of Applied Physics, Aalto University, P.O. Box 15100, 00076 Aalto, Finland. E-mail: armi.tiihonen@gmail.com
bBiobased Colloids and Materials, Department of Bioproducts and Biosystems, Aalto University, P.O. Box 16300, 00076 Aalto, Finland
First published on 5th February 2018
The success of perovskite and dye-sensitized solar cells will depend on their stability over the whole life-time. Aging tests are of utmost importance to identify deficiencies and to suggest cell improvements. Here we analyzed the quality of 261 recent aging tests and found serious shortcomings in current practices. For example, in about 50% of the studies only one sample was considered, meaning that the sample size was too small for statistical significance. We propose a new procedure for aging tests based on careful planning and scientific reporting. This includes estimating the required sample size for an aging test and avoiding so-called nuisance factors, i.e. unintended variations always present in real world testing. The improved procedure can provide more reliable information on stability and lifetime, which could contribute to better understanding of degradation mechanisms important for improving these photovoltaic technologies.
Broader contextPerovskite and dye-sensitized solar cells are promising third generation photovoltaic technologies. In less than a decade, the conversion efficiency of perovskite solar cells has increased almost ten fold, reaching up to 20 percent. Dye solar cells can be manufactured from a variety of materials and in different colors showing wide application areas with commercial potential. Though the lifetime of both PV technologies has constantly improved, they still suffer from major degradation problems, which hamper their market breakthrough. Research on cell stability is therefore of utmost importance, but the quality of related research has shown major shortcomings. Here we comprehensively analyzed the methods used in stability and aging tests, which revealed severe weaknesses in current practices, in particular insufficient reporting and inadequate sample sizes for statistical relevance. As these deficiencies may hamper the progress of perovskite and dye-sensitized solar cells, a new procedure for aging tests is proposed including detailed instructions on how to achieve high quality in such experiments. |
The number of aging studies is small compared to the number of studies focused on improving the already high efficiencies, as this simplistic comparison reveals: 18% of the articles related to DSCs or PSCs listed in the Web of Science mention stability in their topic (details in ESI,† Section S5), whereas efficiency is mentioned in 79% of the articles. Recently the need for more stability research for both DSCs and PSCs has been recognized.1,7 Additionally increased attention is directed at the standards of stability testing – researchers1,7,8 and publishers9 have called for better reporting and more uniform methods.
The main motivation in aging tests is (1) to determine the real lifetime of the cells, and (2) to compare the durability of different types of cells under certain stress factors. The first objective is challenging, because the desired lifetimes of the cells are often very long. In practice, the lifetime of the cells is investigated by accelerated aging tests. The challenge is in determining the accelerating factor of the aging test accurately enough, which is a topic still under research even for commercial silicon solar cells.10
The latter objective of comparing cell types is principally simple: if everybody performs the same test under the same conditions, the results should be comparable. Currently uniformity and repeatability are poor in the stability testing of both DSCs and PSCs, which is demonstrated in Section 2. This situation leads to seemingly contradictory research results – a recognized problem in the literature.1 This in turn hampers the progress of the whole field of research as important phenomena may remain unnoticed for too long because of a lack of information on the circumstances of the aging tests.
Similar problems have been found in other fields, which have been targeted by standardizing the aging tests. This approach has also been suggested at least for PSCs.7 Commercialized technologies – silicon and thin film solar cells – have standardized tests for estimating the durability of PV modules against prolonged exposure in climates that are specified in the standard (IEC 61215-1:2016). These tests are for initial durability testing, not for estimating the long-term stability or lifetime of the modules, although they are sometimes used as the basis for such estimations.11
IEC 61215-1:2016 tests are not adaptable for emerging PV technologies as such. To begin with, these third-generation technologies are too young to meet many of the established evaluations for outdoor testing; also entering the market will most likely happen in milder indoor conditions. For instance, a damp heat test designed for terrestrial thin-film solar cells at a temperature of 85 °C and 85% air humidity (IEC 61215-1:2016) would degrade most DSCs and PSCs quickly. Overly harsh tests leading to a rapid failure are unsuitable for research purposes: something more than a binary pass/fail resolution is needed to detect if there is progress in stability. Not to mention that many laboratories researching third-generation solar cells have insufficient equipment to perform the detailed and laborious tests that are designed for commercial large-scale manufacturers. Another example of standardization is the International Summit on Organic and Hybrid Photovoltaic Stability (ISOS) protocol of organic PVs12 that has been agreed by a wide international consortium of organic PV researchers. ISOS is designed for research purposes, and it is divided into three levels from the highest levels to very basic, encouraging more groups to include stability studies in their research. The ISOS protocol also serves as a practical starting point for designing stability tests for DSCs and PSCs.
Here we present the current state of stability research of perovskite and dye solar cells, investigated with a focus on the methods and practices of performing aging tests instead of the more commonly investigated findings from aging tests. Based on our literature survey, we present practical procedures for improving the effectiveness and quality of aging testing. Our recommended procedures can be applied to all standards of testing – the focus is to maximize accumulated knowledge and accelerate aging research, regardless of how extensive or humble your facilities may be.
Our results show that the state of aging testing of PSCs and DSCs is alarming. Efforts of the whole community are required for swift corrections. Thus, we propose a series of international summits for agreeing with the principles of stability testing of these cells. The improved methods could greatly enhance the progress in stability research in future.
In some studies, the aging tests have possibly been performed for more cells, but the data are presented only for one cell. Unfortunately the scientific audience has no means of recognizing if the study had more cells or not. Additionally presenting the data only for one cell unnecessarily dissipates the information about the repeatability of the results. Fig. 1 shows that the second most common option for group sizes is to refer to the samples in plural, but then the study neglects to report the exact group size. For the scientific audience, including the number of cells is more informative, as not doing so unnecessarily complicates the interpretation of the results.
Small groups of, e.g., less than five cells are also typically insufficient for statistically significant conclusions, although they result in more reliable conclusions than comparison of single cells. Small cell groups can be used for acquiring tentative data about the differences of the cell types, but the quantitative data are unreliable because the information about the variations of the results is inadequate. As a result, the impact of the resulting article decreases. Therefore it is worthwhile to target statistically acceptable group sizes (see Section 3.2 for determining sufficient group sizes). In only 10% of the investigated tests, ten or more cells have been prepared for each cell group (Fig. 1), demonstrating that increasing group sizes to statistically acceptable levels is possible.
38% of the investigated tests are performed for encapsulated or sealed cells, 61% for open devices. Most open devices are PSCs, probably because many DSC types contain liquid electrolytes that would soon leak out or evaporate from a cell left without proper sealing.
More than 60% of the investigated aging tests are performed at open circuit voltage, i.e., operation regime that corresponds to the storage of the cells, and those are mainly done in dark conditions (Fig. 2a). Reverse bias and short circuit conditions are applied on the cells rarely, possibly because realizing these conditions in an aging test setup requires an effort (cf. open circuit). They are also seemingly atypical operation states of a cell. However reverse bias conditions exist in panels that are partly shadowed, for example. Also short circuit conditions might appear in a damaged cell or panel. A continuous current–voltage curve measurement (IV), applied as a condition in 8% of the aging tests, directly corresponds to none of the operational states of the cell. The benefit of the repeated IV test stress is the continuous variation of the electric state that happens in actual cell operation, although typically at a significantly slower pace in daily cycles. Only roughly one-eighth of the aging tests are performed under load, which is the main operating state of the cell (Fig. 2a). The different operating conditions of the cell should be represented in stability research for the sake of completeness. Specifically, operation under load should be utilized more in aging tests because the stability at open circuit or other electric states does not necessarily correlate with stability in real-life use.13,14
Fig. 2 (a) The electric condition of the cells in the investigated aging tests. The cells are aged at open circuit (Voc) either under illumination or in the dark, under load, under reverse bias, by cycling IV repeatedly (IV), at short circuit (Isc), or the electric state remains unknown (unknown). Only a minority of aging tests are performed under operational conditions (i.e. under load). (b) The investigated aging tests divided into dark tests, tests illuminated with visible and/or ultraviolet light, and tests that did not mention if the illumination contained ultraviolet and/or visible light. Cells are dominantly aged in the dark. See ESI,† Section S5 for more detailed information on the classification. |
The aging of the cells is also affected by the intensity of light. For the majority of illuminated aging tests, the visible light intensity is reported quantitatively as Fig. 3a indicates; typically tests are performed at 1 Sun intensity. The state of the reporting of visible illumination seems good. In contrast, only a minority of illuminated tests are performed with quantitatively stated UV intensity (Fig. 3b). A quarter of illuminated tests apply only visible light using LED lamps for example, and another quarter provide no information about the spectrum of the light. Commonly the presence of UV light is deduced from the reported lamp type or solar simulator model, but the intensity of UV light remains unknown (“some UV” in Fig. 3b, more details in ESI,† Section S5).
It seems that generally having a commercial solar simulator with a high-quality calibration cell is regarded as sufficient to describe the aging spectrum in detail. However many standards for the solar simulator spectral accuracy specify only the spectrum above 400 nm (e.g. SFS-EN 60904-9, JIS C 8912, and ASTM E927 for AM 1.5G). Thus commercial solar simulators are not necessarily accurate in the UV part of the spectrum, in fact they could actually emit no UV at all, even if they would be very accurate in the visible part of the spectrum.
Currently both UV and visible light intensities are reported numerically only in 31% of the investigated illuminated indoor tests. Reporting of the intensity separately for the visible and UV parts of the spectrum would be valuable because UV illumination strongly degrades some cell types.
Fig. 4a and b illustrate the temperature and humidity applied in dark and illuminated aging tests. For most aging tests, the humidity and/or temperature are reported only qualitatively, such as being “ambient”, or are not reported at all. Illuminated aging tests are commonly performed in a dry atmosphere and at low temperature. Dark storage tests are performed in a narrower temperature range than illuminated tests, but the humidity range is wider. The scarcity of published illuminated aging tests applying higher than 60 °C temperature or more than 10% humidity suggests that these conditions remain a severe stress factor for DSCs and PSCs.
Fig. 4 The aging tests performed (a) in the dark and (b) under illumination classified based on the temperature and humidity level during the test. The tests that do not specify the value (“unknown”), define the value as ambient (“ambient”), or declare only the upper or lower boundary value (“halfbounded”) are also listed. Qualitatively reported or unknown temperature and humidity are common in aging tests which complicates the comparison of test results. See ESI,† Section S5 for more information on the classification details. |
This trend of missing reporting of environmental conditions during the aging tests spreads across all basic environmental variables. Numerical values are stated for visible and UV light intensities, humidity, and temperature in only one-third of aging tests. The absence of reporting suggests a lack of monitoring of the environmental variables, a situation that is problematic because other environmental factors than the intended ones (e.g., humidity in a light soaking test) could greatly affect the test result. Even if the monitoring would have been appropriate, the environmental details missing from the article complicate the comparison of results with other tests.
In 89% of the tests, the performance of the cells has been monitored with measurements during the aging test, in addition to the beginning and end of the test. The regular monitoring of performance provides information about the progress of degradation mechanisms that could be, for example, linear or step-wise. Typically the measurements seem to be performed manually on a regular basis during the test. Manual measurements are laborious, so working hours would be saved if the commonly repeated measurements were automated.
Whereas the final efficiency distribution is rather similar for both DSCs and PSCs, the distribution of aging test durations shown in 12 and 13 varies. PSCs are typically aged for less than 250 hours, whereas DSCs are aged for more than 1000 hours. A shorter test is perhaps regarded as sufficient for PSCs that are at the early stages of development with shorter lifetime expectations, whereas for DSCs with a longer history, the 1000 hours test has become a standard.
Notably in comparison with dark tests, the majority of illuminated perovskite aging tests are very short (Fig. 5c) and end up with more scattered final efficiency for the best cell group (Fig. 5a). This suggests that illumination remains a significant stress factor for the PSCs, whereas dark tests are more easily endured (although simultaneous analysis of all the test conditions would be needed to confirm this hypothesis – which is out of the scope of this work). It is, however, clear that our understanding of the aging of the PSCs would increase if longer aging tests would be conducted more commonly, even if they would result in a lower final efficiency.
Thus, intrinsic and extrinsic stability are closely linked to the selection of the possible encapsulation for the cells. Encapsulation isolates the active components of the cell from the environment whereas open devices remain susceptible to the incoming impurities, the loss of active cell materials from the cell, and mechanical stress, such as bending or scratching of the cell. From the industrial viewpoint, encapsulated cells resemble cells in a well-sealed solar panel, whereas open devices bring information about the stability in case of a failed sealing. The perfectness of the encapsulation varies with the method16–20 and should be confirmed to be sufficient for the planned conditions.
Typically the aging test is performed with two or more cell groups in order to have a reference group to which the stability of the investigated group is compared, e.g., an illuminated aging test with a reference group stored in the dark as in ref. 21. The groups can be compared with a statistical test, a t-test for two cell groups, analysis of variance for multiple groups, or analysis of covariance for compensation of nuisance factors (see Section 3.3), for example. The sufficient group size can be estimated by calculating the statistical power of the statistical test that you plan to use (see a detailed example for t-test in ESI,† Section S4). Often long-term aging tests require larger group sizes than tests without aging. This is caused by nuisance factors (see Section 3.3) that increase variation in the final performance of the cells and accumulate during the whole aging test. E.g., light intensity variations during the aging test between the aged cells could act as a nuisance factor.
The reality of solar cell research is that the cell preparation often requires manual work. The space in aging test devices is limited, as well. This holds true especially for light soaking tests where the cells cannot be stacked, unlike in dark tests. These factors create pressure to decrease the group size from the optimum value. In such cases, one could preferably decrease the amount of cell groups. Additionally, if one cell preparation method is well-established with low variation, but the other results in more variation, more cells could be in the latter group.
In practice, some cells could fail during the cell preparation or the aging test because of factors unrelated to the research question of the aging study (e.g., sealing failures or a breakdown of the electrical contact). The likelihood of such cell failures should be taken into account by increasing the group sizes correspondingly.
Fig. 6 Illumination level is acting as a nuisance factor of the efficiency of dye solar cells after 700 hours of aging under illumination22 because the post-aging efficiencies form a line with negative slope with respect to the illumination intensity (measured separately for each cell). A reprinted derivative from ref. 22 under CC-BY license. |
In the primary strategy, minimization, the cell preparation procedure and the environmental conditions are ensured to remain constant between the cells and not to contain unexpected factors. A healthy dose of paranoia may be very helpful in this. If exercised, it does not significantly complicate or slow down the work.
For example, the order in which the cells are prepared can affect their performance: the components might adsorb contamination during the cell preparation, and thus the cells prepared first might have better stability than the later ones. Thus preparing and measuring different cell types in an alternating fashion is a very easy way to prevent false positive and negative results with practically zero extra work. Regarding the nuisance factors related to cell materials, different material batches might result in different stability, and even the cell assembly date might affect the results if ambient conditions (especially humidity) vary greatly from day to day. Therefore it is advisable to prepare all the cells during a short time period and when applicable from the same material batches. If that is not feasible, it would be worthwhile to have reference cells for each assembling session to verify equal quality.
Aspects related to illumination are likely nuisance factors. The spectrum could affect the results of the aging test greatly, in the case of many cell configurations.1,22,23 Therefore, one should be aware of the spectrum of the light (especially if it contains UV light) and the effects of possible filters on it. The spatial variations of illumination intensity across the aging platform could be tens of percent and still remain unnoticed by the human eye because of the eye's good intensity adaptivity. The most simple option to measure the spatial intensity distribution is to use a photodiode that is sensitive to the applied illumination spectrum. In our light soaking system, we record the light intensity for each cell separately, for example on a weekly basis, in addition to constant tracking of a few spots. Other significant environmental factors than intensity should also be followed during the aging test. Just stating “ambient” is not enough. To give an example, ESI,† Fig. S7 illustrates the indoor air humidity varying greatly in both the short-term (between days) and long-term (between seasons). These variations certainly affect the aging of moisture-sensitive unsealed devices.
Nuisance factors could remain significant even after they are minimized, their importance could be detected only after the aging test, or they might be impossible to control during the test. In these cases, the alternative strategy is to compensate for the most significant nuisance factor(s) after the aging test with regression analysis or analysis of covariance (ANCOVA), for example. ANCOVA is used for determining if the groups are different regardless of a covariate, that is, the nuisance factor (application example in ref. 22). The compensation is naturally practicable only for nuisance factors that are measurable.
The measurements should be selected and performed so that they do not add unnecessary nuisance factors to the aging test. Some tests, such as electrochemical impedance spectroscopy (EIS) or IV cycling, could affect the electrochemical state of the cells and consequently the results of the following measurements of the same sample. Thus, the measurement sequence should be kept unchanged during the aging test.
Sometimes, a measurement can even accelerate or activate the degradation of the cells. For example, measuring the IV curve of the cells with metal substrates far to the reverse load conditions can trigger corrosion. The corrosion reactions have a certain polarization after which corrosion occurs. Consequently, polarization can be used even as corrosion prevention but also to trigger it. Degradation reactions related to device polarization have also been observed in cells without metal electrodes.24 As another example, the cells could be so sensitive to UV light that the prolonged and repeated IV measurements under full spectrum illumination could trigger cell degradation even though they might otherwise pass the aging test. The IV curve of the cell could be measured before and after the whole measurement sequence to confirm that the cells remain stable during the measurements.
The measurements can be performed either only before and after the aging test, or repeatedly during the aging test. Measuring the cells only before and after the aging test saves working hours but gives limited information on the degradation phenomena. For ambitious aging studies, the frequent monitoring of all the cells and the environmental parameters, like in ref. 25, is recommended. Tracking the environmental parameters permits the compensation for nuisance factors when necessary, and monitoring the cells allows investigating nonlinear degradation phenomena. Additionally, laborious measurements can be timed optimally so that the aging signs are visible in the cells but they still operate well enough for the measurement. For example, electrochemical impedance spectroscopy could provide detailed information about the degrading components of the cell but the results become difficult to analyze if the cell is too degraded (see a demonstration in ESI,† Section S3).
The parameters could be tracked automatically or manually. Automatic measurements can save working hours, and in some cases, they increase the accuracy of the data collection. For example, the variations in humidity in ESI,† Fig. S7, partly connected to the different level of the air-conditioning outside office hours, could remain unnoticed without automated measurements.
Reporting outliers, that is cells dropped from the final analysis, increases the reliability of the study. Cells with major scratches or leaking electrolytes are typical justified outliers that could lead to false conclusions if they would be kept in the analysis. There are also statistical methods, such as Peirce's criterion,28 for objectively detecting outliers from the measurement data.
Using statistical methods in the analysis of results increases the weight of the article because they provide a collectively agreed basis for the conclusions. Statistical methods can be utilized if the assumptions of the selected statistical test are met and the cell groups are large enough. The assumptions could include having no outliers, equal variances in the groups, normally distributed data, or equal group sizes. The sufficient group size for statistical testing varies from test to test. For example, a t-test can technically be performed even for groups of two cells.
However the probability of getting false negative results increases if the groups are small. In practice, this means that no difference between the two groups is detected even if there is one, unless the difference is very notable. To give an example, if the group size is three cells and the difference between the distributions is twice the standard deviation of them, the probability of not detecting the difference between the two groups is more than 50%.29 Increasing the group size to five cells already decreases the probability to approximately 20%.29 Additionally, numerical simulations show that small groups combined with the violation of the assumptions of the t-test lead to increased probability of both false positive and negative results.29
Last but not least, the significance of the results should be expressed whether or not statistical methods are applied in the analysis. In principle, even the smallest differences between two distributions could be detected and be statistically significant if enough samples are investigated. But a remarkable difference is not only statistically significant but also practically noteworthy. Since the definition of noteworthiness varies from person to person, every research team should ask themselves if the acquired differences are large enough to matter. For example, would a statistically significant 5% higher mean short circuit current density in a test group compared to a reference group be a noteworthy result, or should the results be regarded practically equal? How about 5% higher mean open circuit voltage?
We could as a community be bolder in aging testing since the typical applied environmental conditions are still limited to storage conditions, for example dark tests under ambient temperature and moisture, and the open circuit state of the cells. More tests under load and illumination are needed for mapping the stability and degradation of the cells under all the conditions in which the cells actually are used. One issue that is often left unspoken is nuisance factors creating unintended variance in the test results. The best defense against nuisance factors is to overcome them by elimination but is all hope lost if the nuisance factors persist anyway? Comforting news is that their effect can be analyzed with statistical methods, in the best case giving additional insight into degradation (e.g., degradation as a function of light intensity) – provided that the research team is proactive in monitoring potential nuisance factors.
The state of aging testing of perovskite and dye solar cells investigated thoroughly in this work requires swift actions for improvements. The whole community should collaborate in the process. Thus, we suggest a series of international summits for determining the definitive standards of stability testing of these cells. This move could greatly enhance the progress in stability research in future.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c7ee02670f |
This journal is © The Royal Society of Chemistry 2018 |