Saliha Sahera,
Tu C. Le
*b,
Tamar L. Greaves
c,
Jennifer M. Pringle
d,
Douglas R. MacFarlane
a and
Karolina Matuszek
*a
aSchool of Chemistry, Monash University, Clayton, VIC 3800, Australia. E-mail: karolina.matuszek@monash.edu
bSchool of Engineering, STEM College, RMIT University, GPO Box 2476, Melbourne, Victoria 3001, Australia. E-mail: tu.le@rmit.edu.au
cSchool of Science, STEM College, RMIT University, 124 La Trobe Street, Melbourne, VIC 3000, Australia
dInstitute of Frontier Materials, Deakin University, 221 Burwood, VIC 3125, Australia
First published on 24th July 2025
Protic organic salts have great potential to be used as phase change materials for thermal energy storage. However, tuning their melting temperatures and maximising their energy storage density (enthalpy of fusion) is a great challenge. The structures of the cation and anion play a crucial role in determining the thermal properties of protic organic salts. In this study, linear and non-linear machine learning models are used to predict the melting temperature (Tm) and enthalpy of fusion (ΔHf) of 182 possible protic salts using thermal properties (Tm and ΔHf) of 69 protic salts for training models. An additional feature of this study was the investigation of the prediction accuracy of models for salts with solid–solid phase transitions. It was found that the presence of solid–solid transition/s greatly impacted the ΔHf predictions. The best linear models for ΔHf were obtained for salts having no solid–solid transitions (R2 of 0.82, standard error of estimation (SEE) of 4 kJ mol−1). Tm predictions remained unaffected by the presence of solid–solid transitions. The best linear model for Tm prediction achieved R2 of 0.63, and SEE of 28 °C. The non-linear models showed marginally lower performance compared to linear models. Experimental cross-validation demonstrated the acceptable predictive ability of linear models for both Tm and ΔHf. This study opens new avenues for exploring the molecular origins of PCM properties and advancing the development of efficient energy storage materials.
Organic salts have recently emerged as potential intermediate temperature PCMs offering high thermal stability, low vapor pressure, and tuneable thermophysical properties.9–11 Since, the structure of the constituent ions in organic salts drives their properties, an appropriate combination of cations and anions can yield an efficient high energy storage density PCM. However, a nearly infinite number of possible cation–anion combinations and a poor understanding of the underlying structure–property relationships pose a huge challenge to designing organic salt-based PCMs with the desired thermal properties. The identification of optimal PCMs through the synthesis and testing of a vast number of possible salts is impractical, highlighting the need for advanced techniques to design efficient materials. Machine learning (ML) is an effective tool that has been successfully used to predict various attributes of ionic liquids including density,12–14 viscosity,14–17 surface tension,14,18 melting temperature,19–22 thermal conductivity,17 refractive index,15 heat capacity,17,23 and toxicity.24,25 However, ML remains largely underutilized in predicting the most important properties of PCMs i.e. Tm and ΔHf. In 2009, Zhu et al.26 first reported a quantitative structure–property relationship (QSPR) model to predict ΔHf of imidazolium and quaternary ammonium based ionic liquids using six descriptors (the energy of lowest unoccupied molecular orbital, dipole moment, surface area, volume, shortest H-bond distance and cation–anion interaction energy). A good correlation (R2 = 0.90 and standard deviation = 4.79 kJ mol−1) was found between calculated and experimental values of ΔHf.26 Bai et al.27 developed QSPR models for the prediction of ΔHf for 40 ionic liquids (which consisted of four subsets; a mix of all 40 different ILs, 22 imidazolium cation based ILs, 10 halide anion containing ILs, and 9 containing imidazolium halide ILs). The models used different numbers of descriptors which were selected by stepwise addition of effective quantum chemical descriptors and removal of ineffective ones. The predicted and experimental ΔHf for the studied ionic liquids by Bai et al.27 showed good correlation (R2 between 0.93 to 0.97) for all four models, while the imidazolium halide model showed good predictability and validity.27 These promising reports suggest that ML methods can be applied for prediction of melting temperature and enthalpy of fusion of organic salt-based PCMs.
Beyond the realm of ionic liquids, machine learning in thermal energy storage is a rapidly expanding field, exploring different dimensions including prediction of eutectic and composite PCM compositions,28,29 thermal performance prediction, optimisation of PCM-based thermal energy storage systems,30–34 and thermophysical property prediction of PCMs.35–38 A neural network and a linear model have been developed to predict the ΔHf and particle size of micro-encapsulated paraffin wax to understand the influence of synthesis variables (e.g. paraffin wax/styrene mass ratio etc).39 Wang et al.28 constructed a back propagation artificial neural network (ANN) model to predict the composition and melting temperature of a eutectic mixture of KCl–NaF with high accuracy. Wang et al.40 also combined particle swarm optimization and a backpropagation neural network for successful prediction of the ΔHf of binary and ternary eutectic mixtures of inorganic molten salts, with R2 values of 0.93 and 0.94 respectively. Kottala et al.41 developed an ANN model to predict differential scanning calorimetry outcomes i.e. heat flow and temperature of composite PCM (LiNO3 + NaCl) with various mass fractions of expanded graphite. The comparison between predicted and measured values showed a high performance of the model with an R2 of 0.98. Pan et al.42,43 developed ML models to predict the thermophysical properties (density, expansion coefficient, heat capacity, diffusion coefficient, thermal conductivity and viscosity) of molten ZnCl2 and ZnCl2–NaCl–KCl ternary eutectic mixtures, showing a good agreement with experimental values.
Considering the reasonable success of various ML methods44–47 for various ionic liquid property prediction, along with diverse applications of ML in thermal energy storage, we decided to employ the technique to predict two key thermal properties i.e., ΔHf and Tm of protic organic salt PCMs. These predictive ML models will reduce the time and resources needed to explore efficient organic salt-based PCMs and will be highly beneficial in understanding the structure–property relationships at play in organic salt PCMs.
![]() | ||
Fig. 1 Chemical structures, names and associated abbreviations of cations and anions used in the training dataset to construct the different machine learning models. |
The combination of fourteen cations and thirteen anions yields 182 possible salts. Since synthesising all 182 salts is not practical, machine learning methods are employed in this study to predict two key thermal properties, melting temperature and enthalpy of fusion, identify important structural features, and efficiently explore the chemical space. This approach significantly reduces the consumption of time and resources. It should be noted that to avoid any variation in the experimental values arising from the use of different differential scanning calorimetry methods (i.e. different heating rates and different instruments), data was taken initially only for protic organic salts that have been investigated as PCMs by our group. Additionally, 17 protic salts were synthesised to enrich the input data so that robust and well-representative models could be developed. After data collection, all ΔHf were converted into standard units of kJ mol−1 instead of J g−1 as the former unit can better relate to molecular level structure–property relationships. The melting temperatures of the salts varied from 72 °C to 227 °C and enthalpies of fusion varied from 3.4 kJ mol−1 to 36 kJ mol−1. The thermal properties of all 69 salts including 52 from the literature (Table S1, ESI†) and 17 newly synthesised salts (Table S2, ESI†) used in the training set are given in ESI.†
Output | Data points | Model | Effective weights | R2 | SEE |
---|---|---|---|---|---|
Tm | 69 | MLREM_TM69 | 20 | 0.63 | 28 |
Tm | 69 | BRANNLP_TM69 | 13 | 0.51 | 25 |
ΔHf | 69 | MLREM_HF69 | 18 | 0.65 | 5 |
ΔHf | 69 | BRANNLP_HF69 | 8 | 0.43 | 5 |
Tm | 42 | MLREM_TM42 | 19 | 0.62 | 29 |
Tm | 42 | BRANNLP_TM42 | 12 | 0.52 | 22 |
ΔHf | 42 | MLREM_HF42 | 15 | 0.82 | 4 |
ΔHf | 42 | BRANNLP_HF42 | 11 | 0.70 | 3 |
The correlation coefficient/coefficient of determination (R2) measures how well the model explains the variability of observed data around the mean (see eqn (1)). An R2 score close to 1 indicates a good model. However, this can be misleading, and therefore it is often used in conjunction with the standard error of estimation (SEE), which quantifies the average deviation of predicted values from measured values (eqn (2)), expressed in the same units as the target variable.62 In this context, SEE provides an absolute measure of error, making it a meaningful indicator of performance in this study. A lower SEE indicates better model performance.
![]() | (1) |
![]() | (2) |
Interestingly, the predictability of Tm by MLREM and BRANNLP was similar for both datasets. For the dataset-a, MLREM had an R2 of 0.63 with a SEE of 28 °C. While the BRANNLP model showed a lower R2 of 0.51 but lower SEE of 25 °C. For dataset-b, the values of R2 remained mostly unchanged for both MLREM and BRANNLP models, however, there was a slight improvement in the SEE of the BRANNLP model. This can be seen in the parity plots in Fig. 3(e and h). Considering the small sample size; the R2 value is reasonably acceptable for melting temperature prediction models, and previously R2 values of 0.54–0.90 and SEE values between 25 °C and 45 °C have been reported.19,63–66
Overall, the presence of a solid–solid phase transition appears to have a strong influence on the accuracy of predicting the ΔHf, while the Tm models largely remain unaffected. Therefore, future predictions for melting enthalpies in systems with such transitions should carefully consider the presence of solid–solid phase transitions. The linear models showed similar performance to the BRANNLP models for both Tm and ΔHf, indicating that a simple linear relationship between descriptors and melting temperature or enthalpy of fusion can capture complex and hard-to-predict thermal properties like ΔHf and Tm. The comparative lack of additional success of the BRANNLP model could be due to several factors including the limited size and wide diversity of the training dataset.
![]() | ||
Fig. 4 Scaled regression coefficients of descriptors used in the MLREM models for enthalpy of fusion (a) and (b) and melting temperature (c) and (d). |
Although BRANNLP models do not provide explicit weights for individual descriptors, as their internal representations are distributed across multiple layers and neurons, they are still effective in selecting the most relevant descriptors.67 This selective activation allows the models to capture complex, nonlinear structure–property relationships that may not be easily detected by linear models. As a result, BRANNLP models can support the screening and prioritisation of promising material candidates, even without directly reporting descriptor importance values.
To enhance interpretability, BRANNLP models were used in combination with MLREM models. While BRANNLP offers superior predictive accuracy for property estimation, MLREM provides insight into the relative influence of individual descriptors. This complementary approach allows both accurate prediction and a better understanding of the underlying factors governing the material properties.
A detailed discussion on descriptors used in each model is provided in the following section, while a summarised list of descriptors used in all models is also provided in ESI† Tables S3 and S4.
Descriptors | Description | |
---|---|---|
AnionMs | Averaged electronic environments in molecules | MLREM_HF69 |
AnionH-047 | Count of hydrogen–carbon bonds | |
AnionUi | Presence of unsaturation in structure | |
CationC-025 | Count of R–CR–R structural units | |
CationC-026 | Number of halogenated alkyl fragments | |
CationC-033 | Number of R–CH⋯X fragments | |
CationC-042 | Number of X–CH⋯X fragments | |
CationH-046 | Number of H on sp3 carbon without adjacent X | |
CationH-048 | Number of H on primary, secondary and tertiary carbons | |
CationJhetZ | Connectivity index with Z-weighted distances | |
CationS0K | Molecular structure symmetry | |
CationT(N⋯O) | Nitrogen–oxygen separation distances | |
CationX3v | Direct count of three-bond connectivity | |
AnionC-001 | Number of CH3R/CH4 fragments | Common |
AnionO-057 | Number of aromatic hydroxyl groups | |
CationH-051 | Number of H attached to alpha-C | |
CationJhetm | Measure of connectivity through mass weighted distance | |
AnionnDB | Number of double bonds | MLREM_HF42 |
AnionnSO2OH | Number of sulfonic acids | |
AnionnX | Number of halogen atoms | |
AnionZM2V | Describes molecular structure by counting atom connections | |
AnionH-050 | Count of hydrogen connected to heteroatom | |
AnionT(O⋯F) | Sum of distance between O and F atoms | |
CationMAXDN | Measure of maximum negative structural changes | |
CationX3Av | Average connectivity across three bonds | |
CationX4Av | Average connectivity up to four bonds | |
CationX5A | Measures average connectivity up to five bonds |
On the other hand, CationX3v was the strongest negative contributor to enthalpy along with the mean topological state of the anion (AnionMs), the number of saturated fragments (more specifically methyl containing fragments (CH3R)), the degree of substitution or saturation around the alpha carbon in the cation (CationH-051) and the number of haloalkyl fragments in the cation. In the other linear model MLREM_HF42 for enthalpy of fusion, 14 features were found to be effective, among which six were cationic. It seems that this model is based on a more balanced number of cationic and anionic features. The model captured the high order connectivity in the cation (CationX4Av and CationX5Av) as a positive contributor to the enthalpy of fusion other features, like the degree of unsaturation in the anion, and the presence of carbonyl/enol/phenolic hydroxyl group indicating potential H-bonding, were also listed as important features for high enthalpy of fusion. On the other hand, the presence of H attached to a heteroatom (AnionH-050), the presence of terminal methyl groups (AnionC-001) and the number of halogen atoms in the anion were found to have negative correlation.
Overall, MLREM models for enthalpy of fusion used different descriptors, except for four common ones (grouped in Table 2), each exhibiting the same type of correlation with ΔHf across both models. The presence of aromatic hydroxyl groups in the anion (Anion-O57) and compact structure of the cations (CationJhetm) showed positive correlations, while terminal methyl groups (AnionC-001) and branching at the alpha carbon in the cation (CationH-051) displayed negative correlations.
Both BRANNLP models used different descriptors, with no overlap between them. For BRANNLP_HF69 model (dataset-a) seven descriptors were used as input, including four cationic and three anionic descriptors as shown in Table 3. The extent of electronegativity of the cation (CationMe), the connectivity pattern e.g. linear or branched (CationJ), the number of pyridines in the cation (CationnPyridines), the number of hydrogen atoms attached to the specific type of carbon atoms (sp3, sp2 and sp) in the cation (CationH-049), the number of unsubstituted benzenes (AnionnCbH) and the presence of carbonyl group (AnionO-058) in the anion were all found to influence the ΔHf.
Descriptors | Description | |
---|---|---|
AnionnCbH | Number of free benzene rings | BRANNLP_HF69 |
AnionO-058 | Number of double bonded oxygen atoms | |
AnionX1A | Measure of average connectivity | |
CationH-049 | Number of H on crowded carbon centres | |
CationJ | Measure of connectivity through atomic distances | |
CationMe | Reflects atomic electronegativity, carbon-based scale | |
CationnPyridines | Number of pyridines | |
AnionMs | Averaged electronic environments in molecules | BRANNLP_HF42 |
AnionnDB | Number of double bonds | |
AnionnSO2OH | Number of sulfonic acids | |
AnionnX | Number of halogen atoms | |
AnionZM2V | Describes molecular structure by counting atom connections | |
AnionC-040 | Halide containing functional groups | |
AnionX3Av | Average of atom connectivity with three bonds | |
CationBLI | Quantifies benzene-like structure in molecules. | |
CationnArOH | Number of aromatic hydroxyls |
The BRANNLP_HF42 model (dataset-b) used nine descriptors including only two cationic and seven anionic descriptors (Table 3). The cationic features were mainly linked to the benzene-like aromaticity (CationBLI) and the presence of aromatic hydroxyl groups (CationnArOH). The anionic features included the amount of unsaturation (AnionnDB), the number of halogen atoms (AnionnX), the number of sulfonic groups (AnionnSO2OH), the complexity and branching (AnionZM2V), the local connectivity of atoms, focusing on their first, second, and third neighbours (AnionX3Av) and the presence of a functional group containing carbon and a heteroatom like CF3COO (AnionC-040).
Descriptors | Description of descriptors | |
---|---|---|
AnionMs | Mean electrotopological state | MLREM_TM69 |
AnionC-025 | Number of R–CR–R fragments | |
CationLop | Lopping centric index | |
AnionMAXDN | Maximal electrotopological negative variation | |
AnionMp | Mean atomic polarizability | |
AnionnDB | Number of double bonds | |
AnionS3K | 3-path Kier alpha-modified shape index | |
CationC-042 | Number of X–CH–X fragments | |
CationJhetp | Balaban-type index from polarizability weighted distance matrix | |
CationX3v | Valence connectivity index of order 3 | |
AnionH-050 | H attached to heteroatom (−) | Common |
AnionZM2V | Second Zagreb index by valence vertex degrees (−)(+) | |
CationJ | Balaban distance connectivity index | |
CationC-029 | Number of R–CX–X fragments | |
CationGMTIV | Gutman molecular topological index by valence vertex degrees | |
CationH-051 | Number of H attached to alpha-C (−) | |
CationN-069 | Number of Ar–NH2/X–NH2 fragments | |
CationnArNH2 | Number of primary amines (aromatic) | |
AnionF-085 | F attached to C2(sp2)-C4(sp2)/C1(sp)/C4(sp3)X | MLREM_TM42 |
AnionHy | Hydrophilic factor | |
AnionX3Av | Average valence connectivity index of order 3 | |
CationC-026 | Number of R–CX–R fragments | |
CationJhetm | Balaban-type index from mass weighted distance matrix | |
CationnArOH | Number of aromatic hydroxyls | |
CationnO | Number of oxygen atoms | |
CationO-057 | Number of phenol/enol/carboxyl OH | |
CationS0K | Kier symmetry index | |
CationT(N⋯O) | Sum of topological distances between N⋯O |
The MLREM_TM69 model had 13 descriptors showing positive contributions and five descriptors with negative correlation. The strongest positive correlation of Tm was with the connectivity of atoms in the cation over a range of three bonds (CationX3V), which may be linked to the potential inter-molecular interactions in the cation. The strongest negative correlation was the number of tertiary alkyl fragments in the anion (AnionC-025), which can potentially affect the packing efficiency of the organic salt and decrease the melting temperature.
The MLREM_TM42 model for Tm had 12 descriptors showing positive correlation and among them the strongest influence was observed for the degree of connectivity of each atom and the degree of branching in the cation (CationGMTIV). Six features showed negative correlation, and among them CationS0k (representing the symmetry of the cation) was the strongest negative contributor. This is contrary to the general understanding that symmetric molecules/ion will pack efficiently and will result in a high melting temperature. This may be the reason that the model did not have high accuracy in melting temperature prediction (R2 = 0.62, SEE = 29 °C). However, it should be noted that, in our first model, MLREM_TM69, CationS0k had a high positive correlation to the melting point.
In the two nonlinear models (BRANNLP_TM69 and BRANNLP_TM42) used for melting temperature prediction, largely different molecular descriptors were incorporated to capture structural features that influence the melting temperature except three common descriptors related to hydrogen atoms attached to heteroatom in anion (AnionH-050), polarity and rigidity in cation (CationC-040) and presence of polar groups e.g. –OH, –NH2, –SO3H (CationTPSA(Tot)). All descriptors used in BRANNLP models for Tm are listed in Table 5. In the BRANNLP_TM69 model only two descriptors, CationJ and AnionH-O50 (which indicate the connectivity in the cation and the number of hydrogen atoms attached to heteroatoms), were the same as those captured by the linear model MLREM_TM69. Several new descriptors related to atom connectivity (CationpiPC06), electrotopological structure (AnionMs), H-bonding and polar interactions (CationnHAcc, CationTPSA(Tot) and AnionH-050), number of halogen-containing functional groups (CationC-027, CationC-040) and hydrophobicity (AnionALOGP) were found to influence the melting point of the studied salts.
Descriptors | Description of descriptors | |
---|---|---|
CationJ | Balaban distance connectivity index | BRANNLP_TM69 |
AnionMs | Mean electrotopological state | |
AnionALOGP | Ghose–Crippen octanol–water partition coeff. (logp) | |
AnionTI1 | First Mohar index TI1 | |
CationC-027 | Number of R–CH–X fragments | |
CationnHAcc | Number of acceptor atoms for H-bonds (N O F) | |
CationpiPC06 | Molecular multiple path count of order 06 | |
AnionH-050 | H attached to heteroatom | Common |
CationC-040 | R–C(![]() ![]() ![]() |
|
CationTPSA(Tot) | Topological polar surface area using N,O,S,P polar contributions | |
AnionC-024 | R–CH—R | BRANNLP_TM42 |
AnionRBF | Rotatable bond fraction | |
CationH-050 | H attached to heteroatom | |
CationnHDon | Number of donor atoms for H-bonds (with N and O | |
CationTI1 | First Mohar index TI1 | |
CationX0 | Connectivity index chi-0 | |
AnionZM2V | Second Zagreb index by valence vertex degrees | |
CationLop | Lopping centric index |
The BRANNLP_TM42 model exhibited a distinct feature selection, sharing only one descriptor, AnionZM2V, with the MLREM_TM42 model and another descriptor, CationTPSA(Tot), with the BRANNLP_TM69 model. New descriptors used in BRANNLP_TM42 model include connectivity and topological descriptors (CationTI1, CationX0, and CationLop), structural fragments (CationC-040 and AnionC-024), H-bonding and polar interactions (CationnHDon and CationH-050) and molecular flexibility descriptor in the anion (AnionRBF).
The synthesised organic salts were selected based on their structural similarity to those in the training set. Salts containing anions and cations within the model's chemical space were selected. For instance, the triazolium cation and chloride, benzenesulfonate and triflate anions were chosen because they appeared frequently in the training set, allowing us to assess whether the model accurately generalises to new but related compounds. Additionally, 2-methylpyridinium, 2-amino-3-hydroxypyridinium and pyridinium cations, along with ethanesulfonate and bromide anions were included to test the model's extrapolation capability. A comparison of predicted and measured enthalpies of fusion and melting temperatures for the seven new salts is given in Fig. 5 and Fig. 6 respectively.
![]() | ||
Fig. 5 Predicted versus experimental enthalpies of fusion (ΔHf, kJ mol−1) for selected organic salt PCMs using four different regression models, highlighting model accuracy in enthalpy estimation. |
![]() | ||
Fig. 6 Comparison of measured and predicted melting temperatures (Tm, °C) for selected organic salt PCMs across four regression models, demonstrating model performance in melting point prediction. |
The MLREM_HF42 model (R2 = 0.82, SEE = 4 kJ mol−1) had the lowest performance in experimental validation with only three salts ([Tri]Br, [4-t-BuPyH][C2H5SO3], and [2-MePyH]Cl) showing enthalpies within acceptable range i.e. SEE. This model predicted around 5 kJ mol−1 lower enthalpies for [PyH][C2H5SO3] and [2-MePyH][C2H5SO3] and 5 and 16 kJ mol−1 higher enthalpies for [4-t-BuPyH][CF3SO3], and [2-NH2-3-OHPyH][C6H5SO3] respectively. Surprisingly, the BRANNLP_HF69 model, which had the lowest R2 (0.43, SEE = 5 kJ mol−1), showed a good agreement of predicted versus measured values with five salts having ΔHf within the range of SEE and only two salts i.e., [PyH][C2H5SO3] and [2-NH2-3-OHPyH][C6H5SO3] showing enthalpies out of acceptable range. The predicted enthalpy for [PyH][C2H5SO3] was 5 kJ mol−1 higher than measured and for [2-NH2-3-OHPyH][C6H5SO3], the enthalpy exceeded by 10 kJ mol−1 This highlights that R2 should be used cautiously and not solely relied upon as a performance indicator of the models. It is important to mention that two salts [2-MePyH]Cl and [Tri]Br exhibit solid–solid phase transitions (2 kJ mol−1) which could affect the ΔHf but the experimental ΔHf for these salts is in good agreement with the predicted ΔHf by all four models. Overall, the four models failed to predict reliable ΔHf for [2-NH2-3-OHPyH][C6H5SO3] with predicted ΔHf significantly higher ΔHf (10–16 kJ mol−1) than measured, which exceeded the expected deviation given that the instrument error is 5%. This salt was found to supercool and did not fully crystallize even after cooling down to −50 °C. Similar thermal behaviour was observed for [2-MePyH][C2H5SO3] which also has poor agreement between measured and predicted ΔHf, except with the BRANNLP_HF69 model.
For the melting temperature, the comparison of experimental and predicted values for all four models (Fig. 6) showed that MLREM_TM69 was able to predict Tm with reasonable accuracy, with salts having Tm within SEE range except for [2MePyH][C2H5SO3] which showed a difference of 60 °C. It may be argued that the SEE for MRLEM_TM69 model was higher (28 °C) than BRANNLP_TM69 (25 °C), thereby helping the Tm values fall within SEE range. However, the MLREM_TM69 (SEE = 28 °C) has shown better performance than MLREM_TM42, which has a slightly higher SEE of 29 °C. The BRANNLP_TM69 performed better than BRANNLP_TM42, with two salts, i.e. [2-NH2-3-OHPyH][C6H5SO3] and [2MePyH][C2H5SO3] showing melting point outside the SEE range. It is evident from the bar plot in Fig. 5 that all models predicted higher Tm (20 to 47 °C higher) for [2-NH2-3-OHPyH][C6H5SO3] than the measured Tm and lower (by 16 to 37 °C) Tm for [Tri]Br.
Overall, while no model perfectly captured all experimental trends, BRANNLP_HF69 demonstrated the most consistent performance for enthalpies of fusion predictions, and the MLREM_TM69 showed best performance for predicting meting temperatures among other models. The experimental validation emphasises the limitations of relying solely on R2 as an indicator of model reliability, particularly for compounds with atypical thermal behaviour.
The best model for Tm prediction (in terms of statistical parameter R2 and SEE) in our study was MLREM_TM6, which achieved an R2 of 0.63 and an SEE of 28 °C. This places the model in a competitive range relative to established models in the literature. For instance, Venkatraman et al.21 reported a partial least squares regression (PLSR) model trained on a dataset of 2212 ILs, which yielded an R2 of 0.50 and an RMSE of 55 °C on the training set. Their non-linear models (e.g., support vector regression, random forest, and k-nearest neighbours) exhibited improved performance, with R2 values ranging from 0.64 to 0.67 and RMSE values between 46 °C and 49 °C. Low et al.19 employed a k-nearest neighbours approach on the same dataset of 2212 IL to investigate the effect of different descriptor choice and best model achieved an R2 of 0.76 with an SEE of 38 °C. Although Venkatraman et al.21 and Low et al.19 models show a moderate to strong correlation compared to our MLREM_TM69 model, the higher SEE indicates a trade-off between correlation strength and predictive precision, indicated by its significantly lower SEE (28 °C). It should be noted that Venkatraman et al.21 and Low et al.,19 used 2212 ILs, but 40% of ILs in this dataset had bromide and chloride anions, which may have introduced a bias in the models. The multiple linear regression (MLR) and multilayer perceptron neural network (MLPNN) models by Fatemi et al.72 were quite accurate with R2 of 0.91 and 0.97, respectively. However, the associated SEEs of 35 °C (MLR) and 22 °C (MLPNN) suggest that, despite higher correlation coefficients, the SEE values are comparable to the MLRM_TM69 model presented here. Overall, while non-linear models often yield higher R2 values, the balance of accuracy and generalisability offered by our model affirms its utility, particularly in early-stage efforts.
In contrast to Tm, predictive modelling for ΔHf remains relatively underexplored in the literature. Among the limited number of studies, Zhou et al.26 developed a multiple linear regression (MLR) model based on 44 ionic liquids, predominantly imidazolium-based, and reported an R2 of 0.90 with a standard deviation of approximately 5 kJ mol−1. Similarly, Bai et al.27 constructed four QSPR models with training sets comprising 9 to 30 ILs, achieving R2 values between 0.93 and 0.97, and standard deviations of 3 kJ mol−1 for both. In our study, the MLREM_HF42 model emerged as the best-performing model for ΔHf prediction, achieving an R2 of 0.82 and SEE of 4 kJ mol−1. This performance is highly competitive when compared to prior studies, particularly considering the slightly larger and broader diversity of the input dataset in the model training. While the absolute R2 values are slightly lower than those reported by Zhou et al.26 and Bai et al.,27 the greater dataset size and structural diversity in this work support a higher degree of model generalisability and practical relevance. This trade-off between marginally lower fit and broader applicability reflects a more realistic and deployable model, especially for screening new or less conventional ILs with unknown fusion enthalpies.
Although R2 was used as a performance indicator, this study highlights the importance of incorporating additional metrics like SEE and experimental validation to provide a comprehensive assessment of the model's performance. Experimental validation of the enthalpy of fusion models showed poor performance for the highest R2 model i.e. MLREM_HF42 (R2 = 0.82) and better performance for the lowest R2 model i.e. BRANNLP_HF69 (R2 = 0.43). Overall, both the enthalpy of fusion and melting temperature models demonstrated moderate prediction accuracy in experimental validation.
This study represents the first attempt to utilise machine learning for predicting the properties of protic organic salt PCMs for thermal energy storage. With the availability of more data, and the guidance of this work, more accurate and robust models can be developed in the future.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5ma00498e |
This journal is © The Royal Society of Chemistry 2025 |