Mai
Hayakawa
a,
Kosuke
Sakano
a,
Rei
Kumada
a,
Haruka
Tobita
a,
Yasuhiko
Igarashi
b,
Daniel
Citterio
a,
Yuya
Oaki
a and
Yuki
Hiruta
*a
aDepartment of Applied Chemistry, Faculty of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan. E-mail: hiruta@applc.keio.ac.jp
bFaculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennodai, Tsukuba 305-8573, Japan
First published on 26th April 2023
Thermo-responsive polymers having a lower critical solution temperature (LCST) have attracted attention for biological applications such as drug delivery, diagnosis, and coating materials. In recent years, research on predicting LCST by utilizing machine learning has been conducted. However, since these methods targeted only copolymers combining specific monomer structures, they are not versatile, and multiple trials are still required to obtain new thermo-responsive polymers with the desired LCST. In this study, a prediction model for cloud point temperature (TCP) was built by a combination of materials informatics and chemical insight, named sparse modeling for small data (SpM-S) using a small dataset of polymers collected from the literature as training data. This approach created a model that is interpretable, easy to calculate, and versatile. The prediction accuracy was validated using data from different literature sources and experimental test data. The model was able to predict the TCP of polymers containing monomers not included in the dataset as well as polymers containing monomers included in the dataset. The predictive model has the potential to guide the design of new thermo-responsive polymers, and to contribute to efficient development of thermo-responsive polymers.
Among them, thermo-responsive polymers, which exhibit a temperature-dependent phase transition, are one of the most widely studied stimuli-responsive polymers. Thermo-responsive polymers have a lower critical solution temperature (LCST) or upper critical solution temperature (UCST) in aqueous solution.9 Compared to UCST-type polymers, many studies have been reported on LCST-type polymers applied to the biomedical field, because they undergo a phase transition at physiological temperature and conditions,10 which are less sensitive to pH and ionic strength.11–13 Thanks to these advantages, LCST-type polymers have been actively studied.12,14 Experimental characterization of such LCST phenomena has been studied using various measurement methods, including cloud point measurements,15 dynamic light scattering,16 and differential scanning calorimetry.17 Among them, simple assays of clouding behaviour are still the most common to study this type of phase separation process, which is important in various applications of thermo-responsive polymers.18
In designing thermo-responsive polymers, the control of LCST is an important issue for applications. The LCSTs of thermo-responsive polymers can be controlled by changing monomer ratio,19–22 molecular weight,23,24 and the structure of end groups.25,26 Furthermore, phase transition properties of thermo-responsive polymers are affected by the concentrations of ionic species,27,28 and the content of organic solvents.29,30 In addition to these experimental methods, some studies have predicted cloud point temperature (TCP) by creating regression equations based on molecular weight, degree of polymerization, or monomer ratio.31–33 To develop new thermo-responsive polymers, it is necessary to appropriately select type and composition ratio of monomers from an unlimited number of possible combinations. Therefore, controlling LCST of thermo-responsive polymers remains a challenge for researchers.
Recently, machine learning has been applied to predict and control the physical properties of materials and explore new ones.34–37 Although applying machine learning to polymers is generally difficult due to the complex structure–function relationships,38 it is beginning to be used in the field of thermo-responsive polymers.39,40 Kumar and co-workers showed the prediction and control of the TCP of poly(2-oxazoline) via machine learning.39 The method of gradient boosting with decision trees enabled highly accurate TCP control in a design space consisting of four repeating units and various molecular weights. Although the model was limited to the prediction of these structures because of the descriptors such as molecular weight and degree of polymerization of the copolymers in the four poly(2-oxazoline) repeating units, they provide motivation to extend the study of TCP control using machine learning. In addition, other groups have reported the prediction of TCP of polymers consisting of N-isopropylacrylamide (NIPAAm) and methoxy triethyleneglycol acrylate (MTGA) in aqueous salt solutions, utilizing a technique called support vector regression.40 Although the application of this model is limited to poly(NIPAAm-MTGA), it shows that machine learning can be used to understand the phase-transition behaviour of LCST-type thermo-responsive polymers. However, these prediction models were applicable only to copolymers of specific monomer combinations. They cannot be applied to the prediction of thermo-responsive polymers with monomers or copolymer compositions not used in the models. In addition, these studies have fixed explanatory variables and have not reached the point of searching for variables that contribute significantly to TCP among many chemical parameters. There is still room for improvement in TCP prediction model development. Therefore, we have attempted to establish guidelines for the TCP of thermo-responsive polymers using a new approach based on a machine learning method for small data.
In general, the success of prediction or exploration for materials depends on the amount of data.36 However, sufficient data is not always available for the target materials. Our group has focused on sparse modeling for small data (SpM-S). Sparse modeling is an approach to represent the whole of high-dimensional data using a limited number of descriptors extracted by machine learning.41 This approach has been applied to a variety of fields, such as image data and image compressions.42 It is applicable to small data because of extraction of a limited number of significant descriptors from high-dimensional data.43 SpM-S, which combines sparse modeling and chemical insights, has been successfully used to predict nanosheet size and yield in an example of process optimization and to explore new organic anode and cathode materials for lithium ion batteries in materials exploration.43–45 Here, we applied SpM-S to develop an TCP prediction model for thermo-responsive polymers based on a small dataset collected from the literature. Furthermore, the model was validated with the TCP prediction of thermo-responsive polymers with monomers and copolymer compositions that were not incorporated in the training dataset (Fig. 1).
The dataset containing 28 y and xn (n = 1–12) was prepared (Table S1 in the ESI†). Exhaustive search with linear regression (ES-LiR) was performed by Python. The results were summarized in the weight diagram to extract the descriptors by combination with our chemical insights. The linear prediction model was prepared by the selected descriptors after five-fold cross validation.
First, TCP was set as the objective variable, and factors that could be related to it were collected as explanatory variables. The explanatory variables (xn: n = 1–12) were selected and prepared based on our insights and experience (Fig. 1a, Table 1 and Table S1†). They included both parameters related to polymers (xn: n = 1–3) and those related to monomers (xn: n = 4–12). Molecular weight and PDI as typically measured when polymers are synthesized were selected as explanatory variables for the polymers. As an experimental value when measuring TCP, the polymer concentration is also selected as explanatory variable. Physicochemical parameters that can be easily calculated by HSPiP and ChemDraw were selected as explanatory variables for the monomers used in polymers. This allows us to build an interpretable and simple predictive model for TCP without experiments and complex simulation and calculation. In the case of copolymers, the parameters of each monomer were averaged by composition ratio and then used as explanatory variables. It is not necessary to separate the case for homopolymers and copolymers. In addition, it allows us to apply to copolymers constructed of more than 3 monomers. In this study, the influence of terminal substituent characteristics on TCP was considered to be low, and it was decided not to include it as an explanatory variable.
Polymers | Monomers | ||
---|---|---|---|
x n | Parameters | x n | Parameters |
a Literature data. b Calculated data by HSPiP. c Calculated data by ChemDraw. | |||
1 | Molecular weighta | 4 | HSP dispersityb |
2 | Poly dispersity (PDI)a | 5 | HSP polarityb |
3 | Concentrationa | 6 | HSP hydrogen bondingb |
7 | Molecular weight | ||
8 | Boiling pointc | ||
9 | Molar refractive indexc | ||
10 | Topological polar surface areac | ||
11 | ClogPc | ||
12 | CMRc |
Previously, it has been reported that the dominant parameter determining TCP is different between copolymers containing monomers of a brushy nature such as oligo(ethylene glycol) methacrylate (OEGMA) and those containing only non-brushy monomers. As for brush copolymers, graft density had great influence on the TCP, whereas for non-brush copolymers, surface area-normalized hydrophobicity of copolymers has strong influence on the TCP. Considering this knowledge, our predictive model, which uses the average value from the composition ratio of each monomer as explanatory variables, is appropriate for limited application to TCP prediction for non-brushy copolymers. In the present work, reported TCP data of non-brushy polymers, which consisted of monomers 1 to 15, were manually collected from literature references (Fig. 1b, Scheme 1, Fig. S1 and Tables S1, S2†).46–50 This dataset covers a variety of homopolymers and random copolymers, not limited to any particular structure. Finally, a small training dataset containing 28 objective variables (y) and 12 explanatory variables (x) was prepared (Fig. 1c and Table S1†).
The weight diagram shows the top 1000 prediction models with the lowest CVE values (Fig. 1d and2). In the weight diagram, the coefficients of the models were indicated by a cold color for negative correlations and a warm color for positive correlations. Interpretation of the weight diagram and construction of the final model were performed from a chemical perspective. In the present results, x3 (concentration of polymer), x5 (HSP-polarity of monomer), x10 (tPSA of monomer), and x11 (ClogP of monomer) were densely blue colored in the weight diagram. From these variables, we selected the final descriptors, x3 (concentration of polymer), x5 (HSP-polarity of monomer), x11 (ClogP of monomer) based on chemical insights. First, it is known that polymer chains tend to aggregate in aqueous solution at higher polymer concentrations. Therefore, it was reasonable that the explanatory variable x3 (concentration of polymer) was extracted as negatively correlated descriptor. Next, x5 (HSP-polarity of monomer) was extracted because it is one of the parameters related to solubility. In addition, x11 (ClogP of monomer) was extracted. ClogP is the calculated value of the distribution equilibrium between octanol and water, which is the hydrophobicity index. The smaller the value, the more hydrophilic the compound. With increasing hydrophilicity, TCP also increases. Therefore, it is considered as a negatively correlated parameter. In contrast, x10 (tPSA of monomer) was not included in the final predictive model even though it was extracted as a negatively correlated descriptor. From the chemical perspective, an increase in polar surface area is expected not to decrease, but to increase the TCP due to overall increase of polarity. Therefore, tPSA was removed from the final model based on chemical insights. Surprisingly, x1 (molecular weight of polymer) positively correlated to cloud-point, while it should correlate negatively.53 This is probably because the molecular weights of polymers are determined by GPC, and thus no correlation in absolute molecular weight could be obtained among the literature references due to the differences in the GPC columns used and types of standard polymer samples. Based on the above considerations, the prediction model was described using x3 (Concentration of polymer), x5 (HSP-polarity of monomer) and x11 (ClogP of monomer) (Fig. 1d and eqn (1)) with a root mean squared error (RMSE) of 7.13 °C, where xn are normalized by the frequency distribution such that the mean is 0 and the standard deviation is 1.
y = −0.520x3 − 25.8x5 − 21.4x11 + 35.3 | (1) |
Molecular weight was measured using GPC (Fig. S2†), and narrow PDI of synthesized copolymers was confirmed. Optical transmittance measurements were then performed using a UV-vis spectrophotometer to obtain cloud-point test data. The temperature-dependent optical transmittances of synthesized thermo-responsive polymers in water are shown in Fig. 4. All polymers exhibited a sharp phase-transition at different temperatures according to the monomer composition ratio. The TCP of copolymers depended on the composition ratio (Table 2). In the case of P(NIPAAm-co-DMAAm), a lower content of NIPAAm resulted in an increase in TCP, while the opposite trend was observed for P(NIPAAm-co-NNPAAm). In summary, test data for the prediction model was obtained over a wide temperature range by synthesizing NIPAAm-based thermo-responsive copolymers.
Polymer | Composition ratios of NIPAAm in feed [mol%] | Actual composition ratios of NIPAAm [mol%] | M n,NMR | M n,GPC | PDI | T CP[°C] |
---|---|---|---|---|---|---|
P(NIPAAm90-co-DMAAm10) | 90 | 90 | 40600 | 25800 | 1.20 | 35.4 |
P(NIPAAm70-co-DMAAm30) | 70 | 70 | 39700 | 25600 | 1.16 | 44.7 |
P(NIPAAm50-co-DMAAm50) | 50 | 50 | 40900 | 21600 | 1.27 | 64.0 |
P(NIPAAm90-co-NNPAAm10) | 90 | 90 | 40400 | 28800 | 1.18 | 31.7 |
P(NIPAAm70-co-NNPAAm30) | 70 | 70 | 40300 | 26600 | 1.23 | 29.2 |
P(NIPAAm50-co-NNPAAm50) | 50 | 50 | 40400 | 28300 | 1.27 | 27.1 |
y = −1.00x3 − 33.0x5 − 34.3x11 + 37.4 | (2) |
Fig. 5 The relationship between estimated TCPs from prediction model and measured values: (a) Compared by prediction model of eqn (1). (b) Compared by prediction model of eqn (2) with corrected coefficients. |
The RMSE was 5.43 °C for the test data from literature references (Fig. 5b, green plots). This value was comparable to the training data, indicating high prediction accuracy. Prediction accuracy was also confirmed with the synthesized copolymers, P(NIPAAm-co-DMAAm) whose monomer species were used in the dataset. The test data of P(NIPAAm-co-DMAAm), whose TCP exhibits a higher value than that of PNIPAAm, had an RMSE of 7.79 °C (Fig. 5b, orange plots). This RMSE value was also comparable to the training data, indicating high prediction accuracy. NIPAAm and DMAAm are the monomer species included in the dataset, respectively, but not their copolymers. This result indicates that this model can be applied to TCP prediction of polymers whose monomer combination is not included in the dataset. The applicability of this model was also investigated for copolymers containing a monomer not included in the dataset. P(NIPAAm-co-NNPAAm), which exhibits a lower TCP than PNIPAAm, had an RMSE of 1.87 °C, showing high prediction accuracy (Fig. 5b, yellow plot). As for monomers used to construct the polymer, NIPAAm has been included in the dataset, while NNPAAm was not. Considering this result, this prediction model was also able to predict TCP for polymers composed of monomers not in the dataset.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3py00314k |
This journal is © The Royal Society of Chemistry 2023 |