Shogo
Takasuka
a,
Shunto
Oikawa
a,
Takayoshi
Yoshimura
b,
Sho
Ito
a,
Yosuke
Harashima
a,
Tomoaki
Takayama
a,
Shigehito
Asano
c,
Akira
Kurosawa
c,
Tetsunori
Sugawara
c,
Miho
Hatanaka
b,
Tomoyuki
Miyao
a,
Takamitsu
Matsubara
a,
Yu-ya
Ohnishi
c,
Hiroharu
Ajiro
a and
Mikiya
Fujii
*a
aNara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan. E-mail: fujii.mikiya@ms.naist.jp
bKeio University, 4-1-1 Hiyoshi, Kohoku-ku, Yokohama-shi, Kanagawa 223-8521, Japan
cJSR Corporation, Shiodome Sumitomo Bldg., 1-9-2, Higashi-Shimbashi, Minato-ku, Tokyo 105-8640, Japan
First published on 8th May 2023
The properties of polymers are highly dependent on the combination and composition ratio of the monomers used to prepare them; however, the large number of available monomers makes an exhaustive investigation of all the possible combinations difficult. In the present study, five binary copolymers were prepared by radical polymerization using a flow reactor and the prediction performance of a machine learning model constructed using the obtained data was evaluated for the interpolation and extrapolation regions. Copolymer analysis was performed using ultra-high-performance liquid chromatography, and the measurement results were analysed to calculate the monomer conversion and monomer composition ratio in the polymer, which were used as objective variables. A prediction model was constructed using the process variables during polymerization and additional molecular descriptors (i.e., molecular flags (one-hot encoding), fingerprints or quantum chemical calculation values) related to the monomer type as explanatory variables. In the interpolated regions where all monomer types used were included in the training data, the prediction accuracy was high irrespective of the molecular descriptors added to the process variables. In the extrapolation region, the model that included explanatory variables corresponding to quantum chemical calculation values representing the energy generated when radical reactions occur, showed a high prediction accuracy for each objective variable. We found that quantum chemical calculation values (especially the molecular orbital energy of monomers in the extrapolation region) are important factors in the search for new binary copolymers prepared by radical polymerization. The proposed model is expected to accelerate the development of polymers using new monomers.
Materials exploration using ML has been investigated for a wide range of purposes, including searching for polymer compositions that exhibit required properties,8–10 classifying polymers by their crystalline phase and microstructure,11,12 and predicting the physical properties of polymers.13 As summarized by Hu et al., the number of different algorithms and learning models is increasing at an accelerating rate and, with it, the scope of exploration.14 ML can be conducted with a small amount of data; however, many data are preferable considering the scope of the material search and the desired prediction accuracy. Therefore, a high-throughput reactor is one of the critical elements for more efficient studies using ML. Material exploration15–17 and analysis18–20 using high-throughput instruments, whether in organic or inorganic chemistry, are already being actively studied in combination with ML. Coley et al., for example, used ML to predict the synthetic pathways for organic compounds, created a platform for their synthesis using a robotic flow device and demonstrated the platform's efficacy for 15 drugs or drug analogues.21 In addition, reactors in polymer synthesis are becoming increasingly high-throughput. Polymers have been used only in limited cases, such as applications that require photoreactive polymers,22 because they contain certain components produced by heterogeneity, introducing a risk of blocking the piping because of large viscosity changes during polymerization.
In recent years, microchemical approaches in the polymer field have been developed and polymerization by flow synthesis using a micromixer has been investigated.23 Flow synthesis is being actively studied in combination with ML, not only because it enables easy control of polymers and produces polymers with narrow molecular-weight distributions but also because it is highly efficient in production, enabling the collection of large amounts of data.24,25 Reis et al. synthesized copolymers using any combination of six monomers by radical polymerization in flow reactors.26 By incorporating ML methods, they identified more than 10 copolymer compositions that are superior to conventional materials within a search range representing less than 0.9% of the total composition space. Tan et al. used an in-line analyser to continuously acquire time-dependent analytical data, enabling the prediction of polystyrene conversion and molecular-weight distribution charts.27 Their results clearly show that the combination of ML and flow polymerization can be used to study polymers more efficiently. However, predicting polymerization using monomers not included in the learning process (extrapolated regions), which is highly desirable in the search for new materials makes it difficult to achieve high accuracy.
We have constructed a flow copolymerization system using a microflow mixer and are investigating copolymerization with more equal composition ratios for MMA–St and MMA–GMA copolymers.28 In the present work, we synthesized binary copolymers of MMA and GMA/St/4-acetoxystyrene (PACS)/tetrahydrofurfuryl methacrylate (THFMA)/cyclohexyl methacrylate (CHMA) via a free-radical method using the same flow copolymerization system used in our previous work. To more efficiently develop new materials, we used ML predictions to explore the extrapolation region for monomers that would provide desirable polymer properties.
(1) |
(2) |
Fig. 1 Chemical structures of monomers used in present work: GMA, St, PACS, THFMA and CHMA are classified as M1 and MMA as M2. |
Fig. 2 Model of flow synthesis reactor. Two bottles containing each monomer, initiator, and solvent are mixed using a micromixer, and synthesis is performed at the required temperature. |
Fig. 4 Overview of methods: explanatory variables, feature sets, ML algorithms, and objective variables. |
Feature set D is composed of the parameters calculated by density functional theory (DFT), consisting of (D-1) two reaction energies, (D-2) nine activation energies, (D-3) nine geometrical parameters, and (D-4) 16 orbital energies. Focusing on the polymerization of MMA and monomer X (X = GMA, ST, CHMA, THFMA, PACS), we note that the reaction starts from the attack of the radical initiator to the monomer (MMA or X), affording an MMA radical or X radical (denoted as MMA* or X*, respectively). Thus, the reaction energies for head- and tail-attack of the methyl radical (the model radical initiator) and monomer X (Fig. 5(a)) were used for feature set D-1. (Note that the attack to MMA was excluded from the feature set because it was commonly included in all our target reactions.) The second stage of polymerization is C–C bond formation between a monomer and a radical (Fig. 5(b)), such as (i) MMA and X*, (ii) X and MMA*, (iii) X and X* and (iv) MMA and MMA*. Because the transition state (TS) of C–C bond formation has three staggered conformations, each reaction could have three stable TS structures. Thus, the energy difference between the TS with three staggered conformations and the dissociation limit (radical + monomer) for reactions (i), (ii) and (iii) were used for feature set D-2, and the dihedral angles around the reactive C atoms at these TSs (Fig. 5(c)) were used for feature set D-3. We also gathered the features set D-4, i.e., the orbital energies of the highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) for X, those of the singly occupied molecular orbital (SOMO) and LUMO for X* and the HOMO–SOMO and SOMO–LUMO energy gaps between monomer A and radical B*, where (A, B*) are (MMA, X*), (X, MMA*) and (X, X*).
The calculation scheme for the 36 aforementioned features was as follows: first, the monomer conformers were generated from the Simplified Molecular-Input Line-Entry System (SMILES) using the RDKit cheminformatics software. As many as five conformers with large structural differences were then extracted for further geometrical optimization at the xTB level of theory. Second, the geometries of the model radicals (i.e. the products of monomers and methyl radicals) were calculated using an automated reaction-path search method known as the artificial force-induced reaction (AFIR) method at the xTB level of theory. To gather the conformers of the head- and tail-type model radicals, we randomly selected the monomer conformers and applied the AFIR calculation by adding the artificial force between the head or tail C atom of the monomer and the C atom of the methyl radical. This calculation was continued until the last three AFIR calculations did not find a new product. Third, to gather the TSs between the monomer and head-type model radical, we applied the AFIR method by adding the force between the reactive C atoms (i.e., the head C atom of the monomer and the tail C atom of the model radical). All the appropriate local minima and maxima (whose reactive C–C bond length was 2.20–3.24 Å) along the AFIR reaction pathways were reoptimized without any restriction at the B3LYP-D3/def2-SVP level of theory. On the basis of the obtained TS structures, we prepared three staggered conformations by modifying the dihedral angle around the reactive C atoms and then reoptimizing them. The geometry optimization and AFIR calculations were conducted using the Global Reaction Route Mapping (GRRM) program with the energies and energy derivatives computed by the Gaussian 16 program (for the DFT level) and the ORCA program (for the xTB level).
Fig. 6 Comparison of batch and flow syntheses. Plots of actual y vs. estimated y in M1_conv. (a) batch synthesis and (b) flow synthesis. |
Fig. 7 Plot summarizing prediction results for interpolated regions for each ML algorithm (a) R2 and (b) MAE. |
With respect to the predictions of monomer conversion in the interpolation region and monomer composition ratio for polymers, we found that creating a learning model using information specific to each monomer (feature sets B–D) improved the prediction accuracy. There were no significant accuracy differences among the predictions of models based on feature sets B–D.
Fig. 8 Plot summarizing prediction results for extrapolation regions for each ML algorithm (a) R2 and (b) MAE. |
First, in the models with feature set B added to the features, the prediction accuracy of each objective variable increased only for the ν-SVR model; the prediction accuracy of the RF and PLS models showed a decreasing trend. For the ν-SVR model (feature set B), the monomer conversion rate, in particular, was improved; however, its prediction accuracy was insufficient (R2 ≈ 0.4). The fact that the predicted values for St and CHMA did not change in any of the models confirms the need to add features other than molecular flags (Fig. S8–S10†). Second, when feature set C was included in the explanatory variable, the prediction accuracy of each objective variable was improved only for the RF model, as in the case of the model with feature set B added. The ν-SVR model reduced the predictions except that for C_M1_CR, whereas the PLS model slightly improved the predictions for the monomer conversion ratio and reduced the predictions for the monomer composition ratio for polymers. The prediction accuracy of monomer conversion was improved compared with that of the RF model (feature set B). Feature set C (RDKit: 208 features) contains more detailed monomer information, including monomer descriptors, monomer molecular weight and the number of rotatable bonds in the monomers. Liu et al. predicted quantum chemical properties using conformers generated from RDKit;32 we therefore expected RDKit to contain information similar to quantum chemical calculations. We assumed that this information would increase the predicted value of the monomer conversion ratio. Because RF is a nonlinear model, its improved accuracy is attributable to the importance of quantum chemical calculations from the fingerprint being implicitly incorporated into the predictive model.
The use of feature set D substantially improved the prediction accuracy of each objective variable in the RF and PLS models. The ν-SVR model showed a trend similar to that observed upon the addition of feature set C. The RF and PLS models showed high prediction accuracy, with R2 values of 0.8 or higher for most of the objective variables. For the PLS model, the monomer conversion predictions using feature sets B and C did not change the predicted values of St and CHMA for some of the samples; however, this problem was solved by using feature set D. Feature set B contains only molecular flags and has less information about the characteristics of the monomers. Feature set C details monomer information before the monomer undergoes the radical reaction, such as descriptors and molecular weight, but does not include post-reaction information. However, feature set D mainly describes the electronic energy information when radical reactions occur, along with conformation information about intermediate products (dimers and products of the initiators and monomers) and a large amount of post-reaction information. Radical polymerization is characterized by an increase in the molecular weight of radical molecules as the reaction proceeds; among the feature sets B–D used in the present study, feature set D is considered to best reflect this relationship. Thus, the prediction accuracy of the extrapolation region was substantially improved when feature set D was added. The prediction accuracy of the two models was improved using the energy information in radical reactions calculated via quantum chemical calculations as a feature. These features are expected to be highly versatile and adaptable to various situations. Finally, we calculated important properties to investigate which features contribute to the prediction among feature sets A + D with improved accuracy of the extrapolated regions. This discussion focuses on M1_conv. of the RF model (feature set D), which has superior prediction accuracy for the extrapolated regions. Fig. 9 shows the ten most important features for the M1_conv. prediction in the RF model (feature set D) for various copolymers. In addition, we carried out recursive feature elimination (RFE) using a stepwise method to clearly reveal features with a low learning contribution (Table 1). Herein, the feature number represents the number of features used for training, and the features with the lowest contribution when trained with that number of features are listed. In other words, the smaller the feature number, the higher the contribution of that feature. A total of 41 features are listed in this table: 5 process variables (M1_composition_ratio, initiator_composition_ratio, SM, temperature and flow_rate) and 36 DFT calculated values. M is the M1 monomer, 00 is the M2 monomer, theta (60, 180, 300) is the target value of the dihedral angle created by two C atoms on the radical side and two C atoms on the monomer side, Real_theta is the theta value after TS structural optimization, DE_(00M, M00, MM)_TS_theta (60, 180, 300) is the energy of TS, E_(M, 00)_Rad_(SOMO, LUMO) is the energy of SOMO and LUMO of radicals, E_(M, 00)_Mon_(HOMO, LUMO) is the energy of HOMO and LUMO of monomers, DE_(M, 00)_decomposition_(head, tail) is the energy required to dissociate the initiator from the radical, DE_(00M, M00, MM)_SHgap and DE_(00M, M00, MM)_SLgap are the difference between the two orbital energies. Fig. 9 shows that all of the process variables (feature set A) were included in the top 10 and ranked high irrespective of the monomers used. In particular, flow_velocity showed the highest importance for all the copolymers, followed by temperature, with the combined importance of the two features being greater than 45%. These results indicate that the process variables affect the polymer synthesis. It should also be noted that flow_velocity and temperature, which are of high importance, are set values and do not faithfully represent the inside of the reaction system. Supporting process variables with data on flow velocity and temperature distribution in the design from the fluid simulation is expected to improve the prediction accuracy further. A trend was also observed in feature set D (quantum chemical calculated values) included in the important features. The features included were the HOMO, LUMO and SOMO energies of M1 and M1 radicals, the orbital energy gaps (SOMO–HOMO (SH) gap and SOMO–LOMO (SL) gap) and the energy gaps between the normal state and the transition state. The RFE results indicate that the HOMO, LUMO and SOMO energies of the M2 and M2 radicals, the energy required to dissociate the initiator, and the stereo conformational dihedral angle are less important. Interestingly, the importance of orbital energy with respect to M1 and M2 differed dramatically. Some of the orbital energies related to M1 were definitely included in the ten most important features; however, the four orbital energies related to M2 (variables with the “E_00” suffix) were removed by RFE within six iterations. This result is attributed to MMA being the only monomer that corresponds to M2 and all the copolymers containing MMA. The inclusion of the SH gap and SL gap among the important features indicates that not only the orbital energy of each state but also the difference between them is important. On the basis of these results, the orbital energy related to the monomer in the extrapolation region and the orbital energy gap between the monomer and the radical were concluded to be important features for predicting polymerization using monomers not included in the learning.
Fig. 9 Feature importance chart for each copolymer in feature sets A + D: (a) GMA, (b) St, (c) PACS, (d)THFMA and (e) CHMA. |
Feature number | GMA | St | PACS | THFMA | CHMA |
---|---|---|---|---|---|
41 | E_00_Rad_LUMO | E_00_Mon_HOMO | DE_00_decomposition_tail | DE_00_decomposition_head | E_00_Rad_SOMO |
40 | E_00_Mon_LUMO | E_00_Rad_LUMO | DE_00_decomposition_head | E_00_Rad_SOMO | E_00_Rad_LUMO |
39 | E_00_Mon_HOMO | E_00_Rad_SOMO | E_00_Mon_LUMO | E_00_Rad_LUMO | DE_00_decomposition_tail |
38 | DE_00_decomposition_tail | DE_00_decomposition_tail | E_00_Mon_HOMO | E_00_Mon_LUMO | DE_00_decomposition_head |
37 | DE_00_decomposition_head | DE_00_decomposition_head | E_00_Rad_LUMO | E_00_Mon_HOMO | E_00_Mon_LUMO |
36 | E_00_Rad_SOMO | E_00_Mon_LUMO | E_00_Rad_SOMO | DE_00_decomposition_tail | E_00_Mon_HOMO |
35 | DE_M_decomposition_tail | Real_theta_00M_theta300 | DE_MM_SHgap | DE_00M_SLgap | Real_theta_M00_theta300 |
34 | DE_MM_SHgap | Real_theta_M00_theta60 | DE_M_decomposition_tail | Real_theta_M00_theta300 | Real_theta_M00_theta60 |
33 | Real_theta_MM_theta180 | DE_MM_TS_theta180 | Real_theta_MM_theta60 | Real_theta_M00_theta180 | Real_theta_MM_theta180 |
32 | Real_theta_MM_theta60 | DE_M_decomposition_tail | Real_theta_00M_theta300 | E_M_Mon_LUMO | DE_MM_SHgap |
31 | Real_theta_M00_theta60 | Real_theta_MM_theta60 | Real_theta_M00_theta300 | Real_theta_00M_theta60 | Real_theta_MM_theta60 |
30 | Real_theta_00M_theta300 | Real_theta_M00_theta300 | Real_theta_MM_theta180 | Real_theta_MM_theta180 | DE_MM_TS_theta180 |
29 | Real_theta_M00_theta300 | E_M_Mon_LUMO | Real_theta_M00_theta180 | DE_M_decomposition_tail | Real_theta_M00_theta180 |
28 | E_M_Mon_LUMO | DE_00M_SLgap | Real_theta_M00_theta60 | Real_theta_00M_theta300 | DE_M_decomposition_tail |
27 | DE_MM_TS_theta180 | DE_MM_SHgap | Real_theta_00M_theta60 | Real_theta_MM_theta300 | Real_theta_00M_theta60 |
26 | DE_00M_SLgap | Real_theta_M00_theta180 | DE_MM_SLgap | DE_M00_SHgap | DE_00M_TS_theta300 |
25 | DE_M00_TS_theta60 | Real_theta_MM_theta180 | E_M_Mon_HOMO | Real_theta_00M_theta180 | E_M_Rad_LUMO |
24 | DE_MM_TS_theta60 | Real_theta_00M_theta60 | DE_00M_TS_theta60 | E_M_Rad_SOMO | DE_00M_SHgap |
23 | DE_00M_TS_theta60 | DE_M00_TS_theta180 | DE_MM_TS_theta300 | E_M_Mon_HOMO | DE_M00_TS_theta180 |
22 | E_M_Rad_SOMO | Real_theta_MM_theta300 | Real_theta_MM_theta300 | DE_00M_TS_theta300 | DE_00M_SLgap |
21 | DE_M00_TS_theta180 | DE_00M_TS_theta300 | DE_00M_TS_theta300 | DE_MM_TS_theta300 | E_M_Mon_LUMO |
20 | DE_M00_TS_theta300 | DE_M00_TS_theta300 | DE_M00_SLgap | DE_00M_SHgap | DE_M00_SLgap |
19 | E_M_Mon_HOMO | DE_MM_TS_theta60 | E_M_Rad_LUMO | DE_MM_TS_theta180 | DE_M00_SHgap |
18 | Real_theta_MM_theta300 | E_M_Rad_LUMO | Real_theta_00M_theta180 | DE_M_decomposition_head | Real_theta_MM_theta300 |
17 | DE_M_decomposition_head | M1_composition_ratio | DE_MM_TS_theta60 | DE_MM_TS_theta60 | DE_MM_SLgap |
16 | DE_00M_TS_theta180 | DE_M00_TS_theta60 | DE_M00_TS_theta180 | SM | DE_M_decomposition_head |
15 | DE_00M_TS_theta300 | DE_00M_TS_theta60 | E_M_Mon_LUMO | DE_00M_TS_theta180 | E_M_Rad_SOMO |
14 | DE_MM_TS_theta300 | DE_MM_SLgap | E_M_Rad_SOMO | E_M_Rad_LUMO | Real_theta_00M_theta180 |
13 | Real_theta_00M_theta180 | E_M_Mon_HOMO | DE_M_decomposition_head | DE_MM_SHgap | Real_theta_00M_theta300 |
12 | E_M_Rad_LUMO | DE_00M_TS_theta180 | M1_composition_ratio | DE_00M_TS_theta60 | DE_MM_TS_theta300 |
11 | Initiator_composition_ratio | DE_M_decomposition_head | DE_M00_SHgap | Initiator_composition_ratio | SM |
10 | DE_MM_SLgap | DE_M00_SLgap | DE_M00_TS_theta300 | DE_M00_TS_theta180 | E_M_Mon_HOMO |
9 | DE_M00_SHgap | DE_M00_SHgap | SM | DE_M00_SLgap | Initiator_composition_ratio |
8 | SM | Real_theta_00M_theta180 | DE_MM_TS_theta180 | M1_composition_ratio | DE_MM_TS_theta60 |
7 | M1_composition_ratio | E_M_Rad_SOMO | Initiator_composition_ratio | DE_MM_SLgap | M1_composition_ratio |
6 | DE_M00_SLgap | SM | DE_M00_TS_theta60 | Real_theta_MM_theta60 | DE_00M_TS_theta180 |
5 | Real_theta_00M_theta60 | Initiator_composition_ratio | DE_00M_SLgap | DE_M00_TS_theta300 | DE_00M_TS_theta60 |
4 | DE_00M_SHgap | DE_MM_TS_theta300 | DE_00M_TS_theta180 | DE_M00_TS_theta60 | DE_M00_TS_theta60 |
3 | Temperature | DE_00M_SHgap | Temperature | Temperature | Temperature |
2 | Flow_rate | Temperature | Flow_rate | Flow_rate | Flow_rate |
1 | Real_theta_M00_theta180 | Flow_rate | DE_00M_SHgap | Real_theta_M00_theta60 | DE_M00_TS_theta300 |
We found that the molecular orbital energy information for the monomer (extrapolated region) and the orbital energy gap with radicals are necessary for highly accurate prediction of the extrapolated region. It is thought that the inclusion of features using quantum chemical calculations in anionic polymerization, cationic polymerization, polycondensation, and radical polymerization will enable the prediction of extrapolation regions. In such cases, it is necessary to conduct highly reproducible experiments, such as those involving the flow reactor used in this study. However, polymerization using flow reactors is difficult for reactions such as polycondensation, in which the viscosity increases rapidly. Hence, it is necessary to find a reactor that can obtain highly reproducible results for each synthesis method. The proposed model is expected to accelerate the development of polymers using new monomers.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2dd00144f |
This journal is © The Royal Society of Chemistry 2023 |