G. P.
Gakis
*,
I. G.
Aviziotis
and
C. A.
Charitidis
Research Lab of Advanced, Composite, Nano-Materials and Nanotechnology, Materials Science and Engineering Department, School of Chemical Engineering, National Technical University of Athens, 9 Heroon Polytechneiou Street, Zografos, Athens 15780, Greece. E-mail: gakisg@chemeng.ntua.gr; Tel: +30 2107723296
First published on 15th April 2025
The emerging applications of nanotechnology have led to the synthesis, production and use of a continuously increasing number of nanomaterials. In recent years, the focus is being shifted to multicomponent nanomaterials (MCNMs), due to the control over their functional properties. At the same time, the increasing exposure of ecosystems to such materials has raised concerns over their environmental hazard, with several in vivo and in vitro studies used to assess the ecotoxicity of MCNMs. The demanding nature of such methods has also led to the increasing development of in silico methods, such as structure–activity relationship (SAR) models. Although such approaches have been developed for single component nanomaterials, models for the ecotoxicity of MCNMs are still sparse in scientific literature. In this paper, we address the case of MCNM ecotoxicity by developing an in silico classification SAR computational framework. The models are built over a dataset of 652 ecotoxicity measurements for 214 metal and metal oxide MCNMs, towards bacteria, eukaryotes, fish, plants and crustaceans. This dataset is, to the best of the authors' knowledge, the largest dataset used for MCNM ecotoxicity. It is found that two descriptors can adequately classify different MCNMs based on their ecotoxicity over the whole heterogeneous dataset. These descriptors are the hydration enthalpy of the metal ion and the energy difference between the MCNM conduction band and the redox potential in biological media. Although the classification does not allow a quantitative ecotoxicity assessment, the heterogeneous nature of the dataset can reveal key MCNM features that induce toxic action, allowing a more holistic understanding of MCNM ecotoxicity, as well as the nature of interaction between the different MCNM components.
Environmental significanceIn recent years, nanotechnology research has focused on multi-component nanomaterials (MCNMs), which allow increased control over nanomaterial properties. However, concern has been raised regarding the safety of extended ecosystem exposure to such materials. The demanding nature of in vitro and in vivo methods has led to emerging in silico techniques, such as structure–activity relationship (SAR) models, for nanomaterial safety assessment. Nevertheless, such approaches for the case of MCNM ecotoxicity are limited, and are built using limited datasets. In this work, a classification approach is presented for MCNM ecotoxicity towards bacteria, eukaryotes, fish, plants, and crustaceans. The use of the heterogeneous datasets allows a more holistic understanding of MCNM toxicity, assisting the synthesis of MCNMs that are safe-by-design towards ecosystems. |
The emergence of ENM and MCNM applications has also raised concern regarding the safety of extended exposure to such materials. While such exposure can be intentional, with the use of these materials as drug carriers,20,21 non-intentional exposure to ENMs and MCNMs can occur in varying environments.22–25 Ecosystems can also be exposed to ENMs and MCNMs, throughout their manufacturing process, application and disposal stages, via different exposure routes.26–28
The safety of ENMs and MCNMs is usually assessed by measuring their toxicity. In this context, in vitro toxicity methods are used for a faster and cost effective initial toxicity assessment,29 while in vivo methods, which are the most reliable, are used at later stages of regulatory risk assessment,30 as they are time consuming and are characterized by higher cost and ethical concerns regarding animal testing. However, both types of toxicity assessment methods cannot keep up with the innovation, synthesis, and application of novel nanomaterials.31 For this reason, in silico methods have emerged to assess the safety of nanomaterials, with computational models developed for biodistribution32,33 and toxicity,31,34,35 toward different cells and organisms.
During the last decades, the increasing applications of nanotechnology have led to a wide number of research studies regarding nanomaterial toxicity assessment using in vitro36–38 and in vivo39–41 methods, as well as comparative reviews between the different toxicity assessment methods.42,43 In a similar way, interest has shifted towards in silico methods, and in particular structure–activity relationship (SAR) models44,45 based on the correlation of structural characteristics of the materials under study, known as descriptors, to biological activity endpoint data.46 Both quantitative (QSAR) and qualitative (classification SAR)47 models have been developed, especially for the case of metal oxide nanoparticles (NPs),47–49 but also for other types of ENMs.35,50In silico models for such materials have been comprehensively reviewed by Buglak et al.,44 and Li et al.51 However, most models are developed using a limited dataset,47,48,52,53 using a lower number of toxicity data. This limitation can severely hinder the extension of the use of SAR approaches, as models developed using a limited and homogeneous dataset apply only to a range of data similar to their training datasets. This means that the model precision may be severely decreased when comparing with measurements with a slight difference in the experimental design, or the NP properties, such as size and shape, thus damaging the model reproducibility. Furthermore, when larger and more heterogeneous datasets are used for the model development,49,54–57 the extraction of mechanistic information from SAR models challenging due to the complexity of the descriptors used. Complex descriptors can be difficult to be computed for novel NPs or more complex nanomaterial structures. Furthermore, the lack of mechanistic understanding does not allow to extend the use of QSAR models as decision supporting tools for the design and synthesis of safe-by-design nanomaterials.58 Nevertheless, theoretical frameworks have been developed to provide a more cohesive, consistent and mechanistic understanding of metal oxide toxicity, while also leading to predictive SAR models for toxicity.31,53,59–63 Based on a similar framework, we recently used an extensive and heterogeneous dataset of toxicity measurements towards a wide range of cell lines and organisms, to develop a classification SAR model.64
Although in silico methods have been developed for pure metal oxides, the case of MCNMs is not yet sufficiently covered. In particular, the case of metal-loaded TiO2 MCNMs has been studied by means of QSAR modelling,65–69 while only the case of ZnO-based NPs has been covered besides the TiO2 MCNMs.70 Nevertheless, the limited number of MCNMs and toxicity data used to develop these models does not allow a more global understanding of MCNM toxicity mechanisms. In a recent work, we presented a classification SAR model using an extensive dataset for the case of cytotoxicity and antibacterial activity of metal and metal oxide MCNMs,71 which allowed a more mechanistic insight on the dominant MCNM toxicity pathways. Regarding the case of ecotoxicity, however, although there has been an increasing interest using in vitro and in vivo assessment methods,72–75in silico studies with the development of SAR models are still missing for MCNMs.
In this work, a classification SAR approach for the prediction of MCNM ecotoxicity is presented. The model is developed using an extensive dataset of 652 half-maximal concentration measurements for the ecotoxicity of metal and metal oxide MCNMs. The MCNMs considered in the present work consist of doped metal oxides, composite metal oxides, bimetallic NPs, as well as surface-loaded metal oxide NPs. Different subsets of data were used to build different models, based on the target organisms. In particular, models were developed for MCNM ecotoxicity towards E. coli, S. aureus, D. rerio, D. magna and C. albicans. Furthermore, the approach was extended to more heterogeneous datasets, consisting of MCNM ecotoxicity measurements towards different organisms' groups, such as bacteria, eukaryotes, fish, crustaceans and plants. Finally, the complete heterogeneous dataset was used for the development of a SAR model, showing that the approach can offer a more general insight regarding MCNM ecotoxicity.
The novelty of the approach lies in the size and nature of the dataset used for the development of such a model. The present study is, to the best of the author's knowledge, the largest dataset of MCNM ecotoxicity measurements used for the development of a SAR model. Furthermore, the dataset is heterogeneous, consisting of ecotoxicity measurements towards bacteria, eukaryotes, fish, plants and crustaceans. Such a heterogeneous SAR approach has not been applied before for the case of MCNM ecotoxicity while the used approach to compute MCNM descriptors for surface-loaded MCNMs is also novel for ecotoxicity models. Finally, the aim of the present work is not restricted to the development of a predictive classification model, but also to unravel the key characteristics of MCNMs that induce ecotoxicity. The size and heterogeneous nature of the dataset used for the model development will also assist towards a more holistic and mechanistic understanding of the ecotoxic action of MCNMs and the interaction between their components.
Regarding the classification scheme, a measurement is characterized as toxic or non-toxic based on the criteria reported by Simeone and Costa,31 following the scheme presented in our previous works.64,71 Briefly, the measurement is classified as toxic if the logarithm of the concentration endpoint in molar units (mol L−1) is lower than −2.5 (log(EC50) ≤ −2.5). However, in some works, EC50 concentrations are reported as being higher than the range of experimentally tested concentrations (EC50 > Cmax,tested). In such cases, if the maximum concentration yielded an effect less than 50% and log(Cmax) > −2.5, then the measurement was classified as non-toxic. On the other hand, if log(Cmax) ≤ −2.5, then the following scheme is applied: If Cmax is more than 50% of the threshold concentration (Cthres), the measurement is classified as non-toxic. If, however, Cmax is less than 50% of the threshold value, the measurement is omitted from the dataset. The value of 50% was arbitrarily chosen, so that a significant amount of MCNM has been exposed so that the measurement is classified as non-toxic, and to reduce the number of data omitted from the dataset.
In a similar way, if the concentration tested was lower than the threshold value (log(Cmax) ≤ −2.5), and the effect was higher than 50%, then the EC50 value is set to the concentration tested and classified as toxic. Otherwise, for measurements with an effect higher than 50% where the concentration tested was higher than the threshold value (log(Cmax) > −2.5), the measurement is removed. The classification scheme is summarized in Table 1.
Reported concentration | Condition | Classification |
---|---|---|
EC50 value reported | log(EC50) ≤ −2.5 | Toxic |
log(EC50) > −2.5 | Non-toxic | |
C max yields effect >50% | log(Cmax) ≤ −2.5 | Toxic |
log(Cmax) > −2.5 | Omitted | |
C max yields effect <50% | log(Cmax) > −2.5 | Non-toxic |
log(Cmax) ≤ −2.5, Cmax ≥ 0.5·Cthreshold | Non-toxic | |
log(Cmax) ≤ −2.5, Cmax < 0.5·Cthreshold | Omitted |
Based on the above classification scheme, a final dataset of 652 MCNM ecotoxicity measurements is developed, presented in the ESI† of the paper. The dataset consists of ecotoxicity measurements of bimetallic NPs, metal-doped metal oxide NPs, surface loaded metal oxide NPs, and composite metal oxide NPs. The total dataset is divided to different subsets of data, corresponding to the individual organism that was exposed to the MCNMs. Datasets corresponding to the different organism groups are also created. The dataset size (minimum of 30 measurements) and the number of toxic and non-toxic measurements (minimum of 20% of both data classes) served as the criteria for the creation of a data subset. Based on the above scheme, 5 data subsets are created for individual organisms, as well as 5 datasets for organism groups. The complete dataset is also used for the model development. The final datasets are presented in Table 2.
Cell type/cell type | No of measurements | No of NPs | % of toxic measurements | % of non-toxic measurements |
---|---|---|---|---|
Individual organisms | ||||
E. coli | 92 | 65 | 50 | 50 |
S. aureus | 70 | 45 | 50 | 50 |
D. rerio | 88 | 36 | 21.6 | 78.4 |
D. magna | 37 | 26 | 64.9 | 35.1 |
C. albicans | 36 | 31 | 66.7 | 33.3 |
Organism groups | ||||
---|---|---|---|---|
Bacteria | 252 | 82 | 49.2 | 50.8 |
Eukaryotes | 123 | 69 | 57.7 | 42.3 |
Fish | 101 | 42 | 26.7 | 73.3 |
Crustaceans | 74 | 58 | 64.9 | 35.1 |
Plants | 102 | 40 | 30.4 | 69.6 |
Complete dataset | 652 | 214 | 46.2 | 53.8 |
![]() | (1) |
A different computational scheme is used for surface-loaded MCNMs, as presented in our previous work.71 Briefly, the composition of the MCNM surface, which is the area of interaction with the biological media, is computed. This is done by computing the total mass and molecular amount (qi, in moles) for each of the particle components in a single particle, using the nominal densities (25 °C) of each component (from online handbooks/databases) and mass/molar fractions, as well as the particle volume:
![]() | (2) |
The surface components are all assumed to be situated on the particle surface (qs,i = qi). The amount of the core component on the particle surface (qs,core) is computed based on the core component's unit cell:
![]() | (3) |
![]() | (4) |
For datasets with a lower number of measurements (n < 100), a five-fold cross validation was used. For larger datasets (n ≥ 100), a hold-out validation is used with 80% of the data as a training set and 20% as a validation set. In the hold-out validation scheme, the model is trained using a five-fold cross validation on the training set, while the validation is performed by comparing the trained model predictions to the toxic class assigned to the measurements in the validation set. The data splitting is random, and performed using MATLAB®. The model training is performed using the classification learner toolkit, by implementing Support Vector Machines (SVM), k-Nearest-Neighbors (kNN) and Random Forests (RF). The optimal models were identified with the use of different statistical metrics based on the resulting confusion matrix, namely accuracy, precision, sensitivity (or recall), and specificity (or selectivity), computed as in ref. 64 and 71. The receiver operating characteristic (ROC) curve was also used as a metric to identify the optimal models, while the models were deemed acceptable when the accuracy exceeded 80%.
As a second step, the feature selection methods, namely ReliefF and chi-square, are used to rank the remaining descriptors based on their relevance to the response variable (ecotoxic class), as described in section 2.2.3. The four highest ranked descriptors are kept, following the feature selection analysis. Representative results for the four highest ranked descriptors, as derived from the two methods, using the complete dataset, are shown in Fig. 2. The descriptors rank using the two feature selection methods, for the different data subsets, is presented in Table 3. It is mentioned that in Table 3, only the descriptors that were ranked among the four most relevant descriptors for at least one dataset are shown.
![]() | ||
Fig. 2 The four most relevant descriptors, for the complete dataset, as derived from a) ReliefF and b) chi-square methods. |
Relief F | Chi-square | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HE | Dbio | S_IP | IP | X_M | VWR | Xion | HE | Dbio | S_IP | IP | X_M | VWR | Xion | |
E. coli | 3 | — | 2 | — | — | 1 | 4 | 2 | 1 | — | 3 | — | 4 | — |
S. aureus | 3 | — | 1 | 2 | 4 | — | — | 4 | 1 | — | — | 3 | 2 | — |
D. rerio | — | 2 | — | 4 | — | 1 | 3 | 1 | — | 3 | — | 2 | 4 | — |
D. magna | 2 | 4 | — | — | — | 1 | 3 | 1 | — | 2 | 3 | — | 4 | — |
C. albicans | 3 | 1 | 2 | — | — | — | 4 | 2 | 1 | 3 | 4 | — | — | — |
Bacteria | 2 | 4 | 1 | — | — | — | 3 | 2 | — | — | 4 | 1 | 3 | — |
Eukaryotes | 3 | 1 | 4 | 2 | — | — | — | 2 | 1 | — | — | 3 | — | 4 |
Fish | — | 1 | — | 3 | 2 | 4 | — | 2 | 1 | — | — | — | 4 | 3 |
Crustaceans | 1 | 3 | — | 4 | 2 | — | — | 1 | 2 | 3 | 4 | — | — | — |
Plants | 4 | 2 | — | — | 3 | 1 | — | 2 | — | 3 | 1 | — | — | 4 |
Complete dataset | 4 | 1 | — | — | — | 2 | 3 | 1 | 3 | — | 2 | 4 | — | — |
The descriptor ranking presented in Fig. 2 and Table 3 shows that seven descriptors are ranked within the four highest ranked descriptors at least once, for all the data subsets, using the two feature selection methods. Interestingly, HE (hydration enthalpy61,63,64,71), Dbio (energy difference between the conduction band of the MCNM and redox potential of biological media64,71), and IP (ionic potential of the metal component81) are ranked amongst the four highest ranked descriptors for all 11 datasets, using at least one feature selection method. Based on the above observations, the three abovementioned descriptors are kept for the model development, with different descriptor combinations being tested. The results are presented in the following sections, for the models developed for the different datasets.
Dataset | Descriptor combination | ||||||
---|---|---|---|---|---|---|---|
HE, Dbio, IP | HE, Dbio | HE, IP | Dbio, IP | HE | Dbio | IP | |
E. coli | 95.2 | 94.6 | 85.9 | 90.2 | 70.7 | 53.3 | 78.3 |
S. aureus | 97.1 | 97.1 | 81.4 | 95.7 | 84.3 | 51.4 | 77.1 |
D. rerio | 89.8 | 93.2 | 87.5 | 81.8 | 83.0 | 71.6 | 78.4 |
D. magna | 86.5 | 89.2 | 89.2 | 86.5 | 89.2 | 78.4 | 83.8 |
C. albicans | 91.7 | 94.4 | 88.9 | 88.9 | 86.1 | 83.3 | 80.6 |
As seen from the results of Table 4, amongst the single descriptor models, HE is the most predictive descriptor, for most datasets, with the exception of the model developed for the measurements towards E. coli where IP is the most predictive descriptor. The inclusion of a second descriptor improved the model accuracy, with the combination of HE and Dbio being the most predictive set of descriptors, except from the case of the D. magna dataset, where the accuracy did not improve with the addition of Dbio as a descriptor. Finally, the inclusion of all three descriptors produced a non-significant or lower accuracy than the combination of HE and Dbio. Hence, in order to reduce the complexity of the model and enhance the interpretability of the results, the minimum number of descriptors that produce the highest accuracy are kept. This means that the two-descriptor combination of HE and Dbio is kept for the different models, except from the case of the D. magna model, where the single descriptor of HE is kept.
The statistical metrics for the optimal models developed using the ecotoxicity measurements within the individual organism datasets are presented in Table 5.
Organism | Descriptors | Validation scheme | Acc (%) | Prec (%) | Sens (%) | Sel (%) |
---|---|---|---|---|---|---|
E. coli | HE, Dbio | 5-Fold cross validation | 94.6 | 97.7 | 91.3 | 97.8 |
S. aureus | HE, Dbio | 5-Fold cross validation | 97.1 | 94.6 | 100 | 94.3 |
D. rerio | HE, Dbio | 5-Fold cross validation | 93.2 | 100 | 68.4 | 100 |
D. magna | HE | 5-Fold cross validation | 89.2 | 100 | 83.3 | 100 |
C. albicans | HE, Dbio | 5-Fold cross validation | 94.4 | 95.8 | 95.8 | 91.7 |
The statistical metrics presented in Table 5 show that the models developed have acceptable values for accuracy, towards all the individual model datasets. The lowest accuracy is obtained by the D. magna model, which, however, has an acceptable value of 89.2%. The developed models also show high values for precision, sensitivity and selectivity, with the exception of the D. rerio model, which shows a sensitivity of 68.4%. This lower sensitivity could be assigned to the more imbalanced nature of the dataset, as only 21.6% of the measurements are classified as toxic within the D. rerio ecotoxicity dataset. The very high values of precision and selectivity could also be assigned to this imbalance, for the D. rerio dataset. In any case, the different models were able to classify the ecotoxicity measurements in the different datasets successfully, using the same descriptor combination of HE and Dbio, except from the D. magna model which used the HE descriptor. However, as seen in the results of Table 4, the addition of Dbio did not decrease or increase the accuracy.
The results presented in Tables 4 and 5 show that the classification SAR approach presented can predict the MCNM ecotoxicity towards the different organisms with acceptable accuracy, using similar descriptors. This could hint towards similar underlying mechanisms being responsible for the ecotoxicity of the different MCNMs towards the different organisms. Furthermore, the same results show that the additive mixture approach, as suggested by Mikolajczyk et al.,65 and the approach used for surface-loaded MCNMs previously presented,71 is able to produce predictive descriptors for MCNM ecotoxicity. This can assist towards the understanding of the interaction between the various MCNM components.
The small number of descriptors used for the classification SAR model development, together with the fact that similar descriptors are used for all the individual organism datasets, allows the mapping of the classification results over the descriptor space. The predicted ecotoxic class for the different MCNMs, towards the different organisms, is presented in Fig. 3, as a function of the two descriptors, HE and Dbio. It is noted that the results for the D. magna model are also presented over the same descriptor space for comparison, even though the model is developed using only HE as a descriptor.
![]() | ||
Fig. 3 SAR model predictions of the ecotoxic class of the different MCNMs, towards the different datasets, as a function of the two descriptors (HE, Dbio). |
Results of Fig. 3 show that similar results are obtained for the ecotoxic class prediction over the descriptor space, for all the different datasets. Specifically, the ecotoxic MCNMs are characterized by a less negative HE and a lower Dbio value, for all the different classification SAR models developed. Specifically, a threshold value of HE close to −50 eV and a Dbio value close to 1 eV separate the two classes, with the exception of the C. albicans model, where ecotoxic classification is obtained for higher Dbio values. This similar behavior may hint towards similar dominating mechanisms for ecotoxic action among the MCNMs of the different datasets, towards different organisms. Such results are consistent with previous findings for the case of pure metal oxides,61–64 and may assist towards a more global understanding of ecotoxic action of MCNMs. Further discussion regarding the ecotoxic mechanisms and the interpretation of model results will be presented in a subsequent section of the present paper.
As in the previous section, the different combinations of the descriptors identified in the selection step (HE, Dbio, IP) are used for SAR model development, in order to identify the optimal descriptor combination. The accuracy of the models built using the different descriptor combinations, for the organism groups and the complete dataset, is presented in Table 6. It is noted that for the descriptor combination tests, all the models were developed using five-fold cross validation.
Dataset | Descriptor combination | ||||||
---|---|---|---|---|---|---|---|
HE, Dbio, IP | HE, Dbio | HE, IP | Dbio, IP | HE | Dbio | IP | |
Bacteria | 92.9 | 92.9 | 85.3 | 90.9 | 75.4 | 75.0 | 84.9 |
Eukaryotes | 93.5 | 92.7 | 81.3 | 87.0 | 71.5 | 85.4 | 72.4 |
Fish | 87.1 | 88.1 | 78.2 | 84.2 | 76.2 | 74.3 | 72.3 |
Crustaceans | 93.2 | 93.2 | 93.2 | 82.4 | 93.2 | 78.4 | 74.3 |
Plants | 87.3 | 87.3 | 85.3 | 83.3 | 80.4 | 69.6 | 85.3 |
Complete dataset | 89.9 | 89.6 | 75.6 | 80.8 | 76.4 | 64.0 | 72.9 |
The results of Table 6 show that among the single descriptor models, HE is the most predictive descriptor for the fish and crustaceans models, as well as for the complete dataset model. IP is the most predictive descriptor for the bacteria and plants models, while Dbio is the most predictive for the eukaryotes model. As for the case of individual organism models, the addition of a second descriptor improved the accuracy in all the heterogeneous models, except from the crustacean model, where the addition of a second descriptor to HE did not affect the model accuracy. Interestingly, the most predictive descriptor pair was found to be the HE and Dbio combination, as in the individual organism models. Finally, the addition of IP to the HE and Dbio combination led to non or insignificant increase of the model accuracy. From the above results, it is concluded that the optimal descriptor set is the combination of HE and Dbio, with the exception of the crustacean model, where the single descriptor model developed using HE was found to be optimal.
Using the abovementioned descriptor combinations, the optimal models are developed. The resulting statistical metrics for the optimal models using the ecotoxicity measurements within the heterogeneous datasets of organism groups and the complete dataset are presented in Table 7. It is mentioned that the models presented in Table 7 display different accuracy than in Table 6 (except from the crustaceans model), due to the fact that they are developed using a hold-out validation scheme, contrary to the models of Table 6, where a five-fold cross validation scheme was used for all models.
Organism | Descriptors | Validation scheme | Acc (%) | Prec (%) | Sens (%) | Sel (%) |
---|---|---|---|---|---|---|
Bacteria | HE, Dbio | Train. (n = 202) | 94.0 | 90.5 | 95.0 | 93.3 |
Val. (n = 50) | ||||||
Eukaryotes | HE, Dbio | Train. (n = 99) | 91.7 | 92.9 | 92.9 | 90.0 |
Val. (n = 24) | ||||||
Fish | HE, Dbio | Train. (n = 81) | 85.0 | 80.0 | 66.7 | 92.3 |
Val. (n = 20) | ||||||
Crustaceans | HE | 5-Fold cross validation | 93.2 | 97.8 | 91.7 | 96.2 |
Plants | HE, Dbio | Train. (n = 82) | 90.0 | 100 | 75.0 | 100 |
Val. (n = 20) | ||||||
Complete dataset | HE, Dbio | Train. (n = 522) | 89.2 | 83.3 | 92.6 | 86.9 |
Val. (n = 130) |
Results of Table 7 show that acceptable accuracy is obtained for all developed models using the heterogeneous datasets for the organism groups, as well as the complete dataset. Furthermore, acceptable values are obtained for all statistical metrics along the different models, except from the sensitivity obtained by the models developed using the fish and plants datasets. This lower model sensitivity could be assigned to the more imbalanced nature of these datasets (26.7% and 30.4% of toxic measurements, respectively). Nevertheless, the fact that the developed models can predict the ecotoxic class of the various MCNMs in the different validation sets of the different organism group and complete datasets, using similar descriptors, enhances the notion that similar mechanisms dominate the MCNM ecotoxic action. As in the case of the individual organism models, HE and Dbio are found to be the most predictive set of descriptors, except from the case of the crustacean model, which was developed using only the HE descriptor. Moreover, the additive mixture approach used to compute the MCNM descriptors was again able to produce predictive descriptors for the different classification SAR models, allowing a more global understanding of the nature of the interaction between MCNM component.
The ecotoxic class predicted by the different classification SAR models, mapped over the space defined by the two model descriptors, is presented in Fig. 4. This mapping allows more clear understanding of the effect of the different descriptors on the model results, and is possible due to the low number of descriptors used during the model development.
![]() | ||
Fig. 4 SAR model predictions of the ecotoxic class of the different MCNMs, towards the different organism groups and complete datasets, as a function of the two descriptors (HE, Dbio). |
As in the case of the models developed for the individual organism datasets (Fig. 3), the results of Fig. 4 present a similar ecotoxic class prediction over the descriptor space, for all models developed for the organism group and complete datasets. Ecotoxic class is predicted for MCNMs that exhibit a less negative HE and a lower Dbio value. For all the different classification SAR models developed. A similar trend is observed across all the SAR models, both for the case individual organisms (Fig. 3) and organism groups (Fig. 4). This behavior enhances the notion of similar mechanisms being dominant for the ecotoxic action of MCNMs towards different organisms. Such results may lead to a more holistic understanding of ecotoxic action of MCNMs, which in turn can assist in the development of MCNMs with properties that increase or decrease the interaction of MCNMs with the abovementioned organism, according to the desired application.
Dataset | Bounding box PCA | Convex hull | Centroid distance |
---|---|---|---|
E. coli | — | — | 2 |
S. aureus | — | — | 3 |
D. rerio | — | — | 2 |
D. magna | — | — | 2 |
C. albicans | — | — | 2 |
Bacteria | — | — | 12 |
Eukaryotes | — | — | 6 |
Fish | — | 2 | 2 |
Crustaceans | — | — | 4 |
Plants | — | — | 5 |
Complete dataset | — | 1 | 27 |
The results of Table 8 show that the Bounding box PCA method included all the MCNM measurements within the applicability domain of all the developed classification SAR models. The convex hull method also includes the total set of MCNM measurements within the applicability domain of the models developed for most datasets, with the exception of the fish dataset, where two measurements were deemed to be outside the applicability domain. A single measurement was also deemed to be outside the applicability domain for the complete dataset model, by the convex hull method. On the other hand, the centroid distance method defined narrower applicability domains, which did not include a number of MCNM measurements, for all the different ecotoxicity datasets. Such results show the dependency of the applicability domain on the method used for its definition. This dependency has been identified and discussed in scientific literature.92 The analysis of the applicability domain results reproduces the findings of previous SAR model development, where metal oxides with similar descriptor values are outside the applicability domain of the models.62,64
The complete dataset model is used to predict the ecotoxic class of the different MCNMs, and the results are presented in Fig. 5a, over the space defined by the two descriptors used for the model development. In order to compare the model predictions to the actual experimental measurements, Fig. 5b is used to present the corresponding percentage of toxic measurements for each MCNM (as several MCNMs have multiple measurements within the dataset).
![]() | ||
Fig. 5 a) Ecotoxic class prediction for the different MCNMs by the complete dataset model, b) ecotoxic measurements percentage for the different MCNMs in the complete dataset. |
The results of Fig. 5a show that the ecotoxic MCNMs are characterized by a lower Dbio value, as well as a less negative HE value. Hence, using the physical interpretation of the two descriptors, MCNMs that have a conduction band energy close to the redox potential of biological pairs (lower Dbio value), together with a less exothermic hydration of their respective metal cations (less negative hydration enthalpy, HE), are more probable to be ecotoxic. Similar conclusions have been drawn from previous experimental and modelling works for metal oxide NPs.61–63 In the same direction with those works, our previous works for the case of pure metal oxides toxicity,64 as well as metal oxide MCNMs cytotoxicity,71 have also identified these descriptors as predictive. The measurements presented in Fig. 5b show that the percentage of ecotoxic experimental measurements for the different MCNMs follows the same trend over the descriptor space. Besides the misclassification of some MCNMs, the experimental classification follows the model predictions, as the vast majority of ecotoxic MCNMs are situated in the space bounded by low Dbio and less negative HE values. Similarly, MCNMs that are characterized by a high Dbio or a highly negative HE value exhibit less ecotoxic measurements. The good agreement between the model classification and the measurements in the dataset does not only show the accuracy and predictive ability of the classification model, but also enhances the notion that similar MCNM characteristics may induce ecotoxic action towards the different organisms taken into account in the dataset. These key characteristics can be adequately quantified by the two descriptors taken into account for the classification SAR model development.
It should be noted however, that the exact boundary values of descriptors that define whether a MCNM is ecotoxic is not possible using the present approach, as it does not aim towards a quantitative ecotoxicity prediction, but rather towards a qualitative assessment. Hence, the model results are sensitive towards the classification scheme used. Furthermore, the models developed for heterogeneous datasets (Table 7 and Fig. 4) in terms of the tested organisms do not take into account the varying cell morphology of the different target organisms. Although the cell morphology has an influence on the ecotoxic action of the different MCNMs, all the different models were developed using similar descriptors, with a good accuracy towards the ecotoxicity measurements, which may hint that the MCNMs have similar ecotoxic modes of action. However, these results do not mean that the toxicity mechanisms are unaffected by the morphology of the target cells, but rather that the dominating toxicity mechanisms are of similar nature. Specifically, the cell morphology could severely affect the ecotoxic action in a more quantitative way, with higher or lower uptake and MCNM-cell interaction rates. Due to the qualitative nature of the model presented in this work, the extraction of quantitative information is not possible by the present approach. However, similar mechanisms for the toxic action of metal oxides, towards different morphologies of target cells or organisms have been concluded to occur in previous nanotoxicology studies,61,63 where the conduction band of nanoparticles has been found to induce electron transfer and toxicity towards BEAS-2B and RAW264.7 cells,63 as well as E. coli.61 On the basis of such observations, classification SAR approaches for nanomaterial toxicity using heterogeneous morphology datasets have been developed,49,54–57,64,71 assessing the toxicity of nanomaterials over a wide range of cell morphologies. In a similar way, the general trends seen in Fig. 5 are reproduced over a large and populous dataset of heterogeneous MCNM measurements, which encourages a more general understanding of MCNM ecotoxicity mechanisms.
As previously mentioned, Dbio expresses the energy difference between the conduction band and the mean redox potential of pairs in biological media.60,61,63,94,95 Hence, a low Dbio value means that electron transfer between the MCNM and the cell is more probable. This electron transfer, previously identified as a toxicity mechanism for the case of metal oxides, can increase the oxidative stress on the cells, by unbalancing its reducing capacity.59,60 Furthermore, overlapping metal oxide conduction bands and biological redox potentials have been correlated with the production of reactive oxygen species (ROS).96 Such ROS can include hydrogen peroxides and OH radicals.63,97–103 Similar quantities have been used for the development of QSAR,53 classification SAR61–64,71 models, as well as for nanomaterial toxicity grouping,31 towards cells and organisms of different kind, showing a more holistic metal oxide toxicity pathway.
The hydration enthalpy (HE) denotes the energy released during the hydration of a respective metal ion, that is released from the MCNM. In this way, the descriptor expresses the affinity of these metal ions to water molecules.61 A more negative HE value means that water molecules will be more strongly attracted to the metal ions, leading an increase of the ion's hydration shell. In turn, a larger hydration cell hinders the ion's permeability through the cell membrane.104–108 As in previous works, HE is computed using Latimer's equation:61
![]() | (5) |
The release of metal ions has been concluded to be a dominant initial pathway of metal and metal oxide nanoparticle toxicity.97 Several underlying mechanisms regarding the toxic interaction between these ions and different cells have been identified, such as enzyme inactivation,111,112 cell membrane damage63,113 and the increase of oxidative stress.111,112,114–116 DNA damage has also been attributed to the interaction with certain metal ions.111,117,118
The predictive capability of the two descriptors towards the various and heterogeneous datasets could hint that the abovementioned mechanisms could be the dominant pathways towards the metal and metal oxide MCNM ecotoxicity. However, it cannot be concluded whether one of the two pathways prevails over the other. As seen from the results of sections 3.2 and 3.3, single descriptor models were less accurate than models developed using the combination of the two descriptors, except from the D. magna and crustaceans models. For the rest of the models, the ecotoxic MCNMs exhibit a lower Dbio value and a less negative HE. This could mean that both electron exchange between the MCNM and the organism cells, as well as metal ion release that permeate the cell must take place for the MCNM to induce ecotoxic action, as defined with the present classification scheme.
Regarding the nature of interaction between the different components of the MCNMs in the different datasets, the additive mixture approach introduced by Mikolajczyk et al. for the case of MCNMs,65 was able to calculate descriptors that were predictive towards the whole dataset range. The additive approach assumes that the different MCNM components have similar modes of action. While other approaches have been used for the descriptor calculation for mixture of chemicals,119,120 QSAR model development for nanomaterial mixtures has mainly employed the additive mixture approach.82,83,121 Works that have developed QSARs for smaller MCNMs datasets have also used similar approaches for the descriptors, showing high predictivity towards toxicity endpoints.65–68,70 In our previous work, we used the additive mixture approach, together with a novel approach to calculate descriptors for surface loaded MCNMs, to calculate predictive descriptors for the cytotoxicity classification over a large MCNM dataset, leading to the development of high accuracy models.71
With the results of the present work, it is seen that the exact same approach can be used to calculate descriptors that are also able to classify MCNMs based on their ecotoxicity, towards a wide range of organisms, such as bacteria, eukaryotes, fish, plants, and crustaceans. As the additive mixture approach assumes, this could hint towards similar ecotoxic pathways of the different MCNM components. The components in the present work are either metals or metal oxides, which have previously been reported to have similar modes of action.51,64 However, the qualitative nature of the classification approach presented in this work does not allow the quantitative interpretation of the individual component impact in certainty. Nevertheless, quantitative methods have deemed that the compositional ratio of MCNM components dictates their respective impact, showing an additive effect to toxicity.66,67 The large and heterogeneous nature of the MCNM ecotoxicity dataset used in the present work, along with the high accuracy of the developed models, enhances the notion that this additive effect also occurs for MCNM ecotoxicity. In any case, this should be tested by developing quantitative models for more homogeneous datasets, under similar experimental conditions, which will be a subject of future work.
To summarize, the results presented in this study can show the potential of data-based models, such as classification SAR approaches, to be used not only as predictive models, but also as tools for a more general understanding of toxicity mechanisms and modes of action. With their use as an inductive tool to extract scientific information from large and heterogeneous datasets of toxicity measurements, such models can potentially be used to assist the synthesis of safe-by-design nanomaterials. A potential example of such an approach is presented in the work of Feng et al.,58 where based on the results of previous research regarding the dependence of metal oxide toxicity on the conduction band energy, the authors managed to synthesize MCNMs with control over their biological activity, by adequately tuning the conduction band energy of the produced nanomaterials, which is consistent with the findings of the present work.
In particular, the electron transfer between the MCNM and the biological pairs, as well as the release and transport of metal ions from the MCNMs, were deemed to be the dictating ecotoxic pathways for the different MCNMs. These findings are consistent with previous works for the cytotoxicity of metal oxide nanoparticles and MCNMs. However, the identification of these descriptors for the case of ecotoxicity is novel for the case of MCNM. The two descriptors that expressed the abovementioned mechanisms were computed based on the additive mixture approach, shedding light on the nature of the interaction between the MCNM components. These findings are consistent with previous works regarding the QSAR modelling of MCNM toxicity, where the additive mixture approach was found to produce predictive descriptors.
The novelty of the present study lies in the development of a SAR approach using a large, heterogeneous dataset for the prediction of MCNM ecotoxicity. Such an approach allows a more holistic understanding of the ecotoxic action, upon ecosystem exposure of different MCNMs and their constituting components towards various organisms. The mechanistic information extracted by the present approach regarding MCNM ecotoxicity and the interaction of the multiple components of MCNMs, can thus assist towards a more knowledge-driven MCNM ecotoxicity assessment, as well as the synthesis of safe-by-design MCNMs for various applications.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4en01183j |
This journal is © The Royal Society of Chemistry 2025 |