Interpretable machine learning for investigating complex nanomaterial–plant–soil interactions†
Abstract
Soil serves as the main recipient of engineered nanomaterials (ENMs). Understanding the complex nanomaterial–plant–soil interactions is urgently needed to keep pace with the safety concerns of ENMs. Machine learning, suitable for learning complex patterns, has been used to predict the root uptake and translocation of ENMs. However, these models usually exist as black boxes, which are difficult to extract information and build trust. In this study, we first integrated the establishment, performance analysis, post hoc interpretation, and interpretation validation of light gradient boosting machine (LightGBM) model to investigate the root uptake of metal-oxide nanoparticles (MONPs) in the soil environment. The influence of the dataset split and data preprocessing on model performance was discussed as only limited data were available. Model predictions were explained by different post hoc interpretation methods to identify key factors and show how they affect the root uptake of MONPs and interact with the other factors. A three-step validation of the interpretation results was presented for assessing the reliability of the models and explanations. Further, a rule-based ensemble method with good interpretability, the RuleFit algorithm, was established to provide a model-based interpretation by generating rules and compared with post hoc interpretation methods. These post hoc and model-based interpretation methods can be integrated with experiments to promote the understanding of the risks and benefits of ENMs exposed to plants and help achieve a safety-by-design strategy of ENMs in numerous applications.