Machine learning the peak emission wavelength of Mn4+-activated inorganic phosphors†
Abstract
The optical properties of Mn4+-activated phosphors are heavily affected by the host and the related data are difficult to obtain, which makes quantitative design of novel phosphors with required wavelength particularly challenging. In the era of big data, computation is emerging as a vital complement to materials development. Compared with traditional calculation routes, machine learning (ML) models provide better solutions for the problems wherein understanding the underlying physics is complicated with sparse data. In this study, we developed six ML models based on a sparse and small data set to rapidly predict the peak emission wavelength of novel phosphors and use Recursive Feature Elimination (RFE) to select the best features. Among them, the K-nearest neighbors regression (KNN.r) algorithm using non-parametric regression technology can predict the value of the target variable based on the similarity between the target value and its spatial neighbors, which can tackle sparse data sets with no obvious linear relationship well and has the best prediction accuracy. These models were then used to predict 278 potential Mn4+-activated phosphor hosts extracted from the inorganic crystal structure database (ICSD), and 19 samples were selected to validate these methods. In order to overcome the lack of physical and chemical insight of KNN, a visual data analysis was carried out based on hierarchical clustering analysis (HCA), and the rule between the peak emission wavelength and the structure of host was proposed. Guided by the results of KNN and HCA, six samples were selected for the synthesis and characterization to verify our conclusions. The measured values of all these phosphors exhibit good agreement with the predicted results, suggesting the success of our methodology. Thus, the model established and the rule proposed in this paper provide a new pathway for the rapid discovery of novel phosphors.