Impact of atomistic or crystallographic descriptors for classification of gold nanoparticles†
Abstract
Machine learning models are known to be sensitive to the features used to train them, but there is currently no way to predict the impact of using different features prior to feature extraction. This is particularly important to fields such as nanotechnology that are highly multi-disciplinary, and samples can be characterised many different ways depending on the preferences of individual researchers. Does it matter if nanomaterials are described using the interatomic coordinations or more complex order parameters? In this study we compare results of supervised and unsupervised learning on a single set of gold nanoparticles that has been characterised by two different descriptors, each with a unique feature space. We find that there are some consistencies, and model selection is descriptor-agnostic, but the level of detail and the type of information that can be extracted from the results is sensitive to the way the particles are described. Unsupervised clustering revealed that an atomistic descriptor provides a finer-grained interpretation and clusters that are sub-clusters of a more sophisticated crystallographic descriptor, which is consistent with both how the features were calculated, and how they are interpreted in the domain. A supervised classifier revealed that the types of features responsible for the separation are related to the bulk structure, regardless of the descriptor, but capture different types of information. For both the atomistic and crystallographic descriptor the gradient boosting decision tree classifier gave superior results of F1-scores of 0.96 and 0.98, respectively, with excellent precision and recall, even though the clustering presented a challenging multi-classification problem.