Automatic structural elucidation of vacancies in materials by active learning†
Abstract
Finding the optimum structures of non-stoichiometric or berthollide materials, such as (1D, 2D, 3D) materials or nanoparticles (0D), is challenging due to the huge chemical/structural search space. Computational methods coupled with global optimization algorithms have been used successfully for this purpose. In this work, we have developed an artificial intelligence method based on active learning (AL) or Bayesian optimization for the automatic structural elucidation of vacancies in solids and nanoparticles. AL uses machine learning regression algorithms and their uncertainties to take decisions (from a policy) on the next unexplored structures to be computed, increasing the probability of finding the global minimum with few calculations. The methodology allows an accurate and automated structural elucidation for vacancies, which are common in non-stoichiometric (berthollide) materials, helping to understand chemical processes in catalysis and environmental sciences, for instance. The AL vacancies method was implemented in the quantum machine learning software/agent for material design and discovery (QMLMaterial). Also, two additional acquisition functions for decision making were implemented, besides the expected improvement (EI): the lower confidence bound (LCB) and the probability of improvement (PI). The new software was applied for the automatic structural search for graphite (C36) with 3 (C36-3) and 4 (C36-4) carbon vacancies and C60 (C60-4) fullerene with 4 carbon vacancies. DFTB calculations were used to build the complex search surfaces with reasonably low computational cost. Furthermore, with the AL method for vacancies, it was possible to elucidate the optimum oxygen vacancy distribution in CaTiO3 perovskite by DFT, where a semiconductor behavior results from oxygen vacancies. Throughout the work, a Gaussian process with its uncertainty was employed in the AL framework using different acquisition functions (EI, LCB and PI), and taking into account different descriptors: Ewald sum matrix and sine matrix. Finally, the performance of the proposed AL method was compared to random search and genetic algorithm.