Abstract
Atomistic simulation with machine learning-based potentials (MLPs) is an emerging tool for understanding materials' properties and behaviors and predicting novel materials. Neural network potentials (NNPs) are outstanding in this field as they have shown a comparable accuracy to ab initio electronic structure calculations for reproducing potential energy surfaces while being several orders of magnitude faster. However, such NNPs can perform poorly outside their training domain and often fail catastrophically in predicting rare events in molecular dynamics (MD) simulations. The rare events in atomistic modeling typically include chemical bond breaking/formation, phase transitions, and materials failure, which are critical for new materials design, synthesis, and manufacturing processes. In this study, we develop an automated active learning (AL) capability by combining NNPs and one of the enhanced sampling methods, steered molecular dynamics, for capturing bond-breaking events of alkane chains to derive NNPs for targeted applications. We develop a decision engine based on configurational similarity and uncertainty quantification (UQ), using data augmentation for effective AL loops to distinguish the informative data from enhanced sampled configurations, showing that the generated data set achieves an activation energy error of less than 1 kcal mol−1. Furthermore, we have devised a strategy to alleviate training uncertainty within AL iterations through a carefully constructed data selection process that leverages an ensemble approach. Our study provides essential insight into the relationship between data and the performance of NNP for the rare event of bond breaking under mechanical loading. It highlights strategies for developing NNPs of broader materials and applications through active learning.