Automated extraction of synthesis parameters of pulsed laser-deposited materials from scientific literature†
Abstract
The materials science literature contains a large amount of reliable and high-quality data and automatically extracting useful information, including processing parameters and materials property data from this scientific literature continues to be a challenge. The development of new materials is typically based on experimental trial and error approach to identify the optimized processing parameters. In this work, we present an approach at the intersection of Natural Language Processing (NLP) and Materials Science, focusing on the extraction and analysis of materials and processing parameters associated with Pulsed Laser Deposition (PLD). Using the MatSciBERT (Bidirectional Encoder Representations from Transformers)-based architecture, we achieved precise identification and categorization of different PLD synthesis parameters, including, deposition temperature and pressure, laser energy, laser wavelength, thin film material and substrate, using the Named Entity Recognition (NER) model. This involved meticulous data acquisition from over 6000 research articles, followed by pre-processing, feature extraction, and model training. The trained NER model showcased impressive micro and macro F1 scores of 80.2% and 81.4%, respectively. This highlights the potential of Literature-based Discovery (LBD) approaches in expediting material discovery processes. The insights gained from this study are expected to drive advancements in materials research, streamlining information extraction processes by building a searchable database, and accelerating discoveries in the domain of Pulsed Laser Deposition.