Harnessing GPT-3.5 for text parsing in solid-state synthesis – case study of ternary chalcogenides

Maung Thway; Andre K. Y. Low; Samyak Khetan; Haiwen Dai; Jose Recatala-Gomez; Andy Paul Chen; Kedar Hippalgaonkar

doi:10.1039/D3DD00202K

Harnessing GPT-3.5 for text parsing in solid-state synthesis – case study of ternary chalcogenides

Maung Thway,

†^a Andre K. Y. Low,

†^ab Samyak Khetan,^c Haiwen Dai,^a Jose Recatala-Gomez,^a Andy Paul Chen^a and Kedar Hippalgaonkar

*^ab

Author affiliations

* Corresponding authors

^a School of Materials Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
E-mail: kedar@ntu.edu.sg

^b Institute of Materials Research and Engineering (IMRE), Agency for Science, Technology and Research (A*STAR), Singapore 138634, Singapore

^c Department of Metallurgical Engineering and Materials Science, Indian Institute of Technology Bombay, Maharashtra 400076, India

Abstract

Optimally doped single-phase compounds are necessary to advance state-of-the-art thermoelectric devices which convert heat into electricity and vice versa, requiring solid-state synthesis of bulk materials. For data-driven approaches to learn these recipes, it requires careful data curation from large bodies of text which may not be available for some materials, as well as a refined language processing algorithm which presents a high barrier of entry. We propose applying Large Language Models (LLMs) to parse solid-state synthesis recipes, encapsulating all essential synthesis information intuitively in terms of primary and secondary heating peaks. Using a domain-expert curated dataset for a specific material (Gold Standard), we engineered a prompt set for GPT-3.5 to replicate the same dataset (Silver Standard), doing so successfully with 73% overall accuracy. We then proceed to extract and infer synthesis conditions for other ternary chalcogenides with the same prompt set. From a database of 168 research papers, we successfully parsed 61 papers which we then used to develop a classifier to predict phase purity. Our methodology demonstrates the generalizability of Large Language Models (LLMs) for text parsing, specifically for materials with sparse literature and unbalanced reporting (since usually only positive results are shown). Our work provides a roadmap for future endeavors seeking to amalgamate LLMs with materials science research, heralding a potentially transformative paradigm in the synthesis and characterization of novel materials.

Article information

https://doi.org/10.1039/D3DD00202K

Article type

Paper

Submitted

08 Oct 2023

Accepted

21 Dec 2023

First published

02 Jan 2024

This article is Open Access

Download Citation

Digital Discovery, 2024,3, 328-336

Permissions

Request permissions

Harnessing GPT-3.5 for text parsing in solid-state synthesis – case study of ternary chalcogenides

M. Thway, A. K. Y. Low, S. Khetan, H. Dai, J. Recatala-Gomez, A. P. Chen and K. Hippalgaonkar, Digital Discovery, 2024, 3, 328 DOI: 10.1039/D3DD00202K

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Digital Discovery

Harnessing GPT-3.5 for text parsing in solid-state synthesis – case study of ternary chalcogenides

Abstract

Transparent peer review

Article information

Download Citation

Permissions

Harnessing GPT-3.5 for text parsing in solid-state synthesis – case study of ternary chalcogenides

Social activity

Search articles by author

Spotlight

Advertisements