Reconstructing the materials tetrahedron: challenges in materials information extraction

Kausik Hira; Mohd Zaki; Dhruvil Sheth; Mausam; N. M. Anoop Krishnan

doi:10.1039/D4DD00032C

Reconstructing the materials tetrahedron: challenges in materials information extraction

Kausik Hira,

^a Mohd Zaki,

^b Dhruvil Sheth,

^a Mausam

*^a and N. M. Anoop Krishnan

*^ab

Author affiliations

* Corresponding authors

^a Yardi School of Artificial Intelligence, Indian Institute of Technology, Delhi, India
E-mail: kausikhira@gmail.com, dhruvilsheth01@gmail.com, mausam@iitd.ac.in

^b Department of Civil Engineering Indian Institute of Technology Delhi, India
E-mail: cez198233@iitd.ac.in, krishnan@iitd.ac.in

Abstract

The discovery of new materials has a documented history of propelling human progress for centuries and more. The behaviour of a material is a function of its composition, structure, and properties, which further depend on its processing and testing conditions. Recent developments in deep learning and natural language processing have enabled information extraction at scale from published literature such as peer-reviewed publications, books, and patents. However, this information is spread in multiple formats, such as tables, text, and images, and with little or no uniformity in reporting style giving rise to several machine learning challenges. Here, we discuss, quantify, and document these challenges in automated information extraction (IE) from materials science literature towards the creation of a large materials science knowledge base. Specifically, we focus on IE from text and tables and outline several challenges with examples. We hope the present work inspires researchers to address the challenges in a coherent fashion, providing a fillip to IE towards developing a materials knowledge base.

This article is part of the themed collection: AI for Accelerated Materials Design, NeurIPS 2023

Article information

https://doi.org/10.1039/D4DD00032C

Article type

Paper

Submitted

15 Jan 2024

Accepted

16 Mar 2024

First published

18 Mar 2024

This article is Open Access

Download Citation

Digital Discovery, 2024,3, 1021-1037

Permissions

Request permissions

Reconstructing the materials tetrahedron: challenges in materials information extraction

K. Hira, M. Zaki, D. Sheth, Mausam and N. M. A. Krishnan, Digital Discovery, 2024, 3, 1021 DOI: 10.1039/D4DD00032C

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Digital Discovery

Reconstructing the materials tetrahedron: challenges in materials information extraction

Abstract

Article information

Download Citation

Permissions

Reconstructing the materials tetrahedron: challenges in materials information extraction

Social activity

Search articles by author

Spotlight

Advertisements