Assessment of fine-tuned large language models for real-world chemistry and material science applications†
* Corresponding authors
a
Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, École Polytechnique Fédérale de Lausanne (EPFL), Rue de l’Industrie 17, CH-1951 Sion, Switzerland
E-mail:
Berend.Smit@epfl.ch
b Instituto de Ciencia y TecnologÍa del Carbono (INCAR), CSIC, Francisco Pintado Fe 26, 33011 Oviedo, Spain
c Laboratory of Organic and Tecnolog'ıa Chemistry (IOMC), Friedrich Schiller University Jena, Humboldtstrasse 10, 07743 Jena, Germany
d Helmholtz Institute for Polymers in Energy Applications Jena (HIPOLE Jena), Lessingstrasse 12-14, 07743 Jena, Germany
e Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK
f Department of Energy Conversion and Storage, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
g Department of Chemistry, University of Oxford, Oxford OX1 3TA, UK
h Department of Chemical Engineering & Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK
i Department of Computer Science, University of Chicago, Chicago, IL 60637, USA
j Data Science and Learning Division, Argonne National Laboratory, Lemont, IL 60439, USA
k Department of Applied Science and Technology (DISAT), Politecnico di Torino, 10129 Turino, Italy
l Laboratory of Catalysis and Organic Synthesis (LCSO), Institute of Chemical Sciences and Engineering (ISIC), École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
m Laboratory for Computational Molecular Design (LCMD), Institute of Chemical Sciences and Engineering (ISIC), École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
n Department of Chemical and Biological Engineering, Koç University, Rumelifeneri Yolu, Sariyer, 34450 Istanbul, Turkey
o The Research Centre for Carbon Solutions (RCCS), School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh, UK
p BIGCHEM GmbH, Valerystraße 49, 85716 Unterschleißheim, Germany
q Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
r Institute of Metallic Biomaterials, Helmholtz Zentrum Hereon, Geesthacht, Germany
s Polymer Reaction Design Group, School of Chemistry, Monash University, Clayton, VIC 3800, Australia
t Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge CB3 0HE, UK
u Department of Chemical Engineering, University of Waterloo, Waterloo, Canada
v Institute of Chemical Sciences, School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh, UK
w Chemical Engineering & Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
x Dipartimento di Chimica e Chimica Industriale, Unità di Ricerca INSTM, Università di Pisa, Via Giuseppe Moruzzi 13, 56124 Pisa, Italy
y Chemical Engineering Department, University of Mohaghegh Ardabili, P. O. Box 179, Ardabil, Iran
z Department of Chemical Engineering, College of Engineering, University of Tehran, Tehran, Iran
aa Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
ab Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, USA
ac Institute of Applied Synthetic Chemistry, TU Wien, Getreidemarkt 9, Vienna, Austria
ad Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
ae Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, USA
af Laboratory of Materials for Renewable Energy (LMER), Institut des Sciences et Ingénierie Chimiques, École Polytechnique Fédérale de Lausanne (EPFL), Rue de l'Industrie 17, CH-1951 Sion, Switzerland
Abstract
The current generation of large language models (LLMs) has limited chemical knowledge. Recently, it has been shown that these LLMs can learn and predict chemical properties through fine-tuning. Using natural language to train machine learning models opens doors to a wider chemical audience, as field-specific featurization techniques can be omitted. In this work, we explore the potential and limitations of this approach. We studied the performance of fine-tuning three open-source LLMs (GPT-J-6B, Llama-3.1-8B, and Mistral-7B) for a range of different chemical questions. We benchmark their performances against “traditional” machine learning models and find that, in most cases, the fine-tuning approach is superior for a simple classification problem. Depending on the size of the dataset and the type of questions, we also successfully address more sophisticated problems. The most important conclusions of this work are that, for all datasets considered, their conversion into an LLM fine-tuning training set is straightforward and that fine-tuning with even relatively small datasets leads to predictive models. These results suggest that the systematic use of LLMs to guide experiments and simulations will be a powerful technique in any research study, significantly reducing unnecessary experiments or computations.