Issue 5, 2024

Extrapolation validation (EV): a universal validation method for mitigating machine learning extrapolation risk

Abstract

Machine learning (ML) can provide decision-making advice for major challenges in science and engineering, and its rapid development has led to advances in fields like chemistry & medicine, earth & life sciences, and communications & transportation. Grasping the trustworthiness of the decision-making advice given by ML models remains challenging, especially when applying them to samples outside the domain-of-application. Here, an untrustworthy application situation (i.e., complete extrapolation-failure) that would occur in models developed by ML methods involving tree algorithms is confirmed, and the root cause of its difficulty in discovering novel materials & chemicals is revealed. Furthermore, a universal extrapolation risk evaluation scheme, termed the extrapolation validation (EV) method, is proposed, which is not restricted to specific ML methods and model architecture in its applicability. The EV method quantitatively evaluates the extrapolation ability of 11 popularly applied ML methods and digitalizes the extrapolation risk arising from variations of the independent variables in each method. Meanwhile, the EV method provides insights and solutions for evaluating the reliability of out-of-distribution sample prediction and selecting trustworthy ML methods.

Graphical abstract: Extrapolation validation (EV): a universal validation method for mitigating machine learning extrapolation risk

Supplementary files

Article information

Article type
Paper
Submitted
29 Dec 2023
Accepted
17 Apr 2024
First published
19 Apr 2024
This article is Open Access
Creative Commons BY-NC license

Digital Discovery, 2024,3, 1058-1067

Extrapolation validation (EV): a universal validation method for mitigating machine learning extrapolation risk

M. Yu, Y. Zhou, Q. Wang and F. Yan, Digital Discovery, 2024, 3, 1058 DOI: 10.1039/D3DD00256J

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements