Issue 5, 2022

The resolution-vs.-accuracy dilemma in machine learning modeling of electronic excitation spectra

Abstract

In this study, we explore the potential of machine learning for modeling molecular electronic spectral intensities as a continuous function in a given wavelength range. Since presently available chemical space datasets provide excitation energies and corresponding oscillator strengths for only a few valence transitions, here, we present a new dataset—bigQM7ω—with 12 880 molecules containing up to 7 CONF atoms and report ground state and excited state properties. A publicly accessible web-based data-mining platform is presented to facilitate on-the-fly screening of several molecular properties including harmonic vibrational and electronic spectra. We present all singlet electronic transitions from the ground state calculated using the time-dependent density functional theory framework with the ωB97XD exchange-correlation functional and a diffuse-function augmented basis set. The resulting spectra predominantly span the X-ray to deep-UV region (10–120 nm). To compare the target spectra with predictions based on small basis sets, we bin spectral intensities and show good agreement is obtained only at the expense of the resolution. Compared to this, machine learning models with the latest structural representations trained directly using <10% of the target data recover the spectra of the remaining molecules with better accuracies at a desirable <1 nm wavelength resolution.

Graphical abstract: The resolution-vs.-accuracy dilemma in machine learning modeling of electronic excitation spectra

Article information

Article type
Paper
Submitted
30 Oct 2021
Accepted
18 Aug 2022
First published
18 Aug 2022
This article is Open Access
Creative Commons BY-NC license

Digital Discovery, 2022,1, 689-702

The resolution-vs.-accuracy dilemma in machine learning modeling of electronic excitation spectra

P. Kayastha, S. Chakraborty and R. Ramakrishnan, Digital Discovery, 2022, 1, 689 DOI: 10.1039/D1DD00031D

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements