Issue 11, 2024

MolBar: a molecular identifier for inorganic and organic molecules with full support of stereoisomerism

Abstract

Before a new molecular structure is registered to a chemical structure database, a duplicate check is essential to ensure the integrity of the database. The Simplified Molecular Input Line Entry Specification (SMILES) and the IUPAC International Chemical Identifier (InChI) stand out as widely used molecular identifiers for these checks. Notable limitations arise when dealing with molecules from inorganic chemistry or structures characterized by non-central stereochemistry. When the stereoinformation needs to be assigned to a group of atoms, widely used identifiers cannot describe axial and planar chirality due to the atom-centered description of a molecule. To address this limitation, we introduce a novel chemical identifier called the Molecular Barcode (MolBar). Motivated by the field of theoretical chemistry, a fragment-based approach is used in addition to the conventional atomistic description. In this approach, the 3D structure of fragments is normalized using a specialized force field and characterized by physically inspired matrices derived solely from atomic positions. The resulting permutation-invariant representation is constructed from the eigenvalue spectra, providing comprehensive information on both bonding and stereochemistry. The robustness of MolBar is demonstrated through duplication and permutation invariance tests on the Molecule3D dataset of 3.9 million molecules. A Python implementation is available as open source and can be installed via pip install molbar.

Graphical abstract: MolBar: a molecular identifier for inorganic and organic molecules with full support of stereoisomerism

Supplementary files

Article information

Article type
Paper
Submitted
30 Jun 2024
Accepted
24 Sep 2024
First published
10 Oct 2024
This article is Open Access
Creative Commons BY license

Digital Discovery, 2024,3, 2298-2319

MolBar: a molecular identifier for inorganic and organic molecules with full support of stereoisomerism

N. van Staalduinen and C. Bannwarth, Digital Discovery, 2024, 3, 2298 DOI: 10.1039/D4DD00208C

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements