AI-driven hypergraph network of organic chemistry: network statistics and applications in reaction classification
Abstract
Rapid discovery of new reactions and molecules in recent years has been facilitated by the advances in high throughput screening, accessibility to a highly complex chemical design space, and the development of accurate molecular modeling frameworks. A holistic study of the growing chemistry literature is, therefore, required that focuses on understanding the recent trends in organic chemistry and extrapolating them to infer possible future trajectories. To this end, several network theory-based studies have been reported that use a directed graph representation of chemical reactions. Here, we perform a study based on representing chemical reactions as hypergraphs where the nodes represent the participating molecules and hyperedges represent reactions between nodes. We use a standard reaction dataset to construct a hypergraph network of organic chemistry and report its statistics such as degree distribution, average path length, assortativity or degree correlations, PageRank centrality, and graph-based clusters (or communities). We also compute each statistic for an equivalent directed graph representation of reactions to draw parallels and highlight differences between the two. To demonstrate the AI applicability of hypergraph reaction representation, we generate dense hypergraph embeddings and use them in the reaction classification problem. We conclude that the hypergraph representation is flexible, preserves reaction context, and uncovers hidden insights that are otherwise not apparent in a traditional directed graph representation of chemical reactions.