Zhenzhi Tana,
Qi Yang
*ab and
Sanzhong Luo
*a
aCenter of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing, 100084, China. E-mail: luosz@tsinghua.edu.cn
bHaihe Laboratory of Sustainable Chemical Transformations, Tianjin, 300192, China. E-mail: yangqi@hlsct.cn
First published on 11th February 2025
Artificial intelligence (AI) is transforming molecular catalysis by addressing long-standing challenges in retrosynthetic design, catalyst design, reaction development, and autonomous experimentation. AI-powered tools enable chemists to explore high-dimensional chemical spaces, optimize reaction conditions, and accelerate novel reaction discovery with unparalleled efficiency and precision. These innovations are reshaping traditional workflows, transitioning from expert-driven, labor-intensive methodologies to intelligence-guided, data-driven processes. Despite these transformative achievements, significant challenges persist. Critical issues include the demand for high-quality, reliable datasets, the seamless integration of domain-specific chemical knowledge into AI models, and the discrepancy between model predictions and experimental validation. Addressing these barriers is essential to fully unlock AI's potential in molecular catalysis. This review explores recent advancements, enduring challenges, and emerging opportunities in AI-driven molecular catalysis. By focusing on real-world applications and highlighting representative studies, it aims to provide a clear and forward-looking perspective on how AI is redefining the field and paving the way for the next generation of chemical discovery.
10th anniversary statementI am deeply honored to have published 10 research papers in *organic chemistry frontiers* (OCF) since its inception in 2015. This journey has been immensely rewarding, as the journal's rapid publication process and global reach have significantly enhanced the visibility and impact of my work. Over the past decade, OCF has consistently served as a premier platform for cutting-edge research and innovation in organic chemistry. Its dedication to high-quality, interdisciplinary science has driven groundbreaking discoveries in synthesis, catalysis, and materials chemistry, profoundly shaping the future of the field.I am thrilled to contribute my 11th paper, a perspective titled “AI molecular catalysis: where are we now?”, to this special issue celebrating OCF's 10th anniversary. This milestone reflects the journal's pivotal role in advancing organic chemistry and nurturing a dynamic scientific community. Here's to celebrating a decade of excellence and looking forward to many more years of innovation and discovery! |
As a cornerstone of modern chemistry, molecular catalysis exemplifies this transformation. Historically, progress in catalysis relied on fundamental principles, experimental ingenuity, and serendipity.4,5 Classical models, such as linear free energy relationships (LFERs),6 provided elegant but simplified structure–activity relationships (SAR) based on limited datasets. These models, including the Brønsted catalysis law,7 Hammett equation,8 Taft equation9 and Mayr equation,10,11 guided decades of catalytic research and synthetic design. However, as chemical systems have grown more complex,12,13 these traditional tools have struggled to address the intricate interplay of reaction conditions, multi-scale dynamics, and diverse molecular interaction.
The emergence of AI has paralleled the rise of data-driven approaches in molecular catalysis, where the synergy between machine learning (ML) and chemical data presents unparalleled opportunities for discovery.14 Unlike traditional approaches that depend on experiment-derived heuristics or predefined theoretical frameworks,15 AI excels at identifying patterns and predicting outcomes directly from high-dimensional, complex datasets. This capability enables chemists to explore vast chemical spaces with increased efficiency and precision.16
As illustrated in Fig. 1, AI integration throughout the molecular catalysis workflow fosters innovation at every stage. Retrosynthetic analysis models can quickly propose optimal synthetic routes, helping chemists efficiently prepare catalysts and target molecules. AI-guided catalyst design, informed by chemical knowledge and historical data, facilitates the development of catalysts with enhanced performance. In reaction studies, AI accelerates the optimization of conditions and delineates the scope and limitations of reactions. Furthermore, advanced autonomous experimentation allows chemists to perform experiments with significantly greater efficiency and reproducibility. By seamlessly integrating datasets, models, robots, and experiments, AI is transforming traditional expert-driven, labor-intensive workflows into intelligence-guided, data-driven processes.
This review examines the current landscape of AI in molecular catalysis by addressing three key questions: What are the major advancements? What challenges remain? What opportunities lie ahead? We highlight representative studies in retrosynthetic design, catalyst design, reaction development, and autonomous experimentation, providing a focused perspective on how AI is transforming molecular catalysis. Rather than offering a comprehensive overview of AI methodologies, this review emphasizes their practical integration into chemical workflows and their implications for researchers. For readers seeking in-depth discussions on AI applications in chemistry, several notable reviews are recommended.17–20
The retrosynthetic analysis framework proposed by E. J. Corey provided chemists with a systematic and rational approach to deconstructing complex molecules into simpler precursors, significantly reducing the difficulty of synthesis.23 Around the same time, the field of computer-aided synthesis planning (CASP) began to emerge, promoting the use of computational tools for retrosynthetic analysis. With the advancement of AI technology, modern retrosynthetic analysis tools have also undergone substantial development, further enhancing the efficiency and feasibility of designing synthetic routes (Fig. 2).24
![]() | ||
Fig. 2 Timeline of key milestones for CASP. Examples of reaction templates are illustrated within the diagram. |
Before the advent of sophisticated retrosynthesis tools, chemists primarily relied on database search engines such as Reaxys and SciFinder to retrieve reaction information. These platforms remain widely used for retrosynthetic planning, offering comprehensive access to published reactions and experimental data. However, their utility is limited to recorded reactions, often failing to guide unreported or novel transformations.
AI-guided retrosynthesis planning can be categorized into single-step and multi-step retrosynthesis. Single-step retrosynthesis involves a single disconnection or transformation of a molecule, and repeating this process iteratively until commercial substrates leads to multi-step retrosynthesis. Although significant progress has been made in developing advanced algorithms for single-step disconnections,25–28 practical synthetic applications demand more than just effective disconnection strategies—directional guidance in choosing disconnections is equally or even more crucial.29 Therefore, transitioning from single-step to multi-step retrosynthesis requires models to adopt a broader, long-term perspective, enabling them to approach reaction pathway design with a more holistic, globally optimized strategy.
The introduction of reaction templates marked a significant step forward in retrosynthetic analysis by formalizing chemical reasoning into a structured framework. A reaction template encodes the core structural transformation of a reaction, capturing critical features such as bond changes, functional group compatibility, and mechanistic insights. As shown in Fig. 2, template-based methods rely on curated libraries of reaction templates, enabling retrosynthetic analysis to extend beyond the confines of recorded reactions in databases. By systematically applying these templates, chemists can design synthetic routes for complex molecules, including many natural products and pharmaceuticals.
The first major template-based retrosynthesis software, OCSS,30 was developed by Corey and Wipke and later evolved into tools like LHASA31 and SECS.32 Since then, a series of retrosynthesis programs have been developed, such as SYNLMA,33 SYNCHEM,34 SYNGEN,35 IGOR,36 WODCA37 and CHIRON.38 These pioneering systems laid the foundation for computer-aided retrosynthesis by automating reaction rule application.
By leveraging the network of organic chemistry (NOC), which comprises over 10 million compounds and their reaction relationships, Grzybowski et al. introduced synthesis optimization with constraints (SOCS).39 A constrained search algorithm was developed using cost and popularity functions, alongside a manually curated database of more than 20000 reaction templates specifying scope and conflict groups. This expert-designed template set enabled the creation of Chemitica, a computer-aided tool for de novo synthetic design. Compared to earlier synthesis software, Chemitica demonstrated efficacy comparable to that of human chemists, supported by extensive chemical reaction data. Complete synthetic routes for complex natural products, including (–)-Dauricine, Tacamonidine, and Lamellodysidine A, were successfully designed and experimentally validated (Fig. 3).40,41 The results of Turing test demonstrated that even experienced chemists are unable to distinguish between synthesis routes generated by Chemitica and those reported in the literature. Subsequent advancements incorporated molecular force field-level parameters to refine synthesis design and introduced the Stereofix module for improved stereochemical handling, establishing Chemitica as one of the most robust retrosynthesis tools. Similarly, the InfoChem group developed ICSYNTH, demonstrating its innovative capabilities in the de novo design of complex spiro and aromatic heterocyclic compounds.42
Despite the success of template-based methods in retrosynthesis, concerns remain about the breadth and generality of manually curated templates. Kayala and Baldi et al. argued that expanding template libraries exponentially increases conflicts among templates, which in turn hinders the generalization of existing templates.43 To address this, Green and Jensen developed RDChiral, an automated template extraction tool based on RDKit.44 Using this toolkit, they derived 163723 reaction rules, including stereochemical details, from 12.5 million single-step reactions in the Reaxys database and USPTO. These templates were incorporated into a neural network model for template matching and integrated with Monte Carlo tree search (MCTS)45 for substrate identification in retrosynthetic planning. The resulting system, ASKCOS, was evaluated on a robotic flow chemistry platform (see section 5.2) using 15 small molecule examples, showcasing its robustness and efficiency in synthetic route design (Fig. 4).46 Similarly, Genheden and Bjerrum developed AiZynthFinder, an open-source retrosynthesis tool that demonstrated superior performance in specific synthetic scenarios compared to ASKCOS.47–49
Template-based retrosynthesis methods, though effective in many cases, are computationally intensive, particularly during route searches. As the molecular complexity of the target increases, the computation time scales up significantly. Furthermore, these methods are inherently constrained by their reliance on predefined reaction templates, making them unable to predict transformations beyond their established scope. This limitation is especially problematic when encountering novel or less-studied reactions, as they fall outside the boundaries of existing templates.
To address these limitations, template-free retrosynthesis approaches have emerged as a promising alternative. These data-driven methods leverage ML models trained on extensive reaction datasets, allowing the prediction of synthetic routes without relying on predefined templates. By directly extracting reaction knowledge from vast collections of literature data, template-free methods significantly enhance the scope and flexibility of retrosynthetic planning, offering the potential to design synthetic pathways for virtually any molecule.50
Simplified molecular input line entry system (SMILES)51 encoding is a widely used data representation in template-free retrosynthesis methods, enabling to transfer of retrosynthesis planning into a natural language processing (NLP) problem.52,53 This allows the application of advanced NLP frameworks. Schwaller et al. utilized the reduced USPTO-50k dataset to frame retrosynthesis as a language translation task involving the SMILES strings of reactants, reagents, and products. This work led to the development of the Molecular Transformer model.54 Building on this model, they optimized evaluation metrics and introduced a hypergraph exploration strategy, resulting in the creation of a publicly available retrosynthesis platform, RoboRXN.55 Their study presented retrosynthesis route designs for several complex molecules, demonstrating the power of models (Fig. 5A).
Beyond sequence-based models, graph neural networks (GNNs) offer another promising direction for template-free reaction prediction. Compared with SMILES-based methods, the graph structure provides richer information for the model.29 Ke et al. developed the node-aligned graph-to-graph (NAG2G) model, which integrates 2D molecular graphs and 3D conformations to capture comprehensive molecular details.56 The model ensures reaction validity by establishing atom mappings between reactants and products. Its capability to design synthetic routes for pharmaceutical molecules highlights its potential for complex synthesis challenges (Fig. 5B).
Template-free methods can leverage the full potential of AI in processing chemical data, offering significant promise for future development. However, compared to template-based methods, their ability to accurately capture and predict complex reactions remains a challenge. Furthermore, most current template-free predictions lack experimental validation, leaving their reliability to be further substantiated.
Retrosynthetic methods have made remarkable progress in recent years, significantly advancing the field of chemical synthesis. However, several challenges remain that hinder their broader application. One major limitation lies in the limitations of available datasets, which often lack critical details on reaction conditions—such as specific additives, solvents, and their precise quantities—thereby necessitating expert intervention to bridge the gap between computational predictions and practical execution. In this regard, template-based methods such as Chemitica have demonstrated notable advantages over template-free methods, as they inherently capture detailed reaction conditions when encoding reaction templates. By contrast, template-free models struggle to accurately infer such information from existing data. Although large-scale reaction databases such as USPTO contain a fraction of entries with recorded catalysts and solvents, these entries typically lack crucial information on reactant quantities and categories. For example, RoboRXN occasionally includes additives and solvents in the output, but they are presented in the same manner as reactants, lacking additional details, which can complicate the interpretation of the procedure as recommended by the model.
Additionally, the datasets used for training retrosynthesis models are significantly smaller and less diverse than the extensive synthetic knowledge available in the literature. Their accuracy is also a significant concern, as Grzybowski et al. reported that 60% of reactions from patents are either incorrect or highly questionable, while 28% of literature-reported reactions are deemed unreliable.57 These data issues limit the robustness and generalizability of current models.
Transforming AI-assisted retrosynthesis into a truly practical tool for chemists will require addressing these challenges, particularly in improving dataset quality, diversity, and the incorporation of actionable reaction details. Despite these limitations, the potential demonstrated by current methods inspires confidence that AI-driven retrosynthesis will ultimately revolutionize chemical synthesis, addressing challenges beyond the capabilities of traditional approaches.
However, experience-based design is often limited by cognitive biases, which may potentially hinder the exploration of unconventional catalyst structures. Moreover, for complex reactions involving intricate electronic or steric interactions, traditional rational design methods frequently fail to identify optimal catalyst candidates, highlighting the need for more systematic and predictive strategies.
Fortunately, the availability of extensive literature data has enabled data-driven approaches in rational catalyst design. Among these, insights from physical organic chemistry have proven particularly impactful in guiding molecular catalysis. Electronic effects such as pKa,59 hydrogen bond lengths,60,61 infrared vibration frequencies62 and intensities and non-covalent interaction parameters,63 along with steric parameters such as Taft parameters, Sterimol parameters, and %Vbur,64 have played a crucial role in guiding catalyst design. For example, Sigman et al. studied asymmetric propargylation reactions catalyzed by chiral chromium complexes and used a binary substitution matrix to analyze the steric and electronic effects of ligands.65
While physical organic chemistry has provided critical insights into catalyst design, its application in practical reaction development remains fraught with significant challenges. Chemists often require several weeks or even months to synthesize numerous catalysts in order to derive physical organic parameters of a certain reaction. However, these principles and parameters are often applicable to a limited range of catalysts, thereby reducing the overall efficiency of the discovery process and restricting the breadth of exploration in catalyst design. By integrating chemists’ expertise with AI's capacity for large-scale data exploration, more expansive catalyst candidates can be investigated. Coupled with optimization algorithms, AI facilitates the efficient identification of optimal catalyst structures, presenting significant potential to accelerate catalyst discovery. The general workflow for AI-assisted catalyst design and optimization is illustrated in Fig. 6. The process begins with applying clustering algorithms to the catalyst library, grouping catalysts into distinct categories based on their properties. From these clusters, representative candidates are selected for experimental testing. Optimization algorithms are then employed to refine both the catalyst structures and reaction conditions, followed by experimental validation to discover new catalysts.
Schoenebeck et al. applied a clustering algorithm to identify potential Pd(I) dimer catalysts from 348 phosphine ligand pairs in the LKB-P ligand library (Fig. 7A).66 Using K-Means clustering algorithms, the ligands were initially reduced to 25%, followed by further selection based on 42 problem-specific descriptors calculated via DFT. Experimental validation of the cluster containing previously reported Pd(I) dimers led to the identification of 8 Pd(I) dimer catalysts with minimal experimental effort. Similarly, Denmark et al. utilized clustering to optimize peptide catalysts for the reaction of aldehydes with nitroethylene.67 They utilized an algorithm to select 161 tripeptides from a library constructed from 174 commercially available amino acids, which were then synthesized and experimentally tested. The optimal catalyst identified through this process improved the enantioselectivity from 86% to 91% ee.
![]() | ||
Fig. 7 Examples of AI-assisted catalyst design. (A) Discovery of dinuclear Pd(I) complexes through data collection and clustering, with the figure depicting results from the second round of clustering.Reprinted (adapted) with permission from ref. 66, copyright 2021, American Association for the Advancement of Science. (B) Optimization of trisubstituted alkene selectivity using the Kraken virtual screening library. Reprinted (adapted) with permission from ref. 72, copyright 2024, American Chemical Society. |
Clustering enables the rapid identification of structurally or functionally similar catalysts from literature data, facilitating the extraction of highly reactive candidates from databases. Catalyst libraries for clustering can be generated using templates, with molecular properties characterized through fingerprinting methods or DFT calculations. However, unsupervised clustering approaches often require experimental validation to refine results, and the random selection of representative molecules for synthesis from clusters can significantly influence the overall outcomes.
Moreover, employing structure generation strategy to construct a comprehensive library of organocatalysts with their molecular properties, represents a pivotal advancement in accelerating catalyst discovery. Traditionally, manual screening relies on the design of potential catalyst candidates based on empirical knowledge or physical organic principles, resulting in catalyst libraries that are inherently prone to oversight and cognitive bias. In contrast, divergent structure generation methods start from a limited set of molecular scaffolds and systematically combine them with common functional groups to generate a large and structurally diverse set of molecules. This approach produces libraries with broad coverage, enabling a more extensive and unbiased exploration of the potential catalyst space.
The Kraken platform developed by Aspuru-Guzik et al. is a possible catalyst library paradigm.68 They collected 1556 commercially available ligands for PR3 phosphine catalysts and constructed 331776 virtual catalysts containing at least two identical substituents and more than 1.9 million different substituents from a unique 576 substituent R. The prediction of properties of these molecules can be predicted by their proposed BoS and other molecular fingerprints, to predict new organic reactions. Some studies have demonstrated that Kraken can be used to optimize reaction performance.69 or validate reaction mechanisms70,71 through virtual ligand screening. For example, Doyle and Sigman et al. utilized Kraken as a virtual screening library in their study on the nickel-catalyzed reduction of enol tosylates to selectively form trisubstituted alkene products (Fig. 7B).72 Kraken provides a large pool of monophosphine ligands as candidates for the optimal catalyst. By employing an optimization algorithm, they successfully identified phosphine ligands suitable for the selective formation of E- and Z- trisubstituted alkenes, significantly accelerating the screening process.
Despite the availability of comprehensive libraries for phosphine ligands, many other ligand and catalyst frameworks still lack corresponding screening libraries. Expanding virtual catalyst databases is therefore essential to explore a broader range of scaffolds and their derivatives. However, current catalyst design remains limited by manually defined scaffold structures rather than fully mechanism-driven approaches. AI-powered generative molecular design provides a promising solution for rational catalyst design.73 Significant progress in AI-assisted small molecule drug design, where AI analyses target sites to generate protein-binding molecules, has already revolutionized drug discovery.74,75 Given the structural parallels, organocatalysis, which similarly relies on “pocket” architectures, stands to benefit greatly from AI-driven strategies.
With the fast growth of reaction data, AI-driven approaches are showcasing remarkable capabilities in reaction prediction. By leveraging AI models and extensive literature data, these methods enable the accurate prediction of reaction conditions,76–78 products,79,80 regioselectivity,81,82 yields83–85 and stereoselectivity.86,87 However, in the discovery of novel catalytic reactions, the lack of related reaction data often limits the models’ generality. To address the challenge of making predictions in zero or low-data scenarios, active learning or active sampling strategies have been developed.88,89 This section highlights two representative challenges in molecular catalysis research – optimizing reaction conditions and exploring substrate scopes – and highlights AI-assisted solutions that address these challenges.
Among those algorithms, BO is widely used for its high sample efficiency, integration of prior knowledge, and strong high-dimensional space searchability.91 As illustrated in Fig. 8, the general workflow for employing BO in organic reaction condition optimization begins with defining the screening space of reaction conditions and establishing the optimization target. Initial conditions are then selected, either algorithmically or based on chemists’ expertise. Experimental results are iteratively input into the BO model to refine predictions, allowing the process to progressively converge toward optimal conditions that meet the defined target.
In 2021, Aspuru-Guzik et al. developed Gryffin, a BO framework capable of handling both discrete and continuous variables.92 To demonstrate its effectiveness, they applied it to optimize Suzuki–Miyaura reaction conditions using 88 reactions and their turnover numbers (TON) as inputs. Bayesian neural networks were used to predict TON across the reaction space, demonstrating Gryffin's superiority over traditional BO methods in achieving higher yields and faster optimization.
To further extend the application of BO to real experimental scenarios, Doyle et al. developed EDBO (experimental design via Bayesian optimization) to guide the direct arylation of imidazoles.93 By comparing EDBO with manual optimization, the study demonstrated its superior efficiency in identifying optimal conditions. The authors noted that while machines may initially lack the “smartness” of human intuition due to limited prior knowledge, EDBO was able to converge on optimal conditions more rapidly. Subsequently, they introduced EDBO+, which enabled simultaneous optimization of ee value and reaction yield, achieving a 10% yield improvement for the cross-coupling of styrene oxide with aryl iodides (Fig. 9A ).94 More recently, they explored reaction condition optimization as a multi-arm bandit problem, applying the Bayesian UCB algorithm to efficiently identify optimal conditions for palladium-catalyzed C–H arylation, anilamide coupling, and phenol alkylation.95
![]() | ||
Fig. 9 Schematic representation of three approaches for reaction condition optimization. (A) BO only optimizes conditions for a single substrate. Condition optimization with BO can achieve rapid improvements in yield and ee within 7 iterations. Reprinted with permission from ref. 94, copyright 2022, American Chemical Society. (B) Uniform sampling of the reaction space, combined with experimental results for modeling, allows prediction across different substrates and conditions. Plots demonstrate that model-predicted optimal conditions often perform well in actual synthesis (blue structures indicate this fragment is not included in the training set. Reprinted (adapted) with permission from ref. 96, copyright 2023, American Association for the Advancement of Science. (C) TL improves efficiency by leveraging prior knowledge, but its performance is limited by domain discrepancy. A general working framework for transfer learning in condition optimization is shown in the figure. Reprinted (adapted) with permission from ref. 97, copyright 2022, Royal Society of Chemistry. |
Although BO is widely used for optimizing organic reaction conditions, its current applications often focus narrowly on specific substrate templates, which can result in overfitting and limit the transferability of optimized conditions to broader substrate scopes. To address this, Bigler and Denmark et al. developed an ML-guided tool for predicting optimal conditions across a user-defined reaction space (Fig. 9B).96 They evaluated 121 substrates and 24 reaction conditions through 3300 carefully selected experiments. Using these data, a neural network model was trained to predict yields for different substrate groups. In out-of-sample validation, the model demonstrated a reasonable ability to predict optimal conditions, even though the exact predicted yields were not always accurate. However, this approach involves over 3000 experiments to evaluate the reaction scope. For more complex chemical reaction optimizations, it would be necessary to further reduce the experimental data-to-reaction space ratio.
Leveraging literature data can significantly reduce the experimental burden in reaction condition screening. Transfer learning (TL), an algorithm that applies patterns learned from existing experimental data to similar reaction systems, utilizes prior knowledge to minimize trial-and-error efforts. For example, Zimmerman et al. demonstrated the transferability of TL between Suzuki (C–C bond formation) and Buchwald–Hartwig (C–N bond formation) reactions under comparable conditions (Fig. 9C).97 However, their analysis revealed that condition predictions remain largely confined to relatively similar reaction types. To address this limitation, adaptive tree-based learning (ATL) models have been introduced to improve the predictive capabilities of TL, offering a promising strategy for extending its applicability to more diverse reaction spaces.
Although reaction optimization models have demonstrated their effectiveness in global optimization within complex chemical spaces, challenges remain in conditional screening. Selecting the appropriate number of optimization cycles is particularly challenging, especially in large reaction space where too few cycles may obscure meaningful trends, while excessive cycles increase the experimental burden. The inefficiency of manual screening further underscores the need for autonomous experimentation to enhance optimization efficiency (see section 5). For reactions with large chemical spaces and unclear mechanisms, combining AL with high-throughput experimentation (HTE) offers a promising solution, which is likely to shape the future of organic laboratory workflows.
Establishing reaction scope requires selecting a representative set of substrates. Over the past decades, the number of reaction examples required to establish reaction scope has nearly doubled, significantly increasing the workload for organic chemists.98 However, substrate selection has traditionally relied on subjective choices made by researchers, introducing potential biases in both the selection and reporting of results. To present reactions in a more favorable light, researchers may exclude incompatible substrates and selectively highlight optimal outcomes, resulting in skewed evaluations of reaction performance. The integration of intelligence-guided, data-driven approaches into reaction scope studies offers a powerful solution to mitigate human-induced biases. By leveraging ML and data analytics, these approaches can provide a more comprehensive and objective assessment of a reaction's true capabilities, ensuring a more accurate representation of its versatility and limitations.
One of the most commonly used substrate selection algorithms is clustering, which selects a few representative substrate molecules from a large pool of candidates based on their features.99 In the study of Ni/photoredox cross-coupling, Doyle et al.100 utilized a data-driven approach to explore the scope of aryl bromide substrates (Fig. 10A). They searched Reaxys for all aryl bromide substrates and, after a simple screening process, obtained a dataset of 2600 members. Using auto-qchem, they calculated the DFT descriptors of the molecules. These descriptors were then used for dimensionality reduction via uniform manifold approximation and projection (UMAP) and hierarchical clustering, resulting in 15 representative aryl bromide molecules selected for evaluation. The results demonstrated that, compared to traditional literature-based methods, the data science-driven approach could cover a broader substrate space with fewer experiments, providing a comprehensive overview of the reaction scope.
![]() | ||
Fig. 10 (A) Clustering of aryl bromides based on DFT descriptors, enabling a reduction in the number of experimental points while achieving results comparable to literature scope exploration. Reprinted with request permission from ref. 100, copyright 2022, American Chemical Society. (B) Exploration of diverse alkene categories using the DrugBank dataset as the background. The reactivity of the two reaction types is systematically compared. Reprinted (adapted) with request permission from ref. 101, copyright 2024, American Chemical Society. |
While AI-guided approaches effectively reduce selection and reporting biases, they often struggle to capture the complex interactions among functional groups in real chemical systems. To address these challenges, Glorius et al. developed a universal substrate screening strategy to minimize such biases (Fig. 10B).101 Using the DrugBank database as a representative dataset of diverse drug-like molecules, they described molecular structures with extended-connectivity fingerprints (ECFP) and applied K-means clustering to group the substrates into 15 spatially distinct classes, visualized in two dimensions via UMAP. To evaluate reaction scope, the substrate space of a given reaction was mapped onto this same descriptor framework, and 15 olefins closest to each cluster center were experimentally tested. This method was successfully demonstrated for the photochemical imino-carboxylation of olefins and osmium-catalyzed dihydroxylation, revealing the extent to which reported conditions are applicable in diverse chemical spaces.
In practice, selecting the optimal number of candidate substrates presents a significant challenge. A small set of substrates may fail to represent the diversity of chemical space adequately, whereas an excessive number can render experimental validation impractical. Although methods like elbow or silhouette plots provide some guidance, their effectiveness in practical applications remains limited.102 Another challenge in reaction evaluation lies in reporting and selection bias. Researchers often prioritize favorable results for publication, while unbiased, algorithmically designed experiments may produce outcomes that appear less impressive. As a result, negative or suboptimal data are often underreported, leading to a skewed perception of reaction performance. To address this issue, it is crucial to emphasize the systematic inclusion and transparent reporting of unbiased experiments and negative data. This approach will enable a more accurate and comprehensive understanding of reaction robustness and generalizability.103–105
AI has demonstrated significant potential in accelerating reaction condition optimization and unbiased reaction scope analysis. However, for complex chemical systems, even active learning often fails to substantially reduce experimental workloads. Additionally, the limited interpretability of current models hinders their ability to generate new chemical insights. Despite these limitations, continued advancements in AI algorithms and the growing adoption of AI technologies by researchers are expected to accelerate progress in reaction optimization and scope, and efficient and insightful exploration of reaction space.
The automation of chemical synthesis offers a promising solution to these challenges. In 1965, Merrifield introduced automated solid-phase peptide synthesis, laying the foundation for automatic synthesis.106 This milestone was followed by the development of automated analytical systems107 and algorithm-driven chemical synthesis platforms.108 However, the complexity of chemical reactions and the limitations of automatic technologies have hindered the widespread adoption of these innovations.
With the advent of the AI era, the development of versatile mechanical systems and advanced algorithms has empowered chemists to design automated setups for complex organic reactions at significantly reduced costs. This progress has paved the way for autonomous experimentation, a process in which AI and automation systems independently plan, execute, analyze, and optimize experiments with minimal human intervention. By enhancing experimental efficiency, reducing human error, and enabling a systematic exploration of chemical space, autonomous experimentation is poised to revolutionize the field of chemistry, transforming how discoveries are made and accelerating the pace of innovation.
The use of multiparallel batch reactors is an effective approach to automate reactions and improve efficiency in HTE. Typically conducted in 96-, 384-, or 1536-well plates, HTE employs automated dosing systems to accelerate reaction screening through parallel processing and batch analysis. This approach is widely used in synthesizing large screening libraries for drug discovery and has also been applied in molecular catalysis for condition screening109,110 and reaction dataset construction (Fig. 11A).111 Despite its advantages, such as low material consumption and high parallelism, HTE faces challenges with compatibility for complex conditions and limitations in screening diverse substrates due to analytical constraints. Consequently, its adoption in organic synthesis laboratories remains limited.
![]() | ||
Fig. 11 Examples of autonomous experimental techniques. (A) HTE applied to photoreaction condition screening. Reprinted with permission request from ref. 111, copyright 2023, Royal Society of Chemistry (B) Flow chemistry systems employed for Buchwald–Hartwig coupling reactions. Reprinted with permission request from ref. 116, copyright 2018, American Association for the Advancement of Science. |
Another approach to automating experiments is flow chemistry. Flow chemistry uses syringe pumps for sample injection, allowing reactions to occur in flow loops or reactors. One major advantage of flow chemistry is its compatibility with analytical techniques such as HPLC, enabling efficient compound synthesis and analysis. Additionally, by combining multiple flow loops in series or parallel, multistep synthesis can be achieved.
Although a single flow setup is typically optimized for one reaction type, innovations like the “radial synthesizer” have enabled multistep syntheses and optimizations for specific target molecules without requiring complex reconfigurations. For example, Jensen et al. designed a plug-and-play continuous-flow chemical synthesis system, where users can control reagent selection, unit operations (e.g., reactors and separators), and reaction analytics through a graphical interface. Catalytic reactions such as Buchwald–Hartwig cross-coupling, Horner–Wadsworth–Emmons (HWE) olefination, reductive amination, aromatic nucleophilic substitution, photoredox reactions, and [2 + 2] cycloaddition have been successfully demonstrated. Additionally, methods for conducting photoelectrochemical reactions using flow chemistry have also been developed. However, flow chemistry still has limitations. Due to reaction and analysis times, its efficiency is not significantly superior to manual experimentation. Furthermore, metal-based flow loops may influence reaction outcomes, potentially reducing reproducibility.
Flow chemistry represents another approach to automating experiments.112–114 Using syringe pumps for sample injection, reactions occur in flow loops or reactors. A key advantage is its compatibility with analytical techniques like HPLC, enabling efficient compound synthesis and analysis. By integrating multiple flow loops in series or parallel, multistep syntheses can also be achieved.
Innovations such as the “radial synthesizer” have further expanded the potential of flow chemistry, enabling multistep syntheses and optimization for target molecules without complex reconfigurations.115 For example, Jensen et al. developed a plug-and-play continuous-flow system, allowing users to control reagents, unit operations, and analytics through a graphical interface.116 This system has been successfully applied to catalytic reactions such as Buchwald–Hartwig cross-coupling, HWE olefination, reductive amination, photoredox reactions, and [2 + 2] cycloadditions (Fig. 11B). Additionally, photoelectrochemical reactions have been demonstrated using flow chemistry.117 However, flow chemistry has limitations. Its efficiency is not markedly superior to manual experimentation due to reaction and analysis times. Meanwhile, metal-based flow loops can influence reaction outcomes, leading to results that may not be easily reproducible in traditional synthetic laboratories.118,119
To better align with existing synthesis techniques in the laboratory, Cronin et al. reported a novel chemical synthesis framework called Chemputer.120–122 They reimagined chemical synthesis as like computer programming, using sensors and control components to manage hardware operations and flow pumps and valves for system interconnections. The χDL framework acts as a “code” to translate chemical reactions into executable steps. This approach successfully enabled the automation of various organic reactions and was extended to perform parallel reactions.123 While the ChemPU requires complex equipment, it partially overcomes the compatibility challenges often faced by flow chemistry systems.
Automatic experimental techniques for laboratory synthesis have made significant strides, but challenges remain due to the complexity of organic reactions. Reactions demanding strict anhydrous or anaerobic conditions, or those involving insoluble or viscous substances, often surpass the capabilities of standard systems. Additionally, the limited expertise in adapting reactions for automation presents an additional barrier to widespread adoption. Compact and affordable devices with lower automation levels but greater compatibility could help overcome these barriers, enabling traditional synthesis laboratories to transition toward modern workflows and expanding automation's accessibility in research.
![]() | ||
Fig. 12 Various experimental setups for closed-loop experimentation. The four systems on the left illustrate the use of flow chemistry to achieve automated experimentation. Closed-loop workflow (Reprinted with permission request from ref. 124, copyright 2022, American Association for the Advancement of Science) and RoboChem (Reprinted with permission request from ref. 125, copyright 2024, American Association for the Advancement of Science) are designed for closed-loop condition optimization, while the robotic platform for flow synthesis (Reprinted with permission request from ref. 46, copyright 2019, American Association for the Advancement of Science) and Chemputer (Reprinted with permission request from ref. 126, copyright 2019, American Association for the Advancement of Science) enable closed-loop molecular synthesis. The figure on the right illustrates mobile robotic chemist (Reprinted with permission request from ref. 128 copyright 2020, Springer Nature) capable of performing human-like tasks, offering high adaptability for both reaction condition optimization and molecular synthesis. |
Integrating automation with optimization algorithms can significantly enhance the efficiency of reaction optimization. Grzybowski and Burke et al. successfully utilized a closed-loop system to optimize conditions for the Suzuki–Miyaura cross-coupling reaction, achieving a significantly broader substrate scope compared to previously published methods.124 Similarly, Noël et al. demonstrated the use of a robotic platform, RoboChem, for the self-optimization and scale-up of photochemical reactions.125 This system, integrating components such as continuous flow photoreactors and benchtop NMR spectrometers, optimized diverse photochemical reactions. The approach not only accelerated the procedure to identify improved reaction conditions but also highlighted the potential of NMR as an alternative to traditional liquid chromatography for yield determination.
Beyond reaction condition optimization, closed-loop systems have also been applied to molecular synthesis. A notable example is the robotic platform developed by Green and Jensen.46 This platform, enabled by ASKCOS software, facilitates the automated flow synthesis of small-molecule pharmaceuticals. Another notable example is the Chemputer, developed by Cronin et al., which incorporates extensive sensor integration to replicate the human-like execution of routine organic operations, such as extraction and evaporation.126 Demonstrated through the synthesis of 3 pharmaceutical molecules, the Chemputer showcases the potential of automated synthesis for general applications. While the system currently relies on chemists to encode reaction steps for synthesis planning, it holds the promise of integrating automated retrosynthetic planning software, paving the way for a fully autonomous closed-loop reaction planning framework.
Robotic chemists, designed to handle samples similar to human chemists, are well-suited for traditional organic laboratories and experimental setups, offering greater scalability for addressing complex synthesis and optimization challenges.127 For example, Cooper et al. developed a mobile robotic chemist guided by Bayesian search algorithms, capable of autonomously conducting large-scale experiments and identifying optimal reaction conditions.128 By integrating mobile robots, automated synthesis platforms, and analytical tools such as LC-MS and NMR, they established an end-to-end workflow for autonomous reaction optimization and substrate identification.129 Despite these advances, robotic chemists are generally limited to performing one reaction at a time, with throughput comparable to human chemists, and they still rely on fixed programming methods. These limitations highlight the gap in efficiency compared to flow chemistry and HTE systems, as well as the need for greater adaptability to varying reaction conditions in future developments.
Closed-loop optimization can be applied not only to the selection of reaction conditions but also to molecular discovery. Li and Cooper et al. demonstrated a two-step data-driven approach to screen and synthesize organic conjugated photocatalysts (OCPs) from a virtual molecular library.130 However, the vastness of chemical space highlights a limitation of this approach: the inefficiency of human-driven synthesis hinders large-scale exploration, even with modern ML algorithms. To tackle these challenges, Burke and Aspuru-Guzik et al. proposed asynchronous cloud-based delocalized closed-loop (ACDC) optimization, applying it to discover organic solid-state laser gain materials.131 By screening 92800 molecules in silico and employing BO to recommend candidates, they synthesized and characterized 12 superior compounds using automatic platforms.132 While human involvement remains crucial in molecular catalysis, such automated platforms are essential for advancing data-driven closed-loop discovery.
While closed-loop reaction optimization strategies is promising, significant challenges remain in fully integrating these approaches into organic reaction research. The ultimate goal of closed-loop systems is to achieve seamless integration of AI and automation for reaction discovery and optimization. Burke and Aspuru-Guzik et al. offer a glimpse into this goal.133 As shown in Fig. 13, phase I utilized Bayesian optimization (BO) to guide the synthesis of photo-stable molecules, with interpretable machine learning (ML) models based on DFT descriptors highlighting the critical role of solvents. In phase II, additional experiments confirmed the impact of solvents, and phase III focused on solvent optimization, further enhancing photostability. This workflow enables researchers to uncover new insights and iteratively apply them to molecular design, overcoming the limitations of traditional hypothesis-driven trial-and-error approaches.
![]() | ||
Fig. 13 The closed-loop transfer (CLT) diagram. Phase I involves ML-driven reaction discovery and optimization. In phase II, ML-generated hypotheses are experimentally validated, leading to phase III, where physics-driven discovery generates new insights into molecular catalysis. Reprint with permission request from ref. 133, copyright 2024, Springer Nature. |
In summary, autonomous experimentation is transforming molecular catalysis by integrating closed-loop optimization with AI, driving advances in reaction discovery and optimization. However, as previously discussed, limitations in AI modeling and automation technology constrain the broader application of closed-loop systems in molecular catalysis. Overcoming these barriers requires not only the adoption of AI by synthetic chemists but also the development of more user-friendly and cost-effective hardware and software solutions with interdisciplinary collaboration.
However, for AI-driven methods to be deeply integrated into molecular catalysis, several challenges must still be addressed. As highlighted in this review, current models have yet to match or surpass human chemists in terms of domain-specific knowledge within chemical systems. While data distribution and quality are critical factors, a deeper challenge lies in the unmatched ability of human chemists to systematically acquire and apply chemical domain knowledge. While improving molecular representations and model performance are straightforward technical solutions, a more fundamental approach lies in embedding chemical domain knowledge into the development of AI models.18 Additionally, the high cost and limited applicability of automation devices further constrain their utility for a broad range of organic reactions, limiting widespread adoption in laboratories.
Despite the challenges ahead, the collaborative progress between AI and molecular catalysis is poised to usher in a new chapter in the development of organic chemistry. High-quality databases will serve as the foundation for robust and reliable AI-driven predictions. More intuitive and accessible AI tools will empower chemists from diverse backgrounds to integrate ML into their research, breaking down technical barriers. In parallel, cultivating a new generation of talent with expertise in both chemistry and AI will be crucial for bridging the gap between these fields and fostering transformative innovation. Finally, interdisciplinary collaborations, combining the strengths of chemistry, computer science, physics, and engineering, will drive the creation of novel AI-powered methodologies and paradigms, accelerating progress in molecular catalysis.
At the threshold of a new era, the integration of AI and chemistry holds the promise of transformative advancements in molecular catalysis. This paradigm shift is poised to redefine the landscape of chemical research, accelerating discoveries and innovations once deemed beyond reach. By delivering unprecedented efficiency, precision, and exploratory power, it will pave the way for a future where molecular catalysis achieves unparalleled heights of possibility and impact.
This journal is © the Partner Organisations 2025 |