Dan
Guevarra
*ab,
Kevin
Kan
ab,
Yungchieh
Lai
ab,
Ryan J. R.
Jones
ab,
Lan
Zhou
ab,
Phillip
Donnelly†
a,
Matthias
Richter‡
ab,
Helge S.
Stein
c and
John M.
Gregoire
*ab
aDivision of Engineering and Applied Science, California Institute of Technology, Pasadena, CA 91125, USA. E-mail: guevarra@caltech.edu; gregoire@caltech.edu
bLiquid Sunlight Alliance, California Institute of Technology, Pasadena, CA, USA
cTUM School of Natural Sciences, Department of Chemistry, Munich Data Science Institute, Technical University of Munich, Munich, Germany
First published on 3rd October 2023
Advancements in artificial intelligence (AI) for science are continually expanding the value proposition for automation in materials and chemistry experiments. The advent of hierarchical decision-making also motivates automation of not only the individual measurements but also the coordination among multiple research workflows. In a typical lab or network of labs, workflows need to independently start and stop operation while also sharing resources such as centralized or multi-functional equipment. A new paradigm in instrument control is needed to realize the combination of independence with respect to periods of operation and interdependence with respect to shared resources. We present Hierarchical Experimental Laboratory Automation and Orchestration with asynchronous programming (HELAO-async), which is implemented via the Python asyncio package by abstracting each resource manager and experiment orchestrator as a FastAPI server. This framework enables coordinated workflows of adaptive experiments, which will elevate Materials Acceleration Platforms (MAPs) from islands of accelerated discovery to the AI emulation of team science.
Expanding upon the vision of interconnected workflows for AI emulation of team science, Bai et al.11 have envisioned world-wide coordination of self-driving labs driven by the rapidly evolving fields of knowledge graphs, semantic web technologies, and multi-agent systems. Ren et al.12 emphasize the critical need for interconnected laboratories to leverage resources and learn epistemic uncertainties. To realize this collective vision, experiment automation software that builds upon the state of the art13–24 must be developed to interconnect laboratories and their research workflows. A hallmark of human scientific research is on-the-fly adaption of experimental workflows based on recent observations. Human scientists also interleave workflows spanning materials discovery to device prototyping.25 The interleaved execution of multiple workflows typically involves shared resources, which is often a practical necessity for minimizing the capital expense of establishing any experimental workflow. These considerations require “nimble” experiment automation, and in the present work we describe our approach to automating nimble, interconnected workflows via asynchronous programming.
Parallel execution of automated workflows can be realized via “interconnected workflows” wherein a human or machine Science Manager oversees multiple automated workflows that each run on their own central processing unit (CPU). This scheme is appropriate when each workflow can operate independently. If two workflows share an experiment resource, one strategy for their automation is to integrate them into a hybrid workflow that is executed from a single CPU. The shared runtime among all processes in a single CPU inherently limits operational flexibility, a runtime interdependence that is impractical for interconnected labs. In addition to addressing the needs of shared resources, asynchronous programming provides the requisite flexibility for experiment automation tasks such as passing messages, writing data to files, and polling devices. These needs can be partially fulfilled with multi-thread programming, although we find asynchronous programming to be a more natural solution. For a more technical discussion of the difference between asynchronous programming and threading, we refer to the reader to ref. 26.
As an example of a shared resource, consider a lab in which a central piece of equipment such as an X-ray diffractometer or reactive annealing chamber is used in several distinct workflows. Traditional methods of experiment automation would involve each workflow taking ownership of that equipment during the workflow's runtime. Combining all workflows in a single instance of automation software limits the ability of different workflows to start and stop as dictated by science management and/or equipment maintenance needs. Human researchers address this challenge by creating a system to schedule the requested usage of shared equipment, and the automation analogue is to have the shared equipment operated by a broker whose runtime is independent of all other workflow automation software. This is the central tenet of “Nimble interconnected workflows” as depicted in Fig. 1, wherein each resource family is controlled by an asynchronous Resource Manager. The runtime independence of Resource Managers and Workflow Orchestrators enables each Orchestrator to maintain focus on a single research workflow while empowering a Science Manager to coordinate efforts across many workflows in any number of physical laboratories.
The series of workflow automation capabilities illustrated by Fig. 1 has been largely mirrored by the evolution of reported automation software for materials chemistry. Seminal demonstrations of software for automating an experiment workflow include ARES,13 ChemOS,14 and Bluesky.15 Continual development of these platforms have resulted in new capabilities such as the generalized ARES-OS16 and remote operation with Bluesky.17 ChemOS14 and ESCALATE27 have increasingly incorporated ancillary aspects of automation such as encoding design of experiments and interfacing with databases. These efforts have built toward multi-workflow integration in HELAO18,19 and NIMS-OS20 as well as object-oriented, modular frameworks for co-development of multiple MAPs21,23 and multi-agent automation frameworks.22 Enabling independent operation of workflow components ultimately requires asynchronous programming, as envisioned by the present work and ChemOS2.0.24 The abstraction of lab equipment as asynchronous web servers is implemented in HELAO-async using FastAPI,28 a performant, standards-based web framework for writing Application Programming Interfaces (APIs) in Python. SiLA2 (ref. 29) is an alternative framework that may be used to realize the HELAO-style communication within and among multiple workflows. To best of our knowledge, HELAO-async is the first instrument control software platform that realizes the “Nimble interconnected workflows” paradigm illustrated in Fig. 1, where workflows can be independently started and stopped while sharing resources such as centralized or multi-functional equipment. These capabilities are agnostic to whether workflow operation decisions are determined via human or artificial intelligence, while the independent execution of multiple workflows is critical for fully realizing the value of hierarchical active learning and multi-modal AI algorithms.
The implementation of “Nimble interconnected workflows” in HELAO-async is outlined in Fig. 2. A Science Manager (see Fig. 1) is implemented as an Operator for active science management and Observer for passive science management. The Orchestrator manages workflow-level automation, which generally involves launching a series of actions on the workflow's suite of Action Servers. In the parlance of Fig. 1, Action Servers are the Resource Managers, which execute actions via Device Drivers that comprise the Hardware/Software Resources. The design principle of this framework is to enable asynchronous launching of workflows via Operator–Orchestrator communication, as well as asynchronous execution of workflows via Orchestrator–Action Server communication. When multiple Orchestrators share a resource, queuing and prioritization are managed by the respective Action Server.
Fig. 2 The HELAO-async framework is outlined as a specific implementation of the “Nimble interconnected workflow” concept from Fig. 1. Orchestrators manage workflows by controlling Action Servers that manage resources via Device Drivers. By establishing an independent FastAPI server for each Orchestrator and Action Server, workflows have independent runtimes while sharing resources as needed. A human or AI Operator manages the collection of Orchestrators, and an Observer can consume data streams from any server. The use of FastAPI endpoints and websockets for these interactions creates flexibility for the implementation of Operators and Observers. |
The HELAO-async implementation described herein is intended to be agnostic with respect to the type of Operator and Observer, which may involve any combination of a human researcher, an autonomous operator selecting experiments via an AI-based acquisition function, or a more general broker19 for coordinating experiments across many workflows. The scope of an Orchestrator is that of a single workflow, and by implementing each Orchestrator as a FastAPI server, the parameterized workflow is exposed to the Operator via custom FastAPI endpoints. For example, “global_status” is an endpoint of an Orchestrator FastAPI server that is programmed as follows to enable any program to request and receive the Orchestrator's status:
The Orchestrator executes a workflow via Action Server FastAPI endpoints, for example this abridged version of the “acquire_data” endpoint, which includes parameters for the duration and rate of the acquisition:
Fig. 2 also depicts Observers, which can subscribe to the FastAPI websockets established by an Orchestrator and/or Action Server. Each Orchestrator and Action Server must be programmed to publish data of interest to websockets, which enables any number of Observers to listen-in as needed. Our common implementation to-date is a web browser-based Observer (a.k.a. Visualizer) that researchers can launch to monitor quasi-real-time data streams, a critical capability for experiment quality control. For example, the Orchestrator web socket “ws_status” enables any program to subscribe to the Orchestrator's status messages:
The asynchronous operation at the Orchestrator and Action Server levels was particularly motivated by hierarchical active learning schemes, for example human-in-the-loop30,31 or fully autonomous hierarchical active learning.32 When workflow-level decisions are made by a human with autonomous workflow execution, a community of asynchronous Orchestrators enables humans to execute on any available workflow. We envision that this mode of operation will be critical for integrating physically-separated laboratories and realizing cloud laboratories.11,12,33
HELAO-async is being actively developed in two public repositories: https://github.com/High-Throughput-Experimentation/helao-core encompasses the API data structures and https://github.com/High-Throughput-Experimentation/helao-async contains instrument drivers, API server configurations, and experiment sequences. These repositories contain drivers for our suite of experimental resources spanning motion control, liquid handling, electrochemistry, analytical chemistry, and optical spectroscopy. Our Orchestrator-level implementations include scanning droplet electrochemistry,34 scanning optical spectroscopy,35 electrochemical cells with scheduled electrolyte aliquots for monitoring corrosion,36 and several methods of coupling electrochemical transformations with analytical detection of the chemical products.37 These latter two examples share the need for liquid and/or gas aliquoting from operational electrochemical cells, for which we typically use a Tri Plus robotic sample handling system (CTC Analytics), which is a shared resource across multiple workflows.
Given our safety and security protocols, readers of this manuscript may not execute HELAO-async code with our laboratory equipment. While we encourage the duplication and adaption of our hardware and/or software for operation in other labs, we have built a virtual demo for the present purposes of introducing HELAO-async. To create a minimal implementation of Fig. 2, the demo contains two independent Orchestrators, each with a dedicated Action Server that simulates the acquisition of electrochemistry data to characterize the overpotential for the oxygen evolution reaction (OER). We have packaged previously-acquired electrochemistry data with the demo.38 The two Orchestrators share a common resource, which in practice may be the robotic sample handling system. In the demo, the shared Action Server is an active learning agent that manages requests for new acquisition instructions from the two Orchestrators. This shared-resource Action Server runs independently and is unaffected by the runtime of each Orchestrator, which is demonstrated in the demo by independently starting and stopping the Orchestrators, representing the asynchronous operation of research workflows within one or across multiple laboratories. Running the demo batch script will open five user interface browsers, two Operators that control the respective Orchestrators, two Visualizers (Observers) that show the data streams from the respective electrochemistry Action Server, and a Visualizer (Observer) for the shared active learning Action Server. This Visualizer shows the progress of the active learning campaign, including the contributions from each of the independent Orchestrators. A snapshot of these five web browser interfaces is shown in Fig. 3. Due to the use of static random seeding of the active learning, the demo runs deterministically, where the contents of Fig. 3 show the status approximately 17 minutes after launching the demo batch script.
While the HELAO-async schematic of Fig. 2 indicates the intended role and scope of each FastAPI server with respect to the universal research roles summarized in Fig. 1, there remains flexibility in how to implement HELAO-async for a given workflow or ensemble of workflows. Regarding the scope of a single Action Server, a set of resources may be bundled in a single FastAPI server based on (i) their intended use as a grouping of shared resources, (ii) the need for synchronization among the resources, or (iii) safety-related interdependencies. As examples, consider (i) an autosampler for a piece of equipment and the piece of equipment itself, which may have distinct drivers but will always be used together so it is best to code their joint actions in an Action Server and abstract the joint action of sampling and measuring as a single FastAPI endpoint; (ii) an isolation valve and a pump where the isolation valve needs to be opened before the pump starts and the pump needs to stop before the isolation valve is closed, which are couplings of driver steps that are best hard coded within a single Action Server; (iii) a set of motors where the limits of the first motor depend on the position of the second motor, for which evaluating the safety of a given motor movement is best done within a single Action Server. While these examples illustrate why multiple resources should be bundled in a single Action Server, the primary counter examples involve resource sharing and hardware/software modularity. Programming the action queuing and prioritization for an Action Server that is shared among multiple Orchestrators is best done by minimizing the set of resources in the Action Server. To best leverage Action Server code for multiple physical instantiations of resources, the scope of an Action Server should be limited to the set of resources that are always implemented collectively into a workflow. This practice also facilitates error handling, where code or instrument failure within a given Action Server will result in the Orchestrator losing access to these capabilities. However, this type of single-point failure does not inherently crash the Orchestrator, so other aspects of the workflow may continue operating. Once the failure is resolved, restarting the Action Server will make its endpoints available to the Orchestrator to restore full workflow operation. We note that programming Orchestrators for automated error recovery is nontrivial, and the present work focuses on providing the capabilities to implement such strategies via automaton of workflows as networks of FastAPI servers.
The multi-Orchestrator demo described above additionally illustrates optionality with respect to the implementation of AI-guided design of experiments. If the AI agent is intended to be a Science Manager across multiple workflows, it should be implemented as an Operator in HELAO-async. However, in the demo, the active learning engine is implemented as an Action Server that is a shared resource for the two Orchestrators. In this case the Science Manager is a human who configured each Orchestrator to receive guidance from the active learning Action Server, which is a prudent mode of operation when the human will routinely switch an Orchestrator from executing experiments according to AI vs. human guidance. As such, this demo may be prescient in the context of instrument control using large language models that integrate human and AI design of experiments.39 While self-driving labs have traditionally been constructed with the AI acquisition function as the Operator, the future of MAPs will likely relegate active learning to be a resource that facilitates but not governs operation of automated experimental workflows. The asynchronous programming and implementation of workflows as networked servers in HELAO-async are designed to enable development of individual automated workflows followed by their seamless interconnection with additional workflows that can be managed by any combination of human and artificial intelligence.
• Project name: HELAO-async.
• Project home page: https://github.com/High-Throughput-Experimentation/helao-async.
• Operating system(s): Windows 7, Windows 10, Linux (limited driver functionality).
• Programming language: Python 3.8+.
• License: MIT.
• DOI: https://doi.org/10.22002/q2984-04886.
Footnotes |
† Present address: Good Terms LLC, CO, USA. |
‡ Present address: deepXscan GmbH, Dresden, Germany. |
This journal is © The Royal Society of Chemistry 2023 |