Eleni
Karamasioti†
ab,
Claude
Lormeau†
ab and
Jörg
Stelling
*a
aDepartment of Biosystems Science and Engineering and SIB Swiss Institute of Bioinformatics, ETH Zurich, 4058 Basel, Switzerland. E-mail: joerg.stelling@bsse.ethz.ch
bPhD Program Systems Biology, Life Science Zurich Graduate School, Zurich, Switzerland
First published on 18th August 2017
The rational design of synthetic gene circuits has led to many successful applications over the past decade. However, increasingly complex constructs also revealed that analogies to electronics design such as modularity and ‘plug-and-play’ composition are of limited use: biology is less well characterized, more context-dependent, and overall less predictable. Here, we summarize the main conceptual challenges of synthetic circuit design to highlight recent progress towards more tailored, context-aware computational design methods for synthetic biology. Emerging methods to guide the rational design of synthetic circuits that robustly perform desired tasks might reduce the number of experimental trial and error cycles.
Design, System, ApplicationThis review focuses on the computational design of synthetic gene circuits based on mathematical models of different types, such as statistical, thermodynamic, and dynamic systems descriptions, and their (possible) integration. Desired systems functionalities are predictability under uncertainty and biological variability; a primary corresponding design constraint is circuit robustness to effects of biological context. The immediate application potential of emerging methods to guide the rational design of synthetic circuits that robustly perform desired tasks is to reduce the number of experimental trial and error cycles. In perspective, the methods can enable rational design for therapeutic applications through rational re-wiring of natural systems. |
Electrical circuit analogies are potentially useful in abstracting from complicated biological phenomena, but they are not necessarily correct statements about biological reality. Synthetic gene circuits operate as (bio)chemical reaction networks in which, in principle, the network context matters for a specific part's (protein's or gene's) function. Unlike electric circuits, they do not have physically and functionally separated parts and dedicated ‘wires’ for targeted communication between them. Especially progress in synthetic biology towards more complex circuits now revealed important limitations of the plug-and-play approach inspired by electrical engineering. A reliable design of gene circuits needs to account for the following: (i) parts may behave differently when other synthetic parts are acting upstream or downstream of them; (ii) large synthetic circuits may impose a substantial load on the cell's resources, (iii) synthetic circuits may cross-talk with endogenous pathways; and (iv) for functional assembly into complicated circuits, biological parts behaviors may be too variable, either because parts are insufficiently characterized or because of intrinsic biological variability due to molecular noise.1,2,5–7 These are unintended design challenges imposed by context dependence and uncertainty. In addition, the use of synthetic circuits for the analysis and control of natural biological systems, for example, in therapeutic applications requires an intentional integration of synthetic and natural circuits.8 Note that, although cell-free systems seem to circumvent many of these uncertainties, they do not offer the same range of applications as in vivo systems and they face their own challenges such as costly resources and difficult scale-up.9
Computational methods for circuit design based on mathematical models have been integral to synthetic biology from its start, and with the increasing challenges of context dependence and uncertainty, their relevance is bound to increase. For example, while one could rely on trial and error to figure out a part configuration for simple synthetic circuits, the rational design of more sophisticated circuits has proven nearly impossible without the help of computational methods. Consequently, a plethora of computational design methods for synthetic gene circuits exists (see ref. 10–14 for detailed reviews). However, only recently approaches are emerging that systematically account for context dependence and uncertainty in the design process. Starting from methods for the simple assembly of well-characterized parts into (theoretically) functional circuits, we here review current methods that tackle at least one of the issues related to context and uncertainty. To reach overall a better predictability and robustness of synthetic gene circuit design, we then argue that concepts from neighboring disciplines such as systems biology and control engineering provide inspirations for next-generation computational methods. For complementary experimental resources and sources of quantitative data, we refer to recent reviews.15,16
The promoter, located directly upstream of the transcription initiation site, is one of the elements of a gene's sequence that controls transcription initiation and thus gene expression. Most computational models that predict gene expression from promoter sequence are statistical in nature, derived from in vivo gene expression data in one or a few experimental conditions, using machine learning methods such as Gaussian mixture models17 and support vector machines.18 In a slightly different approach, Rhodius et al. developed a computational model describing the strength of full-length constitutive promoters in E. coli; again it is a statistical model based on RNA polymerase binding motifs, but it accounts for effects of polymerase concentration on gene expression.19 Another model for inducible promoters in E. coli20 is conceptually different: it uses thermodynamic principles to predict the probability of RNA polymerase binding to the promoter. By assuming that the rate of gene expression is proportional to polymerase binding, the model could predict promoter efficiency in a different regulatory context, at least for simple repression of gene expression.20
Computational models for translation cover the second major tuning knob for gene expression. Early methods focused on qualitative ways to maximize the efficiency of translation by minimizing factors known to reduce its efficiency. Such factors include the presence of secondary structures in the mRNA21 as well as a codon usage that is not adapted to the host.22 For example, Gaspar et al. showed that one can redesign mRNA sequences with less stable secondary structures by maximizing an mRNA molecule's minimum free energy.23 Second-generation methods seek more quantitative control of translation in order to achieve a specific protein expression level, primarily by assuming translation initiation as the rate limiting step.24 The ribosome-binding site (RBS) sequence in bacteria has been a main target for the development of models with increasing predictive power. While some models are purely data-driven with uncertain generalizability,25 others incorporate thermodynamic/biophysical principles such as binding energy contributions to interactions between mRNA and ribosome.26–28 A similar biophysical model allows for the rational design of riboswitches that regulate the expression of the corresponding gene at the translation level.29 In a complementary effort, Welch et al. studied the effect of codon bias on protein expression in E. coli and showed that a (bilinear regression) model could predict protein expression for designed genes.30
Compared to protein production processes, however, computationally designed control of protein expression by degradation processes has received little attention. For example, synthetic post-translational control of protein abundances through targeted degradation is achievable by modifying the degradation tag in a protein's C-terminus.31 Recently, Cameron et al. built an inducible degradation system for E. coli proteins based on components of an orthogonal bacterial transfer-messenger RNA system, which allowed them to regulate protein degradation by controlling inducer molecule concentration,32 but quantitative experimental data were not yet used to establish predictive computational models.
Hence, there appears to be no shortage of computational methods for the design of genetic parts in general. However, as noted earlier,13 most (and an apparently increasing fraction) of the models are statistical, inferred from natural or engineered gene variants. Learning models that generalize beyond the immediate scope of a specific study faces the problem of combinatorial explosion, for example, to characterize the space of possible sequence variants for long sequences, or the space of (relevant) interactions with other genetic elements. Increasing experimental capabilities as well as possible design principles to be uncovered may help reduce this problem. For example, promoter activity in bacteria in complex environmental conditions appears to be well-described by the linear superposition of expression dynamics in simple conditions.33 However, we argue that biophysical models that relate sequence features to functional parameters such as binding affinities hold more promise because they integrate more easily into modeling frameworks for biochemical reaction networks to reflect context effects. For example, a recent large-scale analysis of gene expression determinants in E. coli revealed substantial competition between translation elongation and mRNA degradation processes, thereby relating codon usage effects and global cellular physiology.34 Such multi-dimensional dependencies are hard to cover in a purely data-driven fashion; conversely, for more mechanistic approaches, defining the relevant connections is critical.
Optimization-based methods fill this gap by searching for a potential circuits' architecture and for parametrizations in the corresponding model's parameter space to provide functional solutions. They typically focus on the optimal selection of circuit components from libraries of characterized parts with the objective of designing a circuit that best fits the desired behavior. Examples include a framework for optimal parts' selection for a predefined circuit topology,35 and a more advanced framework that identifies an optimal circuit topology as well.36 A single design criterion such as optimal dynamics may not be sufficient in practice, which is why multi-objective optimization methods for system design were recently developed. They yield a set of optimal solutions, the Pareto front, where none of the objectives can be improved without degrading one of the other objectives. This can reveal trade-offs between different design criteria such as precision and sensitivity, thus allowing the designer to choose the solution that best fits each problem.37–40
Especially for gene circuits that implement digital logic operations, automated design in the plug-and-play mode has demonstrated successes in theory41 and in practice.4 However, it is important to note that uncertainties and mis-specifications at the parts level will propagate to the circuit level, which limits reliable design at larger scale. Promising approaches to this problem combine either phenomenological characterizations of parts42 or biophysical models of parts43 with dynamic mechanistic models for their interactions, but they are limited, for example, to circuits without feedback. In general, analyzing circuit behaviors experimentally and systematically incorporating this data into iterative cycles of computational design remains a key challenge.2
Fig. 1 Communication channels between the parts constituting a synthetic network, the host, and its environment. |
First, the general assumption of parts' modularity (their theoretical ability to connect to each other at will and not to allow their internal functions to affect other parts) rarely applies in the strict sense posited. For example, unwanted communication between individual components in a gene circuit can occur because of retroactivity, the emergence of undesired interconnections between connected components.44 In addition, stochastic fluctuations in gene expression might propagate towards downstream components of a gene network, thus causing significant changes in a circuit's programmed behavior.45
Second, an in vivo synthetic circuit is integrated into some host and thus part of a bigger picture. This entails several (unexpected) channels of communication between the circuit and the host, which could cause side effects. Such effects could emerge from potential unpredicted cross talk between components of the circuit and endogenous components.46 In addition, a synthetic circuit relies on the limited, potentially changeable, host's resources such as energy supply and gene expression machinery, which could affect not only the circuit itself but also the host's endogenous operation.47,48
Finally, considering a synthetic circuit as part of a host, one cannot ignore environmental stimuli such as temperature and pH that could directly affect the circuit's performance.49–51 Indirect effects of environmental factors mediated by the host are also relevant; any factor that alters the host's physiology could affect the circuit behavior. For example, changes in the available carbon sources modulate the growth rate in bacteria, which in turn affects the host's gene expression machinery and the growth-mediated dilution of intracellular components.52
Fig. 2 Mechanisms of context dependence. Any collection of well characterized components will be part of a bigger system, which it will affect and be affected by. (A) Taking advantage of the time-scale separation principle, Mishra et al., designed a load driver operating on fast time scales able to attenuate retroactivity effects between components operating on slower time scales.73 (B) Excess usage of the host resources would result in a huge burden imposed on the host leading to circuit failure. With the aid of a computational model of translation illustrating the resource usage, Ceroni et al. evaluated different circuit designs with respect to their output capacity and ribosome usage.69,70 (C) Rewiring circuits to natural pathways can provide interesting dynamic behaviors with very few parts. The yeast mating pathway has been rewired to a heterologous G-protein coupled receptor (GPCR), the mating genes have been replaced by a reporter, and the natural receptor has been knocked out.109 (D) Environmental factors are known to affect the physiology of model organisms and could consequently perturb any subsystem. Hussain et al. used computational modeling to understand the effect of temperature perturbations on their system of interest and counteracted it by substituting the wild type LacI repressor with a temperature sensitive mutant.82 |
Propagation of intrinsic noise, namely of stochastic fluctuations of molecular components internal to the circuit and independent of the circuit's context, is another important source of non-modularity. Broad experimental and theoretical evidence suggests that intrinsic noise can cause miscommunication between components, loss of modularity, and partial or complete collapse of the circuit's functionality.3,57 Many methods have been developed to account for and to control noise and its propagation in biological circuits, but examples for successful translations to improved circuit design are still rare.58 However, the potential is demonstrated by an application of stochastic principles that has recently led to modifications on the famous bacterial repressilator circuit,59 reducing noise propagation and thereby resulting in drastically more robust oscillatory behavior.60 For future design improvements, several recent theoretical concepts appear relevant. Adaptations of the concept of signal to noise ratio from electronics and its application to libraries of biological parts or whole circuits could help characterize their efficacy.61 One can decompose the noise propagated from an input signal to the output into a dynamical error (encoding how well dynamics of the input are tracked) and a mechanistic error (encoding the deviation of the signal magnitude) to find network motifs that influence signal fidelity and to evaluate trade-offs between the two types of errors.62 Finally, Oyarzun et al., focusing on a negative autoregulation system (one of the first motifs shown to attenuate stochastic fluctuations), studied the effect of repression parameters on the noise of the circuit's output, and expressed this noise as a function of the design parameters (promoter strength, promoter sensitivity, and repression strength). Such formalisms can enable the definition of conditions on the parameters with the objective of reducing noise.63
Specifically geared towards communication via the gene expression machinery, early work suggested minimizing the use of ribosomal resources by constructing orthogonal ribosomes specific for heterologous mRNA; rational computational design of orthogonal ribosomes aimed to minimize interference with the native translation machinery.68 Representing only the natural context, Algar et al. developed a model of gene expression accounting for ribosome resources to assist in the selection of candidate designs with minimal load,69 and model predictions were also validated experimentally.70 Similarly, models with explicit representation of RNA polymerase were instrumental in designing precisely controllable transcription factors.71 Qian et al. also recently proposed a more phenomenological model based on conservation laws to predict unexpected effective interactions between genes.72 These ideas extend to whole-cell models of resource sharing. Notably, Weiße et al. accounted for energy equivalents, free ribosomes, and proteins in a coarse-grained representation of yeast; by incorporating a synthetic circuit in this model, they were able to predict its dynamic behavior in communication with the host.73 Even more detailed whole cell computational models were proposed to design synthetic circuits that account for host dependences and study their effect on the host,74,75 but it is hard, if not impossible, to validate such detailed models. To deal with metabolic burden, alternative strategies that rely on closed-loop control are also emerging. For example, for metabolic engineering applications that aim to divert a natural pathway towards a synthetic product, Oyarzun et al. designed a synthetic controller capable of dampening perturbations in the host flux distribution while increasing the production of the synthetic product, and also identified trade-offs in the design of promoter and RBS strength.76
Interference with host signaling has received considerably less attention, despite its importance, for example, in potential therapeutic applications. The design of orthogonal signaling components de novo, or by exploiting unnatural mechanisms for the design, is one possible avenue to reduce interference. For example, Green et al. designed transcriptional riboregulators that rely on linear-linear RNA interactions engineered in vitro using thermodynamic models, which are absent from natural systems.77 Yet, especially to interface natural and synthetic signaling, it seems safer to rewire already well-characterized endogenous signaling pathways (Fig. 2C). This approach is strongly encouraged by pioneering work on the mating pathway in yeast: the pathway can be rewired to non-natural inputs and outputs78 and pathway dynamics can be reshaped by engineering scaffolds for host endogenous kinases.79 Rewiring natural pathways also led to first, striking therapeutic and diagnostics applications in mammalian systems.80,81 However, the signaling area does not yet take all the advantage it could from widely used computational approaches for the analysis of natural signaling pathways. Further developing these methods for the design of synthetic controllers or the prediction of optimal rewiring designs could provide substantial advances.
For example, temperature dependencies of chemical reactions as represented in the Arrhenius law are expected to translate to different behaviors of synthetic circuits under different temperatures, which requires more detailed computational models. For example, Hussain et al. engineered a synthetic gene oscillator that shows robustness with respect to temperature alterations (Fig. 2D) by first incorporating temperature effects on the reaction parameters into a computational model and predicting the expected changes in the circuit's dynamic behavior due to temperature changes. They then identified and implemented a modification in the circuit with the opposite dependency on temperature to balance temperature effects on reaction rates.82 A similar approach was used to study temperature effects on a feed-forward circuit.83 This work identified inherent circuit properties associated with robustness to changes in temperature, such as similar temperature dependencies of production rates or similar degradation rates for different states, and proposed circuit modifications such as negative feedbacks to enhance temperature robustness. Here, an internal model of temperature dependences helped designing appropriate feedback loops for compensation of these dependences. This is an instance of the internal model principle from control theory. Adaptations of the principle to biology are rare,84 but they constitute an interesting research direction for the design of synthetic circuits that are able to compensate for environmental perturbations.
Environmental perturbations (as well as intrinsic biological variability) also induce extrinsic noise, defined as fluctuations of circuit parameters that depend on the context of the circuit or the host cell in which it is integrated. Considering unknown external stimuli as potential sources of extrinsic noise, Zechner et al. developed a method for the estimation of dynamically changing noise, which, following principles of noise cancellation in electrical circuits, could be a prerequisite for counteracting it.85 In addition, first methods are available to decompose variation in biochemical circuits into potential sources of variation, such as intrinsic noise and specific properties of the intra- or extracellular environment.86 Noise decomposition approaches so far may suggest which reporters to add to the system to predict the magnitudes of different variation components, but they do not yet provide guidelines on ways to make circuits more robust to specific environmental perturbations. Overall, hence, going from parts to host to environment, computational methods for predictable design are becoming increasingly rare, indicating clear needs for future developments.
One cannot directly assess robustness of a dynamic behavior to perturbations by differential sensitivity analysis. Several approaches for dynamic analysis have been proposed, but they are not widely used yet. For example, one can define a linear temporal logic property to translate the target dynamic behavior into a Boolean property. The size of the largest region of parameters around the optimal parameter set, for which the property is still valid, then indicates robustness of the system.89 Dynamic performance criteria in the time domain can also be formulated as approximate formal control-like specifications in the frequency domain. Then the structured singular value, a tool for robustness analysis in control engineering, evaluates the largest perturbations that the system can withstand while keeping the target behavior, for example, in signaling to protein kinase cascades.90 Iadevaia et al. give an example of local robustness analysis of a dynamic behavior towards topology perturbations, namely of mutations of up to four network nodes.91 Robustness is measured by the ability of a network to achieve the desired dynamic behavior for at least one parameter set, which is still local because it is assessed around the initial topology. Thus, local robustness analysis can help identifying which parameters might cause a loss of performance, and analyzing post-hoc the circuit robustness to small perturbations in parameters, inputs, or topology.
More advanced methods account for the system's tolerance to local perturbations during the design optimization (Fig. 3B). One possibility is to augment the objective function with a term corresponding to the average or the variance of the performance over an uncertainty region around the currently assessed parameter set. For example, Rodrigo et al. defined a scaled cost function ϕ* = (1 − λ)ϕ + λ〈ϕ〉, where ϕ is the metric function (accounting for the distance between the simulated and the target dynamic behavior) and 〈ϕ〉 is the average of this function over pre-defined variations in all parameter values. The user specifies the robustness weight λ, which biases the optimization towards best performance or towards highest robustness.92,93 In local robustness design, a trade-off between the best fit of the circuit's behavior to the target behavior and the best local robustness is a general feature. The structured singular value approach could be extended to compute this precise trade-off using skewed μ, which gives the worst-case performance for a given parameter uncertainty.90 So-called μ-synthesis algorithms, which enable to adjust robustness and performance weights until the best trade-off is found, are common in engineering,94 but not yet transferred to gene circuit design.
The most straight-forward and commonly used approach to analyze the global robustness of a biological system consists of sampling parameter sets in the whole parameter space and checking for each set whether it is located within the feasible region or not (Fig. 3B). The ratio of samples in the feasible region over the total number of samples, often called the Q-value, approximates the relative area of the feasible region. For a given performance threshold, the extent of the feasible region, and thus the Q-value, is a property of the system's topology, which allows to evaluate and compare the global robustness of network topologies. This method was originally used for studying the evolution of robustness in natural systems such as circadian oscillators.95,96 More recently, the method has been used to derive design principles for biochemical circuits able to achieve adaptation to varying, but time-constant stimuli.97 Another example of application of the Q-value for circuit design was given by Chau et al.: they identified combinations of network motifs that can achieve robust cell polarization and cells with implemented circuits predicted to be more robust indeed showed a higher frequency of robust polarization (65% vs. 5% for topologies expected to be less robust).98
Specifically for steady-state global robustness towards parameter variations, it is possible to use an analytical method to construct a so-called design space (Fig. 3C). Ordinary differential equations used for modeling biochemical systems at steady state can be reduced to an S-system (a set of first-order ordinary differential equations that are all a difference of products of power-law functions) by introducing new variables and neglecting non-dominant terms.99 This representation enables the dimensional compression of the parameter landscape, in a design space that can be constructed and analyzed automatically.100,101 The design space is a two-dimensional log-space defined by two of the system's parameters, where the model's equations determine the bounds of phenotypic regions. The phenotypic regions enable a simple discrimination of circuit behaviors according to some local robustness criteria, such as robustness of a certain flux to parameter variations. Importantly, region boundaries of the design space incorporate the influence of the whole parameter landscape on the circuit's steady states, not only the dependency of a certain performance on two parameters.102,103 Analytical expressions involving steady state concentrations and kinetic rates describe the regions' boundaries, and therefore the extension of regions satisfying a certain robustness criterion, and thus global robustness design, is straightforward. It is also possible to analyze the circuit's global robustness towards the fluctuation of a particular parameter. Since performance criteria can also be plotted on the phenotypic regions, this method is very convenient for identifying trade-offs between local performance and robustness. Again, the approach was primarily developed to infer design principles of natural systems, but first design-oriented applications have been presented, for example, for the construction of synthetic oscillators.104
Finally, the Bayesian computation framework, used originally for parameter inference, offers the possibility to directly estimate the global robustness of a circuit (Fig. 3D).105,106 Bayesian inference is a probabilistic method based on Bayes' theorem. In the systems biology context, it enables the inference of a model topology and of parameters (specifically: of posterior distributions of parameter values) that caused a certain observed behavior as defined by data D. For circuit design, a similar method can find the optimal circuit topology and corresponding parameters by simply replacing the observed behavior by the target behavior. The marginalization of the posterior probabilities of a topology (or model) M with parameters θ, given the target behavior D, over all possible parameters, , is a measure for the global robustness of one model. For the simple case of two parameters, this marginalization corresponds to determining the volume below the surface of the posterior probability (Fig. 3D). In performing optimization over circuit (model) topologies with this criterion, the method naturally converges towards a flatter and, thus, more robust solution. However, approximate Bayesian computation incurs high computational costs for sampling and simulation, which currently prevents a systematic enumeration of topologies with more than four nodes. Mathematical or computational complexity of all of the methods for global robust design discussed in this section has so far prevented wide-spread adoption for synthetic circuit design (and, to our knowledge, no experimentally demonstrated proofs of principle), but these are clearly promising approaches for the field.
First steps in the direction of methods integration have been taken, for example, with the Cello design environment for the automated design of synthetic Boolean circuits, which not only accounts for the burden imposed by the circuit on the host, but also aims to insulate designed components from their context.4 More integrated approaches will need to handle at least two kinds of trade-offs. First, trade-offs between circuit complexity and context dependence will manifest themselves. For example, while insulator devices can reduce retroactivity, they consume gene expression capacity and cellular energy, and their strategic placement will therefore be essential. Second, there are inherent trade-offs between complexity (computational complexity as well as requirement on experimental data for calibration) and accuracy of the mathematical models underlying the design methods. We need models that encompass the parts' details (sequence as well as function), the parts' main interfaces with the host (e.g., polymerase, ribosome, and energy requirements), and sufficiently abstract representations of the host. We argue that neither purely statistical models on the parts' end (because of limited generalization), nor detailed whole-cell models for the host (because of virtually impossible validation) hold particular promise. Rather, biophysically motivated, more mechanistic models for parts could be combined with ‘resource-oriented’, systems biology models of the host73 of manageable complexity.
For robust design in general, it will be important to leverage concepts from both systems biology and from ‘traditional’ engineering disciplines beyond electrical circuit design (analogies). We highlighted concepts from control engineering such as the internal model principle and the structured singular value, that tailored design methods for synthetic biology could be based on.58 For example, control concepts could help establish feedback via previously proposed dynamic load sensors.66,67 In principle, global robustness methods from systems biology can provide guidelines on the type of topology to use for circuit design, without any prior knowledge on parameter values. In addition to addressing computational complexity and usability, we think that it will be crucial to augment existing methods with practical considerations such as whether a predicted, robust circuit (parametrization) can be implemented experimentally, and what the associated effort is.
Finally, three specific areas and concepts appear particularly promising for future developments. Systematic computational methods are virtually absent for design that accounts for stochastic effects, for example, in gene expression, and for the (re)-wiring of cell signaling using synthetic components. Recent developments on feedback controllers that ensure robust adaptation also with gene expression noise,107 and first therapeutic applications of re-wired signaling pathways for diabetes treatment,81,108 respectively attest to the potential of these two areas. In addition, most of the current rational design methods operate in open-loop – predicted designs are tested experimentally, but without systematic feedback of data to models for design improvements. As noted earlier,1 closed-loop, iterative design is a major challenge. First approaches in this direction are restricted to circuits without (explicit) feedback.42 We suggest that Bayesian methods are particularly suitable for data assimilation, and eventual design and testing in closed-loop. This could increase robustness of synthetic circuits in the face of biological uncertainties, and substantially reduce experimental efforts associated with either (intuitive) trial-and-error, or with screening of design alternatives (only) predicted to work as synthetic systems.
Footnote |
† These authors contributed equally. |
This journal is © The Royal Society of Chemistry 2017 |