Yehuda
Halfon
a,
Louie
Aspinall
a,
Joshua
White
a,
Isobel
Jackson Hirst
b,
Yiheng
Wang
b,
Michele C.
Darrow
cd,
Stephen P.
Muench
b and
Rebecca F.
Thompson
*a
aSchool of Molecular and Cellular Biology, Faculty of Biological Sciences & Astbury Centre for Structural and Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK. E-mail: r.f.thompson@leeds.ac.uk
bSchool of Biomedical Sciences, Faculty of Biological Sciences & Astbury Centre for Structural and Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK
cThe Rosalind Franklin Institute, Harwell Campus, Didcot, OX11 0QS, UK
dSPT Labtech Ltd, Melbourn Science Park, Melbourn, SG8 6HB, UK
First published on 15th July 2022
Cryo-electron microscopy (cryoEM) has been transformed over the last decade, with continual new hardware and software tools coming online, pushing the boundaries of what is possible and the nature and complexity of projects that can be undertaken. Here we discuss some recent trends and new tools which are creating opportunities to make more effective use of the resources available within facilities (both staff and equipment). We present approaches for the stratification of projects based on risk and known information about the projects, and the impacts this might have on the allocation of microscope time. We show that allocating different resources (microscope time) based on this information can lead to a significant increase in ‘successful’ use of the microscope, and reduce lead time by enabling projects to ‘fail faster’. This model results in more efficient and sustainable cryoEM facility operation.
There are a diverse set of cryoEM facilities globally, with differing microscope hardware, funding models, project mixes, user communities and core aims. All of these factors influence how a facility infrastructure should be resourced and structured to support its user communities. The operational model, data collection strategies employed and wider resources including staff and computing should be structured to match the aims of the projects within the facility.
The single particle structure determination pathway is often presented as a linear, stepwise process, although those with cryoEM experience will likely recognise the need for multiple retrograde steps in order to progress through the stages (Fig. 1). At the facility level, the efficiency of and support available for each step in the process should be considered and optimised to maximise the output from the most expensive (per day) of these steps: data acquisition.
Due to the cost per day, much emphasis is placed (including in this manuscript) on efficiency and throughput at the cryoEM data acquisition stage, conducted on the microscope. While this is of relevance (whether the cost is directly incurred by the user or taken on through core/centralised funding) when accessing high-end cryoEM instrumentation, time is now generally not the most common ‘bottleneck’ in our view – although efficient use of microscope resources is still vital given the overall demand for high-end cryoEM instrument access.
Sample preparation remains a major challenge for many cryoEM projects. For the majority of single particle cryoEM researchers, the main approach used for cryoEM specimen preparation is filter paper-based blotting followed by plunging into a cryogen such as liquid ethane, as first pioneered by Jaques Dubochet and colleagues.5 Whilst countless high-resolution structures have and still are being obtained from grids prepared using blotting-based techniques,6,7 for many samples the production and/or reproduction of good quality grids using this approach is challenging. Research into the causes of variation in grid quality indicates that issues such as protein aggregation, denaturation, preferred particle orientation, subunit dissociation and particle concentration are being caused or exacerbated by interactions between the sample and the air–water interface or the sample and the filter paper.8–11 This has led to the development of alternatives to blotting-based approaches that aim to minimise uncontrolled sample interactions and improve reproducibility.12–15 Most of these technologies do this by generating small droplets (using a variety of methods) which are deposited on the grid en route to the cryogen. By removing the blotting step, decreasing the time between sample deposition and vitrification, and automating more of the process (including sample deposition), some of the aforementioned issues can be reduced and even completely avoided for otherwise difficult samples.16–18 Although this new generation of technology increases the range of grid preparation tools available, we still have a poor understanding of many of the fundamental processes that occur during grid preparation which make some proteins more amenable than others to downstream structural studies. The process of making cryoEM grids is very quick in the context of the structure determination pipeline, however screening and working iteratively through the process to find a grid with suitable characteristics for data collection can be a major hurdle.
Grid screening to assess suitability for data acquisition typically involves a manual inspection of the particle distribution and ice thickness, and/or the acquisition and processing of a small test dataset. The process of screening can be time consuming and subjective, especially for those new to the field, but even highly experienced individuals are not always able to accurately predict the subsequent outcome of a data collection. Once a suitable grid is obtained, a single particle data acquisition session can be scheduled (or take place immediately). The microscope, detector, collection parameters (dose rate, total dose, magnification) and length of the collection can be chosen to try and match the needs of the project.
After data acquisition, single particle image processing approaches are used to reconstruct a three-dimensional (3D) EM density map of the specimen which can then be subject to model building and further interpretation. Leading software packages for single particle reconstruction such as RELION and cryoSPARC19,20 offer a pipelined approach to this workflow. Graphical user interfaces present users with a list of jobs ordered according to their respective position in the pipeline, with default parameters enabling non-expert users to complete the basic workflow and achieve informative results. However, most projects presently require an iterative approach in which sections of the pipeline, such as 3D classification, are revisited several times, with different parameters and intentions, to gain a better understanding of a dataset and its heterogeneity. Thus, processing of single particle datasets demands a significant amount of computational resources and hands-on computational time invested by the user, both of which are potential bottlenecks in progressing from sample to structure.
Notably, the steps at the beginning of this workflow – pre-processing of images including motion correction and CTF estimation – are routinely performed on-the-fly,21 enabling quality assessment of the data coming from the microscope in real-time. More recently, the application of machine learning software has facilitated the automation of steps that traditionally required extensive user input. Software packages such as crYOLO22 and TOPAZ23 permit accurate, unsupervised particle picking through the use of pre-trained, convolutional neural-network (CNN) models. As such, the identification of particles in micrographs can be incorporated into a fully automated processing pipeline. Extraction of these particles is then followed by 2D classification after which 2D class averages containing recognisable protein features must be identified and carried forwards, to select ‘good’ particles and discard those that are sub-optimal. This selection is traditionally subjective and carried out manually by the user. However, recent work has demonstrated the utility of a CNN model for unsupervised 2D class selection, overcoming subjectivity and expanding the section of the single particle workflow that is amenable to automation.24 For example, the Schemes framework within RELION 4.0 permits robust unsupervised processing up to, and including, the generation of a series of initial models, significantly reducing the time between image acquisition and 3D reconstruction.
In this article we will discuss past trends in single particle data acquisition and their impact on the present and future operations of cryoEM research facilities, with a focus on how these might influence the efficiency and throughput of structure determination by cryoEM. We will discuss the challenges and opportunities that these emerging technologies present and propose guiding principles for efficient facility operation going forward.
The first electron microscopy data bank (EMDB) deposition for a single particle cryoEM structure from a Titan Krios microscope with a direct electron detector was in 2013 (EMD-2238), where data was collected on a Falcon I. In the >9 years since, direct electron detector technology has evolved considerably. All three of the major direct electron detector manufacturers (Direct Electron, Gatan, and TFS) have released multiple iterations of detector in this time. Looking across all submissions to the EMDB, we see the average resolution reported for single particle structures improving as new iterations of detector technology are released (Fig. 3). This trend is particularly linear for the Thermo Fisher Scientific Falcon series of direct electron detectors (Fig. 3A). As new detector technology comes online, these typically come with an uplift in the throughput of image acquisition as well as an improvement in the quality of the images obtained (sometimes presented through DQE measurements) (ESI Table S1†).31–40 This improvement in detector quality has, alongside improvements in protein biochemistry, cryoEM sample preparation, and image processing methods, led to an average improvement in the resolution of cryoEM structures. However, we and others have noted that single particle data collected on the same sample and sometimes the same grid using different camera technology results in significant differences in resolution and so detectors are likely to be a key factor.
Alongside improvements in detector speed, fringe free illumination (FFI) and aberration free image shift collection (AFIS) have been introduced and implemented to further increase the speed of collection. AFIS enables the use of beam-image shift for collection of single particle data, whilst preventing consequential coma and astigmatism that would otherwise reduce data quality. This is achieved by compensatory adjustments to the deflection coils and objective stigmator when performing beam-image shift at different distances. Collecting with AFIS is typically performed within 6–12 μm (ref. 25) of the stage location and increases the throughput substantially by reducing the number of stage movements required. Each stage move takes time, alongside the associated stage settling wait time, and so reducing the number of stage movements increases the speed of a typical acquisition.
The diameter of illuminated area determines how many images can be collected per hole while ensuring each acquisition area is not double exposed. Typically, when the condenser 2 aperture is imaged out-of-focus (while the sample is in focus), wave interference at the edge of the condenser beam appears as Fresnel fringes. As a result a larger illuminated area must be used to exclude these fringes from the image. FFI involves an adjustment to the microscope to minimise the presence of Fresnel fringes in the recorded image at a specific magnification. With FFI implemented, both the C2 aperture and the sample will be in focus and no, or very few, Fresnel fringes will be visible in the recorded image. This allows for a reduction of the beam size and more images acquired from a single hole, againallowing more images to be acquired per stage move (Fig. 4).
These advances mean that on average, the number of images collected per hour in a single particle data acquisition session is far higher than in previous years. However, disentangling trends in average dataset size is challenging because EMDB submissions can contain a mixture of micrographs and particle stacks, amongst other data and each EMDB entry can be associated with multiple EMPIAR entries and vice versa. These complications make it difficult to assess the relationship between the amount of data collected to yield each EMDB submission.
Both the specimen preparation and microscope hardware can influence the time taken and success of the screening step. Newer specimen preparation technologies provide a view of the grid from the freezing process which can be used to judge the quality of the grid and ice without loading into a microscope.15,27–30 Screening for appropriate ice thickness, and particle concentration and distribution can be done manually. However this is dependent on microscope operator experience, and is always a subjective judgement. Additionally some issues, such as preferred orientation or partial denaturation, will only become obvious after processing at least a small dataset.
In the majority of facilities, the ‘working day’ (for the purposes of this paper we consider this an 8 h window) is ∼9am–5pm, where the majority of staff are on site. Automated data collection (which can run 24 h per day) is then used to collect data when staff are offsite. Due to these standard working patterns, the majority of facilities scheduling occurs in 24 hour time blocks. Historically this would have meant 24 h collection on a single project. However, recently developed tools in programs such as EPU and SerialEM have opened possibilities to collect multiple datasets on different grids while the microscope is unattended. This enables users to collect datasets more tailored to the needs of the project and based on their target resolution or aim of the project, whilst also utilising the imaging resources of the facility better, especially over weekends or public holidays.
Multi-grid imaging also enables a more efficient collection of small ‘proof-of-concept’ datasets, typically 0.25–4 hours, for the purpose of providing better understanding of the condition of the grid and indication of whether a longer data acquisition is required and/or warranted. This style of data collection is necessary because it is difficult to predict through manual assessment of the micrographs alone if the complete complex is present, if factors that may impact structure determination such as preferred orientation are present, and if the desired resolution will be obtainable from the grids. Once a suitable small-scale data collection has been completed and processed, the relationship between the resolution of a 3D reconstruction and the number of contributing particles can be assessed. By running a series of 3D refinements on random subsets of particles from the particle stack, where the subset size is doubled after each refinement, a B-factor plot (or Rosenthal–Henderson plot)26 can be generated.
With tools that remove the current confinement to 24 h ‘blocks’ and an increasing acknowledgement that manual screening is unlikely to confidently predict success at the data acquisition stage, there is an opportunity to change the standard pipelines for cryoEM data acquisition. A workflow based on manual assessment of micrographs (Fig. 5A) with a single ‘gate’ determining if the project should proceed to the next stage is likely to be replaced by a multi-step process (Fig. 5B) that makes use of short collections and more robust metrics of quality to more confidently predict success of data acquisition sessions.
Fig. 6 Examples of project categorisation and downstream suggested workflow with chance of success expressed as a % (data shown in ESI Table S2†). Retrograde steps are shown in red arrows. (A) Fixed pipeline, all projects receive 8 h of screening time followed by 24 h of data collection if manual inspection identifies a good grid. (B–E) Stratified approach to resources applied to each project. (B) Low chance of failure (e.g., well characterised icosohedral virus), project is immediately allocated 24 h collection. (C) Medium chance of failure, 4 h screening is allocated with a collection session scheduled if the small processed dataset confirms grid is suitable. (D) High chance of failure, 8 h screening is allocated with a collection session scheduled if the small processed dataset confirms grid is suitable. (E) Variable chance of failure or where there are other time restrictions applied, 24 h is allocated in the first instance with further collection scheduled where initial processing confirms suitability. |
The definition of ‘success’ varies largely on a project by project basis, but essentially a ‘successful’ progression through the pipeline means no requirement for a retrograde step (i.e. the quality of data obtained was sufficient to answer the question in hand). For example, for ‘successful’ screening; in the fixed pipeline, success might look like monodisperse particles at a good concentration with vitreous ice. For a ‘medium’ risk project, success would be defined as 2D classes from a pilot collection showing secondary structure detail, a complete complex and a range of views.
In the fixed pipeline, all projects are assigned the same microscope resources initially – an 8 h manual screening session. After this screening, if manual inspection of the micrographs looks promising (vitreous ice, good particle distribution as judged by eye) a 24 h collection is scheduled. For variable pipelines, we have provided examples of ‘low’, ‘medium’, ‘high’, and ‘variable’ risk projects. A ‘low’ risk project would involve a well defined, homogeneous, specimen in the hands of an experienced researcher. Examples might include icosohedral viruses. We assigned these projects an 80% chance of success and estimate that 15% of projects fall into this category. ‘Medium’ risk projects form the majority of projects seen in our facility, at 60% of projects. These are projects where there is a 50/50 chance the grids will be optimal during the screening session, either because the sample requires optimisation (biochemistry or cryoEM grid preparation) or because the user is developing their cryoEM skills, or both. Samples may include those not previously imaged by cryoEM but where preliminary data from negative stain looks promising. ‘High’ risk projects are those where there are obvious challenges with the sample (for example, a medium risk project that has had 3+ imaging sessions would move into the high-risk category).
During standard workflows, there is usually a time gap between when grids are made and then when they are imaged. For the majority of projects this is acceptable or workable, but there are specific cases where microscope time may need to be scheduled to allow immediate feedback about the grids to allow more grids to be made. An example might include a challenging protein purification where the protein cannot be frozen or stored. We have termed these ‘variable’ probability.
Here we have proposed that each categorised ‘risk’ of project is assigned different up-front microscope time based on the risk profile of the project. For example, low risk projects are directed immediately to a 24 h collection session, as the grids are likely to be suitable for collection. Medium risk projects are initially assigned 4 h screening, high risk projects 8 h of screening and variable risk projects 48 h microscope time. Within this initial session, users would be expected to collect a small subset of data (even 20 minutes can be highly informative) on their most promising grid(s) and then process this data to yield at least 2D class averages. Only if these show evidence of a promising structure (secondary structure detail, range of views, whole complex present) will a 24 h data collection session be allocated.
The full data calculations are shown in ESI Table S2,† and are based on the workflows and % chance of success shown in Fig. 6. This shows that if a fixed pipeline is used for 100 projects, 2000 hours of microscope time is required, of which 1200 h is within ‘working hours’ while 800 h is out of hours. If we compare this to the alternative approach of allocating resources based on project risk, for the mix of projects shown here, for 100 projects, 1656 h of microscope time is required, of which 570 hours are ‘working hours’ and 1086 are ‘out of hours’.
Overall, taking a risk based approach to projects means that for 100 projects with the same overall probability of success (0.5 for fixed pipeline, 0.51 for variable pipeline), less microscope time is used overall, reducing demand for microscope time and decreasing wait times between sessions. Significantly, the amount of ‘working hours’ required is also reduced in the risk-based approach. Given that scheduling of microscope time is typically based around when human operators are present to input into decision making, this also helps to reduce the wait time for microscopes. In the risk-based approach, more microscope hours lead to a ‘successful’ outcome and fewer microscope hours are wasted (or ‘failed’) compared with the fixed pipeline.
While the theoretical argument for the risk based approach is clearly compelling, there are barriers to its efficient implementation. The project mix and resources available, along with the culture of the workplace relating to out of hours and weekend working will all vary how the framework is stratified and the resulting potential benefits. Splitting projects into risk bins may be extremely challenging and require the active and willing participation and cooperation of the user community. For the system to work optimally, facility staff should have good working knowledge of projects coming through the facility and a user community who understand the population level benefits of engaging with the system (e.g. reduced time between requested microscope sessions) and providing information to enable the best possible characterisation of the projects. With this in place, project categorisation should always be considered dynamically. Even with these considerations in mind, we feel the majority of facilities will likely benefit from project and resource allocation based on risk, compared with a ‘one-size fits all’ approach.
Another barrier to implementation of this model is facility staff time. Shorter microscope sessions may require loading of the microscope more frequently, and a larger number of shorter collections increases the amount of staff support required. One mechanism of tackling this is increasing users’ independence on the microscope; a second is increasing staffing within the facility. The model and financial constraints of each facility will impact which routes are taken to deliver facility operations.
In our experience, long waits for microscope time (especially screening time) are one of the biggest concerns for a cryoEM researcher, as it can be difficult to make any progress on the project when the outcome from initial screening is not yet known. Generally, microscope time ‘little and often’ is considered to be more useful than longer sessions with large wait times in between. Focusing facility operations to keep the time users wait between sessions to a minimum allows us to meet this need of the cryoEM community. Another benefit of this system is it allows a larger amount of microscope time to be allocated up front to the projects that will really stand to benefit, maximising chances of successful structure determination for these more challenging systems.
For the purposes of this analysis we have allocated 24 h to each ‘data collection’ session. In reality, as discussed above 24 h may not be the optimal amount of time to collect data; for some projects it may be far too much and for some projects not enough. It is highly likely that the model presented here for a risk based approach to allocation of resources can and should be extended out to include a variable amount of data collection time based on the known properties of the macromolecule and the desired outcome for the project. This would likely further improve the efficiency of the framework.
Many institutions worldwide are looking to improve their environmental sustainability, and often have ‘net zero’ emission targets. Operating hardware such as electron microscopes has a carbon cost, but the data generated may have a significant and longer lasting carbon footprint. Data generated must be processed and then stored; many funders then stipulate that these data must be stored past the end of the grant, for up to 10 years. Collecting more data than required has not only a microscope time ‘cost’ associated with it but also a carbon cost, which scales according to the amount of data. Larger datasets are more computationally expensive to process, and store. A final benefit of the risk-based framework proposed here is that overall less ‘bad’ data will be produced, which will not require processing and storage.
A vision for a future cryoEM facility would integrate data from across the pipeline, from biochemical and biophysical analysis, cryoEM specimen preparation and on-the-fly analysis of micrographs into 2D and 3D data, as data are collected. These data could then be used to make dynamic, on-the-fly decisions about progressing or halting data acquisition, automatically moving onto the next sample.
To work towards an idealised vision and maximum output for imaging biological specimens, much work is still outstanding both on the specimen preparation and image processing pipeline elements to support new approaches in microscope scheduling. Many variables impacting the success of specimen preparation are still not understood requiring cryoEM grids to be empirically tested which can take valuable time and resources. Work to better understand the factors influencing specimen preparation and next generation specimen preparation devices may transform this portion of the pipeline. During microscope data collection sessions, machine learning approaches may help users to identify optimal areas for data acquisition with minimal or no human intervention. On the image processing side, on-the-fly image processing, automation, and the implementation of deep learning mean that assessing the quality of a sample and its amenability to high-resolution structure determination has never been faster and will improve further in the future. Coupled with new automated, optimised acquisition capabilities across grids, this offers new opportunities to the user, such as investigating the effect of buffer conditions on the compositional heterogeneity of a sample, or assessing the influence of additives (such as surfactants and detergents) in improving particle behaviours (e.g. preferred orientation) all within a single session.
Only when information is integrated across the cryoEM pipeline, supported by fast and accurate image processing, will we be able to most efficiently distribute time on high-end cryoEM infrastructure. While this future is not quite here, recent exciting advances have and will continue to challenge facility managers to consider how best to organise the distribution of resources within their facility to maximise biological discovery.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2fd00129b |
This journal is © The Royal Society of Chemistry 2022 |