Agnel Praveen
Joseph
*,
Sony
Malhotra
,
Tom
Burnley
and
Martyn D.
Winn
*
Scientific Computing Department, Science and Technology Facilities Council, Didcot OX11 0FA, UK. E-mail: agnel-praveen.joseph@stfc.ac.uk; martyn.winn@stfc.ac.uk
First published on 2nd August 2022
Cryogenic electron microscopy (cryo-EM) has recently been established as a powerful technique for solving macromolecular structures. Although the best resolutions achievable are improving, a significant majority of data are still resolved at resolutions worse than 3 Å, where it is non-trivial to build or fit atomic models. The map reconstructions and atomic models derived from the maps are also prone to errors accumulated through the different stages of data processing. Here, we highlight the need to evaluate both model geometry and fit to data at different resolutions. Assessment of cryo-EM structures from SARS-CoV-2 highlights a bias towards optimising the model geometry to agree with the most common conformations, compared to the agreement with data. We present the CoVal web service which provides multiple validation metrics to reflect the quality of atomic models derived from cryo-EM data of structures from SARS-CoV-2. We demonstrate that further refinement can lead to improvement of the agreement with data without the loss of geometric quality. We also discuss the recent CCP-EM developments aimed at addressing some of the current shortcomings.
The need for cryo-EM map and model validation has been recognized over the last decade. Validation spans different aspects including quality of the map or model derived, fit-to-data, overfitting and bias introduced in processing. A validation task force for cryo-EM discussed the community needs and requirements for validation.7 This was followed by a number of developments and initiatives focused on validation.8–10 The EM data resource map and model challenges10,11 have played a very useful role in identifying new requirements and providing datasets for further developments.
An atomic model provides a more interpretable representation of the map reconstruction. However, atomic model building and refinement are increasingly difficult at low resolutions, and hence model validation becomes even more crucial. The geometrical arrangement of atoms in the model is expected to conform to commonly occurring conformations. A number of validation tools developed originally for X-ray crystallography compare the stereo-chemical properties of the atomic model against reference standards (MolProbity,12 WHAT-CHECK,13 O14).
Often, the expected geometry standards are either introduced as part of the function being optimised or as restraints in atomic model building and refinement. Depending on the weights used for these parameters and restraints, one might end up overfitting to expected geometric standards without improving model representation of the data. For example, some refinement approaches overfit the backbone phi/psi angles to the centroid of allowed Ramachandran space leading to ‘unusual’ phi/psi dihedral distribution in the model. Recent studies demonstrate that Ramachandran Z-score15 is very useful for detecting such anomalous distributions. Similarly, CaBLAM16 was developed to evaluate the quality of the model backbone and detect unusual secondary structure geometries, especially relevant to models built from low resolution data. It is advisable to fix geometry outliers where possible, prior to automated model refinement.17
The most common metric used to quantify agreement of the atomic model with the cryo-EM map is the cross correlation calculated either in real space18–20 or in different resolution shells in the Fourier space (Fourier Shell Correlation (FSC)).21 Several other metrics have also been tested and reviewed in these recent articles.8,22,23 With data resolution getting better, multiple methods have been developed to evaluate agreement to map at the residue level.8,23–27 The absolute values of most of these metrics vary with the map resolution.10
Overfitting to noise in the data is an important factor to consider when trying to optimise model fit-to-map. Over the years several approaches for cross-validation have been proposed to detect overfitting.21,28–30 However, the requirement of a sufficiently large independent dataset has been the primary factor limiting the development of a standardised cross-validation approach equivalent to the R-free employed for X-ray crystallography.31
Ideally, an atomic model is expected to provide the ‘best’ representation of features resolvable at the data resolution while maintaining a good overall geometry. As cryo-EM data often samples a wide range of resolutions within a single structure, it is important to assess features resolvable at different resolutions and multiple tools are required to evaluate features resolvable at low resolutions. As mentioned above, some of the metrics used for model assessment are intrinsically optimised by automated model refinement approaches and hence multiple and/or independent metrics are recommended for validation purposes. Relative weights for geometry and fit-to-data are often estimated automatically depending on the data quality.32,33 However, more often than not, the estimated weights need further adjustment to optimise the fit-to-data without distorting geometry.
Our previous study on a subset of atomic models derived from cryo-EM reconstructions from SARS-CoV-2 revealed a bias in the refinement approaches towards optimising model geometry compared to the agreement with data.34 Further automated refinement using REFMAC5 (ref. 33) with a relatively lower starting weight improved the agreement with the maps without significant loss in stereochemical quality. New developments in REFMAC5 (ref. 35) include better weight estimation within the range 0.2 to 18.0, depending on the resolution and ratio of model to map volumes. Here, we use Servalcat to re-refine a large dataset of atomic models (720 structures) of cryo-EM reconstructions from SARS-CoV-2 and discuss the quality of re-refined models. We also highlight other recent developments from the Wellcome Trust UK validation project. We discuss the CCP-EM model validation task developed as part of this project to provide access to multiple validation metrics to assess the geometry and agreement with data, ideally evaluating features resolvable at different resolutions. We also discuss other map and model validation tools available in the CCP-EM software suite.
The MolProbity score gives an indication of the quality of the model which is expected to vary with the data quality. There is no clear relationship between the data resolution and model geometric quality as reflected by MolProbity scores (Fig. 1A). 75% of the structures have MolProbity scores better than 2.0, which is comparable to better than 2.0 Å resolution structures.36 The mean of MolProbity scores for structures resolved at resolutions better than 3.5 Å resolution is 1.6 while the mean score of structures worse than 3.5 Å resolution is 1.8. This shows that the geometric quality is restrained to a similar extent irrespective of data resolution. Fig. 1B highlights that the geometric quality of the models is not related to their agreement with data, as reflected by FSCavg scores. 31.2% of the structures had FSCavg scores worse than 0.5, reflecting poor agreement with data.
To check whether the automated refinement helps to improve agreement with the data without significant decline in geometric quality, we re-refined the models with 20 iterations of Servalcat.35 The fit to data of 94% of structures in the dataset improved after re-refinement. The scores for nearly 20% of the structures improved by more than 5% (dFSC > 0.05) and 5% (35 structures) had more than 10% improvement (dFSC > 0.1). The improvement in FSCavg scores was not correlated with the map resolution (Fig. 1C). However the mean improvement in FSCavg was 6.5% for structures at resolution worse than 5 Å versus 2.6% for structures of resolution better than 5 Å. 44% of the dataset had improved MolProbity scores as well while the MolProbity scores of the rest of these structures worsened after re-refinement (Fig. 1D). The drop in MolProbity scores however was not large (less than 0.3 for all but 5 models).
Hence, the fit-to-data could be further improved in a significant majority of these cases without causing a significant loss in geometric quality. The re-refinement with Servalcat also improved the quality of subunit interfaces in the model, as reflected by improvement in PI-scores of 58.6% of subunit interfaces (ESI Fig. S1†). Note that further interactive refinement and error fixes may be required on a case-by-case basis after automated refinement.
As expected, automated estimation of refinement weights using Servalcat improved the fit (FSCavg scores) of a larger portion of the dataset (i.e. 94%), compared to improvement of 71% using user-defined initial weight for REFMAC5 in our previous study.34 Also 44% of the dataset had better Molprobity and FSCavg scores after re-refinement with Servalcat, compared to 34% from the previous study.
Fig. 2 (A) Atomic model of SARS-CoV-2 spike derived from a 3.4 Å resolution cryo-EM map (grey), colored by TEMPy SMOC scores calculated using the CCP-EM software interface. (B) The model colored by the FDR-backbone scores. (C) A segment of the model backbone colored by the FDR-backbone score highlighting areas of mis-trace. (D) The deposited model of the SARS-CoV-2 spike open form (red) (PDB ID: 6VYB) derived from a 3.2 Å resolution map (grey) (EMD-21457). (E) The model with the open RBD coordinates modeled (orange) based on the crystal structure of the RBD bound to the human ACE2 receptor (PDB ID: 6M0J). The LocScale map derived using this model as a reference is shown in grey. (F) The final extended model (green) built with the help of local map sharpening with LocScale (LocScale map shown in grey). |
This example demonstrates the importance of evaluating model agreement with the map. Although the geometric quality of the model is quite impressive, the model does not provide a good representation of the data due to its poor agreement with the map.
We optimised the fit of the modelled RBD domain using real-space refinement in Coot.40 The LocScale map also showed additional features corresponding to the N-terminal domain (NTD) and the C-terminus. Using related structures solved at higher resolutions (PDB IDs: 6VXX and 5X58) as guides, we traced additional residues in the locally sharpened map in Coot. In an iterative process, extended models were then used to make new LocScale maps. In the end, we were able to extend the model by 151 residues (Fig. 2F). Outliers (Ramachandran, Rotamer and CaBLAM) were fixed in Coot where possible. The REFMAC5 interface in CCP-EM33 was used to refine the extended and fixed models against the deposited map (not the locally sharpened map, as recommended). ProSMART restraints33 were used in refinement when higher resolution related structures were available. The extended model had a MolProbity score of 1.56 and the FSCavg improved slightly from 0.54 to 0.55 (local correlation from 0.86 to 0.88). The extended model is available from the Coronavirus Structure Task Force41 repository (https://www.github.com/thorn-lab/coronavirus_structural_task_force/tree/master/pdb/surface_glycoprotein/SARS-CoV-2/6vyb).
In this case, the open RBD and part of the NTD are relatively less resolved compared to the core of the spike. The locally sharpened map however enhanced features in these parts of the map, enabling extension of the model. There is an ongoing debate whether to build models in low-resolution areas of the map or if an ensemble of models (rather than a single model) should be deposited to represent the local variability.42 In the case discussed above, the extension of the model provides additional information on the exposed structural segment of the RBD which forms an interface with the ACE2 receptor and antibodies. Hence the modelled segment at low resolution is useful but the level of interpretation should be based on the features resolvable at that resolution.
The current implementation provides access to MolProbity (various stereochemical checks12), CaBLAM (backbone Cα geometry16), PI-score (subunit interface quality46) and JPred4 (agreement with sequence-based secondary structure prediction47). To quantify global agreement with data, REFMAC5 (model–map FSC48) and TEMPy (CCC and other real-space scores22) can be used. To evaluate per-residue fit, TEMPy (SMOC score24) and FDR backbone score (identify mis-traced residues27) are provided. The details of the validation tools currently available through this task are discussed in detail in ref. 34. Multiple validation tools not only add more confidence to some of the issues detected but also work in a complementary way by identifying unique issues. Multiple issues in the same structural neighbourhood usually point to more serious errors and often fixing one or more of them can help resolve others in their vicinity. The results from complementary validation tools are collated and sorted to highlight specific structural regions with the most serious issues (clustered by spatial proximity). The results are also linked to Coot40 where the issues can be fixed interactively and flagged as complete as and when each residue is fixed by the user.
3D-Strudel45 scores how well the map features around a certain residue resemble those observed in other structures at a similar resolution, and suggests alternative interpretations (residue types) of the map where the agreement is poor. It can thus identify register errors in model building.
The TEMPy Diffmap task identifies mis-fitted residues by calculating the difference between the experimental map and the theoretical map derived from the atomic model.50 This tool can also be used to detect conformational or compositional differences between two experimental maps.
The map to MTZ CCP-EM task applies an array of global sharpening factors for assessment of a post processed map. A Wilson plot is displayed, allowing inspection of potential pathologies arising from over-sharpening,52 and the task is linked to Coot for visual inspection.
The ProSHADE task33 allows identification of symmetry, given a map or an atomic model. ProSHADE can identify the point group of a map, and hence is useful during deposition as well as during molecular visualisation.
The confidence map task uses the false discovery rate (FDR) approach53 to quantify the confidence at each voxel for distinguishing molecular signals from the background. It can detect weak features in the map based on the statistical significance estimate.
EMDA is a toolkit with a range of functionalities for comparing either an atomic model against a map or multiple maps.44 The toolkit includes map–model and map–map local correlation, map–map superposition and map magnification correction. EMDA is currently distributed with the CCP-EM software suite and accessible through the command-line.
We are working on integrating tools that are part of the EMDB validation analysis.43 This will provide access to different validation tools used by EMDB to evaluate deposited maps and models. Hence, the user can assess their maps and models and fix any issues prior to deposition.
(1) Map the mutations onto 3D structures of viral macromolecules determined by cryo electron microscopy and X-ray crystallography.
(2) Link to external resources on protein domain and function annotations.
(3) Visualise the site of mutation on the 3D structures.
(4) List contacts involving the mutation site based on a selected structure.
For crystal structures, we fetch the validation metrics from PDBe using the REST API (https://www.ebi.ac.uk/pdbe/api/doc/search.html) for programmatic access. For the cryo-EM structures, we use the more extensive set of model validation tools implemented in the CCP-EM software suite (https://www.ccpem.ac.uk/)54 to calculate multiple metrics that evaluate the geometry of the model and the fit-to-data.
We demonstrate an example of mutation search for the spike protein based on genome samples from the UK to highlight the use of the CoVal database. Upon the search, the structure summary page provides a table of metrics appropriate to the experimental technique used, with poor scores highlighted (Fig. 3). Users can select a model based on the resolution and/or the validation scores and choose one of the chains where the mutation(s) is mapped.
For this example, we selected the structure PDB ID: 7c2l from the list of search results. The mutation site(s), its structural environment and polar contacts can be visualised on the selected structure and chain using the NGL applet55 (Fig. 3B). In Fig. 3B, the mutation sites are shown as space filled spheres and their structural interactions are shown in ball and stick and the bound antibody is shown as grey ribbon. PFAM and CDD domain definitions are used to annotate the chains in the model and the backbone of each chain is colored using unique colors for each domain. We also provide function annotations for each chain in the model. This includes annotations retrieved from PDBe56 and mappings to UNIPROT57 and GO.58 To provide further guidance, we include general remarks on the effect of mutation on the physico-chemical nature of the amino acid, and whether the mutation site is at the interface with the receptor or antibody or another subunit (chain) in the model (Fig. 3C). The interactions involving mutation sites in the selected structure are listed under the interactions tab.
For cryo-EM structures, we provide multiple validation metrics to highlight any potential errors or ambiguities associated with the model at the mutation site and the interacting residues, reflecting stereo-chemical quality and agreement with experimental data. Fig. 3D shows an example (PDB ID: 7c2l) of various validation metrics which are provided for the mutation site (L452) and its structural interactions. Residues associated with low validation scores (highlighted in yellow) are less reliable compared to others, and hence the user has to be cautious with interpretations based on the atomic details of this residue in the selected model. The clashes in and around the mutation site L452 are highlighted in Fig. 3E suggesting that the atomic coordinates at this site are potentially less reliable for downstream interpretation.
Further refinement of SARS-CoV-2 structures with Servalcat improved the agreement with maps without significant loss of geometric quality. In fact, the geometry also improved in nearly 44% of the cases and the drop in MolProbity scores for the rest of the cases was not large (less than 0.3 for all but 5 models). The improvement in fit to the maps was not correlated with the data resolution suggesting no clear trend for overfitting to geometry as the resolution worsens. Apart from the estimated or user-defined refinement weights, other user-defined parameters and restraints used in refinement, and the initial fit of the model in the map also influence the final geometric quality and agreement with the map. In this context, efforts like CERES59 and extension of PDB-REDO60 for models derived from cryo-EM will be important.
Clearly, there is a need to report metrics reflecting agreement with the map alongside geometry evaluations. EMDB has been developing a resource for validation analysis of deposited maps and models where multiple metrics for evaluating model agreement with the map are included.43 This will help downstream users of the deposited maps and models to detect reliable areas of the model to base their interpretation on. The model challenges organised by EMDataResource are another useful initiative in this context.10 A number of metrics have been proposed to evaluate local model agreement with the map, some of them shown to work in a complementary manner.10,27 As in the CCP-EM validation task,34 the use of multiple metrics helps to detect a range of potential issues and evaluate different features of the model in a complementary way. We plan to expand this to include tools that evaluate low-resolution features in the model and their agreement with the map.
Building atomic models from cryo-EM reconstructions is increasingly common given the improvement in data resolution. Last year, 2894 of 4483 entries (∼65%) released in EMDB had associated atomic models (https://www.ebi.ac.uk/emdb/statistics/emdb_entries_pdb_models). Hence it is crucial that structural biologists adopt standard practices for model building and refinement, and report validation metrics to reflect the stereochemical quality, fit-to-data and any test for overfitting (to noise in the data) where possible. Ideally, the field needs to work together to agree on a fit-to-data/cross-validation metric, equivalent to ‘Rwork’/‘Rfree’ in X-ray crystallography, that is simple and universally-recognised. It may be, however, that multiple complementary metrics are required, as discussed above. In this context, CCP-EM organises the Icknield workshop every year focused on training users on tools for model building and validation, and supports other workshops on best practices. Cryo-EM map and model validation is an area of development, rightly recognised and supported by different community developments and initiatives across the world.
The pipeline underlying the CCP-EM validation task is used to evaluate all cryo-EM structures from SARS-CoV-2 and the results are provided via the CoVal database. In the future, we plan to expand the validation task to provide access to multiple and complementary validation tools that work across a range of resolutions and evaluate different model features. Robust application of validation relies on good data management, and therefore the validation task will utilise the recent development of the Pipeliner framework in the CCP-EM suite which tracks data and metadata of all jobs that are run (or imported). In the end, processing workflows, validation and deposition are closely linked activities.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2fd00103a |
This journal is © The Royal Society of Chemistry 2022 |