Preserving structural integrity: fold reproducibility in computational design of proteins non-homologous to wild-type sequences†
Abstract
Even with remarkable accomplishments, designing a protein with a given structure is still a challenging task. There is no general approach that works for all challenges. Protein sequences with higher sequence similarity are usually shown to have similar three dimensional structures. This work is focused on designing non-homologous protein sequences with low sequence similarity to the wild-type sequence while maintaining secondary structure integrity. Basically, the aim of the present study is to check whether or not dissimilar sequences tend to encode a similar structure. In this work, we employ a negative design approach to design protein sequences by optimizing non-native conformational ensembles. Three non-native conformational ensembles are created for each of the three chosen target structures. During the design of protein sequences using the Monte Carlo simulation method and developed Cα distance-based statistical potentials, these ensembles are destabilized along with stabilization of the targets. The structures of the designed sequences are determined using AlphaFold2. Interestingly, the results suggest that secondary structure elements like alpha helices and beta sheets can be conserved even for non-homologous sequences with low sequence similarity. It is also observed that the designed sequences have the ability to reproduce the three target protein's fold viz. all-α, all-β and mixed αβ despite very low sequence similarity to the wild-type sequences. This indicates that the employed design strategy is effective in preserving structural integrity despite low sequence similarity.