A neural network learning approach for improving the prediction of residue depth based on sequence-derived features†
Abstract
Residue depth is a solvent exposure measure that quantitatively describes the depth of a residue from the protein surface. It is an important parameter in protein structural biology. Residue depth can be used in protein ab initio folding, protein function annotation, and protein evolution simulation. Accordingly, accurate prediction of residue depth is an essential step towards the characterization of the protein function and development of novel protein structure prediction methods with optimized sensitivity and specificity. In this work, we propose an effective method termed as NNdepth for improved residue depth prediction. It uses sequence-derived features, including four types of sequence profiles, solvent accessibility, secondary structure and sequence length. Two sequence-to-depth neural networks were first constructed by incorporating various sources of information. Subsequently, a simple depth-to-depth equation was used to combine the two NN models and was shown to achieve an improved performance. We have designed and performed several experiments to systematically examine the performance of NNdepth. Our results demonstrate that NNdepth provides a more competitive performance when compared with our previous method evaluated using the Student t-test with a p-value < 0.001. Furthermore, we performed an in-depth analysis of the effect and importance of various features used by the models and also presented a case study to illustrate the utility and predictive power of NNdepth. To facilitate the wider research community, the NNdepth web server has been implemented and seamlessly incorporated as one of the components of our previously developed outer membrane prediction systems (available at http://genomics.fzu.edu.cn/OMP). In addition, a stand-alone software program is also publicly accessible and downloadable at the website. We envision that NNdepth should be a powerful tool for high-throughput structural genomics and protein functional annotations.