SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties†
Abstract
Lysine succinylation is an emerging protein post-translational modification, which plays an important role in regulating the cellular processes in both eukaryotic and prokaryotic cells. However, the succinylation modification site is particularly difficult to detect because the experimental technologies used are often time-consuming and costly. Thus, an accurate computational method for predicting succinylation sites may help researchers towards designing their experiments and to understand the molecular mechanism of succinylation. In this study, a novel computational tool termed SuccinSite has been developed to predict protein succinylation sites by incorporating three sequence encodings, i.e., k-spaced amino acid pairs, binary and amino acid index properties. Then, the random forest classifier was trained with these encodings to build the predictor. The SuccinSite predictor achieves an AUC score of 0.802 in the 5-fold cross-validation set and performs significantly better than existing predictors on a comprehensive independent test set. Furthermore, informative features and predominant rules (i.e. feature combinations) were extracted from the trained random forest model for an improved interpretation of the predictor. Finally, we also compiled a database covering 4411 experimentally verified succinylation proteins with 12 456 lysine succinylation sites. Taken together, these results suggest that SuccinSite would be a helpful computational resource for succinylation sites prediction. The web-server, datasets, source code and database are freely available at http://systbio.cau.edu.cn/SuccinSite/.