CarSite: identifying carbonylated sites of human proteins based on a one-sided selection resampling method†
Abstract
Protein carbonylation is one of the most important biomarkers of oxidative protein damage and such protein damage is linked to various diseases and aging. It is thus vital that carbonylation sites are identified accurately. In this study, CarSite, a novel bioinformatics tool, was established to identify carbonylation sites in human proteins. The one-sided selection (OSS) resampling method was used to establish balanced training datasets and this resampling method is demonstrated to perform better than a Monte Carlo resampling method via 10-fold cross-validation tests on the Jia dataset. Moreover, the hybrid combination of position-specific amino acid propensity (PSAAP), composition of k-spaced amino acid pairs (CKSAAP), amino acid composition (AAC), and composition of hydrophobic and hydrophilic amino acids (CHHAA) was selected to optimize the performance of the predictor. On 10-fold cross-validation of the Jia dataset, CarSite obtained rates of sensitivity corresponding to K/P/R/T-type peptides of ∼21%, 22%, 19%, or 18% higher than those obtained by iCar-PseCp, respectively, which was previously considered as the best predictor for identifying carbonylation sites in human proteins. Furthermore, compared with other existing predictors, CarSite obtained much higher sensitivity and accuracy when tested on the same dataset.