A matching algorithm with isotope distribution pattern in LC-MS based on support vector machine (SVM) learning model†
Abstract
In proteomics, it is important to detect, analyze, and quantify complex peptide components and differences. The key is to match the elution time peaks (LC peaks) produced by the same peptide in replicate experiments. Warping functions are currently widely used to correct the mean of time shifts among replicates. However, they cannot reduce the ambiguity to distinguish the corresponding peak pairs and the non-corresponding ones because the time shifts are random based on each extracted-ion-chromatogram (XIC). In this paper, besides time feature, isotope distribution pattern similarity is considered. The novelty is that compared with other feature based methods including the isotope feature, the algorithm is not based on the peak profile similarity as usual, but on the isotope distribution similarity. First, the training set of peptides including the corresponding and non-corresponding peak pairs were selected from the MS/MS results. Second, we generated time difference and isotope distribution pattern similarities for each peak pair. Third, Support Vector Machine (SVM) classification was used based on the two features. Finally, the accuracy was measured along with final coverage. We first used a 10-fold cross validation to test the effectiveness of the SVM learning model. The accuracy of correct matching could reach 97%. Second, we evaluated the coverage based on the learning model, which could be from 75% to 91% in different datasets. Thus, this matching algorithm based on time and isotope distribution pattern features could provide a high accuracy and coverage for the corresponding peak identification.