Multiscale peak detection in wavelet space
Abstract
Accurate peak detection is essential for analyzing high-throughput datasets generated by analytical instruments. Derivatives with noise reduction and matched filtration are frequently used, but they are sensitive to baseline variations, random noise and deviations in the peak shape. A continuous wavelet transform (CWT)-based method is more practical and popular in this situation, which can increase the accuracy and reliability by identifying peaks across scales in wavelet space and implicitly removing noise as well as the baseline. However, its computational load is relatively high and the estimated features of peaks may not be accurate in the case of peaks that are overlapping, dense or weak. In this study, we present multi-scale peak detection (MSPD) by taking full advantage of additional information in wavelet space including ridges, valleys, and zero-crossings. It can achieve a high accuracy by thresholding each detected peak with the maximum of its ridge. It has been comprehensively evaluated with MALDI-TOF spectra in proteomics, the CAMDA 2006 SELDI dataset as well as the Romanian database of Raman spectra, which is particularly suitable for detecting peaks in high-throughput analytical signals. Receiver operating characteristic (ROC) curves show that MSPD can detect more true peaks while keeping the false discovery rate lower than MassSpecWavelet and MALDIquant methods. Superior results in Raman spectra suggest that MSPD seems to be a more universal method for peak detection. MSPD has been designed and implemented efficiently in Python and Cython. It is available as an open source package at https://github.com/zmzhang/libPeak.