Optimization of Raman-spectrum baseline correction in biological application†
Abstract
In the last decade Raman-spectroscopy has become an invaluable tool for biomedical diagnostics. However, a manual rating of the subtle spectral differences between normal and abnormal disease states is not possible or practical. Thus it is necessary to combine Raman-spectroscopy with chemometrics in order to build statistical models predicting the disease states directly without manual intervention. Within chemometrical analysis a number of corrections have to be applied to receive robust models. Baseline correction is an important step of the pre-processing, which should remove spectral contributions of fluorescence effects and improve the performance and robustness of statistical models. However, it is demanding, time-consuming, and depends on expert knowledge to select an optimal baseline correction method and its parameters every time working with a new dataset. To circumvent this issue we proposed a genetic algorithm based method to automatically optimize the baseline correction. The investigation was carried out in three main steps. Firstly, a numerical quantitative marker was defined to evaluate the baseline estimation quality. Secondly, a genetic algorithm based methodology was established to search the optimal baseline estimation with the defined quantitative marker as evaluation function. Finally, classification models were utilized to benchmark the performance of the optimized baseline. For comparison, model based baseline optimization was carried out applying the same classifiers. It was proven that our method could provide a semi-optimal and stable baseline estimation without any chemical knowledge required or any additional spectral information used.