Application of sparse linear discriminant analysis for metabolomics data
Abstract
The discovery of potential biomarkers that may be closely related to diseases is a major purpose of metabolomics data analysis. Hence, we expect to find some effective methods which can screen these potential biomarkers from large amounts of dataset. In this paper, we propose an effective strategy named sparse linear discriminant analysis (SLDA), which can perform classification and variable selection simultaneously to analyze complicated metabolomics datasets. Compared with two other approaches, i.e. partial least squares discriminant analysis (PLS-DA) and competitive adaptive reweighted sampling (CARS), SLDA obtains relatively better results and can identify some informative metabolites, which are proven to be consistent with those identified by biochemical studies. Furthermore, by building a model based on selected features, SLDA can be applied to high dimensional, small sample cases where linear discriminant analysis (LDA) fails to work. In summary, SLDA is a very useful method for exploring and processing metabolomics data.