Guided principal component analysis (GPCA): a simple method for improving detection of a known analyte†
Abstract
There is increasing interest in the application of Raman spectroscopy in a medical setting, ranging from supporting real-time clinical decisions e.g. surgical margins to assisting pathologists with disease classification. However, there remain a number of barriers for adoption in the medical setting due to the increased complexity of probing highly heterogeneous, dynamic biological materials. This inherent challenge can also limit the deployment of higher level analytical approaches such as Artificial Intelligence (AI) including convolutional neural networks (CNN), as there is a lack of a ground truth required for training purposes i.e. in complex clinical samples. Principal component analysis (PCA) is an unsupervised data reduction approach (orthogonal linear transformation) that has been used extensively in spectroscopy for 30+ years, due to its capability to simplify analysis of complex spectroscopic data. However, due to PCA being unsupervised features will inherently appear mixed and their rank may vary between experiments. Here we propose Guided PCA (GPCA), a simple approach that allows PCA to be guided with spectral data to ensure a consistent rank of a key target moiety by the inclusion of a reference (guiding) spectrum to the data set. This simplifies analysis, increases robustness of PCA analysis and improves quantification and the limits of detection and decreases RMSE.