Correlation analysis of modern analytical data – a chemometric dissection of spectral and chromatographic variables
Abstract
The Standard Practices for Infrared Multivariate Quantitative Analysis (ASTM E1655) provide a guide for determining physicochemical properties of materials using multivariate calibration techniques applied to chemical sources that have high multicollinearity and correlated information. Partial least squares (PLS) is the most widely used multivariate regression method due to its excellent prediction capabilities and easy optimization. Initially applied to chromatographic data, PLS has also shown great results in near-infrared (NIR) and mid-infrared (MIR) spectroscopies. However, complex chemical matrices with low correlation may not be efficiently modeled using PLS or other multivariate analyses limited by grouping similar information (such as latent variables or principal components). Therefore, this study aims to evaluate the multicollinearity of different analytical techniques, such as high-temperature gas chromatography (HTGC), NIR, MIR, hydrogen nuclear magnetic resonance (1H NMR), carbon-13 nuclear magnetic resonance (13C NMR), and Fourier transform ion cyclotron resonance mass spectrometry coupled to the electrospray source in positive and negative ionization modes (ESI(±)FT-ICR). Descriptive statistics (coefficient of determination, R2) and principal component analysis (PCA) were used to identify the distribution of correlated information. Results showed that NIR and MIR spectroscopies exhibited a higher percentage of correlated variables, while 13C NMR and ESI(±)FT-ICR MS had more discrete profiles. Therefore, PLS development may be more effectively applied to NIR, MIR, and 1H NMR data, while 13C NMR and mass spectra may require other algorithms or variable selection methods in combination with PLS.