Wei Zhuab,
Ruifang Yang*b,
Nanjing Zhao*b,
Gaofang Yinb and
Jianguo Liub
aUniversity of Science and Technology of China, Hefei, 230026, China
bKey Laboratory of Environmental Optics and Technology, Anhui Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, Hefei, 230031, China. E-mail: rfyang@aiofm.ac.cn; njzhao@aiofm.ac.cn
First published on 10th January 2024
Phenolic compounds are toxic chemical pollutants present in water. Three-dimensional fluorescence spectroscopy analysis is an effective and rapid method for real-time phenol monitoring in aquatic environments. However, similar chemical structures of phenols result in highly overlapping three-dimensional fluorescence spectra. Therefore, it is extremely difficult to analyze and quantify the concentration of components in a mixture system that includes two or more phenolic compounds. In this article, we study the mixed phenol system containing phenol, o-cresol, p-cresol, m-cresol, catechol, and resorcinol combined with excitation-emission matrix (EEM) fluorescence data. A multivariate statistical method called best linear unbiased prediction (BLUP) is proposed to analyze the spectra with the aim to achieve quantitative results and a trilinear decomposition algorithm called parallel factor analysis (PARAFAC) was used for comparison. Two experiments with different calibration samples were set to validate the effectiveness of BLUP through recovery, ARecovery (Average Recovery), AREP (Average Relative Error of Prediction), and RMSE (Root Mean Square Error). Overall, the average recovery of each component in experiment 1 and experiment 2 ranged from 95.91% to 111.62% and 82.91% to 129.02%, respectively. Based on the results of the experiments, the concentration of phenolic compounds in water can be quantitatively determined by combining three-dimensional fluorescence spectroscopy with the BLUP method.
Chemical analysis, gas chromatography (GC), gas chromatography-mass spectrometry (GC-MS), and high-performance liquid chromatography (HPLC) are some of the classical methods that can be used to determine phenolic compounds.4–8 However, because of the time-consuming process of handling chemical reagents and pretreatment of the experiment, these techniques do not perform very well in terms of real-time monitoring. To solve this issue, three-way fluorescence spectra are used, along with excitation-emission matrix (EEM) fluorescence data. Spectral information, high sensitivity, and low detection limits make it an effective technique for monitoring water pollutants.9–12
In recent years, many mathematical algorithms have been applied and improved to process three-dimensional fluorescence spectra.13–15 The trilinear decomposition algorithm is one type of these algorithms. A typical algorithm called parallel factor analysis (PARAFAC) has been used most commonly for dealing with EEM fluorescence data.16,17 On the premise that the signal-to-noise ratio is appropriate and the number of components is estimated correctly, PARAFAC usually performs well in the separation and reduction of the three-dimensional fluorescence spectra of each component in a mixture system.18,19 The largest advantage of PARAFAC is the uniqueness of decomposition under the condition that the dataset is linear in three directions. However, PARAFAC may not be able to obtain accurate concentrations for each compound when the three-dimensional fluorescence spectra overlap seriously, as in the case of the six phenolic compounds quantitatively studied in this article. It is therefore important to explore various methods to estimate concentrations more precisely in such situations.
Multivariate statistical analysis is a comprehensive analysis method developed from classical statistical analysis. It can be used to analyze the statistical patterns of multiple objects and indicators when they are interrelated. Multivariate statistical analysis includes multiple regression analyses, cluster analysis, factor analysis, and Canonical correlation analysis. Best linear unbiased prediction (BLUP) is a prediction analysis method in multivariate statistical analysis. It is a valuable method for analyzing prediction since the corresponding predictor is the most optimal among the classes of linear and unbiased predictors. BLUP has been widely used in various fields, such as life testing and genetic connectedness in genetic statistics .20,21 It is useful for simplifying prediction calculations in some cases and constructing large-sample approximate predictors for scale and location–scale parameter distributions.22,23
In this study, we apply BLUP to the EEM data. In two different experiments, BLUP quantitative identification is used to identify 5/6 phenols directly from fluorescence excitation-emission matrices (EEMs). The results show that BLUP can provide accurate results for phenols with severe spectral overlap at different calibration set ratios.
(1) |
For model (1), the conditional expectation of Y is given by:27
In fact, this predictor is the best linear unbiased predictor (BLUP) under the normality assumption.
Note that μ1, μ2, Σ11, and Σ21 are all unknown in practice, so it is necessary to estimate them based on the sample data. Suppose that the sample , i = 1,2,…,n are drawn from the population , then the maximum likelihood estimators of the parameters are,
Thus, we can use the following to predict Y, that is,
In this work, we prepared two experiments to determine the calculation accuracy of the BLUP compared with PARAFAC. In experiment 1, the calibration set comprised 9 samples which were mixed with phenol, o-cresol, p-cresol, catechol, and resorcinol in deionized water. Similar to the calibration set, the test set contained different concentration ratios of 5 phenols. All the concentration ratios of 5 phenols in the calibration and test sets are shown in Table 1.
Calibration set | Test set | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Phenol | o-Cresol | p-Cresol | Catechol | Resorcinol | Phenol | o-Cresol | p-Cresol | Catechol | Resorcinol | ||
1 | 0.08 | 0.16 | 0.40 | 0.56 | 0.64 | 1 | 0.10 | 0.64 | 0.10 | 0.30 | 0.20 |
2 | 0.16 | 0.32 | 0.08 | 0.40 | 0.56 | 2 | 0.50 | 0.30 | 0.38 | 0.44 | 0.30 |
3 | 0.24 | 0.48 | 0.48 | 0.24 | 0.48 | 3 | 0.32 | 0.70 | 0.40 | 0.70 | 0.40 |
4 | 0.32 | 0.64 | 0.16 | 0.08 | 0.40 | 4 | 0.72 | 0.60 | 0.18 | 0.62 | 0.70 |
5 | 0.40 | 0.08 | 0.56 | 0.64 | 0.32 | 5 | 0.08 | 0.56 | 0.56 | 0.10 | 0.60 |
6 | 0.48 | 0.24 | 0.24 | 0.48 | 0.24 | 6 | 0.20 | 0.26 | 0.70 | 0.08 | 0.50 |
7 | 0.56 | 0.40 | 0.64 | 0.32 | 0.16 | 7 | 0.60 | 0.10 | 0.22 | 0.50 | 0.10 |
8 | 0.64 | 0.56 | 0.32 | 0.16 | 0.08 | 8 | 0.64 | 0.50 | 0.44 | 0.60 | 0.46 |
9 | 0.72 | 0.72 | 0.72 | 0.72 | 0.72 | 9 | 0.48 | 0.48 | 0.30 | 0.72 | 0.68 |
10 | 0.24 | 0.36 | 0.60 | 0.20 | 0.34 |
In experiment 2, a calibration set of 21 samples was built from six 2-component mixed samples, four 3-component mixed samples, six 4-component mixed samples, and five 5-component mixed samples. The test set is also a mixed system, but all ten samples contain 6 phenols. The purpose of conducting experiment 2 was to test the situation in which more similar component was added and fewer components were mixed in the calibration set. Table 2 and Table 3 list the concentration values of 6 phenols in the calibration and test samples.
Calibration set | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Phenol | o-Cresol | m-Cresol | p-Cresol | Catechol | Resorcinol | Phenol | o-Cresol | m-Cresol | p-Cresol | Catechol | Resorcinol | ||
1 | 0.8 | 1.5 | 0 | 0 | 0 | 0 | 12 | 0 | 1 | 0.7 | 0.5 | 0 | 0 |
2 | 0 | 0.6 | 1.1 | 0 | 0 | 0 | 13 | 0.5 | 0.4 | 0.6 | 0.4 | 0.24 | 0 |
3 | 0 | 0 | 1.6 | 0.8 | 0 | 0 | 14 | 0.7 | 0 | 0 | 0.7 | 0.4 | 2.2 |
4 | 0 | 0 | 0 | 1.2 | 2 | 0 | 15 | 0 | 0.8 | 0.5 | 0 | 0 | 0.7 |
5 | 0 | 0 | 0 | 0 | 1.8 | 2.4 | 16 | 0 | 0.6 | 0 | 1.5 | 1 | 0 |
6 | 1.5 | 0 | 0 | 0 | 0 | 1.8 | 17 | 0.16 | 0.7 | 0.16 | 0.3 | 1.2 | 0 |
7 | 1 | 1.2 | 0.4 | 0 | 0 | 0 | 18 | 0.9 | 0 | 1.1 | 0.9 | 0.9 | 0.16 |
8 | 0 | 0.5 | 0.8 | 0.6 | 0 | 0 | 19 | 1.1 | 0.5 | 0 | 0.16 | 0.5 | 0.9 |
9 | 0 | 0 | 1 | 1 | 1.5 | 0 | 20 | 0.3 | 0.3 | 0.8 | 0 | 0.7 | 0.3 |
10 | 0.6 | 0 | 0 | 0 | 2.6 | 2 | 21 | 0.8 | 0.9 | 0.3 | 0.6 | 0 | 0.6 |
11 | 0.4 | 1 | 0.7 | 0.5 | 0 | 0 |
Test set | ||||||
---|---|---|---|---|---|---|
Phenol | o-Cresol | m-Cresol | p-Cresol | Catechol | Resorcinol | |
1 | 0.64 | 0.64 | 0.66 | 0.24 | 0.6 | 0.4 |
2 | 0.5 | 0.3 | 0.32 | 0.38 | 1 | 1 |
3 | 0.44 | 0.7 | 0.4 | 0.4 | 0.7 | 1.2 |
4 | 0.72 | 0.6 | 0.48 | 0.2 | 0.9 | 0.7 |
5 | 0.4 | 0.56 | 0.72 | 0.56 | 0.5 | 0.6 |
6 | 0.2 | 0.26 | 0.44 | 0.7 | 1.4 | 0.5 |
7 | 0.6 | 0.2 | 0.2 | 0.5 | 1.2 | 1.1 |
8 | 0.36 | 0.5 | 0.7 | 0.44 | 0.66 | 1.3 |
9 | 0.48 | 0.48 | 0.36 | 0.3 | 0.84 | 1.5 |
10 | 0.26 | 0.36 | 0.52 | 0.6 | 1.6 | 0.8 |
Fig. 2 Three-dimensional fluorescence spectra combined with contour plots of six phenolic compounds. |
Phenol | o-Cresol | p-Cresol | Catechol | Resorcinol | m-Cresol | |
---|---|---|---|---|---|---|
Phenol | 1 | 0.9728 | 0.8027 | 0.7417 | 0.8761 | 0.9512 |
o-Cresol | — | 1 | 0.8633 | 0.8302 | 0.9517 | 0.9851 |
p-Cresol | — | — | 1 | 0.9288 | 0.9127 | 0.8404 |
Catechol | — | — | — | 1 | 0.9133 | 0.7882 |
Resorcinol | — | — | — | — | 1 | 0.9485 |
m-Cresol | — | — | — | — | — | 1 |
In these experiments, CORCONDIA (core consistency diagnostic) was used as an efficient and useful method to calculate the component numbers. It can determine the number of factors through the value of the core consistency coefficient:
In experiment 1, the value of the core consistency coefficient is above 50%, corresponding to five components, and drops to 5.24% when the number is increased from 5 to 6. Therefore, five components are suggested to be the correct estimation constituents in experiment 1. In the same way, 6 is determined as the suitable component number in experiment 2.
After the number of components is determined, the next step is to use algorithms with appropriate parameters to calculate the concentrations of each component in test samples. Two sets of twenty test samples containing five phenolic compounds in experiment 1 and six phenolic compounds in experiment 2 are quantitatively calculated by BLUP and PARAFAC. ARecovery (Average Recovery), AREP (Average Relative Error of Prediction), and RMSE (Root Mean Square Error) are the four indicators of the calculation results.
Considering the different construction of the calibration sets, two experiments are discussed separately.
Fig. 3 The calculated concentration using BLUP, PARAFAC algorithm, and the actual concentration of 5 phenolic components in all test samples in experiment 1. |
As for the quantitative results of phenol, BLUP performed better than PARAFAC in most test samples except samples 4 and 8. Although the recovery rates calculated by BLUP for samples 4 and 8 are not as accurate as those obtained by PARAFAC, they could still reach 96.42% for sample 4 and 110.92% for sample 8. Similar to phenol, there were no more than two samples in which the PARAFAC algorithm was superior to BLUP for calculating o-cresol, p-cresol, and resorcinol. These test samples are sample 6 in o-cresol, sample 5 in p-cresol, and sample 7, 8 in resorcinol, and their corresponding recovery rates were 126.19%, 102.89%, 135.7%, and 118.91%, respectively. The calculation results for catechol were relatively poorer than those for the other 4 phenolic compounds using the BLUP in experiment 1. There were three samples: sample 2, sample 3, and sample 8. Quantitative analysis revealed that BLUP is worse than PARAFAC. The respective recovery rates were 117.79%, 87.61%, and 74.52%, but the calculated concentration was not too far from the actual one.
As can be seen in Fig. 3, overall, BLUP performs better than PARAFAC, irrespective of the accuracy or the stability of results according to the fitness degree between the calculation lines and the actual lines in these plots. This conclusion can also be supported by the data in Table 5, in which average recovery and AREP represent the accuracy of the overall calculation results and RMSE represents the degree of discretization of data. It can be seen in Table 5, that the average recovery rates of all 5 phenolic compounds were closer to 100%, and the values of the average AREP and RMSE were also smaller when using the BLUP algorithm.
BLUP | PARAFAC | BLUP | PARAFAC | |
---|---|---|---|---|
Phenol | p-Cresol | |||
ARecovery/% | 111.31 | 176.14 | 108.22 | 117.31 |
AREP/% | 18.38 | 93.06 | 16.14 | 24.12 |
RMSE/mgL−1 | 0.0512 | 0.1856 | 0.0668 | 0.1040 |
o-Cresol | Catechol | |||
ARecovery/% | 110.11 | 127.28 | 95.91 | 141.96 |
AREP/% | 13.33 | 53.32 | 10.88 | 57.31 |
RMSE/mgL−1 | 0.0401 | 0.2052 | 0.0656 | 0.1119 |
BLUP | PARAFAC | |
---|---|---|
Resorcinol | ||
ARecovery/% | 111.62 | 165.14 |
AREP/% | 17.69 | 83.52 |
RMSE/mgL−1 | 0.0708 | 0.4184 |
As can be seen in Fig. 4, there are only 3 dots that are closer to the actual dots using PARAFAC than using BLUP out of the total 60 dots, and sample 7 of phenol, sample 6 of o-cresol and sample 4 of m-cresol correspond to these three dots. Based on BLUP, their recovery rates were 101.62%, 72.62%, and 75.5%, within the acceptable ranges. The fluctuation range of the recovery rates of each component was also calculated to show the prediction performance of BLUP in experiment 2. For phenol, the recovery ranged from 101.62% to 157.77%; for o-cresol, the recovery ranged from 72.62% to 121.9%%; for m-cresol, recovery ranged from 72.25% to 92.72%; for p-cresol, recovery ranged from 98.02% to 121.9%; for catechol, recovery ranged from 80.71% to 111.8%; for resorcinol, recovery ranged from 95.55% to 120.7%.
Fig. 4 Calculated concentrations using BLUP, PARAFAC algorithm, and actual concentration of 6 phenolic components in all the test samples from experiment 2. |
As can be seen in Table 6, BLUP quantitatively calculates better than PARAFAC in terms of average recovery, average REP, and RMSE. Combined with Table 6, the difference between these three indicators the above between BLUP and PARAFAC increases rapidly as the component of the mixture system and composition of the calibration set change from experiment 1 to experiment 2. Meanwhile, it is noteworthy that the accuracy of quantitative calculation results using BLUP in experiment 2 is not significantly affected in such an environment according to the values of average recovery, average REP, and RMSE.
BLUP | PARAFAC | BLUP | PARAFAC | |
---|---|---|---|---|
Phenol | o-Cresol | |||
ARecovery/% | 129.02 | 180.62 | 85.65 | 200.94 |
AREP/% | 29.03 | 77.13 | 18.73 | 103.88 |
RMSE/mgL−1 | 0.1308 | 0.3989 | 0.0899 | 0.4672 |
m-Cresol | p-Cresol | |||
ARecovery/% | 82.91 | 57.67 | 106.47 | 244.45 |
AREP/% | 17.09 | 45.22 | 7.70 | 149.89 |
RMSE/mgL−1 | 0.0920 | 0.2562 | 0.0287 | 0.5747 |
Catechol | Resorcinol | |||
ARecovery/% | 97.75 | 280.90 | 110.64 | 248.28 |
AREP/% | 6.31 | 190.49 | 11.59 | 139.63 |
RMSE/mgL−1 | 0.0799 | 1.5195 | 0.1176 | 1.1577 |
This journal is © The Royal Society of Chemistry 2024 |