Rui
Liu
a,
Jieqiang
Liu
b,
Zhengwei
Huang
c and
Qingbo
Li
*a
aSchool of Instrumentation and Optoelectronic Engineering, Precision Opto-Mechatronics Technology Key Laboratory of Education Ministry, Beihang University, Beijing, China. E-mail: qbleebuaa@buaa.edu.cn
bBeijing Orient Institute of Measurement & Test, Youyi Road No. 104, Haidian District, Beijing, 100094, China
cBeijing BAIF-Maihak Analytical Instrument Co., Ltd, Beijing, 100095, China
First published on 7th November 2024
PPG signals are a new means of non-invasive detection of blood glucose, but there are still shortcomings of poor time adaptability and low prediction accuracy of blood glucose quantitative models. Few studies discuss prediction accuracy in the case of a large time interval span between modeling and prediction. This paper proposes an automatic optimal threshold baseline removal algorithm based on variational mode decomposition (AOT-VMD), which can adaptively eliminate high-frequency noise and baseline interference for each decomposed IMF modal component and reduce the baseline difference of PPG signals from different days. Furthermore, a fuzzy integral multi-model decision fusion algorithm based on error weight is proposed. The fuzzy integral operator is introduced to make the features with large contributions in each sub-model maintain a high-weight value in the overall prediction mechanism, which improves the prediction accuracy of blood glucose. In this paper, a self-developed portable PPG glucose meter is used to collect PPG signals, and the true blood glucose values for 8 consecutive days are collected by CGM. The proposed algorithm is used to build a model with the first day's data and predict the blood glucose values for the remaining 7 days. The experimental results show that the AOT-VMD preprocessing algorithm and the quantitative regression algorithm of the fuzzy integral multiple model decision fusion algorithm proposed in this paper perform well in measurement accuracy and time adaptability compared with the traditional methods. In addition, the proposed method requires less invasive calibration samples in the modeling stage, achieving high-precision prediction for a long period. 100% of the samples are located in areas A and B of the Clarke area in this experiment, and the algorithm has strong time generalization ability. This innovative method can promote the development of a home blood glucose noninvasive detector.
In this paper, self-developed hardware is used to collect PPG signals using multiple wavelength LEDs as light sources. In addition, a non-invasive blood glucose detection method with low cost and high detection accuracy based on PPG signals, less invasive calibration times, and a wide prediction time range is proposed.
In terms of data preprocessing, this paper proposes an automatic optimization threshold baseline noise removal algorithm based on variational mode decomposition. This achieved the effective removal of the signal baseline and high-frequency noise, and as far as possible eliminated the interference of the instrument state change over time, the physiological state of the subject, the test pressure, the detection site, and other measurement conditions. In addition, this paper proposes a weighted fuzzy integral multi-model decision fusion algorithm. It can adaptively adjust the nonlinear integral fusion weight parameters, and different sub-models select the best features as training models, which reduces the sample size requirement of the model. This improves the precision of blood glucose prediction in the case of small sample modeling.
Most of the traditional denoising methods based on variational mode decomposition directly discard the high-frequency IMF component, or filter the correlation coefficient and set the threshold of the correlation coefficient. Only when the correlation coefficient between the IMF and the original signal exceeds the threshold, the IMF signal of this layer is retained. However, this can only measure the linear relationship, and for some nonlinear relationship information, it will cause information loss. In order to effectively eliminate the baseline interference and high-frequency noise components caused by respiration in the PPG signal, and fully retain the original effective information, an automatic optimal threshold baseline noise removal algorithm based on variational mode decomposition (VMD) is proposed in this paper. In the traditional denoising algorithm based on the variational mode decomposition method, the threshold needs to be set manually. The key denoising parameters are estimated based on statistical and experimental analyses, and the estimated threshold is fixed, which is difficult to change manually according to different IMF components. In order to solve the above problems, the evaluation function is introduced to iterate and determine the best parameters to determine the threshold, so that the denoising effect is more obvious and the signal information contained in each order IMF component can be retained. The algorithm process is as follows:
(1) The original signal is decomposed by variational mode decomposition, and the relationship between the original signal and the IMF signals of each order can be expressed as the following equation, which can be decomposed into k mode functions and a drift baseline term.
x(t) = imf1 + imf2 +… + imfk + res |
(2) The Pearson correlation coefficient between the modal functions of each layer and the original PPG signal is calculated, and the IMF functions of each order are rearranged according to the correlation coefficient size from large to small.
(3) The signal energy of the mode function with the largest correlation coefficient E is calculated.
(4) The correlation coefficient threshold C is set, and it is assumed that there are m mode functions with a correlation coefficient greater than or equal to C. Finally, the signal without baseline noise is Y(t). Then Y(t) can be obtained by using the following equation.
(5) The optimal parameter β needs to be determined through several iterations, and the iterative evaluation function f is as follows:
An increase in r11 means that the useful component is larger than the noisy component. If r22 and hr21 are decreasing, then this means that the two signals are becoming more independent. As a result, the original signal and noise are gradually separated. In general, the noise denoising process performs better if r11 is larger, and r22 and hr21 are smaller, so f is expected to be as small as possible. Under the guidance of the evaluation function, the appropriate step is selected for parameter iteration to determine the optimal parameters.
The overall process of the automatic optimization threshold baseline noise removal algorithm based on variational mode decomposition is shown in Fig. 1.
Fig. 1 Automatic optimization threshold baseline noise removal algorithm based on variational mode decomposition. |
The commonly used multi-sub-model fusion structure uses the weighted average method, but the weighted average fusion structure needs to meet a condition, that is, the contribution of each sub-model to the final result is completely linear, and there is no interaction between the features. However, the interaction between features exists in blood glucose value prediction. In order to improve the accuracy of multiple nonlinear regression of the blood glucose value and better realize the accurate prediction of blood glucose, this paper uses the fuzzy integral multi-model decision fusion algorithm based on weight and chooses random forest, extreme learning machine, and gradient boosting as the sub-models with high accuracy. The fuzzy integral operator adopts the Choquet integral integration operator, which is a nonlinear integral method. The fuzzy measure can describe the interaction between each attribute, and then it is used to determine the weight to improve the accuracy of nonlinear regression. The specific algorithm is shown in Fig. 2.
First, as mentioned above, the temporal and spatial characteristics of human PPG signals were obtained, and the features of the two dimensions were fused as the input of the above three quantitative regression models, and a total of three kinds of blood glucose prediction values were obtained. Then a fuzzy integral is used to fuse these three blood glucose values.
According to the concept of fuzzy measure, if X = {X1,X2,…,Xn} is a group of N information sources, then fuzzy measure g: 2X → Ris+ is a function that can more effectively describe the joint contribution rate of feature attributes to the target attribute. According to Choquet's requirements, this function should satisfy the following boundary conditions and monotonicity:
(a) Triviality, g(ϕ) = 0, and g(X) = 1.
(b) Monotonicity, if A,B⊆X, and A⊆B, then g(A) ≤ g(B).
The Chouquet integral of the observation h over X is expressed as follows.
Each experimental period includes 8 days, and the distribution of time and blood glucose values for each experimental period is shown in Fig. 6 and 7.
Compared with the first experimental period, the second experimental period reduced the amount of invasive data collected on the first day. The first experimental period was divided into Day 1 as the training set and Day 2–Day 8 as the test set. Due to the small blood glucose range on the first day of the second week experimental cycle, the second experimental period was divided into Day 1 and Day 2 combined as the training set, and Day 3–Day 8 as the test set.
The experimental results are presented in Table 1.
Training set date | Test set date | RMSE (mmol L−1) | A% | A + B% | |||
---|---|---|---|---|---|---|---|
Ours | VMD | Ours | VMD | Ours | VMD | ||
Day 2 | 0.50 | 0.61 | 100% | 100% | 100% | 100% | |
Day 3 | 0.81 | 0.85 | 79% | 70% | 100% | 100% | |
Day 4 | 0.89 | 1.07 | 100% | 47% | 100% | 78% | |
Day 1 | Day 5 | 0.85 | 0.98 | 82% | 56% | 100% | 85% |
Day 6 | 0.84 | 1.43 | 96% | 57% | 100% | 78% | |
Day 7 | 1.32 | 1.44 | 67% | 62% | 100% | 79% | |
Day 8 | 1.17 | 1.71 | 79% | 60% | 100% | 72% |
It can be seen from the results that the signal denoised by the preprocessing algorithm proposed in this paper has better performance in blood glucose prediction results. This is because the traditional VMD denoising simply removes all the high-frequency IMF components in the signal and ignores the biological signal information that may be contained in the high-frequency signal. The clean signal and noise signal are separated for each order component. The separation index parameter is determined by the similarity between the IMF component and the original signal, and it is adaptively iterated. This enables each layer of the IMF component of each PPG signal to have an optimal separation parameter, which effectively removes the noise and also retains the effective information. In addition, the proposed algorithm can effectively remove the baseline and eliminate the influence of respiratory drift. It can be seen from the results that the measurement error of the traditional preprocessing algorithm increases with the increase in the time span, and the proportion of the results falling in the interval A and B is also decreasing, which does not meet the accuracy requirements of blood glucose prediction. The algorithm proposed in this paper improves the time adaptability of noninvasive blood glucose detection from the perspective of data preprocessing (Fig. 8 and 9).
Training set date | Test set date | RMSE | |||
---|---|---|---|---|---|
Ours | RF | GB | ELM | ||
Day 2 | 0.50 | 0.61 | 0.65 | 0.57 | |
Day 3 | 0.81 | 1.07 | 0.95 | 0.91 | |
Day 4 | 0.89 | 0.91 | 1.43 | 1.44 | |
Day 1 | Day 5 | 0.85 | 1.28 | 1.12 | 1.01 |
Day 6 | 0.84 | 0.91 | 1.21 | 1.02 | |
Day 7 | 1.32 | 1.65 | 1.47 | 1.92 | |
Day 8 | 1.17 | 1.65 | 2.61 | 1.82 |
The experimental results show that the integral fusion algorithm proposed in this paper can adaptively adjust the weights according to the contribution of different sample features and the contribution of sub-model prediction results, and map the sub-model prediction results to the fuzzy operator space. The features and sub-models with high accuracy can be better selected in the fusion strategy. Then, the prediction results of each sub-model are fused at the level of fuzzy operator, and finally the blood glucose prediction results are inverted. The proposed algorithm corrects the prediction bias of sub-models at the fuzzy operator level. Compared with the three sub-models in the fusion algorithm, the proposed algorithm improves the accuracy of blood glucose prediction results (Fig. 10).
The quantitative accuracy evaluation results of the second experimental cycle are shown in Table 3.
Training set date | Test set date | Root mean square error | Correlation index | Proportion located in zone A | Proportion located in zone A + B |
---|---|---|---|---|---|
Day 1 + Day 2 | Day 3 | 0.96 | 0.74 | 93.8% | 100% |
Day 4 | 1.34 | 0.87 | 78.9% | 100% | |
Day 5 | 1.37 | 0.66 | 76.9% | 100% | |
Day 6 | 0.94 | 0.60 | 80.0% | 100% | |
Day 7 | 0.56 | 0.96 | 100% | 100% | |
Day 8 | 1.23 | 0.72 | 66.7% | 100% |
The experimental results show that the blood glucose range of the system is 4.1–10.6 mmol L−1, the span is up to 9 mmol L−1, the effective detection time is 6 days, and the maximum root mean square error is 1.37 mmol L−1.
When the interval time between the training set and the test set is one week, the root mean square error and correlation coefficient indicators perform well, which meets the requirements of system design. The preprocessing part effectively eliminates the influence of the respiratory baseline and high-frequency noise adaptively without artificially setting parameters, retains the effective information of blood glucose to the greatest extent, and improves the signal-to-noise ratio. It reduces the interference of the change in the subject's physiological state on the model prediction.
In the quantitative regression prediction part of blood glucose, the idea of fuzzy integral fusion is used. The regression model with high prediction accuracy is selected in the sub-model and the prediction error of the sub-model is corrected in the fusion stage. The weights are given according to the prediction results of each sub-model. The features with large contributions in each sub-model also maintain a large contribution in the overall prediction mechanism, which alleviates the impact of sub-model prediction error on the overall accuracy.
With the passage of time, the hardware system itself will drift, and the subject's body state will also change. The proposed algorithm effectively eliminates the changes in the human body and the hardware system over time, and enhances the time adaptability. The method adopted in this paper has low instrument cost, improves the prediction accuracy and time adaptability, and the demand for modeling samples is small, which is of great significance to realize home non-invasive detection of blood glucose.
First, the AOT-VMD preprocessing algorithm is proposed, which introduces the evaluation function to automatically determine the threshold parameter, so that the baseline removal and denoising effect are more obvious compared with the traditional method. Moreover, the useful information contained in each IMF component can be retained, and the baseline and high-frequency noise generated by the instrument over time can be eliminated. Therefore, the consistency of PPG signals collected on different days can be improved. In order to improve the prediction accuracy, the ensemble modeling strategy is used to assign weights according to the prediction results of each sub-model, and then the prediction results of each sub-model are mapped to the fuzzy operator level for nonlinear integral fusion. This makes the features with large contribution in each sub-model maintain a high weight value in the overall prediction mechanism, reduces the impact of sub-model prediction error on the overall accuracy, decreases the demand for training sample number, and realizes high-precision prediction of small-sample modeling.
The experiment results show that the accuracy of the proposed algorithm is significantly improved compared with each sub-model. In the long-term prediction experiment, the blood glucose detection range of this system is 4.1–10.6 mmol L−1, and the effective detection time is 7 days. All prediction results fall in the interval A and B of the Clarke error grid, and it requires a small calibration sample set with the invasive blood glucose reference values. Therefore, the proposed algorithm shows the promise of future non-invasive home blood glucose detection, and may promote the development of affordable, portable, non-invasive, high-precision, and time-adaptive blood glucose meters. In addition, its applicability is not limited to blood glucose detection, but may be extended to the quantitative detection of blood parameters such as hemoglobin and blood lipids in the future.
This journal is © The Royal Society of Chemistry 2025 |