Using a supervised machine learning approach to predict water quality at the Gaza wastewater treatment plant
Abstract
This paper presents the use of four machine learning algorithms including Gaussian process regression (GPR), random forest (FR), extreme gradient boosting (XGB) and light gradient boosting machine (LightGBM) to predict the concentration of total suspended solids (TSS), chemical oxygen demand (COD), and biochemical oxygen demand (BOD) in the effluent of the Gaza wastewater treatment plant one day ahead. Data was collected from 360 wastewater samples taken from the Gaza wastewater treatment plant, and five input parameters were used in the proposed method: pHinf, temperature (Tempinf), BODinf, TSSinf, and CODinf. Four error measures were used to evaluate the prediction accuracy of the models. Results showed that the GPR model in the testing datasets is the best predictive model for predicting the effluent's TSS, COD and BOD with the best accuracy in relation to the correlation coefficient (CC), that is, (0.964–0.950–0.975) against RF (0.932–0.910–0.943), XGB (0.916–0.901–0.954), and LightGBM (0.890–0.892–0.883). The importance of input parameters was assessed, and temperature and pH were found to be the most important parameters in wastewater quality predictions using these four models. The study concluded that GPR is the most representative model. The model may help users in selecting optimal wastewater treatment based on original characteristics and standards.
- This article is part of the themed collection: Machine learning and artificial neural networks: Celebrating the 2024 Nobel Prize in Physics