Minkyu
Park
a,
Tarun
Anumol
ab and
Shane A.
Snyder
*ac
aDepartment of Chemical & Environmental Engineering, University of Arizona, 1133 E James E Rogers Way, Harshbarger 108, Tucson, AZ 85721-0011, USA. E-mail: snyders2@email.arizona.edu
bAgilent Technologies Inc., 2850 Centerville Road, Wilmington, DE 19808, USA
cNational University of Singapore, NUS Environmental Research Institute (NERI), 5A Engineering Drive 1, T-Lab Building, #02-01, 117411 Singapore
First published on 16th July 2015
Realized and potential threats of water scarcity due in part to global climate change have increased the interest in potable reuse of municipal wastewater. Recalcitrant trace organic compounds (TOrCs), including pharmaceuticals and endocrine disrupting compounds in wastewater are often not efficiently removed by conventional wastewater treatment processes. Ozone application has been demonstrated to be a highly efficient oxidation process to attenuate TOrCs. However, operation of ozone oxidation can be challenging in wastewater due to variations in water quality that can impact critical control points through fluctuations in ozone demand/decay. Therefore, this study implemented three explanatory modeling techniques including multiple linear regression (MLR), artificial neutral network (ANN), and PC (principal component)-ANN to predict TOrCs removal by ozone oxidation in a secondary wastewater effluent. All the developed models displayed good agreements between the predicted TOrCs removal and the observed TOrCs removal with the explanatory variables (input variables) of ozone dose, total organic carbon (TOC) concentration, and rate constants of ozone and ˙OH. PC-ANN displayed the highest predictive power in the external validation step (R2 = 0.934) successively followed by ANN (R2 = 0.914) and MLR (R2 = 0.758). Based on the MLR model equation and the result of sensitivity analysis of the ANN model, TOC was found to have negligible effects on the TOrCs removal in a given water quality. Despite the predictive power of the ANN model, possible overfitting remains to be solved since the cross validation coefficient (q2) value calculated by the leave-one-out cross validation was not sufficient to ensure model predictive power. In contrast, the PC-ANN model was found to be robust across the scenarios applied. This study provides a guideline for software sensors to control ozone treatment processes in regards to TOrC oxidation and likely can be adapted to monitor disinfection as well.
Water impactTrace organic compounds (TOrCs) have increasingly drawn attention, particularly occurrence in wastewater effluents and subsequently for applications involving potable water reuse. Ozone oxidation has been demonstrated as an efficient process for transforming the majority of TOrCs and can be a viable treatment option for reuse applications. However, real-time online monitoring is nearly impossible at the moment because monitoring of TOrCs requires highly sensitive analytical techniques such as mass spectrometry. This paper presents several modeling approaches including multiple linear regression, an artificial neural network (ANN), and principal component (PC) combined with ANN (PC-ANN) in order to predict the removals of TOrCs in a secondary wastewater effluent. In addition, this paper attempted elucidating procedures to select a robust model for the prediction of TOrCs removal, eventually providing a guideline for the application of the implemented modeling techniques to real-time model-based software sensors. |
Engineered potable water reuse systems employ advanced treatment technologies and they can produce water with nearly any desired quality.5 However, the efficiency and efficacy of water treatment technologies is important to the continued acceptance and advancement of reuse of municipal wastewater for augmenting potable water supplies.6 Of key interest is the efficacious attenuation of chemical contaminants that are recalcitrant in conventional wastewater treatment technologies.7 Of the vast number of trace organic compounds (TOrCs) reported to occur in wastewater, bioactive and highly potent substances such as certain pharmaceuticals and endocrine disrupting compounds (EDCs) are a potential threat to ecological and public health.8
Thus most potable water reuse programs utilize a multi-barrier treatment regime.3 Advanced oxidation processes (AOPs) are often implemented in potable water reuse applications as powerful oxidants for transformation of many organic constituents9,10 and for disinfection of essentially any biological organism.11,12 For instance, ozone is a strong oxidant and has been well proven to remove majority of TOrCs with high efficacy.13 In water, ozone is readily decomposed, and OH radicals (˙OH) are formed in a chain reaction.14 Ozone is a selective oxidant that can rapidly react with electron-rich moieties (ERM) such as aromatic compounds, organosulfur compounds, and deprotonated amines, whereas ˙OH is relatively non-selective oxidant with high reactivity with the majority of organic structures.15
For an efficient operation of ozone treatment processes in reuse applications, the prediction of TOrC attenuation is valuable. In general, two kinds are modeling techniques can be considered for predictive treatment efficacy: deterministic models and numerical models. Recently, Lee et al. deterministically predicted TOrC removal in wastewater based on a kinetic equation as follows:
(1) |
Numerical modeling techniques, particularly based on exploratory method, for the prediction of TOrCs attenuation in wastewater can have several benefits. For example, no apparatus is required to measure characteristics of ozone decomposition kinetics expressed by integral exposures of ozone and ˙OH. In addition, a generated or trained model based on actual data enables facile prediction of TOrCs under the seasonal variation of ozone decay/demand characteristics.
Hence, the objective of this study is to develop exploratory models to predict removal of TOrCs in a secondary wastewater quality by ozone processes. Multiple linear regression (MLR), and artificial neural network (ANN), and principal component (PC)-ANN models were developed and their predictive power and robustness were compared. A discussion regarding the internal and external validation is provided along with application of the developed models to software sensors.
TOrC | Application | Structure | k O3,pH7 [M−1 s−1] | k ˙OH [M−1 s−1] | Ref. |
---|---|---|---|---|---|
Group I: high reactivity with both ozone and ˙OH kO3,pH7 > 1 × 105 M−1 s−1 and k˙OH > 5 × 109 M−1 s−1 | |||||
Sulfamethoxazole | Antibiotics | 5.7 × 105 | 8.5 × 109 | 19 | |
Group II: moderate reactivity with ozone and high reactivity with ˙OH 10 < kO3,pH7 ≤ 1 × 105 M−1 s−1 and k˙OH > 5 × 109 M−1 s−1 | |||||
Atenolol | β-blocker | 2.0 × 103 | 8 × 109 | 11 | |
Group III: low reactivity with ozone and high reactivity with ˙OH kO3,pH7 < 10 M−1 s−1 and k˙OH > 5 × 109 M−1 s−1 | |||||
Ibuprofen | Nonsteroidal anti-inflammatory drug | 9.6 | 7.4 × 109 | 20, 21 | |
Primidone | Anticonvulsant | 1 | 6.7 × 109 | 22 | |
Group IV: low reactivity with ozone and moderate reactivity with ˙OH kO3,pH7 < 10 M−1 s−1 and 1 × 109 < k˙OH ≤ 5 × 109 M−1 s−1 | |||||
DEET | Insect repellent | <10 | 5 × 109 | 23 | |
Meprobamate | Anti anxiety drug | <1 | 3.7 × 109 | 24 |
The six representative TOrCs, as well as other TOrCs not included in the model development, were analyzed using a fully-automated online solid phase extraction (SPE) system (Flexcube-Agilent Technologies, Santa Clara, CA). This module was connected inline to a 1290 Agilent liquid chromatograph (LC) coupled to a tandem mass spectrometer (MS/MS-Agilent Technologies 6460). Method optimization parameters, reproducibility and sensitivity have been described previously.25 Briefly, the method employed only 1.7 mL to active method detection limits between 0.4 and 3 ng L−1 for the target analytes. The method used a PLRP-s (2.1 × 12.5 mm) online SPE cartridge for retention of target analytes and an Agilent Poroshell EC-120 C-18 (2.1 × 50 mm) column for gradient elution. The samples were spiked with a mixture of isotopically-labeled surrogate standards to account for matrix effects.
Total organic carbon (TOC) was measured using Shimadzu TOC-L CSH Total Organic Carbon Analyzer (Shimadzu Corp., Japan). Before analysis, samples were acidified to pH 3 or lower using HCl (ACS grade, 37%, Sigma Aldrich).
(2) |
(3) |
Fig. 2 The occurrence of the TOrCs selected for modeling and TOC. Error bars indicate the standard deviations of each TOrC and TOC. |
MLR is considered a transparent model since the model can express the relation between explanatory variables and output variables mathematically.42 This feature of MLR enables physically meaningful interpretations of modeling result. The regression equation obtained by MLR is as follows:
Removal (%) = −32.77 + 10.22CO3 + 1.130TOC + 6.912 × 10−5kO3,TOrC + 6.023 × 10−9k˙OH,TOrC. | (4) |
The estimated coefficient of CO3 is 10.56, which indicates that the increment of 1 mg L−1 ozone dose can achieve ~10% more TOrCs removal regardless of the type of TOrCs within the given water quality. It is obvious from this result an increase in ozone dose increases TOrCs removal. However, the positive value of estimated coefficient of TOC cannot physically explain TOrCs removal, which indicates that the increase in TOC can cause higher TOrCs removal. This physically non-interpretable result can be an indicator of failure in model development. In MLR modeling, statistical significances of the each regression coefficient need to be checked. That is, the null hypothesis that a regression coefficient is equal to zero needs to be tested. Table S1 in the ESI† shows the result of significance test. The p-value of the regression coefficient of TOC greater than 0.05 indicates that the coefficient is statistically equal to zero. In addition, the standard deviation of TOC in all the sampling campaigns was less than 1 mg L−1 (Fig. 2), which means that the change in TOrC removal due to TOC estimated by MLR is ~1% in the variation of the given water quality (i.e., 1.012 × 1 mg L−1). Therefore, the exclusion of TOC from the MLR model is necessary and a new MLR equation was obtained as follows:
Removal (%) = −25.93 + 10.27CO3 + 6.900 × 10−5kO3,TOrC + 6.052 × 10−9k˙OH,TOrC. | (5) |
The regression result of the three-parameter MLR model is tabulated in the ESI† (Table S1). The values of regressions coefficients remain similar with eqn (3), which again indicates that TOC does not significantly influence the estimated TOrCs removal. One interesting aspect of the model is that the model linearly depends on the ozone concentration and rate constants of each TOrC. Since each TOrC has a unique rate constant, the removal is thoroughly reliant on the ozone dose in a linear manner. According to the data shown in Lee et al.'s work,24 meprobamate and DEET (Group IV, compounds with low reactivity with ozone and moderate reactivity with ˙OH) showed relatively linear trends of their removals with respect to O3/DOC (O3 dose normalized by dissolved organic carbon concentration in mg/mg) whereas the TOrCs with high reactivity with ozone and ˙OH displayed logarithmic trends with respect to O3/DOC. This non-linear trend of the TOrCs with relatively higher reactivity with ozone and ˙OH cannot be explained by the MLR model and may provoke relatively small deviation of the MLR model from the observed data. In addition, the fact that MLR model does not include the effects of TOC on TOrCs removal may also influence the predictability of the model. DOC (the dissolved fraction of TOC) is a key factor for ozone decomposition in water since DOC concentration and composition affects the ozone decomposition.43 Therefore, complete exclusion of DOC may lower the predictive power of MLR.
Compared to MLR, the ANN resulted in better predictability of TOrC removal, with R2 = 0.935 and 0.914 for the training and the external validation, respectively. Hidden neurons of ANN enable the prediction of nonlinear relation between explanatory variables and output variables.44 However, this modeling technique is often considered as a “black box” and requires careful investigation of overfitting.45 One of the most important criteria of models is their reproducibility in a domain of interest. Overfitting would cause inaccurate prediction although high goodness of fit can be achieved for a model training step. This study employed the data sets from four sampling campaigns for the model training step and the other data set from a sampling campaign for the external validation step. The data set for the external validation step can be considered independent on the data sets used in the model training step. Hence, the external validation can verify the reproducibility of the ANN model. In addition, the q2 value from LOO cross validation procedure for the ANN model has higher value (i.e., 0.843) as shown in Fig. 3. High values of q2 generally indicate predictive powers of models and q2 > 0.5 is considered as good and q2 > 0.9 as excellent.46
There are several cases addressing insufficiency of q2 to ensure model predictive power.47–49 As mentioned earlier, MLR can provide physically meaningful interpretation from the obtained model equation. Like MLR, ANN also can give insightful interpretation using a sensitivity analysis. LH-OAT sensitivity analysis method can provide relative effects of model input parameters on output variables. Two ANN models were selected to explain an overfitting problem that can be arisen during training procedures. One is the ANN model shown in Fig. 3 as an exemplary model with high goodness of fit for the both training and external validation step (Case 1). The other ANN model chosen (Case 2) has a high goodness of fit for the model training procedure, but with extremely low coefficient of determination (zero) in the external validation. Fig. 4 shows the sensitivity indices of the each explanatory variable for the two model cases. Case 1 showed an agreement with the modeling result of MLR. That is, TOC has minimal impacts on the TOrCs removal while ozone dose plays significant roles. The effects of kO3 and k˙OH are also relatively significant, which implies that each TOrC removal relies on oxidation kinetics. It was also found that the effects of k˙OH is slightly more significant than kO3. On the other hand, the most influential explanatory variable in Case 2 was TOC, which may be an indicator of overfitting. Hence, it is noteworthy that q2 value would not sufficient to appreciate a predictive power for the used data sets. The more detailed investigation is made in the following section.
Fig. 4 The result of LH-OAT sensitivity analysis of the two ANN models. The model selected for Case 1 was the one shown Fig. 3. Case 1 showed high R2 values of both the training and external validation steps as well as high q2 value. The model chosen for Case 2 was one with high a R2 value for the training step, but zero value of R2 for the external validation step. |
As depicted in Fig. 6, PC-ANN yielded excellent agreements between the predicted and observed TOrC removal for both the model training and external validation steps. When applying the same procedure with the ANN modeling for the evaluation of robustness, PC-ANN showed slightly better R2 and q2 values for the model training step as shown in Fig. 5. Moreover, the distribution of R2 for the external validation showed that the PC-ANN modeling technique induced excellent predictive power even for the external validation step (the median value is 0.896), which implies that the explanatory variables would have collinearity in particular between ozone and TOC. Organic carbon is often considered when using O3/DOC ratio as an operating parameter of ozone oxidation, which needs to be reflected during the modeling procedure.24 Hence, a possible reason of better robustness of PC-ANN compared to ANN was that the PCA reduced the dimensionality (i.e., the number of explanatory variables) while eliminating collinearity between the explanatory variables.
In a real ozone facility, the optimal operation of ozone oxidation processes is crucial since it can reduce operational cost and maximize the removal efficacy of TOrCs, hence providing a safe barrier of TOrCs in potable reuse applications. To this end, the best practice would be to apply online sensors to directly measure, or predict, TOrCs attenuation. Recently, the correlation of TOrCs removal with surrogate indicators such as spectroscopic parameters including UV absorbance at 254 nm and total fluorescence (i.e., the integral of fluorescence intensity over the excitation and emission wavelengths) was extensively studied in physico-chemical processes such as activated carbon adsorption and advanced oxidation processes.11,52,53 These approaches would be practical and useful in a real plant since spectroscopic sensors require minimal pretreatment, accompany high frequency of data collection, and possess high sensitivity.54 However, although these approaches possess such benefits, they cannot be mechanistically interpreted.55 Therefore, the employment of multiple sensing techniques can lower possibilities of sensor failures and software sensor can support such analytical monitoring techniques for the prediction of TOrCs attenuation by ozonation. To this end, advantages and disadvantages of the three models employed in this study need to be elucidated.
Four input parameters including TOC, applied ozone dose, rate constants of O3 and ˙OH of TOrCs of interest are necessary for the developments of models. Due to the inherent nature of exploratory modeling approaches in which models are built based upon data, regular monitoring of TOrCs is essential for building a robust model along with online TOC sensors. For the selection of a TOC sensor, chemical-based sensors such as catalytic combustion and UV/persulfate oxidation types would be recommended rather than optical-based TOC sensors since optical-based TOC sensors potentially display a bias of TOC measurement in oxidation processes.56 A benefit of MLR is intuitive and easily understandable, so may be preferred for software sensor applications. In addition, the minimal influence of TOC on the model prediction enables the exclusion of TOC as an input parameter, which does not require implementation of TOC sensors. ANN displayed a good predictive power, but it may not be suitable for software sensor applications because a developed model can be overfitted, thereby losing its predictive power. In addition, internal validation such as LOO cross validation method cannot ensure the robustness of a model. PC-ANN could predict the TOrCs with high predictive power and showed its robustness, thereby capable of a precise software sensing technique. In addition, possible reductions in noise of data by PCA can enhance the predictability of highly nonlinear data, which renders the modeling technique more attractive for the TOrCs whose analysis is sensitive and possibly variable.
- MLR model showed relatively good predictive power (R2 values for the model generation and external validation were 0.835 and 0.758, respectively). Based on the model equation from the MLR, the effects of TOC was found to be negligible for the given water qualities.
- Better predictive power was achieved by ANN than MLR (R2 values for the training and external validation were 0.935 and 0.914, respectively). However, the careful appreciation is required to avoid overfitting since the cross validation coefficient (q2) as an general indicator of predictive power of model by LOO cross validation was not sufficient to ensure model predictabilities.
- PC-ANN was displayed the highest predictability (R2 values for the training and external validation were 0.946 and 0.934, respectively) among the three models while maintaining robustness confirmed by external validations.
The each implemented model accompanies pros and cons and can be flexibly applied to various software sensors with regard to aims of operation. Therefore, this study is expected to contribute to helping the real-time optimization of ozone dose in terms of TOrCs removal.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c5ew00120j |
This journal is © The Royal Society of Chemistry 2015 |