Q. H. Leab,
P. Carreraab,
M. C. M. van Loosdrecht
c and
E. I. P. Volcke
*ab
aDepartment of Green Chemistry and Technology, Ghent University, 9000 Ghent, Belgium. E-mail: Eveline.Volcke@UGent.be; Tel: +32 (0) 9 264 61 29
bCentre for Advanced Process Technology for Urban REsource recovery (CAPTURE), Frieda Saeysstraat 1, 9052 Gent, Belgium
cDepartment of Biotechnology, Delft University of Technology, 2600 AA Delft, The Netherlands
First published on 14th January 2025
Sensor availability and costs are nowadays no longer limiting data gathering at wastewater treatment plants (WWTPs). However, one should be aware that a higher amount of measured data gathered does not necessarily imply that also more information is obtained. In this light, this contribution assesses the general applicability and the added value of a structured experimental design approach for planning measurement campaigns at WWTPs, in view of mass-balance-based data reconciliation. To this end, the results from full-scale WWTP case studies available in the literature were compared to those obtained with the developed structured experimental design procedure. Planning measurement campaigns comprises the selection of (additional) measurements to meet a pre-set main goal. The need for a structured experimental design procedure replacing past expert judgment approaches became clear from the fact that three out of five case studies available in the literature failed to meet the main goal and/or performed unnecessary additional measurements. Translating the main goal into specific key variables was found essential in this respect. The general applicability of the procedure was proven with three outcomes. First, the procedure, involving well-defined steps, could be applied to different WWTP layouts. Second, it ensured the fulfilment of various main goals. Third, it provided useful outcomes, i.e., optimal measurement campaigns, which reduced the need for additional measurements (40–70% less) compared to expert knowledge approaches, hence more information could be obtained with less analytical data. Overall, the experimental design procedure proved a fast and useful tool ensuring the success of subsequent mass-balance-based data reconciliation.
Water impactNowadays, large amounts of data are generated in wastewater treatment plants (WWTPs) but data-rich does not always mean information-rich. In this light, this contribution assesses the general applicability and the added value of a structured experimental design approach for planning measurement campaigns at WWTPs in view of mass-balance-based data reconciliation for reliable plant data gathering. |
In order to guarantee that key variables can be identified through data reconciliation, it is vital that the available measurements satisfy redundancy and steady-state conditions. The data redundancy requirement means that one or more variables in the data set can be calculated from other (measured) variables using the available set of constraints (including mass balances), and are therefore identifiable,5 in the sense that they can be reconciled (identified). In case there are no or not sufficient initially measured data available, additional measurements need to be carried out to ensure the required degree of redundancy and thus the possible identification of key variables. Experimental design involves the determination of sets of (additional) measurements to fulfil this goal. This concept has been proven useful for optimal sensor placement for different applications, including chemical processes13,14 or water networks.15,16 Nevertheless, experimental design formulated as a structured, optimization problem has still limited attention in wastewater treatment processes.17,18 Instead, in available case studies from practice (Table A1, ESI†), providing sufficient redundancy has been interpreted quite intuitively, by ensuring that the number of constraints (independent mass balances) was higher than the number of unknown variables, i.e. aiming at an overdetermined system. In this way, redundancy was considered a ‘global property’ of the system while in reality, it is a property of individual variables.5 It is therefore unclear whether the experimental design approaches followed in the case studies previously reported in the literature guarantee the identifiability of all specified key process variables.17
In order to overcome the shortcomings of previous studies, Le et al.17 presented a more formal, structured experimental design procedure, including a comprehensive redundancy analysis to unambiguously check the identifiability of all key variables. The search for optimal sets of additional measurements is solved as a multi-objective optimisation problem minimising the cost of additional measurements and maximising the accuracy of the improved estimates of key variables. The results are visualized in a Pareto-optimal front, which represents the optimal solutions (= sets of measurements) taking into account the trade-off between their cost and accuracy. This is a valuable outcome for measurement planning, as it allows for compliance with the main goal with an optimal use of resources. However, so far the results obtained with the experimental design procedure of Le et al.17 have not yet been compared with those obtained in previously published studies.
In this work, the added value and general applicability of the structured experimental design procedure of Le et al.17 in view of mass-balance-based data reconciliation were scrutinized by comparing them with previous expert judgment approaches for WWTP measurement campaign planning. In particular, the procedure was assessed in terms of applicability to different layouts and main goals, redundancy and identifiability of key variables, relevance of the mass balances and the number and type of additional measurements. To this end, the experimental design procedure of Le et al.17 was applied to five full-scale WWTP case studies available in the literature dealing with experimental design in view of mass-balance-based data reconciliation.19–22 Going beyond the mere detection of mistakes from the past, this work demonstrates why a rigorous experimental approach is needed for future measurement campaigns and how this can be performed.
# | WWTP | Type/capacity | Configuration | Main goal of the study | Were key variables specified? | How was the measurement campaign carried out? |
---|---|---|---|---|---|---|
a p.e. = population equivalent.b SCADA = supervisory control and data acquisition system. | ||||||
1 | WWTP Katwoude,20 average data of one year | Municipal WWTP 86![]() |
A2/O process with limited biological phosphorus removal and mainly chemical phosphorus removal | Reliable data for model calibration | Partially | Measurement campaign was not implemented |
Only total oxygen consumption, and the amount of nitrified nitrogen and denitrified nitrogen were explicitly defined as key variables | Average data of one year from SCADA* and routine lab analysis of the plant was used | |||||
Variables involved in SRT calculation were not defined as key variables but implied to be so | ||||||
2 | WWTP Katwoude,20 8 day measurement campaign | Same as previous | Same as previous | Reliable data for model calibration | Yes | 8 day measurement campaign was carried out with 24 h-composite samples (where available) and grab samples (at peak flow) combined with data from SCADAb and routine lab analyses |
Seven process flow rates | ||||||
3 | WWTP Deventer22 | Municipal WWTP 182![]() |
Modified UCT-process according to the BCFS-concept | Reliable data for calculating sludge retention time and operational conditions for benchmarking | Partially | Intensive measurement campaign was carried out on three separate days with 24 h-composite samples (where available) and grab samples combined with data from SCADA and routine lab analyses |
Only total oxygen consumption, and the amount of nitrified nitrogen and denitrified nitrogen were explicitly defined as key variables | ||||||
Variables involved in SRT calculation were not defined as key variables but implied to be so | ||||||
4 | WWTP Houtrust21 | Municipal WWTP 330![]() |
A2/O process with primary and secondary sludge fermentation | Reliable data for model validation and calibration | Yes | The plant was monitored for six weeks. Collected comprehensive data set consisting of 24 h-composite samples (where available), grab samples, data from SCADA and routine lab analyses |
15 flow variables and six mass flows of COD and total phosphorus | ||||||
5 | WWTP Tabriz19 | Petrochemical WWTP 4800 m3 per day | Following steps: oil separation coagulation & flocculation - activated sludge - sand filter | Reliable data for evaluating the performance of individual unit processes | No | Four sampling runs were carried out and combined with data from SCADA and routine lab analyses |
Only flow measurements were balanced by data reconciliation | ||||||
Mass flows of COD were reported to be balanced, but they were calculated from balanced flows and measured COD concentrations |
First of all, experimental design was conducted independently of what was proposed in the previous studies. This involves the translation of the main goal of the measurement campaign into key variables and the determination of optimal sets of additionally measured variables (besides initially available ones) that guarantee the identifiability of these key variables. Typical examples of key variables concern influent and effluent mass flow rates (e.g., total phosphorus, nitrogen) of the activated sludge process or the waste sludge mass flow rate. The oxygen requirements for carbon and nitrogen removal are usually important as well.17 These would be appropriate key variables if one wants to get reliable data for monitoring plant performance or perform model simulations. The sets of additional measurements obtained by solving the multi-objective optimization problem are also referred to as (optimal) solutions and belong to the Pareto optimal front.
The application of the experimental design procedure required three main types of input information: (a) main goal and associated key variables, (b) mass balances and (c) initially measured data set and potential additionally measured variables, with estimated cost and variance of the measured variables. These inputs were obtained from the five case studies.
a. Key variables. The main goal of each case study was translated into key variables, the identification of which ensured that this goal was fulfilled. Key variables can be initially measured or not; their identification means that their value can be calculated from other measured variables through which they are related by mass balances. As a result, the value of this variable can be reconciled by using mass balances. This implies that key variables need to be conservative quantities, i.e., fulfil material conservation laws (mass balances). For the case studies from the literature in which the key variables were not specified explicitly, they were deduced in this study from the given main goal.
b. Mass balances. The mass balances used in this work correspond with the incidence matrices from previous studies. They were represented for each case study in equation form. It can be noted that all the studies used steady-state mass balances and calculated average operating conditions for data reconciliation. For dynamic processes, other approaches such as moving-time window data reconciliation could be used,23 but they were out of the scope of this study.
c. Initially measured data set and potential additionally measured variables, with estimated cost and variance of measured variables. The additionally measured variables were proposed relative to a set of initially available data to ensure the fulfilment of the main goal. From the previous studies, however, it was not always clear which of the presented measured data were initially available and which ones were proposed additionally. For the previous studies in which the initially measured data were not clearly specified, the initially measured data set was assumed. Since the costs of additionally measured variables (flows and concentrations) were not specified in any of the previous studies, the costs of all measured variables were assumed equal. This assumption implies that the cost is proportional to the number of additionally measured variables. The use of measurement-specific costs would not limit the applicability of the procedure but may deliver different optimal solutions in terms of cost. The uncertainty of the measured variables was expressed in terms of their standard deviation, the magnitude of which could be derived from data provided in previous studies.
The set(s) of additionally measured variables proposed by applying the experimental design procedure from Le et al.17 was subsequently compared to those actually carried out in the previous studies. The previous studies defined the experimental design based on expert knowledge, and not explicitly as a multi-objective optimization problem like Le et al.17 Thus, the outcome was a single set of additional measurements. This was compared with the optimal solutions obtained in this study in terms of the number of additionally measured variables and in terms of the accuracy of key process variables. Only additionally measured variables that were used for data reconciliation were considered in the comparison. A detailed analysis of previous approaches was made by answering the following questions:
– Main goal and key variables: Were the key variables defined in previous studies? If yes, to what extent did they reflect the main goal of the measurement campaign?
– Mass balance setup: Were the mass balances relevant?
– Experimental design results: Was the set of (additional) measured variables implemented in the previous study relevant – did it allow the identification of key variables? Are there any alternative sets of additionally measured variables which may be better in terms of the number of required additionally measured variables and/or resulting accuracy of key variables?
Number of | Case study 1 (ref. 20) average data of one year | Case study 2 (ref. 20) 8 day data | Case study 3 (ref. 22) | Case study 4 (ref. 21) | Case study 5 (ref. 19) | |
---|---|---|---|---|---|---|
a Only considering flows.b As checked in step 3 of the experimental design procedure.17c No additional measurements were performed in this case study.d M = (G + I − K)/(G + I). Note: in case the additionally measured variables proposed in a previous study (G) were not sufficient for the identification of key variables, the number of missing essential additionally measured variables (I) was added. | ||||||
Main goal and key variables | ||||||
A | Key variables (number of which defined in a previous study) | 6 (3) | 7 (7) | 11 (3) | 21 (21) | 9 (0)a |
B | Key variables identified in a previous study | 3 | 0 | 11 | 17 | 9 |
Mass balance setup | ||||||
C | Mass balances set up by previous studies | 8 | 12 | 14 | 20 | 8 |
D | Relevant mass balances among C | 8 | 6 | 14 | 19 | 8 |
Experimental design results | ||||||
E | Initially available measured variables | 20 | 8 | 9 | 11 | 2 |
F | Potential additionally measured variables, i.e. initially unmeasured variables in mass balances related to key variablesb | 2 | 12 | 25 | 29 | 17 |
G | Additionally measured variables obtained (measurement campaign) in a previous study | 0c | 21 | 25 | 27 | 17 |
H | Relevant additionally measured variables obtained in a previous study, i.e. contributing to the identification of key variables | NAc | 4 | 25 | 24 | 17 |
I | Missing essential additionally measured variable in a previous study, i.e., required for the identification of all key variables | 2 | 2 | 0 | 2 | 0 |
J | Number of Pareto-optimal solutions found using an experimental design procedure | 1 | 6 | 8 | 12 | 10 |
K | Minimum number of additionally measured variables needed to identify all key variables (i.e. for the Pareto-optimal solution with minimum number of additionally measured variables) | 2 | 7 | 11 | 18 | 7 |
L | Additionally measured variables for the most accurate Pareto-optimal solution (i.e. for the optimal solution with a maximum number of additionally measured variables) | 2 | 12 | 25 | 29 | 17 |
M | Maximum potential reduction of additionally measured variables, compared with a previous study with a Pareto-optimal solution with a minimum number of additionally measured variablesd | NAc | 70% | 56% | 38% | 59% |
![]() | ||
Fig. 1 WWTP Katwoude, adapted from Meijer et al.20 Unit processes are indicated in grey: R1 = mixed non-aerated selector, R2 = completely mixed anoxic reactor, R3 = aerated carrousel reactor, CL12, CL34 = four clarifiers were operated in pairs. fd1, fd2, fd3 = flow dividers, TH = sludge thickener and CE = centrifuge. The black boxes refer to the name of the streams. The measured variables are indicated by name at their respective positions. |
# | Unit process | Mass balances | Unit |
---|---|---|---|
Q = flow, mTP = total phosphorus mass flow, mTKN = Kjeldahl nitrogen mass flow, mCOD = COD mass flow, mNOx = NO3 mass flow rate. OCnet = net oxygen consumption (kg per day), OCcod = oxygen for COD removal (kg per day), NITR = nitrified nitrogen (kg per day), DENI = denitrified nitrogen (kg per day). | |||
1 | WWTP | Qin − Qef − Qex | Flow (m3 per day) |
2 | CE | Qce − Qcent − Qex | |
3 | WWTP | mTPin − mTPef − mTPex | Total phosphorus (kg per day) |
4 | CE | mTPce − mTPcent − mTPex | |
5 | WWTP | mCODin − mCODef − mCODce + mCODcent − OCcod − 2.87·DENI | COD and nitrogen (kg per day) |
6 | WWTP | mTKNin − mTKNef − mTKNce + mTKNcent − NITR | |
7 | WWTP | DENI − NITR + mNOxef + mNOxin − mNOxex | |
8 | WWTP | OCnet − OCcod − 4.57·NITR |
By discarding the effluent flow (Qef) and centrifuge outflow (Qcent) from the set of potential additionally measured variables and by performing again a feasibility evaluation (i.e., checking variable identifiability without Qef and Qcent as additional measurements), it was concluded in this study that they were essential. Qef was required to identify the flow of excess sludge (Qex) and the mass flow of total phosphorus in the influent (mTPin). Qcent was required to identify the amount of denitrified nitrogen (DENI), the amount of nitrified nitrogen (NITR) and the total oxygen consumption (OCnet).
Unit process | Mass balance | Unit | ||
---|---|---|---|---|
Q = flow, mTP = total phosphorus mass flow, mNH = ammonium mass flow. | ||||
1 | R1 | Selector | Qin + Qrt34 + Qover + Qcent − Qr1 | Flow (m3 per day) |
2 | R2 | Denitrification reactor | Qr1 + Qrc + Qrt12 − Qr2 | |
3 | R3 | Aerated carousel | Qr2 − Qrc − Qr3 | |
4 | CL | Clarifiers | Qr3 − Qef − Qrt12 − Qrt34 − Qth | |
5 | TH | Thickeners | Qth − Qover − Qce | |
6 | CE | Centrifuge | Qce − Qcent − Qex | |
7 | R1 | Selector | mTPin + mTPrt34 + mTPover + mTPcent − mTPr1 | Total phosphorus (kg per day) |
8 | R2 | Denitrification reactor | mTPr1 + mTPrc + mTPrt12 − mTPr2 | |
9 | R3 | Aerated carousel | mTPr2 − mTPrc − mTPr3 | |
10 | CL | Clarifiers | mTPr3 − mTPef − mTPrt12 − mTPrt34 − mTPth | |
11 | R1 | Selector | mNHin + mNHrt34 + mNHover + mNHcent − mNHr1 | Ammonium (kg per day) |
12 | R2 | Denitrification reactor | mNHr1 + mNHrc + mNHrt12 − mNHr2 |
![]() | ||
Fig. 2 Pareto optimal solutions for the setup of Meijer et al.20 determined by the experimental procedure of Le et al.17 and expressed in terms of accuracy and costs. The line with the filled circles (black) denotes the Pareto-optimal front. ‘x’ = a solution. |
For comparison, Meijer et al.20 used 24 measured variables for data reconciliation, 21 of which were measured additionally during an 8-day measurement campaign. The set of additionally measured variables of Meijer et al.20 was not presented in Fig. 2 since it did not satisfy the defined main goal and therefore was not a solution. In fact, our findings suggest that none of the key variables could be identified with the proposed set of additionally measured variables. In order to identify the seven key variables, the proposed set of measured data (Table C2a, Appendix C2, ESI†) should be complemented with the flow from R3 to R2 (Qr2) and the centrifuge output flow (Qcent), making up two additionally measured variables.
From the experimental design procedure, it is clear that only flow measurements can help in identifying flows.17 However, Meijer et al.,20 with the aim of identifying only total flows, included 6 mass balances for total phosphorus (TP) and total Kjeldahl nitrogen (TKN) in the system of mass balances (Table 4) and 17 corresponding concentration measurements of TP and TKN, 15 of which were measured additionally. The measurements of TP and TKN, however, will not contribute to the identification of flow variables as there is no direct relation to total flows in the mass balances. Therefore, setting up total phosphorus and ammonium mass balances and performing 17 concentration measurements of TP and TKN, as proposed by Meijer et al.,20 were irrelevant for the reconciliation of flow rates. The result of the flow balancing would be the same with or without the TP and TKN mass balances and measurements.
In brief, only 6 of the 12 mass balances set in a previous study were relevant. In addition, Meijer et al.20 proposed 21 additionally measured variables, only 4 of which were relevant and still missing 2 essential ones. Using the experimental design procedure from Le et al.,17 the minimum number of additionally measured variables was 7. As a result, the potential reduction in the number of additionally measured variables could be up to 70% (= [21 + 2 − 7]/[21 + 2]) compared to the proposed set of Meijer et al.20
![]() | ||
Fig. 3 Flow diagram of the Deventer WWTP, The Netherlands (adapted from Puig et al.22). R1 and R2 = two anaerobic reactors, R3 = a contact tank, R4 = an anoxic reactor, R5 = an alternatively aerated reactor, R6 = aerated rector, C1 = six secondary settlers (in parallel) and PS = stripping reactor. Measured variables are indicated. |
Four methods for SRT calculation were considered. The first one was the classical SRT calculation obtained as the ratio of the sludge mass TSS in the reactor to the sludge mass TSS flow rate leaving the reactor through the waste sludge stream (‘was’), the secondary settler effluent (‘ef’) and the stripping reactor effluent streams (‘se’) (eqn (1)).
![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
Unit process | Mass balance | Unit | |
---|---|---|---|
Q = flow, mTP = total phosphorus mass flow, mTKN = Kjeldahl nitrogen mass flow, mCOD = COD mass flow, mNOx = NO3 mass flow. OCnet = total oxygen consumption (kg per day), OCcod = oxygen for COD removal (kg per day), NITR = nitrified nitrogen (kg per day), DENI = denitrified nitrogen (kg per day). | |||
1 | WWTP | Qin − Qse − Qef − Qwas | Flow (m3 per day) |
2 | R1 + R2 | Qin − Qse − Q2 + Qa | |
3 | R3 + R4 | Q2 + Qras − Qa − Q6 + Qb | |
4 | R5 | Q6 − Q7 + Qc | |
5 | R6 | Q7 − Qc − Qb − Q8 | |
6 | C1 | −Qras + Q8 − Qef − Qwas | |
7 | WWTP | mTPin − mTPse − mTPef − mTPwas | Total phosphorus (kg per day) |
8 | R1 + R2 | mTPin − mTPse − mTP2 + mTPa | |
9 | R3 + R4 | mTP2 + mTPras − mTPa − mTP6 + mTPb | |
10 | C1 | mTP8 − mTPras − mTPef − mTPwas | |
11 | WWTP | mTKNin − mTKNse − mTKNef − mTKNwas − NITR | COD and nitrogen (kg per day) |
12 | WWTP | NITR − DENI − NOef − NOse − NOwas | |
13 | WWTP | mCODin − mCODse − mCODef − mCODwas − 2.78 × DENI − OCcod | |
14 | WWTP | −OCcod − 4.57 × NITR + OCnet |
The key variables corresponding with the defined main goals were not specified by Puig et al.22 but were deduced in this study. Eleven variables related to SRT (eqn (1)–(4)) and operating conditions (NITR, DENI and OCnet in mass balances #11–#14, Table 5) were conservative and therefore defined as key variables, namely: the flow rates and mass flows of total phosphorus in the influent (Qin, mTPin), effluent (Qef, mTPef), excess sludge (Qwas, mTPwas), stripped effluent (Qse, mTPse), amount of nitrified nitrogen (NITR, kg per day), denitrified nitrogen (DENI, kg per day) and total oxygen consumption of WWTP (OCnet, kg per day).
From the SRT calculations (eqn (2)–(4)), it is clear that there are more variables related to the main goal, namely the mass flows of total suspended solids (TSS), orthophosphate (PO), total particulate phosphorus TSS (TPTSS) and particulate COD (CODTSS). They could not, however, be defined as key variables since they are not conservative quantities, which means that no mass balances can be set up for these compounds. As a result, the SRT calculations were based on both the measured and reconciled variables.
![]() | ||
Fig. 4 Pareto optimal solutions for the setup of Puig et al.22 determined using the experimental procedure of Le et al.17 and expressed in terms of accuracy and costs. The line with the filled black circles denotes the Pareto-optimal front. ‘x’ = a solution. |
The set of measured data used by Puig et al.22 for data reconciliation contained 25 additionally measured variables (indicated in Fig. 4 and detailed in Table C3a, Appendix C3, ESI†). This solution satisfied the main goal and, moreover, belonged to the Pareto-optimal front. More specifically, it was the most accurate but also the most expensive Pareto-optimal solution. In this case study, the minimum number of additionally measured variables was 11. As a result, the potential reduction in the number of additionally measured variables was 56% compared to the proposed set of Puig et al.22 Thus, the set of additional measurements was relevant and allowed the identification of all the key variables. Nevertheless, alternative solutions were found in this study involving fewer measurements and lower cost.
![]() | ||
Fig. 5 Flow diagram of the Houtrust WWTP, The Netherlands, adapted from Meijer et al.21 Measured variables are indicated. |
Process unit | Mass balance | ||
---|---|---|---|
Q = flow, mTP = total phosphorus mass flow, mCOD = COD mass flow, mTSS = mass flow of total suspended solids. | |||
1 | WWTP | Q4 + Q40 − Q17 − Q35 | Total flow (m3 per day) |
2 | Water line | Q7 − Q17 − Q26 | |
3 | Sludge line | Q26 + Q31 − Q35 − Q37 − Q38 | |
4 | Rejected water line | Q5 − Q37 − Q38 − Q39 − Q40 | |
5 | Activated sludge units | Q7 + Q23 − Q15 | |
6 | Primary settler | Q4 + Q5 − Q7 − Q28 | |
7 | Primary thickening | Q28 − Q31 − Q39 | |
8 | Secondary clarifier | Q15 − Q23 − Q26 − Q17 | |
9 | Waste sludge thickener | Q26 − Q27 − Q37 | |
10 | Digestor | Q27 + Q31 − Q34 | |
11 | Dewatering | Q34 − Q35 − Q38 | |
12 | WWTP | mTP4 − mTP17 − mTP35 | Total phosphorus (kg per day) |
13 | Water line | mTP7 − mTP17 − mTP26 | |
14 | Secondary clarifier | mTP15 − mTP23 − mTP26 − mTP17 | |
15 | Primary settler | mTP4 + mTP5 − mTP7 − mTP28 | |
16 | Sludge line | mTP26 + mTP28 − mTP5 − mTP35 | |
17 | Primary settler | mCOD4 + mCOD5 − mCOD7 − mCOD28 | COD (kg per day) |
18 | Sludge line | mCOD26 + mCOD28 − mCOD5 − mCOD35 − mCOD43 | |
19 | Digester | mCOD27 + mCOD31 − mCOD34 − mCOD43 | |
20 | Waste sludge thickener | mTSS26 − mTSS37 + mTSS27 | TSS (kg per day) |
Meijer et al.21 used 34 measured variables for data reconciliation, 27 of which were measured additionally. However, only 24 additionally measured variables actually contributed to the identification of key variables. The TSS mass flow balance around the waste sludge thickener (#20 in Table 6) did not contribute to the identification of any key variables. Therefore, this mass balance and the three associated TSS measurements (TSS26, TSS37 and TSS27) were not necessary in this case study.
Moreover, 4 key variables could not be identified with the measured data from Meijer et al.:21 the total influent flow rate (Q4), the return activated sludge flow rate (Q23), the influent COD mass flow (mCOD4) and the influent mass flow of total phosphorus (mTP4). So, the main goal was not entirely achieved by their measurement campaign. Still, Meijer et al.21 reported that Q4 and Q23 were balanced by data reconciliation – no results were reported for balancing mCOD4 and mTP4.
The set of measured data applied for data reconciliation by Meijer et al.21 missed two crucial additionally measured variables: the settled influent flow rate (Q7) to balance mCOD4 and mTP4 and the inflow rate to the secondary clarifiers (Q15) to balance Q23. These two variables were found essential to identify all the key variables during the redundancy analysis performed in this study. The addition of these two variables (Q7 and Q15) to the measured data set used by Meijer et al.21 would have resulted in a solution (indicated by ‘x’ in Fig. 6 and detailed in Table C4b, Appendix C4, ESI†), i.e., would have allowed the identification of all key variables. However, the latter solution is not a Pareto-optimal solution since it has the same number of additionally measured variables but about 38% lower accuracy than the most expensive Pareto-optimal solution (accuracy fv = 1.38). In this case study, the minimum additionally measured variables were 18. The potential reduction in the number of additionally measured variables could be up to 38% compared to the proposed set of Meijer et al.21
![]() | ||
Fig. 6 Pareto optimal solutions for the setup of Meijer et al.21 determined using the experimental procedure of Le et al.17 and expressed in terms of accuracy and costs. The line with the filled grey circles denotes the Pareto-optimal front. ‘x’ = a solution. |
Overall, 19 of the 20 mass balances set by Meijer et al.21 were considered relevant. However, the set of additional measurements did not allow the identifiability of the key variables. On the one hand, unnecessary measurements were performed. On the other hand, crucial variables were missing.
![]() | ||
Fig. 7 Flow diagram of the Tabriz petrochemical WWTP, Iran.19 Measured variables are presented. |
Process unit | Mass balance | ||
---|---|---|---|
Q = total mass flow (density is assumed to be the same for all streams). | |||
1 | Screening | Qin1 − Q1 + Q11 | Flow (m3 per day) |
2 | API | Q1 − Q2 − Q9 − Q10 | |
3 | Equalization | Q2 + Qin2 − Q3 | |
4 | DAF | Q3 − Q4 − Q11 − Q12 | |
5 | Aeration | Q4 − Q5 + Q13 + Q17 | |
6 | Clarifiers | Q5 − Q6 − Q13 − Q14 | |
7 | Clarifiers | Q6 − Q7 − Q15 | |
8 | Sand filter | Q7 − Q8 + Q16 − Q17 |
![]() | ||
Fig. 8 Pareto optimal solutions for the setup of Behnami et al.19 determined using the experimental procedure of Le et al.17 and expressed in terms of accuracy and costs. The line with the filled circles denotes the Pareto-optimal front. ‘x’ = a solution. |
Behnami et al.19 used 19 flow measured variables in data reconciliation, 17 of which were considered measured additionally compared to the initially measured flows in the measurement campaign (detailed in Table C5a, Appendix C5, ESI†). This data set satisfied the main goal to identify all key variables. This set also belongs to the Pareto-optimal front obtained using the experimental design procedure. The set of additionally measured variables implemented by Behnami et al.19 provided the highest accuracy (accuracy fv = 1) but with the highest cost (cost fc = 170) (Fig. 8 and detailed in Table C5b, Appendix C5, ESI†). In this case study, the minimum additionally measured variables given by the experimental design procedure would be 7. Therefore, the maximum potential reduction in the number of additionally measured variables could be up to 59%. Overall, a similar conclusion to the case study from Puig et al.22 could be drawn. The set of additional measurements was relevant and allowed the identification of all the key variables, but alternative solutions involving fewer measurements and lower cost were found in this study.
First, only conservative variables can possibly be identified using data reconciliation and therefore qualify as key variables. Some variables related to the main goal cannot appear in the mass balances because they are not expressed in conservative quantities, so they cannot be put forward as key variables. For example, the mass flow of orthophosphate and total suspended solids in case study 3 (ref. 22) and case study 1 (ref. 20) were non-conservative.
A second important requirement to keep in mind during key variable selection is that all key variables must be identifiable for the set of mass balances considered. This is checked through a redundancy analysis of the system of mass balances, which is an integral part (step 4, Fig. B1, Appendix B, ESI†) of the experimental design procedure of Le et al.17 In case one or more key variables are not identifiable, the set of mass balances needs to be reviewed first. In some cases, the problem can be solved by adding mass balances. However, it could also be that some variables related to the main goal, even if they are conservative, cannot be identified because practical constraints make it impossible to close the corresponding mass balances (e.g. measurement of the gas phase components of an open tank with a large surface area).
Redundancy analysis is an essential part of experimental design, as it removes the dependent mass balances and checks the presence of all the key variables in the independent mass balances (detailed mathematical procedure can be found in the study by Le et al.17). This ensures the identifiability of all the key variables. The absence of this analysis might lead to unnecessary mass balances with dependent constraints irrelevant for the identification of the key variables, and associated irrelevant additional measurements. None of the case studies reported in the literature so far performed such redundancy analysis. As a result, irrelevant mass balances were set up in case studies 2 and 4 of Meijer et al.20,21 Subsequently, unnecessary additional measurements were performed.
Besides avoiding the use of mass balances which are irrelevant (not related to key variables) and or redundant (linearly dependent on other mass balances), another point of attention is to take into account the maximum amount available of independent mass balances containing key variables. For instance, in case study 5 of Behnami et al.,19 more mass balances could have been defined and additional associated variables could have been identified. More specifically, in the latter study, only 15 flows were actually reconciled and used for further calculation and process evaluation, while more than 100 variables were additionally measured. These additional measurements were not exploited to their full potential. In case also the mass balances of COD, phosphorus and nitrogen would have been set up, more key variables could have been defined and identified (reconciled) for this case study.
In the studies of Puig et al.22 and Behnami et al.,19 additional measurements were collected for all unknown variables that appear in the set of mass balances. The redundant data sets collected in these case studies corresponded with the most expensive (but most accurate) Pareto-optimal solutions identified with the experimental design procedure in this study. The study of Meijer et al.21 involved a relatively complex set of mass balances, which made it challenging to find the right additionally measured variables without a structured experimental design approach. As a consequence, the additional measurements performed in the latter case study missed two crucial additionally measured variables and not all key variables could be identified.
As a side note, it could be mentioned that the measurement accuracy will also influence their usefulness and added value. In the experimental design procedure in this study, the measurement accuracy is taken into account through their variance, which is incorporated in the objective function. Adding redundant sensors for variables which are already observable may lead to improved precision of the reconciled values or to improved sensor fault isolation, as demonstrated by Villez et al.18 through the concept of ‘surplus redundancy’. However, such a procedure was considered beyond the scope of the present study.
Overall, it is clear that more measurements do not necessarily lead to more information. Data gathering should only be done if one knows where to use the data for, i.e. once the main goal and key variables have been defined. Rather than measuring more, one needs to measure the right things. Balancing the number of measurements (costs) and the obtained accuracy of identified variables by staying on the Pareto-optimal front will avoid excess costs for additional measurements that do not add information. Besides the measurement costs as such, also overhead and costs associated with data management cannot be neglected – the costs for sensors are just the tip of the iceberg. Digitalisation of the water industry, a topic which has attracted a lot of interest,25 should therefore never be the goal as such.
The application of the design procedure is straightforward. It consists of seven steps, the first three of which require inputs from the user to organize all the collected information in one preformat input file: to translate the main goal(s) into key variables (step 1), to set up mass balances relating key variables to other, measured variables (step 2) and to inventory initially available data (step 3). Step 4 to step 7 are fully automated and are directed in finding the (optimal) solutions for any problem that can be formulated in the first three steps. For all case studies considered in this contribution, these last four steps took at most 10 seconds. This fast implementation makes one effectively and iteratively rework the set of mass balances and recheck input data and/or the key variable definition in case one or more key variables cannot be balanced/estimated.
The experimental design procedure is very flexible in providing alternative additional measurement sets. For a user-defined set of potential additionally measured variables, the experimental design procedure proposes several alternatives, all of which are Pareto-optimal. The user can then select a solution from the Pareto-front based on the available budget and expected accuracy. Note that additional approaches can be used to define the optimization problem in accordance with the main goal of the study. For instance, Villez et al.18 defined the sensor placement procedure as a trade-off between observability and cost in WWTPs, while other studies in the chemical engineering field also considered objectives such as reliability (= low probability of faults), precision or estimability.26,27 Nevertheless, the Pareto optimal front is considered an excellent option to visualize the solutions.
If problems are expected with the measurement of specific streams, e.g. because of safety issues or difficult access, these can be avoided upfront by discarding them from the set of potential additionally measured variables. Application of the experimental design procedure will then indicate whether the discarded variables are essential (in step 4: feasibility evaluation, see Fig. B1, Appendix B, ESI†) and if not, will propose alternative solutions. For example, in case studies 2 and 3, measuring internal recycling flows, which may be problematic, could be avoided by excluding them from the list of potential additionally measured variables.
While this study deals with measurement campaigns for WWTPs, similar experimental design methodologies for application to water distribution networks and sewer systems could be developed in the future.
– The application of the experimental design procedure was straightforward and could easily be adapted for different WWTP configurations and different main goals.
– Translating the main goal of a study into key variables is essential to find appropriate additionally measured variable sets. The key variables should be conserved quantities and need to appear in the set of mass balances considered for the system under study, in order to be identifiable during future data reconciliation. In three out of the five case studies from the literature applying expert judgement approaches, the main goal was not translated well into specific key variables and thus they were not well identified with additional measurements.
– A redundancy analysis, to check the identifiability of key variables for the considered set of mass balances, is an essential part of the proposed experimental design procedure. The optimal sets of additionally measured variables proposed using the procedure thus guarantee the identifiability of all the key variables through subsequent data reconciliation. This was not always the case in the literature case studies. This showed that more measurements do not necessarily lead to more information.
– Even though adequate additional measurements were proposed using the expert judgement approach, there were often too many measurements. With the structured experimental design procedure, about 40% to 70% fewer measurements were needed.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4ew00315b |
This journal is © The Royal Society of Chemistry 2025 |