M.
Gabrielli
a,
F.
Trovò
b and
M.
Antonelli
*a
aDipartimento di Ingegneria Civile e Ambientale (DICA), Politecnico di Milano, Piazza Leonardo da Vinci 32, Milano, 20133, Italy. E-mail: manuela.antonelli@polimi.it
bDipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano, Piazza Leonardo da Vinci 32, Milano, 20133, Italy
First published on 12th April 2022
The semi-arbitrary selection of water monitoring frequencies and sampling instants conducted by water utilities and regulatory agencies does not guarantee the identification of the maximum contaminant concentration or the extent of the daily variations present in fast-responding water systems, potentially leading to erroneous evaluations of process performances or human health risk. Hence, this work proposes two novel methods to optimize temporal monitoring schemes dealing with daily contaminant concentration patterns to select the sampling instants characterized by the maximum concentration or the maximum daily variation, while, coincidentally, limiting the number of samples analysed. The corresponding algorithms, based on the multi-armed bandit framework, were termed Seq(GP-UCB-SW) and Seq(GP-UCB-CD). While the first algorithm passively adapts to daily pattern changes, the other actively monitors the sampled concentrations providing change detection alerts. The algorithms' application to monitoring of drinking water distribution systems has been compared against traditional schemes on two synthetic scenarios derived from full-scale monitoring campaigns regarding chemical or microbiological contaminants and directly employing high-frequency flow-cytometry data. Compared to traditional schemes, the algorithms demonstrate better performances, providing lower differences between the observed and true target values (i.e., maximum concentration or maximum concentration variation) with a reduced number of samples per day, being also resilient to pattern changes. Following a sensitivity analysis, we provide practical guidance for their usage and discuss their applicability to other water matrices and highlight possible modifications to handle different usage scenarios and other pattern types. The application of the developed algorithms results in lower monitoring costs while providing detailed water contamination characterization.
Water impactThis study proposes two automatic online algorithms to optimize temporal monitoring schemes to target maximum concentration or maximum concentration variations of daily contaminant concentration patterns, while, coincidentally, limiting sampling costs. These algorithms overcome the constraints of current monitoring schemes which do not provide guarantees on the monitored concentrations. Algorithms' validation on drinking water full-scale data proved their robust applicability. |
Several studies showed how the concentrations in water of several contaminants can change smoothly throughout the day resulting in stochastic but reproducible daily patterns.3–5 Noteworthily, such daily patterns also change over longer time scales, likely due to the variations of the surrounding environmental conditions and/or anthropic activities responsible for their occurrence.6,7 Remarkably, these daily concentration patterns arise due to several causes in fast-responding water systems such as surface water, shallow groundwater, and water distribution and collection systems. For instance, in drinking water production and distribution, patterns can be caused by variations in the source water of the drinking water treatment plants or even by the daily variation of operating and conditions in the treatment plants and drinking water distribution systems.3,8–12 Such evidence highlights how monitoring schemes should take into account the possible presence of daily contaminant concentration patterns.3
Monitoring the temporal variability of water contaminant concentrations has recently become more accessible thanks to the recent advancements of online analytical instrumentation (e.g., flow cytometers, gas chromatographs, ATP meters) which have increased the range of chemical and microbiological parameters measurable.8,11,13–15 However, compared to electrochemical sensors, these new instruments are characterized by non-negligible capital and operating costs and the need for increased maintenance in case of high sampling frequencies.5,11
While high monitoring frequencies using such instruments have uncovered relevant contaminant concentration fluctuations,3,9,16,17 such intensive campaigns are not sustainable by water utilities or environmental protection agencies for long periods due to budget constraints. Hence, sampling frequencies are arbitrarily reduced by the operators to limit costs, having legislative compliance as the only constraint for the sampling frequency selection. Together with the fact that sampling instants are chosen arbitrarily, this results in monitoring schemes which do not guarantee the effectiveness of the monitoring campaign, potentially leading to miss relevant fluctuations.5,6,18 Moreover, different contaminants might require different monitoring schemes. For instance, the identification of maximum concentrations should be the focus when monitoring contaminants linked with a direct human or environmental risk to ensure that no concentration exceeds the acceptability thresholds and connected risks are not underestimated.9 In cases where no direct risk is present, e.g., measurement of total bacterial concentrations, monitoring should focus on detecting the variability to obtain information regarding process stability as legislative compliance is often based on its variability.3,19
The use of event-based sampling, already proposed for transient events,20 constitutes an efficient monitoring strategy when the causes of contaminant concentration patterns are easily identifiable and measurable (e.g., well abstraction rates21). Conversely, this approach is not feasible in the case where the daily patterns either arise from the sum of several minor events (e.g., domestic water uses4) or have no explicit direct cause.6 In this case, the solution proposed by Gabrielli et al.6 could be adopted. However, this method requires manual selection of the monitoring scheme based on an initial high-frequency monitoring period of arbitrary duration to gather information on the pattern present. Therefore, as the daily concentration pattern might vary with time, periodical checks are required to evaluate if the initial calibration is adequate for the current pattern. Remarkably, general guidelines have already been proposed for the selection of sampling times for calibrating hydrologic models.22 However, such guidelines cannot be applied in the case of daily contaminant concentration patterns, as they focus on collecting a few samples from transient events to calibrate water discharge models.
The absence of prior information on the process of interest and the capability to gather information during the operating life of the system, adapting to possible changes, are commonly addressed in the Machine Learning field by Online Learning techniques.23 Specifically, the problem of determining the optimal sampling time can be modelled with the Multi-Armed Bandit (MAB) framework, a decision-making approach commonly used in advertising, internet routing, and other applications.24 While active sampling approaches have already been used for environmental monitoring applications (e.g., to improve hydrologic model calibration25 and identify anomalous sensors' data26), such methodologies do not fully exploit the guarantees provided by the MAB framework.
Within the MAB framework, a learner is presented with a set of available options, which can be selected each time over a finite time horizon. The learner starts with no prior information on the available options and he can observe only the realization of the options selected each time.27 Over the time horizon, the learner balances between the characterization of the available options (exploration) and the selection of the one they believe as optimal (exploitation), to either identify the optimal option with high probability or to minimize the loss accumulated over time due to the choice of sub-optimal decisions. Several algorithms have been proposed to achieve such goals while, at the same time, providing theoretical guarantees.27–30 While classical MAB techniques assume that the processes are stationary, i.e., they have constant behaviour over time, recently, a new set of techniques for non-stationary MAB settings have been proposed and showed promising results in a wide range of applications in the Internet advertising and dynamic pricing fields, but not environmental monitoring.31–34 This framework is usually described as a slot machine game with several arms characterized by different rewards, which in the non-stationary case might change as the game progresses. At the beginning of the game, the player will pull the arms randomly, not having any previous knowledge of the rewards, while, as the game progresses, they will focus on the most promising arm, pulling the others less frequently. The exploitation/exploration dilemma derives from the fact that the player will have to decide whether to pull the arm they consider as the best or a more uncertain one, possibly discovering a better performing arm, especially in the non-stationary case, as the arms' rewards might change over time.
In this work, we propose two novel methods to optimize temporal monitoring campaigns targeted for monitoring campaigns using advanced online instrumentation and dealing with daily contaminant concentration patterns. The algorithms, based on the MAB framework, termed Seq(GP-UCB-SW) and Seq(GP-UCB-CD), aim to sample instants which are characterized by either the maximum daily concentration of a target contaminant or its maximum concentration variation without the need for external information (e.g. no available measurements of the daily pattern causes).
The proposed algorithms frame temporal sampling within the MAB framework: starting with no information on the monitored process, over time (i.e., as the monitoring campaign progresses), the proposed algorithms have to select an action (i.e., sampling at a specific time instant) among a set of available options (i.e., all the possible sampling instants). Resorting to the description of the above-mentioned toy example, the proposed algorithms assign each arm of the slot machine to the action of taking a sample at a specific time of the day. Every time that one arm is available (i.e., the time of the day corresponds to the specific sampling time) the algorithms decide to either pull that arm or not (i.e., sampling or not at that time instant). Over the monitoring period, the algorithms estimate a probability distribution of the various arms using the concentration of the contaminants measured in the previous samples and, depending on the target, select the most appropriate sampling instant. Indeed, to optimize the actions performed (i.e., sampling time instants presenting the target contaminant concentrations), they balance the trade-off between sampling the instants that are believed to correspond to the target concentration (exploitation) and getting measurements from promising sampling instants whose concentration estimate is not accurate enough (exploration). Thanks to these approaches, it is possible to sample the contaminant concentration only at the time instants likely to be useful to address the specific goal of the monitoring campaign, realizing a cost-effective and informative water quality monitoring system. Remarkably, this approach does not require any assumption on the monitored contaminant and, therefore, can be applied to any contaminant or water matrix of interest. In what follows, we describe the two novel algorithms and their components, and apply them in the field of drinking water distribution systems on: (i) two synthetic scenarios derived from full-scale monitoring campaigns, and (ii) a real-world scenario directly employing high-frequency flow-cytometry monitoring data, in order to show their exploitation for addressing daily concentration patterns representative of different water contaminants and two specific monitoring targets (i.e., the detection of the maximum daily concentration of a given contaminant, or its maximum daily variation). Then, we compare their performance against traditional monitoring schemes. Finally, after a sensitivity analysis of the algorithms' performances and discussing their use in different water matrices, we provide guidance on their use in other real-world scenarios.
To better identify sampling instants characterized by the target contaminant concentrations, both proposed algorithms take advantage of the temporal correlation which is present among the contaminant concentration in close sampling instants. Such a correlation is exploited by the combination of Gaussian Processes (GPs) and Upper Confidence Bound (UCB), namely GP-UCB, proposed by Srinivas et al.:35 GPs are used for modelling purposes and the UCB as a selection criterion.27
GPs allow unknown functions to be estimated starting from a set of noisy samples through a collection of Gaussian random variables governed by a predefined covariance function (also known as a kernel).36 In the developed algorithms, a Matérn kernel (ν = 2.5), together with a white noise kernel, has been used to capture the autocorrelation among sampling instants and their stochasticity. Moreover, the GP was adapted to properly capture the temporal proximity of samples taken at the end (e.g., 23:00) and at the beginning (e.g., 01:00) of the day.
The UCB criterion, a commonly used policy in MAB algorithms, selects sampling instants based on the principle of “optimism in the face of uncertainty”. Following this criterion, the sampling instants are chosen on a predefined statistical confidence bound,35 targeting instants in which the expected concentration is either highly promising or highly uncertain. When the algorithms are used for targeting maximum contaminant concentrations, only the time instants with the highest confidence bound are selected. Conversely, when targeting maximum daily variations, the time instants are chosen based on the highest and lowest confidence bounds.
To exploit the possibility to collect and analyse multiple samples per day provided by advanced online instruments, the Seq() meta-algorithm37 was adopted. The use of this meta-algorithm allows multiple actions to be selected per day. Indeed, as soon as a sample is analysed, its concentration is used to re-estimate the contaminant concentration pattern provided by the GP and identify the new sampling instant as the time with the highest, and eventually lowest, confidence bounds.
Fig. 1 illustrates the outcome of combining the three components (GP estimation, UCB criterion, and Seq() meta-algorithm) of the algorithms when targeting the maximum daily concentration in two consecutive sampling days. At day d, based on the concentrations measured in samples collected during previous days, the selected sampling time is at around 20:00, since it corresponds to the time having the highest confidence bound. Once the new measurement is available, the uncertainty bound is re-estimated by the GP, leading to a reduction of the uncertainty regarding the concentration at that time of the day. After such reduction, the next sampling instant is selected as the new time corresponding to the largest confidence bound. In Fig. 1, this happens at around 11:00 of the next day d +1, but, in case the largest UCB would have resulted at a later time (e.g., 22:00), this time instant would have been sampled during the same day d.
To adapt to concentration pattern changes Seq(GP-UCB-SW) trains the GP on the last n days, where n is the length of the sliding window, similar to what has been proposed by Garivier and Moulines.31 The pseudo-code for Seq(GP-UCB-SW) is shown in Algorithm S1.†
Seq(GP-UCB-CD), instead, similar to what has been proposed by Liu et al.,32 performs change detection through an online change-point method38 using the non-parametric scale-location Lepage test.39 Such a test, being non-parametric, does not require prior information on the monitored process and allows control of both changes in the variability and the central value of the monitored objective. Furthermore, this change detection test provides already-defined thresholds to limit the occurrence of false positive change detection alarms by controlling the average number of observations (i.e., the number of measured target contaminant concentrations) between two consecutive occurrences (commonly referred to as ARL0): it was applied either to the measured daily maximum concentration or the measured daily maximum, daily minimum, and daily maximum variation, depending on the monitoring objective. Seq(GP-UCB-CD) requires an initial training period (TW), during which the samples are assumed as independent and identically distributed, to let the GP learn the daily pattern appropriately and correctly identify the instant to sample before starting the detection of target value changes. To favour the detection of changes occurring throughout the whole day, after each sampling event Seq(GP-UCB-CD) randomly selects the next sampling instant with probability α, called exploration percentage. Note that, due to the self-starting capabilities of Seq(GP-UCB-CD), before detecting any change, it requires a minimum number of observations after the initial TW which are assumed without pattern changes. The pseudo-code for Seq(GP-UCB-CD) is shown in Algorithm S2.†
Notice that both Seq(GP-UCB-SW) and Seq(GP-UCB-CD) provide an unbiased estimate of the maximum (or maximum and minimum) contaminant concentrations and temporal location. Such estimates are obtained for each monitoring day through a Monte Carlo approach drawing 100 GP realizations to estimate the probability that each time instant corresponds to the maximum (or minimum) contaminant concentration and using those probabilities to perform a weighted average over the concentrations used to train the GP, similar to what was proposed by D'Eramo et al.40
The first performance metric is the relative difference between the target values observed by a monitoring scheme and their true values occurring each day (RDOT). Formally:
RDOT = (vobs − vtrue)/vtrue, |
The second metric is the number of samples per day, namely SPD [day−1], requested by the monitoring scheme. Such a metric is used as a proxy for the operating costs due to both reagents used for the sample analyses and instrument maintenance. Therefore, the smaller the number of samples requested, the better the algorithm performs in terms of operational costs, but, in general, the worse the estimation task is fulfilled.
The first scenario was derived from the hourly Intact Cell Count (ICC) measurements provided by Nescerescka et al.43 The measured pattern shows a constant baseline concentration with two short-lived ICC peaks which were modelled using a constant baseline and two Gaussian-shaped peaks (Fig. S1†). An uncertainty equal to the analytical uncertainty specified by the authors of the study was used to introduce stochasticity in the simulated patterns. An abrupt shift of 1 h in the occurrence of the ICC peaks was manually imposed on the daily pattern after 90 simulation days to mimic a possible change caused by variations in the pump scheduling, water demands or drinking water treatment plant operations3,11,21 (Fig. S1†). In this scenario, the monitoring schemes were evaluated targeting the maximum variation in terms of concentration, as the ICC is not linked to consequences on human health44 and legislations often focus purely on its variations.19 This scenario can be considered as representative of real-world involving complex daily concentrations patterns, presenting rapid concentration variations, multiple contaminant peaks throughout the day and abrupt pattern changes. Such characteristics occur commonly in microbiological concentrations in drinking water due to treatment plant management changes and peaks in water demands.3,4,6,11,12,43
In the second synthetic scenario, trihalomethanes (THMs) are considered as the target contaminant. The stochastic daily concentration pattern used in this scenario was generated based on the model formulated by Chaib and Moscandreas,9 derived from 7 weeks of THM analyses performed every 4 hours in a full-scale system. This daily pattern presents a continuous variation of the THM concentration throughout the day with a single broad peak around midday (Fig. S2†). Stochasticity in the daily concentration pattern was obtained considering both the uncertainty regarding the amplitude of the daily THM fluctuations and their periodicity, as indicated in the original study. Furthermore, a gradual seasonal change in the daily pattern shape was simulated by shifting the THM concentration peak gradually by 6 h between the 70th and 120th simulation days in accordance with the seasonal differences found in Wang et al.45 (Fig. S2†). Due to the presence of a legislative maximum allowed for THM concentrations and the presence of a direct human health risk,46,47 in this scenario the monitoring schemes were evaluated for the identification of the sampling instant revealing the maximum concentration. This latter scenario, characterized by more gradual concentration changes, can be considered representative of simple contaminant concentration patterns resulting from the variation of environmental conditions (e.g., temperature, light).7,9,45
Similar to Fig. 2, the results obtained in the THM synthetic scenario are shown in Fig. 3 and S5 and S6,† with the RDOT being evaluated against the maximum daily concentration (RDOTmax). Different from the previous synthetic scenario, the daily THM pattern is subjected to a gradual change, representing a possible seasonality45 and, for this reason, the evolution of the evaluation metrics obtained by each monitoring scheme during the whole period is shown. In general, compared to the previous scenario, a higher RDOT (i.e. a more accurate estimate of the target value) is achieved by all monitoring schemes. In addition, fixed-time sampling instant combinations at higher SPD values show a reduced variation of the RDOTmax values throughout the simulations, due to the broadness of the concentration peak. However, the results of this scenario agree with what has been observed previously: (i) the performance of the traditional monitoring schemes increases with larger SPD values, (ii) random sampling provides average performances, but is resilient to pattern changes and (iii) fixed-time sampling, RDOTmax is not resilient to pattern dynamicity and presents performances which vary significantly among different sampling instant combinations (Fig. S5 and S6†). The proposed algorithms obtain very similar performances in terms of both RDOTmax and SPD values, resulting in a RDOTmax which is matched by traditional schemes only using more than two times the number of samples per day. It is possible to see how, during the gradual pattern change, both algorithms suffer from a slight decrease in RDOTmax and increase temporarily their SPD in order to readapt the pattern estimate performed by the GP to the new pattern. However, while Seq(GP-UCB-SW) results in a smooth change of RDOTmax and SPD values during the simulation, Seq(GP-UCB-CD) adapts to the gradual change only after detecting its presence (Fig. S7†), resulting in a stepwise adaptation to the pattern change.
Focusing exclusively on the two proposed algorithms, the sliding window approach implemented in Seq(GP-UCB-SW) achieved a higher RDOTdelta (approximately 7%) than the active change detection test adopted by Seq(GP-UCB-CD), employing on average only 19.5 more samples over the entire 5-months period. On the other hand, Seq(GP-UCB-CD) provides pattern change alerts, detecting in most simulations their occurrence (Fig. S8†) before and after the monitoring gaps, as confirmed by the inspection of the full original dataset (Fig. S3†).
Seq(GP-UCB-SW) | Seq(GP-UCB-CD) | |||||
---|---|---|---|---|---|---|
SW = 10 d | SW = 15 d | SW = 30 d | TW = 10 d | TW = 30 d | ||
ICC scenario | −0.575 (0.005) | −0.453 (0.006) | −0.337 (0.006) | α = 0.05 | −0.235 (0.007) | −0.223 (0.007) |
α = 0.075 | −0.250 (0.009) | −0.236 (0.007) | ||||
α = 0.15 | −0.297 (0.007) | −0.284 (0.007) | ||||
THM scenario | −0.055 (0.001) | −0.053 (9 × 10−4) | −0.058 (0.001) | α = 0.05 | −0.049 (9 × 10−4) | −0.048 (8 × 10−4) |
α = 0.075 | −0.049 (9 × 10−4) | −0.048 (8 × 10−4) | ||||
α = 0.15 | −0.051 (0.001) | −0.051 (9 × 10−9) |
Focusing on the results of Seq(GP-UCB-SW), we can see how a significantly different behaviour exists in the two scenarios, as such scenarios represent pattern changes with different complexities and change types. In fact, the performance of Seq(GP-UCB-SW) continues to increase as the sliding window length increases in the ICC scenario, while such performance peaks with a sliding window equal to 15 days in the THM scenario.
About Seq(GP-UCB-CD), short training windows reduce the obtained RDOT, since they affect the estimation of the pattern shape leading to an increased presence of false-positive change detection (Fig. S9†). Similar to what has been observed for Seq(GP-UCB-SW), this effect is less evident for the THM scenario, due to its lower complexity. Instead, an increase in the values of α is connected to worse performances, as Seq(GP-UCB-CD) will choose more frequently a sampling instant not connected to either maximum or minimum concentrations.
The sensitivity analysis of the algorithm parameters in the real-world scenario is shown in Fig. 6. As discussed beforehand, an excessive or an overly short Seq(GP-UCB-SW) sliding window length impacts both the RDOTdelta achieved and the number of samples per day analysed. Regarding Seq(GP-UCB-CD), an excessively long training window TW results in decreased performances, as different patterns might be included in this window. In addition, as no change detection is performed during this initial period, an excessive training period will also limit the possibility to detect changes and adapt accordingly. As already noted, a clear decrease in the average RDOTdelta is obtained in the case of an excessively large α value. However, an appropriate percentage of exploratory samples is needed to improve the worst-case performance of the algorithm and to properly control the concentration throughout the whole day. Indeed, while the difference between the average RDOTdelta with α equal to 0.05 and 0.075 is small due to the limited α variation, the worst-case performance, represented by the 5th quantile, shows a larger difference (i.e., −0.56 with TW = 20 d and α = 0.05; −0.54 with TW = 20 d and α = 0.075).
Selecting every possible time instants with equal probability, as done by random sampling, provides an estimate of the target values resilient to pattern changes; anyway, it does not allow their true value to be properly characterized, as noted by Gabrielli et al.6 and highlighted by the mediocre RDOT values in Fig. 2–5. In practical terms, although changes in target contaminant concentrations are detected by a monitoring scheme implementing the random sampling, looking at the average values of the analysed samples, it is not possible to accurately observe the contaminants' target value every day. Consequently, erroneous evaluations of the process stability and water quality could be drawn, for example, regarding the temporal stability of ICC concentrations affected by treatment or distribution.3
Focusing exclusively on selected sampling instants and neglecting the others, as done by fixed-time sampling, might lead to the true target contaminant concentration being missed, due to: (i) misspecified sampling instants (e.g., fixed-time sampling instant combinations with poor performances both before and after the pattern change in Fig. 3), or (ii) inconclusive information on the observed variation which cannot be attributed to a change in the maximum and minimum daily concentrations or just to a change in the time of their occurrence (e.g. fixed-time sampling is unable to catch the shift of the maximum THM concentrations due to differences in water retention times and temperature profiles as in Fig. 3 and S2, S5 and S6†).9,45 Such erroneous evaluations might result in inadequate, or even harmful, interventions. For example, erroneously-observed reductions in THMs, as highlighted in Fig. 3, might lead drinking water treatment plant managers to relax the treatment steps dedicated to their removal, potentially increasing consumers health risk. Similar results could occur in case of increases in THM concentrations at times different from the ones sampled and which might go undetected. In fact, selecting a sampling combination with the best performance during one period (e.g., selected using a preliminary sampling campaign as proposed by Gabrielli et al.6) does not solve this problem, as daily patterns might change unpredictably. Furthermore, these issues are particularly relevant when employing low sampling frequencies (i.e., in our scenarios SPD < 6 d−1), as the increasing number of possible sampling instant combinations reduces the probability of selecting the best combination without a priori information.
The proposed algorithms, instead, actively make use of the collected samples to select the successive sampling instants, resulting in performances resilient to pattern changes, but ensuring lower operating costs (with SPD being a proxy, see section 2.2). For example, comparable RDOT could be achieved only by more than two times higher SPD values (i.e., operating costs) in the scenarios investigated. Noteworthily, such performances are obtained without any a priori or external information on the monitoring process, removing the need for explicit human intervention. In case of pattern changes, their performance will temporarily drop, as shown in Fig. 3 and 4, but with a limited number of samples the new pattern would be successfully learned, resulting in again high performances which, in the tested scenario, allow the total cell concentration to be effectively monitored and anomalous variations to be properly assessed, which could have been missed otherwise. Comparing the two algorithms, the better RDOTdelta obtained by Seq(GP-UCB-SW) in the real-world scenario highlights the flexibility of the sliding window approach for the adaptation to generic changes in the data.53 In fact, active approaches, as the one used by Seq(GP-UCB-CD), are usually not well suited for gradual or complex pattern changes and can possibly lead to a significant delay before the change detection and the subsequent adaptation.38 However, such loss in RDOTdelta is compensated for by the ability to actively detect changes in the daily concentration pattern and to provide alerts (e.g., Fig. S8†), which could trigger additional investigations to reveal the cause of the change, aiding the management of the infrastructure. Nonetheless one must take care to avoid an excessive number of false alarms, as such events could be problematic for water utilities and environmental protection agencies both due to the costs for the verification of the change origin and the decrease of the trust in the events' detection.54
The training window length TW must be set accordingly to the complexity of the daily pattern expected in order to allow the Seq(GP-UCB-CD) algorithm to properly learn the pattern and avoid excessive false positive alarms, as highlighted by the sensitivity analysis on the ICC scenario. It should be stressed that any operation which might affect the monitored contaminant or its pattern should be avoided during this period (e.g., change filters and/or its backwash schedule), since uncontrolled conditions during the initial training might limit the algorithm's ability to learn the water quality pattern and the change detection performances.38
The value of α should reflect the degree of stochasticity in the pattern occurrence and should not be set too small to prevent excessively low worst-case performances. Hypothetically, if the time instants of the maximum (and minimum) concentration were known to be fixed, the best performances would be obtained with α = 0. On the other hand, in the case of a completely random concentration pattern, the most appropriate value should be 1, as no single time instant could be considered as having the maximum (or minimum) concentrations. Such consideration explains the different results of the sensitivity analysis conducted on α: the optimal value of α lies below 0.05 in the synthetic scenarios due to their lower pattern stochasticity (i.e., the best sampling locations are more repetitive due to the simpler pattern changes) (Table 1), while to properly account the real-world data stochasticity a value of 0.075 is needed (Fig. 6).
Regarding the choice between the two algorithms, in our opinion, Seq(GP-UCB-SW) is more suited when complex pattern dynamicity might be present, or in the case where it is not possible to provide controlled conditions during the initial Seq(GP-UCB-CD) training phase due to its continuous pattern adaptation. Furthermore, the misspecification of the sliding window length appears to affect less Seq(GP-UCB-SW) performances, compared to the use of suboptimal parameters for change detection. On the other hand, Seq(GP-UCB-CD) is more suited in the case of more controlled situations, e.g., in drinking water treatment plants, where deviations from the normal conditions must be actively identified and notified as soon as possible to minimize possible negative outcomes (e.g., the distribution of contaminated water).
In any case, basic knowledge of the concentration pattern which is expected aids the algorithm parametrization. In general, changes in the environmental conditions (e.g., day/night cycles) generically lead to smooth and simple concentration patterns of chemical contaminants (e.g., THM scenario, Wang et al.45), which likely vary gradually throughout the year, thus requiring shorter sliding and training windows. On the other hand, concentrations of microorganisms and of chemicals linked with intermittent human activities (e.g., ICC and real-world scenarios, Besmer and Hammes,3 Favere et al.,11 Buysschaert et al.12) can result in complex patterns (i.e., presenting drastic daily fluctuations), which might also change abruptly (e.g., within a few days), requiring the use of longer sliding and training windows. A general indication on the best algorithms and suggested parameters' values as a function of the target value, pattern complexity and change type can be found in Table 2. The parameter values need to be considered as a general indication, which needs to be adapted to the characteristics of each specific case study. In particular, in the case of high pattern stochasticity, the value of the sliding window of Seq(GP-UCB-SW) should be slightly decreased, i.e. by 2–3 days, in order to sample more often all the possible sampling instants. The same effect can be obtained in Seq(GP-UCB-CD) by increasing the value of α, i.e. 0.025–0.05. To obtain the best-performing and case-specific parameter values, it is advised to test the algorithms' performances using different parametrizations on synthetically generated time series based on historical data.
Target value | Pattern complexity | Pattern change type | Algorithm | Suggested parameters' values |
---|---|---|---|---|
Max concentration | Simple | Abrupt | Seq(GP-UCB-CD) | TW = 10 d |
α = 0.05 | ||||
Gradual | Seq(GP-UCB-SW) | SW = 10 d | ||
Complex | Abrupt | Seq(GP-UCB-CD) | TW = 17 d | |
α = 0.075 | ||||
Gradual | Seq(GP-UCB-SW) | SW = 15 d | ||
Max concentration variation | Simple | Abrupt | Seq(GP-UCB-CD) | TW = 15 d |
α = 0.05 | ||||
Gradual | Seq(GP-UCB-SW) | SW = 13 d | ||
Complex | Abrupt | Seq(GP-UCB-CD) | TW = 20 d | |
α = 0.075 | ||||
Gradual | Seq(GP-UCB-SW) | SW = 17 d |
As other water matrices might be more affected by environmental conditions, the developed algorithms could be extended to include the use of external information to handle their aperiodic effects. In case a triggering event is known to affect the concentration of the monitored contaminant, Seq(GP-UCB-SW) or Seq(GP-UCB-CD) could be coupled with event-based sampling.20,41 In such a case, the proposed algorithms would indicate the sampling times during normal conditions (e.g., dry weather), while external information could trigger a threshold for event-based sampling (e.g., rainfall), possibly still calibrated using MAB strategies. In other cases, where a triggering event is not easily identifiable, a possible alternative is the use of contextual bandit techniques,61 which infer the relationship between external information (e.g., meteorological conditions and/or other easily-monitorable water parameters) and the targeted contaminant concentration.
In any case, even though Seq(GP-UCB-SW) and Seq(GP-UCB-CD) have been developed to tackle the presence of daily contaminant concentration patterns, they can also be used when no apparent pattern is present (yet) and adapt to its onset, regardless of the water matrix. In such a case, Seq(GP-UCB-SW) results in a mostly uniform sampling of all the available sampling instants (Fig. S10†). On the other hand, Seq(GP-UCB-CD) focuses most of the samples on a single sampling instant, exploring the remaining ones based on the specified α (Fig. S11†).
Regarding manual sampling, as already noted by Ekklesia et al.,63 sampling in the same location more than once per day might not be practical. For monitoring campaigns targeting the maximum variability, a practical workaround is to sample the time instant corresponding to the maximum concentration at a given day and wait for the next sampling day to sample the time instant corresponding to the minimum. Finally, it is worth noticing that, as routine manual sampling is restricted to working hours (e.g., 8:00–17:00), no information can be obtained for the rest of the day, possibly neglecting relevant events. Autosamplers, instead, can be programmed to collect samples at any time of the day for multiple days.64 However, the analysis is performed only later, limiting the update of the algorithms. For this reason, the frequency of the analysis of the collected samples needs to be adjusted to avoid errors due to the use of outdated information. While autosamplers could also be used to collect composite samples, the use of this technique would lead to the collection of averaged concentrations without the possibility to identify short-lived concentration peaks.14
Finally, it can be of interest to monitor at the same time different contaminants possibly characterized by different best sampling times (e.g., different concentration peak times). As the algorithms have been designed to focus on a single contaminant (either as a single compound or as a sum of compounds from the same chemical family, e.g., THMs), two options are available depending on the aim of the monitoring campaign. In case the concentration of every single contaminant is of interest, the solution would be to use one algorithm for each contaminant and take a sample every time it is suggested by any of the algorithms. Even though in each sample the target value is expected only for a few, or even only one, of the monitored contaminants, it is advisable to carry out the analysis of the entire set of monitored contaminants in each sample. In fact. This aids the estimation of the daily concentration patterns of all the targeted contaminants, resulting in a quicker identification of the best sampling instants and, overall, a lower number of samples analysed. To further reduce monitoring costs linked with the use of different analytical instrumentation, it could be possible to limit the analyses to only the contaminants requiring the same analytical method as the one expected at its target value. The other option consists in the use of the developed algorithms based on an aggregated index estimated from the concentrations of the contaminants of interest. While the sampling instants selected will likely not be characterized by the target concentration of any specific contaminant, such a strategy would be suitable for monitoring campaigns focused on properties which arise from mixtures of contaminants as, for example, the cumulative risk.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2ew00089j |
This journal is © The Royal Society of Chemistry 2022 |