Yiqi Liua,
Yarong Songb,
Jurg Kellerb,
Philip Bondb and
Guangming Jiang*b
aSchool of Automation Science & Engineering, South China University of Technology, Wushang Road, Guangzhou 510640, China. E-mail: aulyq@scut.edu.cn
bAdvanced Water Management Centre, The University of Queensland, St. Lucia, Brisbane, QLD 4072, Australia. E-mail: g.jiang@awmc.uq.edu.au
First published on 15th June 2017
Concrete corrosion is a major concern for sewer authorities due to the significantly shortened service life, which is governed by the corrosion rate and the corrosion initiation time. This paper proposes a hybrid Gaussian Processes Regression (GPR) model to approach the evolution of the corrosion rate and corrosion initiation time, thereby supporting the calculation of service life of sewers. A major challenge in practice is the limited availability of reliable corrosion data obtained in well-defined sewer environments. To enhance the predictability of the hybrid GPR model, an interpolation technique was implemented to extend the limited dataset. The trained model was able to estimate the corrosion initiation time and corrosion rates very close to those measured in Australian sewers.
The development of corrosion on concrete sewers mainly results from the H2S in the sewer air.4–6 To control the corrosion problems in concrete sewers, many technologies have been devoted to remove or reduce hydrogen sulfide. Chemicals, such as nitrate or iron salts, are dosed to reduce the formation or emission of H2S.7–9 Other alternatives are to construct the new sewers with corrosion-resistant pipe materials or repair corroded concrete surfaces using corrosion-resistant mortar or polymer materials. To facilitate planning sewer maintenance and rehabilitation, proper estimation of the sewer service life, is critical in prioritizing limited resources. The sewer service life (L, year) is typically determined by the corrosion initiation time (ti, month) and the corrosion rate (r, i.e., concrete depth loss over time, mm per year).
Generally, both phenomenological and data-driven models can be used to predict the sewer service life. Phenomenological models are constructed based on the first principle models, whereas data-driven models are empirical models derived from historical data collected in the processes. Phenomenological models have received significant attentions recently. The well-known Pomeroy model was used to calculate the deterioration rate of concrete sewer pipes.10 These empirical models were widely used although it fails to take into account recent findings of the corrosion process and associated impacting factors. It is recently shown that both the corrosion initiation time and corrosion rate depend on various sewer environmental factors that include the H2S concentration, relative humidity and temperature.11,12 Additionally, it was recently discovered that the corrosion development can be facilitated by internal cracking which is caused by the formation of corrosion products that include iron oxides precipitating in concrete.13–15 Also, the corrosion initiation involves a combination of physical, chemical and biological processes.16 To build a proper relationship between input variables and responses, data-driven models are another alternative. Data-driven models17 including but not limited to partial least squares (PLS),18,19 Principle Component Regression (PCR),20 nonlinear PLS,21 support vector regression,22 artificial neural networks23,24 are widely studied as a predicting tool. Even if a good model can be developed successfully, its estimation performance could deteriorate with the effect of uncertainties.25–27 To account for uncertainty in input data, probability models capable of making full use of previous knowledge are more suitable for the prediction of uncertain measurements.28 Probability models can lead to a potential conclusion comparable to the models based on fuzzy logic.
Gaussian Process Regression (GPR) model is a new proposed distribution-driven methodology, which is not only able to model dynamic processes of both linear and nonlinear systems, but also to generate predicted distribution (interval prediction) rather than point prediction, to facilitate our decision making for service life prediction.29 Traditional models give a bare prediction without any associated confidence values and hence have to rely on the previous experience or relatively loose theoretical upper bounds on the probability of error to gauge the quality of the given prediction. On the contrary, the GPR model would become more flexible by associating confidence intervals to the predicted values. Also, through the choice of the covariance function, a wide range of modeling assumptions would be expressed to delineate unexplainable environmental factors of concrete corrosion.
A recent long-term project reported corrosion data over 4.5 years in laboratory corrosion chambers with well-controlled conditions simulating real sewers.8,9 They are so far the most comprehensive corrosion data covering both corrosion rate and corrosion initiation time obtained under a full range of environmental conditions, including H2S concentration, temperature and relative humidity. This paper developed the models based on these data. These corrosion chamber studies investigated the effect of locations within the sewer on corrosion, by exposing concrete to the sewer atmosphere (simulating the pipe crown) or partially-submerging in sewage (simulating the sewer tidal region at the sewage/air interface). Since the corrosion of gas-phase (GP) and partially-submerged (PS) parts exhibited significantly different corrosion features, this study constructed separate GPR models for the two corrosion hot-spots. Furthermore, hybrid automata were proposed to coordinate the predicted results of two GPR models and formed the hybrid GPR model. The discrete changes (GP or PS) were modelled using a form of transition diagram dialect similar to state charts, while the continuous changes were modelled using the GPR model.
For the concrete corrosion, one hindering factor for the GPR model is the limited availability of historical data and incompleteness of dataset. In this paper, we interpolated the missing positions in data by the estimated samples with similar characteristics from the observed historical data, thus allowing more information to improve model prediction accuracy. Consequently, we utilized this extensive dataset to build a hybrid GPR model to predict corrosion initiation time (ti) and corrosion rate (r), which can be used to estimate the service life for a specific sewer condition. Due to involving GPR as predicted models, both of nonlinear relationship among variables and uncertainty resulting from unexplainable factors can be approached properly. The performance and application of the proposed GPR model was further evaluated by comparison with a classical regression model, a neural network model and with observations in real sewers across Australia.
Fig. 1 Side-view of a corrosion chamber with the H2S concentration, relative humidity, and gas temperature controlled by a program logic controller (PLC). |
Thirty-six parallel corrosion chambers were established to simulate the real sewer environment controlled by a combinations of different factors, including three gas-phase temperatures (17 °C, 25 °C and 30 °C), two levels of relative humidity (RH) (100% and 90%) and six H2S levels (0 ppm, 5 ppm, 10 ppm, 15 ppm, 25 ppm and 50 ppm). These factors were chosen based on extensive literature review of concrete corrosion processes in sewers.30 Each chamber, 550 mm (L) × 450 mm (D) × 250 mm (H) in dimensions, contained 2.5 L of domestic sewage collected from a local sewer pumping station and replaced every two weeks. Other detailed constructions and installation of corrosion chambers were described previously.8,9
During the period of chamber operation for up to 4.5 years since 2009, one set of coupons (one pair of gas-phase and one pair of partially-submerged coupons) were periodically retrieved at intervals between 6 and 10 months. A standard step-by-step procedure of sampling and analysis was employed to measure surface pH, followed by sampling for sulfur species and then photogrammetry analysis (thickness change), which has been described in previous studies.8,9 Accordingly, the time to reach a detectable level of sulfate on the fresh concrete surface was regarded as the corrosion initiation time, ti. According to the previous experiments, the critical levels of sulfate were arbitrarily determined as 1 g S m−2 and 10 g S m−2 for the gas-phase and partially-submerged concrete coupons respectively, when the location of coupons and their actual sulfide oxidation rates were taken into consideration.12 The corrosion rate was calculated by mass loss data of corroded coupons as the thickness change per year (mm per year).
yi = f(xi) + εi | (1) |
ε ∼ N(0,σn2) | (2) |
f(·) ∼ GP(0,k(·,·)) | (3) |
y ∼ N(0,Ky) | (4) |
(Ky)ij = cov(yi,yj) = k(xi,xj) + σn2δij | (5) |
(6) |
In summary, the parameters needed to be identified for aforementioned GPR model are formulated as θ = (σf2, l, σn2), where l is the width of kernel k(xi,xj). The optimal θ can be achieved by optimize its corresponding likelihood function. The corresponding likelihood function is:
(7) |
The final procedure is to use the derived θ to facilitate further prediction. Since (y1, y2,…,yn, f(x*))T also following a Gaussian distribution, the prediction at the location x* can be obtained with the mean and variance:
E(f(x*)|D) = k*Ky−1y | (8) |
var(f(x*)|D) = k(x*,x*) − k*Ky−1k* | (9) |
By using hybrid automata, the transition behaviors of GP and PS can be accounted for properly as shown in Fig. 2. More details for hybrid automata definition can be seen in the ESI.†
Predicted variables | Data type | Training set | Testing set | Laboratory data validation | Field data validation |
---|---|---|---|---|---|
Corrosion initiation time-ti (months) | Original data | 20 | 10 | 10 | 4 |
Extended data | 200 | 80 | 10 | 4 | |
Corrosion rate-r (mm per y) | Original data | 26 | 10 | 10 | 17 |
Extended data | 208 | 100 | 10 | 17 |
To better define the comparative study, the models with extended data for training are defined as MLR-ex, RBF-ex and GPR-ex. On the contrary, the models with original data are formulated as MLR-or, RBF-or and GPR-or.
The number of input for all models is set to 3. A linear × SE × per kernel is used for GPR model due to the shape similarity of SVI. The Root Mean Square Error (RMSE) and correlation coefficient (rc) were used to assess the prediction performance of inferential model. The RMSE is defined as follows for quality comparisons of different models:
(10) |
Fig. 3 Prediction of corrosion initiation time of lab and actual data of concrete corrosion using MLR, RBF and GPR for both GP (left) and PS (right) sewers. |
Fig. 4 Comparison of RMSE and rc under lab and actual data testing using MLR, RBF and GPR for both GP (left) and PS (right) sewers. |
Secondly, RBFs are trained for GP and PS processes using the same data sets, respectively. The final structure of the RBF models for GP and PS using extended data have 23 and 21 neurons in the hidden layers. On the contrary, relative fewer neurons are obtained for the RBF models for GP (10 neurons) and PS (8 neurons) with original data. For all the scenarios, the activation functions are set to radial basis function. Even though RBF has the ability to approach nonlinear relationship between the explanatory variables and independent variables, requirement of large number of training data always make it inadequate, thus leading to even worse performance in terms of RMSE and rc (Fig. 4).
Following the MLR and RBF models, GPR models are used to analyze the same data set. The addictive Squared-Exp (SE) covariance function is selected for all GPR models. Even though, different GPR models are generated for GP and PS processes, all parameters can be identified automatically without resorting to trial and errors necessarily. Of all three models, GPR achieved the best performance for extended data testing and laboratory data validation.
After developing the hybrid GPR model to predict ti based upon the laboratory data, a further step was carried out to validate its performance using field data. The corrosion initiation time ti measured for all the field sites, including two Perth sewers and two Melbourne sewers, varied from site to site but were in the range of 9 to 24 months. Also, we compared the predictions of ti for the field sites among the MLR model, the RBF model and the GPR model. By comparing the fit between the predicted ti and measured ti for the four field sites it is clear that the GPR-or model achieved better accuracy for the prediction of ti than other models with the original data for prediction. It deserved to notice that the models with extended data indeed performed better than counterpart with original data in the lab data validation, but a little bit worse during the field data validation. This is mainly due to the fact that some conditions at the field sites were far beyond the ranges for those in the laboratory corrosion chambers.30
In particular, the Perth sewer site had very high H2S concentrations, up to 830 ppm in the gas phase, and high temperatures (up to 36.6 °C), and in that situation the MLR model predicted a negative ti. In terms of calculating the sewer service life, ti normally contribute little to the service lifespan of a sewer pipe which is designed to last 50–100 years. However, the prediction of ti is important to evaluate and optimize the effectiveness of a prevention strategy, such as sewer gas ventilation and chemical dosing in sewage, which is employed to prevent the initiation of corrosion. It is also important to predict the initiation of corrosion based on the operation of new sewer systems.
Fig. 5 Prediction of corrosion rate for laboratory and field data using MLR, RBF and GPR for both GP (left) and PS (right) sewers. |
Fig. 6 Comparison of RMSE and rc obtained with laboratory and field data testing using ANN, MLR and GPR models. |
To improve the prediction of corrosion rate (r), a RBF model was developed in a similar approach used for the prediction of corrosion initiation. The best architecture determined by exhaustive searching based on minimum error criterion (4-22-1 for RBF-ex under GP, 4-21-1 for RBF-ex under PS, 4-8-1 for RBF-or under GP and 4-8-1 for RBF-or under PS). The model was trained using the extended and original data sets of corrosion data obtained in the laboratory corrosion chambers and its corresponding interpolating data. The RBF model showed unacceptable performance in the extended data validation as well as lab data validation for the GP processes. On the contrary, the predictions under PS are relatively better than counterpart of GP in terms of RMSE and rc (Fig. 6). The RBF model demonstrated improved predictions of corrosion rates by interpolating proper data points in the lab data, while under-predicting the corrosion rates for the models with original data (Fig. 5).
Finally, the GPR models were evaluated to predict corrosion rate based upon the original laboratory data and its corresponding extended data. Modeling procedures are performed as RBF model. For all the scenarios, the GPR-ex and GPR-or for both GP and PS sewers achieved the best performance with RMSE being lower than 0.2 and rc being higher than 0.8 (Fig. 6).
By comparing the fit between the predicted and measured corrosion rate for the seventeen field sites it is clear that the GPR-or model achieved relatively better accuracy for the prediction of corrosion rate than other models with original data for prediction. It deserved to notice that the models with extended data indeed performed better than counterpart with original data for both of the lab data and field data validation. The prediction capacity of corrosion rate can be used to evaluate and optimize those corrosion prevention strategies.
Such ignorance will further result in uncertainties to frustrate decision-making. Different from RBF and MLR models using point prediction without taking into account uncertainties, GPR is able to generate variance to envelope uncertainties properly. Such envelopes represent different levels of confidence on the prediction results. It is obvious in Fig. 7 that the boundaries of GPR models for GPR-or are capable of enveloping most of the variations of real values approximately. Even though some predicted values crossed over the 90% confident limit, the predicted results for both of GPR-ex models (GP and PS sewers) are acceptable. To qualify how reliable obtained predictive regions are, we count the percentage of wrong predictive intervals (out of envelop); in other words, how many times the GPR model fails to give a predictive region that contains the real output of every test sample. The results in Table S1 in the ESI† show that the validity of GPR models is under 90%: the rate of successful predictions is at least equal to the desired accuracy. Fig. 7 complements part of the information given in Table S1† for predicting with 90% confidence. It shows that the prediction uncertainty is an important issue for a model with new data updating. During the transition stage, some of the input variables are adjusted to bring the process to a new steady state. The confidence would be widened due to the adjusted process variables derivative from steady state values.
Fig. 7 Uncertainty analysis of GPR models based on laboratory and field corrosion data for both of GP and PS sewers. |
Different from traditional black-box model-based prediction, GPR model is able to generate the confident levels to describe the uncertainty from the model parameters as well as external unexplained factors. Also, due to the limited availability of historical data, traditional black-box models are not capable of capturing the nonlinear actual variations of concrete corrosion. Thus, it is imperative to analyze the derived data and extract more features to facilitate model building. By the deep investigation of derived data, the temperature and H2S data are interpolated properly to make sure a sufficient amount of historical data available for model building. On the contrary, due to too few data for RH, RH interpolation could result in deviation of original data and is therefore not considered in this paper.
Although most of the data used for modelling came from the lab experiments instead of real sewer networks, the results are still convincing and suitable for the model building, since the experiments were long-term (4.5 years) conducted in purpose-built corrosion chambers that has well-controlled conditions. Indeed, the controlled factors i.e. H2S gas-phase concentration, RH and temperature are demonstrated to significantly affect the corrosion processes thus they are indispensable for corrosion modelling. However, the corrosion data obtained in real sewers might be subjected to more complicated situations. Likely, the H2S concentration shows high variation due to the sewage flow dynamics and fluctuations of other environmental factors. Similarly, other critical but non-defined variables may still exist but have not been considered due to the confined corrosion knowledge. In order to achieve efficient modelling of concrete corrosion, further research is needed to determine the specific effects of those unexplained factors on the corrosion of concrete sewers.
In this study, we demonstrated the advantage of a new modelling approach in the prediction of sewer service life for better sewer corrosion management. While we used an advanced model giving realistic representations of corrosion, the model requires further verification through application to different sewer networks. Furthermore, although the corrosion of GP and PS sewers are divided and modeled separately, some GP sewer might be also subjected to short term submersion (similar to PS conditions). The prediction performance in such circumstances require further investigation and improvement through field studies, although the model has previously been demonstrated efficient to predict corrosion initiation time and corrosion rate properly in some sewers. This will be an important and interesting research question for future research.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c7ra03959j |
This journal is © The Royal Society of Chemistry 2017 |