Identification of the causes of drinking water discolouration from machine learning analysis of historical datasets
Abstract
Understanding the processes and interactions occurring within complex, ageing drinking water distribution systems is vital to ensuring the supply of safe drinking water. While many water quality samples are taken for regulatory compliance, the resulting data are often simply archived rather than being interrogated for deeper understanding due to their sparse nature across time and space and the difficulties of integrating with other data sources. This paper opens a new direction of research into distribution system water quality by mining large, historical drinking water quality datasets using machine learning techniques, in this case self-organizing maps (SOMs). Application of the methodology to national-scale datasets from three different UK water companies demonstrates the ability to identify the dominant mechanisms of iron release. Factors leading to discolouration such as low disinfectant residual, nitrification, and corrosion of unlined cast iron mains were identified at scales ranging from city to country, thereby enabling targeted interventions to ensure drinking water quality.