Outlier detection and gap filling methodologies for low-cost air quality measurements†
Abstract
Air pollution is a major environmental health problem around the world, which needs to be monitored. In recent years, a new generation of low-cost air pollution sensors has emerged. Poor or unknown data quality, resulting from the intrinsic properties of the sensor as well as the lack of a consensus on data processing methodologies for these sensors, has, among other factors, prevented widespread adoption of these sensors. To contribute to the creation of this consensus, we reviewed the available methodologies for quality control, outlier detection and gap filling and applied two outlier detection methodologies and five gap filling methodologies to a case study (consisting of an 11-month long air quality data set from a low-cost sensor). We showed that erroneous data can be detected in a fully automated way, and that point and contextual outlier detection methodologies can be applied to low-cost air pollution data and yield meaningful results. The linear interpolation showed the best performance for gap filling for low-cost air pollution sensors. In conclusion, data cleaning procedures are important, and the presented methods can form part of a generalised data processing methodology for low-cost air pollution sensors.