Achieving unbiased predictions of national-scale groundwater redox conditions via data oversampling and statistical learning
Science of the Total Environment, https://doi.org/10.1016/j.scitotenv.2019.135877
Wilson, S.R., Close, M.E., Abraham, P., Sarris, T.s., Banasiak, L., Stenger, R., Hadfield, J.
An important policy consideration for integrated land and water management is to understand the spatial distribution of nitrate attenuation in the groundwater system, for which redox condition is the key indicator. This paper proposes a methodology to accommodate the computational demands of large datasets, and presents national-scale predictions of groundwater redox class for New Zealand. Our approach applies statistical learning methods to relate the redox class determined on groundwater samples to spatially varying attributes. The trained model uses these spatial variables to predict redox status in areas without sample data. We assembled the groundwater sample data from regional authority databases, and assigned each sample a redox class. A key achievement was to overcome the inﬂuence of sample selection bias on model training via oversampling. We removed additional bias imposed by imbalances in the predictor variables by applying a conditional inference random forest classiﬁer. The unbiased trained model uses eight predictors, and achieves a high validation performance (accuracy 0.81, kappa 0.71), providing good conﬁdence in model predictions. National maps are provided for redox class and probability at speciﬁed depths.
Feature importance rankings indicate that reducing conditions are associated with poorly-drained soils, and to a lesser extent, high hydrological variability, low elevation, and low-permeability lithology. These conditions are common in New Zealand's coastal and lowland plains, where artiﬁcial drainage is required to make land suitable for production. The spatial extent of reduced groundwater increases with depth, suggesting a shallow inﬂuence of soil inﬁltration or mobile organic carbon, and a deeper inﬂuence of lithological electron donors.
Our model provides unbiased predictions at a scale relevant for environmental policy development and legislation. Identifying where the ecosystem service provided by denitriﬁcation can be utilised will enable spatially targeted interventions that can achieve the desired environmental outcome in a more cost-effective manner than non-targeted interventions.