Water quality estimates using machine learning techniques in an experimental watershed Article

Costa, D, Bayissa, Y, Barbosa, KV et al. (2024). Water quality estimates using machine learning techniques in an experimental watershed . JOURNAL OF HYDROINFORMATICS, 26(11), 2798-2814. 10.2166/hydro.2024.132

cited authors

  • Costa, D; Bayissa, Y; Barbosa, KV; Dias Villas-Boas, M; Bawa, A; Lugon, J; Silva Neto, AJ; Srinivasan, R

authors

abstract

  • This study aims to identify the best machine learning (ML) approach to predict concentrations of biochemical oxygen demand (BOD), nitrate, and phosphate. Four ML techniques including Decision tree, Random Forest, Gradient Boosting and XGBoost were compared to estimate the water quality parameters based on biophysical (i.e., population, basin area, river slope, water level, and stream flow), and physicochemical properties (i.e., conductivity, turbidity, pH, temperature, and dissolved oxygen) input parameters. The innovation lies in the combination of on-the-spot variables with additional characteristics of the watershed. The model performances were evaluated using coefficient of determination (R2), Nash-Sutcliffe efficiency coefficient (NSE), Root Mean Squared Error (RMSE) and Kling-Gupta Efficiency (KGE) coefficient. The robust five-fold cross-validation, along with hyperparameter tuning, achieved R2 values of 0.71, 0.66, and 0.69 for phosphate, nitrate, and BOD; NSE values of 0.67, 0.65, and 0.62, and KGE values of 0.64, 0.75, and 0.60, respectively. XGBoost yielded good results, showcasing superior performance when considering all analysis performed, but his performance was closely match by other algorithms. The overall modeling design and approach, which includes careful consideration of data preprocessing, dataset splitting, statistical evaluation metrics, feature analysis, and learning curve analysis, are just as important as algorithm selection.

publication date

  • November 1, 2024

published in

Digital Object Identifier (DOI)

start page

  • 2798

end page

  • 2814

volume

  • 26

issue

  • 11