[month] [year]

Rajesh Maddu

Rajesh Maddu supervised by Rehana Shaik received his doctorate in Civil Engineering (CE). Here’s a summary of his research work on Prediction of River Water Quality Variables using Hybrid Machine Learning Models under Data Uncertainties:

The impact of climate change on water quality variables is an essential topic for sustainable river water quality management in a warming environment and is a great environmental concern worldwide. River Water Quality (RWQ) models aim to simulate the behaviour of various water quality variables in response to pollutants, land use changes, and climate change. However, these water quality models suffer from sparse data leading to data uncertainty. In the past decades, different models have been successfully used for RWQ modelling under different spatial and temporal scales. To simulate RWQ variables, physically based water quality models can be used, but it requires large amounts of site specific detailed data, including stream geometry, meteorological variables, and hydraulic properties of the river, which are unavailable for many river systems globally. However, unlike process-based models, statistical models possess many advantages. Additionally, statistical models do not require a large number of input variables, which are unavailable for many ungauged river systems. However, accurately describing the nonlinear characteristics of a data series is a significant shortcoming of this approach. To overcome such limitations, artificial intelligence algorithms, i.e., Machine Learning (ML) techniques, are widely used to address a range of nonlinear prediction problems. Such models are suited for information extraction from sequential data in RWQ modelling, and it serves functionalities to build models using reduced numbers of variables with more accurate simulation. Machine Learning (ML) has been increasingly adopted due to its ability to model complex and nonlinearities between river water quality (RWQ) variables and their predictors (e.g., Air Temperature, AT, streamflow). Most of these ML approaches have been applied without any detailed sensitivity analysis to identify the most influencing variables to be considered in the RWQ variables prediction. Furthermore, the development of systematic models combined with ML under minimum data input variables has not been intensively studied in the prediction of RWQ variables. To address these, the present study first demonstrates how new ML approaches, such as Ridge regression (RR), K-nearest neighbours (KNN) regressor, Random Forest (RF) regressor, and Support Vector Regression (SVR), can be coupled with Sobol’ global sensitivity analysis (GSA) to predict accurate RWQ variables estimates. Changes in Air Temperature (AT) can affect River Water Temperature (RWT) under anthropogenic climate change, the primary variable that influences water quality. Therefore, the present study selected RWT as a water quality variable prediction with a tropical river system of India, Tunga-Bhadra River, as a case study. Further, the proposed ML approaches have been combined with the Ensemble Kalman Filter (EnKF) data assimilation (DA) technique to improve the predicted values based on the measured  data. The SVR has been noted as the most robust ML model when coupled with a global sensitivity algorithm and DA techniques to predict RWT at a monthly time scale compared to daily and seasonal. Another data uncertainty is the lack of availability of long-time series data to capture inter annual variability and consistent water quality measurement datasets in RWQ modelling. Globally, RWQ data availability is at monthly scales and is burdened with a large number of missing values with limited durations. In this context, the selection of appropriate model inputs, development of models under limited data, processing of non-stationary data, seasonality scenarios, and relevant lags of variables have not been intensively investigated in the literature, especially in the case of estimation of RWQ variables. Given the missing, limited, and non-stationary data scenarios, the present thesis developed hybrid models for RWQ variables prediction using Long Short-Term Memory (LSTM), integrated with (i) k-nearest neighbour (k-NN) bootstrap resampling algorithms (kNN-LSTM) to address the data limitations and (ii) discrete wavelet transform (WT) approach (WT-LSTM) to address the time frequency localised features. To demonstrate the prediction of RWQ variables and to assess the impact of climate change on the river water quality parameters, this study considered the two most important water quality variables, i.e., River Water Temperature (RWT) and saturated Dissolved Oxygen (DO) concentrations, and AT and lag variables as predictors. When WT and k-NN bootstrap resampling algorithms were included, LSTM outperformed the conventional models; hence these hybrid models are the new promising frameworks for RWQ prediction under data-sparse regions. Bayesian optimization is applied to optimise the hyperparameters of all applied ML models. The hybrid kNNLSTM has effectively predicted RWT for five catchment sites (i.e., Narmada, Cauvery, Musi, Godavari, and Ganga) out of seven catchment sites (i.e., Narmada, Cauvery, Sabarmati, Tunga-Bhadra, Musi, Godavari, and Ganga) at monthly time scales under data limitations and outperformed the standalone LSTM, WT-LSTM, and hybrid 3-parameter version of Air2Stream models (physical based RWT prediction model). Also, this thesis presents the combined effects of streamflow and AT in prediction of RWT using the kNN-LSTM model, LSTM model, a modified nonlinear regression model, and an 8-parameter version Air2Stream when applied to three major river systems of India (TungaBhadra, Musi, and Ganga). Results revealed that the kNN-LSTM model could predict RWT more accurately than the LSTM model, a modified nonlinear regression model, and an 8-parameter version Air2Stream model for all the three catchment sites. Overall, the study concluded that hybrid models consistently outperformed standalone models addressing the uncertainty due to data sparsity. The study assessed the climate change impacts on river water quality variables using an Ensemble of National Aeronautics Space Administration (NASA) Earth Exchange Global Daily Downscaled Projections (NEX-GDDP) with Representative Concentration Pathways (RCP) scenarios 4.5 and 8.5 for seven major polluted river catchments of India. For this assessment, the best performance hybrid kNN-LSTM model has been used for future predictions. The RWT increase for Tunga-Bhadra, Musi, Ganga, and Narmada basins are predicted as 3.0, 4.0, 4.6, and 4.7 oC, respectively for 2071-2100. Overall, RWT over Indian catchments is likely to rise by more than 3.0 °C for 2071- 2100. While river water temperatures (RWTs) are increasing under climate change signals, how climate change affects Dissolved Oxygen (DO) saturation levels in response to RWT have not been intensively studied. This thesis examined the direct effect of rising RWTs on saturated DO concentrations for seven major polluted river catchments of India at a monthly scale. The RWT reaches close to 35 oC, and decreases DO saturation capacity by 2%–12% for 2071–2100. Also, in this thesis evaluated the effect of climate change on DO saturation levels with respect to RWT and streamflow using the kNN-LSTM model forced with nine hypothetical climate change scenarios for three polluted catchments of India (Tunga-Bhadra, Musi, and Ganga). The largest DO decreases (13.22 %) were found in the Ganga catchment for selected climate change scenarios relative to the historical values. Overall, for every 1 oC RWT increase, there will be about 2.3 % decrease in DO saturation level concentrations over Indian catchments under climate signals. Overall, the study demonstrates how hybrid ML methods can be coupled with a global sensitivity algorithm, DA techniques, bootstrapping algorithms, and wavelets to generate accurate RWQ variables prediction under data uncertainties. Although the focus of our study has been limited to climate change impacts on RWT and DO saturations, the proposed hybrid ML modelling frameworks are generic and have the potential to incorporate other water quality parameters as well to make better decisions towards river water quality management.

April 2024