Chepuri Shivani supervised by Dr. Harikumar K received his Master of Science – Dual Degree in Computer Science and Engineering (CSD). Here’s a summary of his research work on Synthesis of Response Shaping Reward Functions for Deep Reinforcement Learning based feedback control:
A control system aims to regulate the behavior of a dynamical system to achieve a desired output by adjusting the input. The control system design specifications for dynamical systems are typically provided in terms of the desired transient response and steady-state response. Performance measures like Rise Time, Settling Time, Steady-State Error, Control Effort, etc., help us understand the effectiveness of the designed controller in real-time. In traditional approaches, based on these specifications, we choose methods like root locus, Bode plot, and algorithms like Proportional Integral-Derivative (PID), Linear Quadratic Regulator (LQR), Model Predictive Control (MPC), etc. After tuning based on the allowed constraints, we may end up with a control system that meets our specifications. However, mathematical models of many real-world systems are not fully known.
Meeting the control design specifications for unknown dynamical systems is a challenging task. Having partial knowledge or approximating the system’s mathematical model is known to lead to inaccurate results due to internal modeling errors, which are not always avoidable or rectifiable. Therefore, there is a need to explore control strategies
that do not require the knowledge of mathematical models altogether. In this context, model-free Reinforcement Learning methods have been shown to perform well. Their performance was comparable to traditional control design methods with system knowledge, and sometimes even better, due to their ability to learn optimal policies through interaction with the environment and adaptability to changes in real-time. Techniques such as the Deep Q-Network (DQN), Proximal Policy Optimization (PPO), Twin delayed deep deterministic (TD3) policy gradient, etc, have demonstrated the potential to handle complex control tasks without explicit system models. This thesis introduces a new approach for control system design of dynamical systems whose mathematical models are not known, by using state-of-the-art model-free reinforcement learning techniques (TD3). We show that the RL agent can be trained to learn and shape the error dynamics and system response of unknown systems in trajectory tracking, particularly state regulation scenarios, with a suitable reward function. The effectiveness of the method is verified with first order linear systems, second-order linear systems, and a nonlinear pendulum system. The resulting controller significantly influences both the transient and steady-state behavior of an unknown system while adhering to control and
input constraints. By modifying the reward parameters, we explore the relationship between reward shaping (training) parameters and test performance metrics. This analysis helps to tune the reward (training) parameters so that the required performance is reached during real-time testing. We present methods to find reward parameter solutions that satisfy the performance specifications using Particle Swarm Optimization. We achieve exponential convergence of tracking error dynamics in linear systems without requiring a system model.y of additional safety features. These robots present a cost-effective and innovative solution for various applications.
November 2025

