[month] [year]

Kinal Mehta – Multi-Agent Reinforcement 

Kinal Mehta, supervised by Dr. Pawan Kumar received his Master of Science in Computer Science and Engineering (CSE). Here’s a summary of his research work on Understanding Learning in Multi-Agent Reinforcement Learning:

Reinforcement Learning (RL) has witnessed remarkable advancements in both algorithms and engineering, enabling a wide range of exciting applications. Multi-agent Reinforcement Learning (MARL) in particular has made strides of progress enabling multiple learning entities to interact in an effective manner. Few of the challenges that still remain are learning under sparse rewards, achieving social generalisation by adapting to changing behaviours of other agents and reproducibility. This thesis tackles two important challenges in MARL: (1) learning in multi-agent sparse reward environments and (2) reproducibility with social generalisation. In the first part, we address the issue of learning a reliable critic in multi-agent sparse reward scenarios. The exponential growth of the joint action space with the number of agents, coupled with reward sparsity and environmental noise, poses significant hurdles for accurate learning. To mitigate these challenges, we propose regularising the critic with spectral normalisation (SN). Our experiments demonstrate that the regularised critic exhibits improved robustness, enabling faster learning even in complex multi-agent scenarios. These findings highlight the importance of critical regularisation for stable learning. In the second part, we introduce marl-jax, a powerful software package for MARL that focuses on training and evaluating social generalisation of agents. Built on DeepMind’s JAX ecosystem and leveraging their RL framework, marl-jax supports cooperative and competitive environments with multiple agents acting simultaneously. It provides an intuitive command-line interface for training agent populations and evaluating their generalisation capabilities. Researchers interested in exploring social generalisation in MARL can leverage marl-jax as a reliable baseline. In conclusion, this thesis addresses two crucial challenges in RL: learning in multi-agent sparse reward scenarios and reproducibility for social generalisation in MARL. By introducing spectral normalisation as a regularisation technique and providing the marl-jax software package, this research contributes to enhancing stability, robustness, social generalisation and reproducibility in RL.

 

October 2023