[month] [year]

Aishwarya S – Mitigating Negative Side Effects

Aishwarya Srivastava received her Master of Science – Dual Degree in  Computer Science and Engineering (CSE). Her research work was supervised by Dr. Praveen Paruchuri. Here’s a summary of her research work on Mitigating Negative Side Effects:

Autonomous systems perform various tasks across different industries ranging from finance to healthcare to space applications. However, these systems are often deployed in the open world, where it is hard to obtain complete specifications of the objectives and constraints. Operating based on an incomplete model can produce undesired effects, i.e., Negative Side Effects (NSEs). Negative side effects affect the system’s safety and reliability and can be of two types: Markovian and non-Markovian.

In this thesis, we try to mitigate negative side effects in environments modelled as Markov decision processes (MDPs). Unlike previous works in this area that associate negative side effects with state action pairs, our framework associates them with entire trajectories, which is more general and captures non-Markovian dependence on states and actions. Non-Markovian negative side effects are produced when the agent executes a certain sequence of actions in the deployed environment. Prior works mitigate Markovian negative side effects and can not be easily extended to non-Markovian negative side effects.

We build a framework, Controller-Assisted Safe Planning (CASP), for mitigating the non-Markovian negative side effects. Our primary contributions are:

  1. We design a model based on Finite State Controllers (FSCs) that can predict the severity of negative side effects for a given trajectory.
  2. We learn the model parameters using observed data containing state-action trajectories and the severity of the associated negative side effects. The model is learned such that it generalises well to unseen data. Information about negative side effects is gathered through Oracle feedback and compactly represented as a finite state controller.
  3. We develop a constrained MDP model that uses information from both the underlying MDP and the learned model for planning while avoiding negative side effects.

Our empirical evaluation demonstrates the effectiveness of our approach in learning and mitigating Markovian and non-Markovian negative side effects.