Kritika Prakash received her MS Dual Degree in Computer Science and Engineering (CSE). Her research work was supervised by Dr. Praveen Paruchuri. Here’s a summary of her research work on Privacy attacks in reinforcement learning with sensitive rewards:
Reinforcement Learning (RL) enables agents to learn how to perform various tasks from scratch. In domains like autonomous driving, recommendation systems, domestic service robots, and more, the need for privacy of individuals interacting with such systems has become evident. We aim to investigate the quality of solutions that claim to protect the reward functions that contain sensitive personal information, from being exploited. Optimal RL policies learned could cause a privacy breach if the policies memorize any part of the private reward. Differential Privacy (DP) – a popular technique used for computational privacy, empowers us to provide mathematically quantifiable privacy guarantees for queries made to “privatised” algorithms. Our work introduces the reward reconstruction attack that tries to reverse engineer the reward from a private policy using Inverse RL algorithms. This attack is the key part of the novel Privacy-Aware Inverse RL Analysis (PRIL) frame work. We assume that the entire training process and the optimal policy is released publicly.
We study the set of existing differentially-private RL policies derived from various RL algorithms including Value Iteration, Deep Q Networks (DQNs), and Vanilla Proximal Policy Optimization (PPO). We propose the new PRIL analysis framework, that performs reward reconstruction as an adversarial attack on private policies that the agents may deploy. For this, we introduce the novel reward reconstruction attack, wherein we seek to reconstruct the original reward from a privacy-preserving policy using an Inverse RL algorithm. An adversary must do poorly at reconstructing the original reward function if the agent uses a tightly private policy. Using the PRIL framework, we empirically test the effectiveness of the privacy guarantee offered by the private algorithms on multiple instances of Open AI Gym environments of varying complexities, specifically the discrete state-space “Frozen Lake” domain and the continuous state-space “Mountain Car” domain, employing a different Linear Programming based Inverse RL method for each domain.
From our experiments, we observe that there is no indication of the policy improving at reconstructing the reward with a relaxation in the budget – thus rendering all strategies ineffective at providing true reward privacy. Based on the analysis performed, we infer that there exists a gap between the current standard of privacy offered and the standard of privacy needed to protect reward functions in RL. We quantify the extent to which each private policy protects the reward function by measuring distances between the original and reconstructed rewards. We calculate the distance between the original reward and the reconstructed reward via various distance metrics. The larger the reward distance, the better the private agent’s ability to protect the sensitive reward function using which it was trained. We present the trends in test-time performance utility of private agents upon varying the level of privacy guaranteed (by altering the privacy budget during training).
We also formally discover the differentially-private Bellman Update algorithm along with its proof of sensitivity. This is one of the first few works in this sub-field, and we hope that it is pursued with more rigour in the future.