Mudit Agarwal received his MS Dual Degree in Computer Science and Engineering (CSE). His research work was supervised by Dr. Naresh Manwani. Here’s a summary of his research work on Learning with bandit feedback:
As we all know, training a highly effective online learning model for complex tasks often hinges on
the abundance of high-quality noise-free labeled data. However, acquiring this high-quality, noise-free
labeled data is becoming a bottleneck in cost, time, and computational resources.
In this thesis, we make a sincere effort to address two significant issues that the current State-Of-The Art bandit feedback-based online learning algorithms fail to address: (a) noise present in bandit feedback
and (b) algorithms’ heavy reliance on labeled data. To deal with the noise present in the bandit feedback,
we proposed a novel algorithm named RCINE, robust to noisy bandit feedback. The methodology used
in RCINE requires knowledge of the noise rates. We proposed a subroutine called NREst to estimate
the noise rates, resulting in an end-to-end learning framework for learning a multiclass classifier under
noisy bandit feedback. The proposed algorithm enjoys a mistake bound of the order of O(√T) in the
high noise case and of the order of O(T 2/3) in the worst case. We also show our approach’s effectiveness
using extensive experiments on several benchmark datasets.
Furthermore, to reduce the reliance of an online supervised learning algorithm on labeled data, we
proposed ALBIF, an efficient stochastic sub-gradient descent algorithm for learning a multiclass classifier under an active bandit feedback setting. ALBIF enjoys a regret bound of the order O(log T) in the active learning setting as well as in the standard (non-active) bandit feedback setting. We also demonstrate the effectiveness of the proposed algorithms by conducting extensive experiments on various real-world and synthetic datasets against several benchmark algorithms