November 2022
Abhay Rawat received his Master of Science in Computer Science and Engineering (CSE). His research work was supervised by Prof. Kamal Karlapalem. Here’s a summary of his research work on Making Deep Models Generalizable using Domain Adaptation:
Supervised methods in deep learning approaches hinge on the assumption that the training and testing data are sampled from the same distribution. However, this is not the case in most realistic scenarios and thus leads to poor performance when these models are deployed in domains with different data distribution that the training set. Unsupervised Domain Adaptation (UDA) tackles the problem of adapting the data distributions between a labeled source domain and an unlabeled target domain. In contrast, Semi Supervised Domain Adaptation (SSDA) assumes a partially labeled target domain, a more realistic scenario in many computer vision tasks. Domain randomization is a popular approach where in models are trained on synthetically generated data. With full control over the synthetic data generation process, domain randomization introduces randomness in various properties of the object as well as the scene. Similar to data augmentation techniques in deep learning, the hope here is to introduce invariance for different noncausal features of the data, and even nudge the model towards learning the causal correlations for the task at hand. In this thesis, we explore and study various approaches to domain adaptation. First, we present the image level domain adaptation methods, which use image level manipulation or transformations to achieve domain invariance. We first analyze the domain randomization approach used in an object detection setting. For this we use a synthetically generated data and train a FasterRCNN model aimed at the object detection task. Domain randomization helps boost the performance of object detection models and a model trained entirely on synthetic data outperforms the one trained on real data. With fine tuning, the performance of the model trained on synthetically generated data increases drastically. Next, we extend the work on domain adaptation in the frequency domain, wherein the image level adaptation occurs in the frequency domain. To this end, we propose new combination strategies to combine the frequency components. We propose masking techniques which consider the frequency of the components in the combination process. Fourier domain adaptation techniques have seen some success in the image segmentation task from synthetic like GTA5 [1] and SYNTHIA [2] to realistic domains like Cityscapes [3], however, these domains contain syntactically similar images. For synthetic dataset that we use, we find that these frequency domain bases stylization methods do not improve performance over the domain randomization method. Finally, we present two novel methods for domain adaptation using feature level alignment. One of the primary challenges in SSDA is the skewed ratio between the number of labeled source and target samples, causing the model to be biased towards the source domain. Recent works in SSDA show that aligning only the labeled target samples with the source samples potentially leads to incomplete domain alignment of the target domain to the source domain. In our first approach, we train the source an target feature spaces separately. To ensure that the feature space of target domain is generalized well, we employ semi-supervised methods to leverage the labeled and unlabeled samples. The Domain Adapters, which are parametric functions are then trained to learn the feature level transformation from the target domain to the source domain. During inference, we use the target domain’s feature extractor and then pass the features to the respective Domain Adapter for that target-source pair. The transformed feature representation in the source domain is then fed to the source classifier. We show that keeping the feature extractors separate is advantageous if the domain gap between the source and the target domain in significant. Finally, we present SPI, which leverages contrastive losses to learn a semantically meaningful and a domain agnostic feature space using the supervised samples from both domains. To mitigate challenges caused by the skewed label ratio, we pseudo-label the unlabeled target samples by comparing their feature representation to those of the labeled samples from both the source and target domains. Furthermore, to increase the support of the target domain, these potentially noisy pseudo-labels are gradually injected into the labeled target dataset over the course of training. Specifically, we use a temperature scaled cosine similarity measure to assign a soft pseudo-label to the unlabeled target samples. Additionally, we compute an exponential moving average of the soft pseudo-labels for each unlabeled sample. These pseudo-labels are progressively injected (or removed) into the (from) the labeled target dataset based on a confidence threshold to supplement the alignment of the source and target distributions. Finally, we use a supervised contrastive loss on the labeled and pseudo-labeled datasets to align the source and target distributions. Using our proposed approach, we showcase state-of-the-art performance on semi-supervised domain adaptation benchmark datasets.