[month] [year]

Ashutosh Mishra – Semantic Segmentation

Ashutosh Mishra, supervised by Prof. C V Jawahar  received his Master of Science in Computer Science and Engineering (CSE). Here’s a summary of his research work on Towards Enhancing Semantic Segmentation in

Resource Constrained Settings:

Understanding the semantics of the scene to automate the decision process for self-driving cars completely is becoming a crucial task to solve in computer vision. Due to the recent progress in the state of autonomous driving, added with a lot of semantic segmentation datasets for road scene understanding being proposed, semantic segmentation of road scenes has recently evolved to be an important problem to tackle. But training semantic segmentation models becomes a resource-intensive task since it requires multi-GPU training and therefore becomes the bottleneck to reproducing results for better understanding quickly. This thesis introduces challenges and provides solutions to reduce the training time of segmentation models by introducing two small-scale datasets. Additionally, the thesis explores the potential of employing neural architecture search and automatic pruning techniques to create efficient segmentation modules in resource-constrained settings. Chapter2 of the thesis introduces the problem of semantic segmentation and discusses some deep learning approaches to solve supervised semantic segmentation. We briefly discuss the different metrics used and also touch upon the statistics of various datasets that are available in the literature to train semantic segmentation models. Chapter 3 of the thesis explains the need of having a dataset based on the Indian road scenario. Most of the datasets in the literature are captured in Western settings having well-defined traffic participants, delineated boundaries, etc, which seldom mould in the Indian setting. We describe the annotation pipeline, along with the quality check framework used to annotate the dataset. Now, though the IDD dataset [121] caters to the Indian setting, this dataset is still quite resource intensive in terms of GPU computation. Hence, there is a need to have a small resolution, less label-sized dataset for rapid prototyping. We introduce our proposed datasets and provide a detailed set of experiments, and statistical comparisons with the existing datasets to substantiate our claim regarding the usefulness of the proposed solution. We also show through experiments that the models trained using our datasets can be deployed on low-resource hardware such as Raspberry Pi. At the end of this chapter, we also look into the significance of the proposed datasets in facilitating challenges at two prominent conferences: the International Conference on Computer Vision (ICCV) and the National Conference on Pattern Recognition, Image Processing, and Graphics (NCVPRIPG) in 2019. These challenges aimed to address semantic segmentation in resource-constrained settings, inviting innovative architectures capable of achieving decent accuracy on these proposed datasets. We also discuss the potential application of these datasets in teaching semantic segmentation through a course of notebooks introducing traditional as well as deep learning-based methods to perform segmentation. These notebooks are plug and play, where the first three notebooks can run on laptop CPU, while the fourth notebook requires GPU access.

Chapter 4 of the thesis is about the application of neural architecture search and hyper-parameter tuning in the context of semantic segmentation. Neural Architecture Search (NAS) is defined as automating the process of finding novel architecture without a human in the loop. Initially, we discuss the different approaches to performing neural architecture search followed by the description of hyper-parameter tuning. We discuss some cases of architecture search specific to segmentation and classification. With the help of experiments, we then show that the accuracy of different models obtained through training on our proposed datasets correlates well with the accuracy on large-scale datasets, especially in cross-domain settings. This correlation allows for achieving faster results and also paves the way to perform an architectural search for semantic segmentation algorithms. In the Chapter 5 of this thesis, we delve into the topic of automatic pruning and its application in optimising segmentation models. Automatic pruning is a technique that aims to improve the efficiency of deep learning models by removing unnecessary connections or parameters without significantly compromising their performance using predefined optimization algorithms. This process is driven by optimization algorithms that analyse the model’s structure and weights to determine which connections can be pruned to achieve a more efficient network without any human intervention. Finally, we conclude the thesis with a summary of the work and also propose future directions.

January 2024