[month] [year]

Vallurupalli Nikitha – Semantic Segmentation

November 2022

Vallurupalli Nikitha received her Master of Science – Dual Degree in  Electronics and Communication Engineering (ECE).  Her research work was supervised by Prof. C V Jawahar. Here’s a summary of her  research work on On Designing Efficient Deep Neural Networks for Semantic Segmentation:

Scene understanding is a fundamental computer vision problem. Semantic segmentation is an essential primitive in real-time systems such as a autonomous navigation, which require processing at high frames per second. Hence for models to be practically applicable, it is essential that they have to be compact, fast as well as achieve high prediction accuracies. Previous research into semantic segmen-tation has focused on creating high-performance deep learning architectures. Most of the time, these best-performing models are complex, deep, have large processing times, and demand a significantly higher amount of processing capacity. Another relevant area of research is model compression, by which we can obtain light-weight models. Explicit model compression applies post-hoc techniques to heavy networks after or during training, whereas implicit model compression uses lightweight structures and layers in the network architecture. There have also been works that produced light-weight semantic segmentation models at the expense of performance.

In this thesis, we design models and processes that bring a desirable balance between performance and latency. Specifically, methods and architectures that give a high performance while being real-time and working on resource-constrained settings. We comprehensively analyze various state-of-the-art approaches relevant to real-time semantic segmentation. Based on this analysis, we identify the redundancies in the existing approaches and propose compact architecture family called ESSNet with accuracy comparable to the state-of-the-art while utilizing only a fraction of the space and computa-tional power of those networks. We propose three convolutional module designs with sparse coding theory as a premise. Further, we propose two real-time encoder backbones: ESSNett and ESSNet* employing our proposed modules. We empirically evaluate the efficacy of our proposed layers and compare them with existing approaches. Secondly, we explore the need for optimization during the training phase in the proposed models and present a novel training method called Gradual Grouping that results in models with improved implementation efficiency vs accuracy trade-offs. Additionally, we conduct extensive experiments by varying macro architecture hyper-parameters such as network depth, kernel sizes, encoder-decoder symmetry, and additional context extraction modules. Using the results, we propose a novel segmentation network WSPD-Net that incorporates a pyramid structure in the convolutional layer with differential dilation rates that is much more effective without increasing computational complexity.

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •