Pranav Gupta supervised by Dr. Santosh Ravi Kiran received his Master of Science in Computer Science and Engineering (CSE). Here’s a summary of his research work on Multi-Object Multi-Part Scene Parsing:
This thesis presents a comprehensive exploration of Multi-Object Multi-Part Scene Parsing in 2D images, showcasing significant advancements through two novel approaches, FLOAT and OLAF, each tailored to enhance scene parsing performance and scalability. The first paper introduces FLOAT, a factorized label space framework designed to independently predict object categories and part attributes, thereby simplifying the segmentation task and enhancing scalability. Notably, FLOAT incorporates a unique ’zoom’ refinement technique at inference time, significantly elevating segmentation accuracy, particularly for smaller objects and parts. Empirical results on the Pascal-Part datasets underscore FLOAT’s superior performance, achieving notable improvements in mean Intersection Over Union (mIOU) and segmentation quality IOU (sqIOU), especially on the most comprehensive Pascal-Part-201 dataset, reflecting its effectiveness in handling diverse and complex scenes The second paper delves into OLAF, a plug-and-play methodology that augments traditional RGB inputs with object-based structural cues to better capture the complexities of scene structures. This approach leverages a weight adaptation technique, allowing pre-trained RGB models to seamlessly integrate augmented data, thus stabilizing the optimization process. Additionally, the introduction of the LDF encoder module aids in providing low-level dense feature guidance, enhancing the segmentation of smaller parts. OLAF demonstrates its versatility across various architectures and datasets, achieving significant mIOU gains on multiple Pascal-Part benchmarks, highlighting its broad applicability and robust performance enhancements in challenging segmentation scenarios. Together, these studies contribute to the evolving field of computer vision by offering scalable, efficient and effective solutions for multi-object multipart scene parsing, reflecting a significant stride in parsing intricate scenes with high granularity and diversity.
March 2025