Deepak Kumar Singh supervised by Prof. Jawahar C V received his Master of Science by Research in Computer Science and Engineering (CSE). Here’s a summary of his research work on Computer Vision on Road Scenes: Benchmarking, and Open World Object Detection:
In autonomous driving, we have multiple computer vision tasks like object detection, semantic segmentation, and instance segmentation which plays a crucial role in perceiving the environment around the vehicle. Understanding the behaviour and performance of such tasks helps improve and address the key issues that are inherent in the system. There can be issues which are latent in the deep learning architecture and also in the datasets on which the deep learning models are trained and tested. In this thesis, we benchmark the performance of various popular deep learning models on road scene datasets for various computer vision tasks and also formulate open-world object detection on road scenes by addressing the inherent issues present in road scene datasets. In the first part of the work, we aim to understand the performance and behaviour of various deep learning models on road scene datasets; Cityscapes, IDD, and BDD. Object detection, semantic segmentation, and instance segmentation form the bases for many computer vision tasks in autonomous driving. The complexity of these tasks increases as we shift from object detection to instance segmentation. The state-of-the-art models are evaluated on standard datasets such as PASCAL-VOC and MS-COCO, which does not consider the dynamics of road scenes. Driving datasets such as Cityscapes and Berkeley Deep Drive(BDD) are captured in a structured environment with better road markings and fewer variations in the appearance of objects and background. However, the same does not hold for Indian roads. The Indian Driving Dataset(IDD) dataset is captured in unstructured driving scenarios and is highly challenging for a model due to its diversity. This work presents a comprehensive evaluation of state-of-the-art models on object detection, semantic segmentation, and instance segmentation on road scene datasets. We present our analyses and compare their quantitative and qualitative performance on structured driving datasets(Cityscapes and BDD) and the unstructured driving dataset(IDD); understanding the behavior on these datasets helps in addressing various practical issues and helps in creating real-life applications. In the second part of the work, we model open-world object detection on road scenes since object detection is a crucial component in autonomous navigation systems. Current object detectors are trained and tested on a fixed number of known classes. However, in real-world or open-world settings, the test set may consist of objects of unknown classes; this results in the unknown objects being falsely detected as known objects leading to the failure in the decision making of autonomous navigation systems. We propose Open WoRld Object DEtection on Road Scenes (ORDER) to resolve the aforementioned problem. We introduce Feature-Mix that widens the gap between known and unknown classes in latent feature space and improves the unknown object detection in the ORDER framework. We identify the inherent problems present in autonomous datasets: i) a significant proportion of the dataset comprises small objects and ii) intra-class bounding box scale variations. We address the problem of small object detection and intra-class bounding box variations by proposing a novel focal regression loss. Further, the detection of small objects is improved by curriculum learning. We present an extensive evaluation of two road scene datasets: BDD and IDD. Our experimental evaluations on BDD and IDD shows consistent improvement over the current state-of-the-art method. We believe that this work will lay the foundation for real-world object detection for road scenes.
April 2025