December 2022
Durga Nagendra Raghava Kumar M received his Master of Science in Computer Science and Engineering (CSE). His research work was supervised by Prof. C V Jawahar. Here’s a summary of his research work on Computer Vision based Large Scale Urban Mobility Audit and Parametric Road Scene Parsing:
The footprint of partial or fully autonomous vehicles is increasing gradually with time. The existence and availability of the necessary modern infrastructure are crucial for the widespread use of autonomous navigation. One of the most critical efforts in this direction is to build and maintain HD maps efficiently and accurately. The information in HD maps is organised in various levels 1) Geometric layer, 2) Semantic layer and 3) Map prior’s layer. The conventional approaches to capturing and extracting information at different HD map levels rely heavily on huge sensor networks and manual annotation. This is not scalable to create HD maps for massive road networks. We propose two novel solutions to address the mentioned problems in this work. The first solution deals with the generation of the geometric layer with parametric information of the road scene and other one to update information on road infrastructure and traffic violations in the semantic layer. Firstly, the creation of the geometric layer of the HD map requires understanding the road layout in terms of structure, number of lanes, lane width, curvature, etc. Prediction of these attributes as part of a generalizable parametric model with which road layout can be rendered would suite the creation of a geometric layer. Many previous works that tried to solve this problem rely only on ground imagery and are limited by the narrow field of view of the camera, occlusions, and perspective shortening. This work demonstrates the effectiveness of using aerial imagery as an additional modality to overcome the above challenges. We propose a novel architecture, Unified, that combines aerial and ground imagery features to infer scene attributes. We quantitatively evaluate the KITTI dataset and show that our Unified model outperforms prior works. Since this dataset is limited to road scenes close to the vehicle, we supplement the publicly available Argoverse dataset with scene attribute annotations and evaluate far-away scenes. We quantitatively and qualitatively show the importance of aerial imagery in understanding road scenes, especially in regions farther away from the ego-vehicle. Finally, we also propose a simple mobile imaging setup to address and audit several common problems in urban mobility and road safety, which can enrich the information in a semantic layer of HD maps. Recent computer vision techniques are used to identify street irregularities (including missing lane markings and potholes), absence of street lights, and defective traffic signs using videos obtained from a moving camera-mounted vehicle. Beyond the inspection of static road infrastructure, we also demonstrate the applicability of mobile imaging solutions to spot traffic violations. We validate our proposal on the long stretches of unconstrained road scenes covering over 2000 Km and discuss practical challenges in applying computer vision techniques at such a scale. Exhaustive evaluation is carried out on 257 long-stretches with unconstrained settings and 20 conditions-based hierarchical frame-level labels for different timings, weather conditions, road type, traffic density, and state of road damage. For the first time, we demonstrate that large-scale analytics of irregular road infrastructure is feasible with existing computer vision techniques.