Yash Mehan -

Yash Mehan supervised by Prof. K Madhava Krishna received his Master of Science – Dual Degree in Computer Science and Engineering (CSD). Here’s a summary of his research work on Topological Scene Understanding for Robots:

A mobile robot can move around and ”see” the environment around, but understanding the environment is an important task to be addressed. Downstream tasks such as loop closure, planning, and navigation require an effective understanding of the scene on the semantic, structural and topological level. The underlying motivation is to be able to segment an environment on various bases, along with understanding connectivity relationships between such segments. The basis of such segmentation can depend on the downstream task. This thesis discusses two such works: Hierarchical Unsupervised Topological SLAM, which segments in an unsupervised manner, the robot trajectory into sections which have similar image composition, with the goal of improving loop detection. We show over a number of traversals across different Habitat environments that such a hierarchical pipeline significantly improves SoTA image-based loop detection and closure methods. As a consequence of improved loop detection, the loop closure and backend SLAM performance are improved. Such a rendering of a traversal into topological segments is beneficial for downstream tasks such as navigation that can now build a topological graph where spatially adjacent topological clusters are connected by an edge and navigate over such topological graphs. Next, we come up with QueSTMaps, a two stage pipeline which extracts a topological map, discusses about understanding the structural organisation of 3D indoor scenes in terms of rooms and connectivity, and explores object-room relationships. A room level understanding is often accomplished via floor-plan extraction. A semantic understanding is typically achieved via object-level semantic segmentation. However, such object-level methods struggle to segment out topological regions. Subsequently, it generates semantic labels for every room instance based on the objects it contains. QueSTMaps supports natural language querying, and outperforms the current state-of-the-art on room segmentation by ∼20% and room classification by ∼12%. Our detailed qualitative analysis and ablation studies provide insights into the problem of joint structural and semantic 3D scene understanding, with possible downstream applications in visual-and-language navigation.

October 2025