November 2022
Amarthya Sasi Kiran Dharmala received his Master of Science – Dual Degree in Electronics and Communication Engineering (ECE). His research work was supervised by Prof. Madhava Krishna. Here’s a summary of his research work on Enhancing Object Goal Navigation by exploiting Semantic Priors and improving Long-term Goals:
In times where autonomous navigation has the center stage, it is important for the embodied agents to learn to navigate through the environment with the degree of flexibility, that resembles the human-like sensibility and decision-making capacity. To do so, it is important to understand the spatial and semantic correlation between the regions and the objects and encapsulate this knowledge into the agent. In this thesis, we contribute to the same by proposing frameworks which utilize the relational co-occurrence of different regions and the objects within them in the problem of “Object Goal Navigation”. The Object Goal Navigation (ObjectNav) task requires a robot to move to an instance of an out of view target object class, when initialised at a random position, in an unseen environment. The first framework proposed, uses a history of robot trajectories to learn a Spatial Relational Graph (SRG) and Graph Convolutional Network (GCN)-based embeddings for the likelihood of proximity of different semantically-labeled regions and the occurrence of different object classes in these regions. When the robot has to locate a specific target object instance during evaluation, a Bayesian inference approach and the SRG are used to estimate the visible regions, and the learned GCN embeddings are used to rank different visible regions and select the region to explore next. The approach is tested using the Matterport3D (MP3D) benchmark dataset of indoor scenes in AI Habitat, a visually realistic simulation environment. We report a improvement in performance over relevant baselines. The second framework described, first builds a semantic map of the environment at gradually over time, and then repeatedly selects a long-term goal based on the semantic map to locate the target object instance. The decision about “where to go” (long-term goal) is formulated as a vision-based deep reinforcement learning problem. Specifically, an Encoder Network is trained to process a semantic map, extract high-level features, and select a long-term goal. In addition, we incorporate data augmentation and Q-function regularization to make the long-term goal selection process more effective. We report experimental results using the photo-realistic Gibson benchmark dataset in the AI Habitat 3D simulation environment to demonstrate that our framework substantially improves performance on standard measures in comparison with a state of the art baseline.