Dhillu Thambi -

Dhillu Thambi supervised by Dr. Praveen Paruchuri received his Master of Science in Computer Science and Engineering (CSE). Here’s a summary of his research work on Enhancing and Interpreting Decision Transformers for Continuous Control and Multi-Discrete Action Spaces:

Decision Transformers (DT) have emerged as a powerful offline reinforcement learning architecture, demonstrating substantial successes across various complex tasks. Despite their impressive performance, the interpretability and adaptability of these models remain areas requiring significant exploration, particularly when applied to continuous control and multi-discrete action environments. This thesis provides a comprehensive study of Decision Transformers by addressing these two critical aspects: interpretability in continuous control environments and adaptability through novel tokenisation strategies in multi-discrete action spaces.

In the first part, we perform a detailed interpretability analysis of Decision Transform ers trained on MuJoCo environments, focusing on continuous control tasks known for their complexity and realism. We utilize multiple interpretability techniques such as positional encoding (PE) analysis, return-to-go (RTG) examination, embedding and hidden state analyses, attention visualizations, and perturbation studies. Our results reveal crucial insights: positional encoding significantly impacts performance only in environments demanding precise temporal coordination (e.g., Hopper and Walker2d), whereas in simpler, velocity-based tasks (Half Cheetah), performance remains stable without explicit positional encoding. Additionally, RTG analysis indicates that the model’s ability to achieve specified returns is closely linked to the distribution of training data rewards, highlighting its adaptability. Embedding analysis uncovers the hierarchical abstractions and structured representations learned by the models, and attention visualization reveals the nuanced role of attention mechanisms in guiding behavior. Perturbation studies further emphasize the robustness of Decision Transformers, identifying key action dimensions (joints) critical for successful task execution.

The second part of the thesis addresses Decision Transformer performance limitations in environments characterized by multi-discrete action spaces, specifically image-based domains like ViZDoom. Here, we introduce Multi-State Action Tokenisation (M-SAT), an innovative method that enhances decision-making performance by tokenizing actions at the individual action level and incorporating auxiliary state information. M-SAT’s to kenization strategy significantly improves agent performance and interpretability within attention layers, thereby providing clearer insights into agent decisions and enhancing transparency in dynamic environments involving complex, multi-discrete action choices. Crucially, we demonstrate that M-SAT can achieve superior performance in challenging scenarios such as Deadly Corridor, My Way Home, and Death Match without the necessity of positional encoding, occasionally even benefiting from its removal. The granular action tokenization approach employed by M-SAT facilitates more efficient learning and enables detailed interpretability of individual actions, showcasing marked improvements in sample efficiency and adaptability over baseline Decision Transformers.

Through the integration of these two studies, this thesis presents a unified narrative emphasizing the necessity and benefits of interpretability and innovative tokenization strategies in Decision Transformers. Our insights are critical for the future development of more interpretable, adaptable, and reliable transformer-based decision-making systems capable of excelling across diverse and complex environments. This combined analysis deepens our understanding of the internal mechanisms driving Decision Transformer performance, setting the groundwork for further advancements in reinforcement learning architectures.

July 2025