Rahothvarman P supervised by Dr. Radhika Mamidi received his Master of Science – Dual Degree in Computational Linguistics (CLD). Here’s a summary of his research work on Towards a Task-Agnostic Architecture for Natural Language Processing using Attention Networks:
Natural Language Processing (NLP) has witnessed transformative progress with the advent of architectures based on transformers and their attention mechanisms. This thesis investigates the versatility and adaptability of attention networks across three increasingly complex tasks: fine-grained sentiment analysis, multilingual pronominal coreference resolution, and multimodal user-defined keyword spot ting. By progressively applying attention architectures from single-modality, fine-grained classification to multilingual and multimodal problems, we establish a unified framework for sequence understanding and cross-domain transfer. We begin by exploring fine-grained sentiment analysis as a test bed to evaluate the ability of attention networks to model subtle emotional distinctions across multiple sentiment classes. Unlike coarse sentiment classification tasks, fine-grained sentiment analysis requires capturing nuanced affective expressions and their contextual dependencies. Using the English subset of “A Multilingual Dataset for Sentiment Analysis and Emotion Detection (XED)”, we train an attention-based architecture. Our model demonstrates a strong capacity to distinguish between emotions such as joy, anger, fear, and surprise, showing the efficacy of attention mechanisms in handling fine-grained multi-class classification. This phase serves as a foundational study of how attention models perform when semantic granularity is essential. Building on this, we investigate pronominal coreference resolution in a multilingual setting, aiming to determine whether attention networks can generalize across typologically diverse languages. Coreference resolution involves identifying which entities pronouns and noun phrases refer to within a discourse. We design an attention-based model capable of capturing long-range dependencies between mentions and their potential antecedents. Evaluated in a zero-shot transfer setup using our own multilingual coreference dataset, Multilingual GAP (mGAP), the model shows robust performance in resolving coreferences across multiple low-resource and morphologically rich languages, without requiring language-specific tuning. This demonstrates the potential of attention architectures in bridging linguistic variation and addressing the challenges of multilingual understanding. Finally, we extend and adapt the attention framework to the multi-modal task of user-defined key word spotting, where the goal is to detect arbitrary text queries in continuous speech. Traditional key word spotting systems rely on fixed vocabularies and specialized architectures, limiting their applicability. Our model uses a similar attention mechanism to align spoken utterances with textual queries in a flexible, vocabulary-agnostic manner. We evaluate our approach on the Google Speech Commands dataset and the Qualcomm Keyword Spotting dataset, introducing architectural optimizations to reduce inference latency while maintaining high retrieval accuracy, making the approach practical for real-time applications in resource-constrained environments. Through these three tasks, we highlight how attention networks can be systematically adapted for varied linguistic and multimodal challenges. Our findings emphasize the trade-offs between accuracy, generalizability, and computational efficiency, while underscoring the potential of cross-attention mechanisms to serve as a common foundation across diverse NLP tasks. This work contributes to the broader understanding of attention-based modeling and provides practical insights for designing adaptable neural architectures for real-world language technologies.
June 2025