[month] [year]

Adith John Rajeev

Adith John Rajeev  supervised by Dr. Radhika Mamidi  received his  Master of Science –  Dual Degree in Computational Linguistics (CLD). Here’s a summary of his research work on Modeling Entity Dynamics in Discourse: Advancing Coreference Resolution and Entity Framing:

 This thesis investigates how entities are shaped by the discourse in which they appear. Rather than being isolated mentions, entities take on meaning through their interactions with the surrounding con text, both in terms of reference and interpretation: the identity of an entity can shift depending on how it is invoked, and the narrative framing around it can significantly influence how it is perceived by a reader. At its core, the research addresses these two complementary dimensions of entity analysis: the challenge of tracking entity references across text (coreference resolution) and the interpretive challenge of understanding how entities are characterized through their framing within the text (entity role classification). While these tasks represent distinct technical problems, they are unified by their contribution to richer discourse comprehension and their shared goal of moving NLP systems beyond surface-level language processing toward deeper semantic understanding. The first part of this research focuses on coreference resolution for South Asian languages, addressing a significant gap in current NLP research where these morphologically rich and syntactically diverse languages remain underrepresented. To overcome the lack of suitable evaluation resources, we develop mGAP, a novel dataset containing coreference annotations across 27 South Asian languages, created through a dedicated translation and alignment pipeline from English reference data. This dataset provided the foundation for investigating cross-lingual transfer learning strategies designed to address the low-resource nature of many of these languages. Through systematic experimentation, we examine which languages serve as effective source languages for training when target language data is scarce, offering insights into making use of linguistic similarities for enhanced cross-lingual adaptation. The second part shifts focus to entity framing in news media, where the research question evolves from identifying what an entity refers to, to understanding what narrative role that entity plays within the text. The framing of the text is pivotal in how a reader would perceive the entity being described. For this task, we developed a two-stage pipeline for classifying entities into roles such as protagonist, antagonist, and innocent, as well as more fine-grained narrative categories. The pipeline first employs entity-centric summarization to extract relevant contextual information surrounding target entities, followed by classification using a DeBERTa v3-based model enhanced with contrastive learning techniques. This approach addresses the significant challenge of determining how entities are portrayed based on surrounding textual cues, contributing further to computational methods for analyzing media bias and narrative construction. 

Together, these two research directions advance our understanding of how computational systems can process entities in discourse, from resolving referential ambiguities to interpreting narrative roles. By bridging the tasks of identification and interpretation, this thesis contributes to the development of NLPsystems that are able to perform more human-like text comprehension, with potential applications spanning multilingual processing, media analysis, and automated content understanding.

July 2025