Vamshi Krishna Bonagiri supervised by Prof. Ponnurangam Kumaraguru received his Master of Science – Dual Degree in Computational Linguistics (CLD). Here’s a summary of his research work on Towards Trustworthy AI:Frameworks for Evaluating Consistency in Language:
Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse domains, yet they exhibit critical inconsistencies that fundamentally undermine their reliability in real-world applications. This thesis addresses two fundamental challenges in LLM reasoning: the evaluation and enhancement of consistency in moral and logical reasoning tasks. We first tackle moral consistency evaluation, where traditional accuracy-based methods fail due to the subjective nature of moral reasoning. We introduce SaGE (Semantic Graph Entropy), an information-theoretic framework that quantifies moral consistency by analyzing the semantic coherence of underlying “Rules of Thumb” (RoTs) inferred from LLM responses [1]. To support this evaluation, we construct the Moral Consistency Corpus (MCC), containing 50,000 moral reasoning instances across diverse scenarios. Our comprehensive evaluation reveals widespread moral inconsistencies across state-of-the-art LLMs, with the maximum observed SaGE score being only 0.681, indicating substantial reliability concerns. We then investigate logical consistency, focusing on the pronounced difficulties LLMs encounter when reasoning with counterfactual premises that conflict with their parametric knowledge. Through the CounterLogic benchmark, a systematically designed dataset spanning 9 formal inference schemas, we demonstrate substantial performance degradation (27% on average) when models reason against their parametric knowledge compared to knowledge-consistent scenarios [2]. To address these logical consistency challenges, we propose Self-Segregate, a metacognitive intervention inspired by human cognitive strategies for handling conflicting information. This two-phase prompting technique first assesses the factual alignment of premises before performing logical reasoning, enabling epistemic compartmentalization. Self-Segregate significantly reduces counterfactual reasoning performance gaps from 27% to 11% while improving overall logical accuracy by 7.5% across multiple models and tasks. Our findings establish consistency as a critical dimension of LLM performance that is orthogonal to accuracy, revealing that models can achieve high task performance while remaining fundamentally unreliable. This work demonstrates that task-specific accuracy and consistency represent independent evaluation dimensions, emphasizing the urgent need for comprehensive assessment methodologies in AI systems. This thesis contributes essential methodologies for developing more robust and trustworthy language models through novel evaluation frameworks, systematic benchmarks, and effective intervention strategies.
July 2025

