Priyanshul Govil -

Priyanshul Govil supervised by Prof. Ponnurangam Kumaraguru received his Master of Science – Dual Degree in (LCD). Here’s a summary of his research work on A Context-Based Quantitative Assessment of the Quality of Bias Benchmarks for Language Models:

This thesis addresses a critical limitation in the current evaluation of bias in Large Language Models (LLMs): the lack of contextual considerations in existing bias benchmark datasets. While LLMs are known to inherit and propagate biases from their training data, current methods for assessing and mitigating these biases heavily rely on benchmarks that often present biased statements in isolation, devoid of the situational context that significantly influences their interpretation. This work argues that evaluating bias without considering the various contexts in which a statement might appear can lead to unreliable and potentially misleading assessments of model behavior and the effectiveness of debiasing techniques. To overcome this limitation, this thesis introduces a novel framework centered around the concept of contextual reliability. This framework posits that a biased statement’s ability to reliably detect bias in an LLM is contingent upon its robustness to variations in context. A contextually reliable statement should elicit consistent model behavior across different relevant situational contexts. Conversely, if a model’s response changes significantly upon the addition of context, it indicates that the original statement lacked sufficient contextual grounding, making it a potentially unreliable measure of bias. The core contribution of this thesis is the development of the Context-Oriented Bias Indicator and Assessment Score (COBIAS). This novel quantitative metric measures the contextual reliability of a biased statement by evaluating the variance in an LLM’s behavior when different contexts are introduced. COBIAS provides a systematic way to assess whether a statement contains adequate contextual information to serve as a reliable indicator of bias. The research methodology involves a two-pronged approach: 1. Contextual Augmentation of Bias Benchmarks: To evaluate the COBIAS metric, the thesis undertakes the task of augmenting existing bias benchmark datasets. Specifically, 2,291 stereotyped statements from two widely used benchmarks, CrowS Pairs and StereoSet, are expanded by identifying potential context-addition points within each statement. This process involves employing a fine-tuned language model (gpt-3.5-turbo) to generate candidate context-addition points, followed by human annotation to ensure the relevance and plausibility of these points. These context addition points then serve as anchors for adding diverse contextual information to the original biased statements through a text infilling objective using LLMs. This systematic augmentation allows for the creation of multiple contextually enriched versions of the same biased statement. 2. Development and Validation of the COBIAS Metric: The thesis then introduces the mathematical formulation of the COBIAS metric. This metric quantifies the contextual reliability of a statement by analyzing the consistency (or inconsistency) of an LLM’s responses across the original statement and its various contextually augmented versions. The underlying principle is that a statement with high contextual reliability will elicit similar responses from the LLM regardless of the added context, indicating that the original statement was already sufficiently grounded. Conversely, a low COBIAS score suggests that the model’s behavior is highly sensitive to contextual variations, implying that the original statement, in isolation, may not be a reliable indicator of the underlying bias. To validate the effectiveness of the COBIAS metric, the calculated scores are compared against human judgments on the contextual reliability of the augmented statements. The thesis demonstrates a strong positive correlation between COBIAS scores and human evaluations, achieving a statistically significant Spearman correlation (ρ = 0.65, p < 0.001). This strong alignment with human intuition provides compelling evidence for the validity and robustness of the proposed COBIAS metric as a measure of contextual reliability. Furthermore, the thesis applies the COBIAS metric to analyze the contextual reliability of several existing bias benchmark datasets, including CrowS-Pairs, StereoSet, and WinoGender, as well as a dataset curated from Reddit. The findings reveal significant differences in the contextual reliability across these benchmarks. Notably, CrowS-Pairs and WinoGender exhibit lower contextual reliability compared to the Reddit dataset, which demonstrates higher reliability, potentially due to the more verbose and context-rich nature of Reddit posts. Interestingly, the study also observes that the COBIAS metric’s scores remain relatively invariant to the size of the language model used for evaluation. In conclusion, this thesis makes a significant contribution to the field of bias evaluation in LLMs by introducing the concept of contextual reliability and a novel metric, COBIAS, to quantify it. By demonstrating the limitations of evaluating bias in isolation and providing a systematic approach to assess the contextual grounding of bias benchmark datasets, this work paves the way for the development of more reliable and interpretable bias assessment methodologies. The findings highlight the importance of considering context in the evaluation of LLM bias and offer a valuable tool for refining existing bias benchmarks, ultimately contributing to the development of more robust and fair language models. The publicly released dataset and code associated with this research further enable the community to adopt and build upon this contextual reliability framework.

June 2025