[month] [year]

Sai Madhusudan Gunda

Sai Madhusudan Gunda supervised by Dr. Ravi Kiran Sarvadevabhatla  received his Master of Science – Dual Degree  in Computer Science and Engineering (CSD). Here’s a summary of his research work on Trace the Evidence: Mask-Based Grounding for Document Question Answering:

This thesis advances Document Question Answering (DocQA) by improving the interpretability of large vision-language models. It introduces M3Grounder, a novel mask-based framework that identifies the exact visual evidence used to generate answers, enabling pixel-level grounding instead of conventional bounding-box attribution. The approach supports evidence spread across multiple regions and different semantic levels, such as phrases, lines, and blocks, making it effective for complex documents containing curved text, tables, charts, and dense layouts. The thesis also presents GroundingDocQA, a large-scale dataset generation pipeline, and GroundingDocQA-Bench, a benchmark for evaluating both answer accuracy and grounding fidelity. Experimental results demonstrate that the proposed framework delivers more precise, faithful, and interpretable document understanding, contributing to the development of trustworthy and deployable AI systems for document intelligence.

May 2026