Ananya Mukherjee supervised by Dr. Manish Shrivastava received her doctorate in Computer Science and Engineering (CSE). Here’s a summary of her research work on Automatic Evaluation of Machine Translation Outputs:
Human knowledge is predominantly encoded in natural language. Automatic understanding of natural language has been a most awaited goal of AI/ML. Among several challenges of natural language processing and understanding, Machine Translation (MT) stands to be a major task. MT is the process of converting the source text in one natural language to another natural language. With the advent of neural models, the field of neural MT (NMT) has made significant progress, yet the challenge of effectively evaluating the translation quality remains critical. This thesis focuses on the automatic evaluation of machine translation outputs, addressing both the limitations of existing metrics and proposing innovative approaches to enhance evaluation accuracy. Existing widely used metrics, such as BLEU and METEOR, are scrutinized for their reliance on n-gram overlap and limited linguistic insight, often failing to capture nuances of language such as fluency, adequacy, and semantic equivalence. To mitigate these shortcomings, this research explores advanced evaluation methods leveraging pre-trained multilingual embeddings and deep learning, which model linguistic phenomena more comprehensively. This thesis introduces several novel automatic evaluation approaches (unsupervised reference-based, unsupervised reference-free and supervised approaches) that integrate semantic similarity and contextual relevance by employing pre-trained language models. Through extensive experimentation, our metrics are benchmarked against the traditional ones, demonstrating superior performance in correlating with human judgments. Additionally, this work investigates the existing evaluation datasets and evaluation approaches, discusses their limitations, and also provides decision trees to help the researchers choose particular evaluation criteria or a metric based on available computation and linguistic resources. Ultimately, this thesis contributes to the field by offering robust methodologies for MT output evaluation, emphasizing the need for metrics that align more closely with human perceptions of translation quality. The proposed advancements pave the way for more reliable and insightful evaluation systems, fostering continued innovation and improvement in machine translation technologies.
October 2025

