Pandramish Vinay received his MS Dual Degree in Computational Linguistics (CL). His research work was supervised by Prof. Dipti M Sharma. Here’s a summary of his research work on Towards effective ranking in machine translation systems:
Machine Translation is an area of Natural Language Processing that involves building systems that translate information from one language to another. Over the past few decades, various approaches and their variants have been proposed, such as Rule-based systems, Statistical Machine Translation(SMT), and Neural Machine Translation (NMT) have been presented to automate the Translation process. Researchers explored several approaches to analyze the changes in translation quality. While some methodologies yielded an overall improvement in translation, others declined the performance over baselines. This thesis presents our attempts and outcomes in exploring several notions to improve the
overall translation quality.
We propose selecting a better hypothesis for Neural Machine Translation Systems. After building a baseline system, we take the last few iterations outputs as the N-best list, and it is observed that it has an oracle score higher by +1.01 BLEU points. Hence, we propose a technique of Checkpoint Reranking, which selects a better hypothesis than the baseline system without any language model or data by focusing on the decoder’s ability to generate distinct tokens.
Next, we investigate the effects of filtering the synthetic data in Iterative BackTranslation. We introduce a notion of reutilizing synthetic data generated across different passes of the system and do a Round-Trip Filtering to choose the best scorer. Our experiments observe that the procedure performs slightly better over the baseline Iterative BackTranslation system.