[month] [year]

CALCS 2021 

Research work by Devansh Gautam; Prashant Kodali; Kshitij Gupta; Manish Shrivastava; Anmol Goel, GGSIPU  Delhi; Ponnurangam Kumaraguru, IIIT Delhi on machine translation ranked 2nd in CALCS @ NAACL 2021. 

Research work on code mix translation titled “CoMeT: Towards Code-Mixed Translation Using Parallel Monolingual Sentences” as explained by the authors: 

Code-mixed languages are very popular in multilingual societies around the world, yet the resources lag behind to enable robust systems on such languages. A major contributing factor is the informal nature of these languages which makes it difficult to collect codemixed data. In this paper, we propose our machine translation system  to translate English to Hinglish in a supervised setting. Translating in the given direction can help expand the set of resources for several tasks by translating valuable datasets from high resource languages. We propose to use mBART, a pre-trained multilingual sequence-to-sequence model, and fully utilize the pre-training of the model by transliterating the roman Hindi words in the code-mixed sentences to Devanagari script. We evaluate how expanding the input by concatenating Hindi translations of the English sentences improves mBART’s performance. Our system gives a BLEU score of 12.22 on the test set. Further, we perform a detailed error analysis of our proposed systems and explore the limitations of the provided dataset and metrics.

View Paper at: https://precog.iiitd.edu.in/pubs/2021_Code_Mixed_Translation.pdf

LTRC ranked 2nd in machine translation –  https://ritual.uh.edu/lince/leaderboard