[month] [year]

Sourav Kumar – Dual Degree CSE

Sourav Kumar received his MS  Dual Degree in Computer Science and Engineering (CSE). His research work was supervised by Prof. Dipti M Sharma. Here’s a summary of  Sourav Kumar’s thesis Multilingual NMT for Indian Languages:

There are different subfields of Artificial Intelligence of which Natural Language Processing (NLP) is the one which deals in giving knowledge to computers to produce and understand human text and speech. Machine Translation (MT) is an application of NLP that focuses on the automatic translation between languages. It is crucial as it has enabled people all over the world to travel to various countries and interact with each other. With the recent advancement in deep learning, Neural Machine Translation (NMT) has shown indistinguishable translations from translations produced by humans for many language pairs, but driving factors behind the leaps in translation quality is availability of abundant parallel data resources which Indian languages lack. In addition to above, Indian languages are diverse, morphologically rich and use different scripts which make Translation tasks complex and challenging. A lot of research has been done to improve the translation quality of low resource Indian language via exploiting monolingual data or parallel data involving other language pairs. Recently, Multilingualism has drawn much attention and is gradually becoming ubiquitous in the sense that more and more researchers have successfully shown that using additional languages helps in improving the translation quality. In the case of Indian languages, despite having different scripts, most of the Indian languages still share a lot of lexical features which can be utilized to improve the quality in Multilingual NMT systems, which traditional multilingual NMT lacks. So in this thesis, we are presenting our systematic case study on Multilingual Neural Machine Translation. In addition to this, we have also put our efforts in fixing the Multilingual NMT to achieve better quality of results compared to the traditional Multilingual Translation system specially for Indian languages. In this thesis, we are also presenting our algorithm for smartly selecting the language pairs for a multilingual NMT system based on requirements that could enhance the translation performance for the language pairs of interest. In addition, we propose our modified pipeline for both Multilingual and Multilingual Multi Domain systems to overcome the problem of data scarcity. This is the first large-scale study specifically devoted to improve the Multilingual Neural Machine Translation for Indian languages by utilizing language relatedness to the best of our knowledge.