[month] [year]

Sai Vivek Nanduri – Dialogue modelling

Sai Vivek Nanduri received his MS Dual Degree in Computational Linguistics (CL). His research work was supervised by Dr. Radhika Mamidi. Here’s a summary of his  research work on Towards dialogue modelling in code-mixed low resource language settings:

Natural Language Processing (NLP) has evolved drastically in many areas due to the popularity and
success of a variety of different Machine Learning and specifically Deep Learning techniques and their
applications commercially as well as academcially in the last decade or so. Typically most of the work
done in these fields globally has centred around solving tasks with the use of resource rich languages
such as English as a base upon which models have been built. The large chasm this technological and
academic revolution brings about, to a world that is divided in a multitude of regional cultures and
languages that haven’t evolved at the same pace, creates a very real obstacle.
We aim to play a part in bridging this gap that has been brought about with the creation of a rich  resource for one such low resource language in Telugu, a Dravidian language. We aim to focus on
curating and annotating a dialogue corpus as it allows us to approach a variety of tasks and holds various
different real world applications including the rather pertinent current phenomenon of improving Human Computer Interactions especially in the context of low resource languages such as Telugu. It is important to also take note of the evolution of language and different means of communication as
a function of time and one product of this observation is the increased usage of code-mixing in colloquial
discourse. In order to develop robust models and accurately recreate Human-Human interactions with
the use of computers, we need to be able work with and model on code-mixed data. Therefore another
focus of our study is in the creation of a dialogue corpus that is heavily code-mixed with Telugu and
English and the subsequent modelling of the dialogue system using deep learning techniques.
In order to determine the resourcefulness of the Telugu-English code-mixed dialogue corpus that we
have carefully curated, we put it to the test by performing two different widely studied NLP tasks in :
Text-based Speaker Identification and Automatic Humour Recognition. We provide a comprehensive
analysis and survey of modern deep learning techniques for the task of text-based speaker identification
and present a detailed explanation of all the choices made while undertaking this study. The deep
learning techniques implemented in this body of work include the likes of Convolutional Neural Networks, LSTMs, and Transformer based models.