Anirudh Dahiya received his MS Dual Degree in Computational Linguistics (CL). His research work was supervised by Prof. Dipti M Sharma. Here’s a summary of Anirudh Dahiya‘s Transfer Learning for Low Resource Language Processing in Indian Context:
As ever increasing populations from around the world gain access to digital technologies, enabling them access to information, and thus knowledge, rights, and justice, it is imperative to ensure inclusiveness and accommodate the multitudes of languages used by them. As the computing machines and methods have got more sophisticated and gained capabilities by leaps and bounds, the use of these methods to automatically process human language has gained traction. However, this usually comes at a cost of expensive resource creation and computation to build and run these systems, and has limited the inclusiveness of the recently proposed state of the art systems for language processing to resource scarce.
Transfer Learning methods aim to dispel this limitation by leveraging pre-trained systems on existing large resources to adapt to serve the low resource domains, tasks, and languages. Research into these methods has gained increased attention as they have proven their efficacy at mitigating the resource constraint across tasks, domains and languages. This study focuses on exploring these transfer learning approaches in the context of presently used Indian languages, particularly Hindi and its code-mixed English-Hindi form, which is widely popular on social media. We explore both across task and cross-lingual transfer approaches towards a variety of downstream tasks, and successfully show their efficacy in the context of resource constrained training data and compute resources. We explore a syntactico-semantic curriculum learning based formulation to explore English-Hindi codemixed sentiment analysis, and show significant gains in performance.
We also explore a variety of lexical and sentence level cross lingual transfer approaches for discourse analysis, and demonstrate their efficacy under different training regimen for discourse relation task. We also compare these approaches and gain insights into the nature of the cross-lingual transfer.