Abhinaba Bala supervised by Dr. Parameswari Krishnamurthy received his Master of Science in Computer Science and Engineering (CSE). Here’s a summary of his research work on Advancing Digital Inclusivity: Enhancing NLP Techniques for Fake News Detection, Abusive Content Identification, and Information Enrichment in Low-Resource Languages:
The digital age has amplified the importance of inclusive and comprehensive information dissemination, yet many low-resource languages remain underrepresented in this sphere. The lack of advanced natural language processing (NLP) tools for these languages hampers the ability to effectively combat misinformation, detect abusive content, and provide enriched informational resources. Addressing these challenges is crucial for fostering digital inclusivity and ensuring that speakers of low-resource languages have equitable access to reliable and enriched information. Our first study focuses on detecting fake news in Dravidian languages by leveraging Google’s MuRIL (Multilingual Representations for Indian Languages) model. By fine-tuning the ”muril-base-cased” variant with a carefully curated dataset of labeled comments and posts in Dravidian languages, we enhance the model’s ability to differentiate between authentic and deceptive content. The performance evaluation using standard metrics demonstrates the effectiveness of MuRIL in maintaining a safer digital ecosystem for speakers of Dravidian languages. In the second study, we address the pressing need for detecting abusive language in multimodal content, particularly in Tamil videos, and performing sentiment analysis for Tamil and Malayalam videos. Our novel framework integrates state-of-the-art models such as Multiscale Vision Transformers (MViT) for video analysis, OpenL3 for audio analysis, and bert-base-multilingual-cased for textual analysis. By combining these multimodal features, our approach significantly improves the accuracy of abusive content detection and sentiment categorization, contributing to a safer and more inclusive online environment for Dravidian language speakers. The third study tackles the challenge of information scarcity in the very low-resource language of Mizo. We investigate a methodology to generate holistic summaries of Mizo news articles, enriched with information from English-language news to enhance coverage. By providing a dataset of 500 Mizo news articles with corresponding enriched summaries, our approach significantly improves the comprehensiveness and accessibility of news content for Mizo speakers. Human evaluations confirm the effectiveness of our method in meeting the information needs of this community. Together, these studies highlight the critical importance of developing advanced NLP techniques for underrepresented languages. By addressing fake news detection, abusive language identification, and information enrichment in Dravidian and Mizo languages, our research makes significant strides toward digital inclusivity and safety, ultimately enhancing the digital experience for speakers of these languages and paving the way for future innovations in multilingual and multimodal content analysis.
June 2025