Gundapu Sunil received his MS in Computer Science and Engineering (CSE). His research work was supervised by Dr. Radhika Mamidi. Here’s a summary of his research work on NLP for Social Media: Towards Automatic Identification of Some Significant Issues in User-Generated Content:
The use of the Internet and online social networks has increased tremendously around the world over the past decade, providing an opportunity to exchange thoughts, ideas, and opinions with other people. This exponential growth of social media networks has allowed the production, distribution, and consumption of data at a phenomenal rate. However, it has led to various forms of online problems such as offensive content, hate speech, fake news, sarcasm, trolling, etc. So, it is vital to identify and mitigate such social media issues to prevent psychological impact on affected communities and to avert hate crimes. And simultaneously, we need to increase encouraging, positive, and supportive content on social media.
This thesis predominantly focuses on developing automated models to detect and classify a few significant issues in two distinct modalities: Text and Multimodality (Text + Vision). In the text modality, we worked on monolingual English text and code-mixed data. We worked on Internet memes in multimodality (Text + Vision). Most of the memes are downright humorous, while others, behind an amusing presentation, seek to convey subtle subtleties, including fake, hatred, sarcasm, propaganda regarding an idea, or a motivational purpose. Therefore, it is necessary to identify and remove such content on social media and make it a safer place for everyone.
Coming to language use in social media, we find both monolingual data and code-mixed data. Initially, in monolingual text, we investigate the work of detecting fake news in COVID-19 related social media posts through the adaption of transformers-based contextual word representations. We propose an ensemble model by fusing the BERT, ALBERT, and XLNet representations. This contextual word representations-based ensemble model outperformed all other individual transformer models for this problem. Group of people increasing conflict and hatred among consumers by exploiting freedom of speech and expression on social media platforms. Therefore, it is essential to take a positive reinforcement approach to research on positive, helpful, and supportive social media content. Wherefore, we build a transformer-based BERT with a threshold-based language detection system to detect Hope Speech in Youtube comments. The primary intent of this work is to reduce negativity and strengthen encouraging, supportive, and efficacious social media content.
Next, we examine two principal issues in code-mixed data. Language identification is a primary preprocessing step in numerous code-mixed applications. With this purpose, we develop a word level language identification system by utilizing the English Telugu code-mixed content. Subsequently, we explore the problem of sentiment analysis with English Hindi bilingual code-mixed data. For this problem, we propose a character level and sub-word level word representations with LSTMs.
Later in multimodality, we explore a very salient new issue on social media: the emotion analysis of Internet memes. This work analyzes three different types of social media issues that revolve around Internet memes. The first of them is predicting the sentiment polarity of a meme, the second is a multi-label classification task that assesses whether a meme is offensive, humorous, satirical, or motivational, and the third is a multi-output ordinal classification task that predicts the degree of offense, humor, sarcasm of a meme. To handle these tasks, we introduce a multimodal architecture with a late fusion technique that combines LSTM for textual features with VGG-16 for image features.
Propaganda is a communication tool that influences the opinions and actions of other people to achieve a predetermined goal. Initially, it was seen in newspapers, advertisements, and so on, but now it is widely used on social media. Therefore we develop a multimodal fusion system to detect propaganda in memes. We have used a robust fusion strategy called Multimodal Fusion Search to combine text and image representations in this system.