[month] [year]

Roopal Vaid – NLP in Climate Change

Roopal Vaid received her Master of Science – Dual Degree in Computer Science and Engineering (CSE). Her research work was supervised by Dr. Manish Srivastava. Here’s a summary of her research work on Towards NLP in Climate Change:

Climate change is one of the most pressing issues of our time, and understanding the discourse surrounding it is crucial for effective communication and action. The discourse encircling climate change can circumscribe a wide range of perspectives, attitudes and opinions. It is essential to analyze this dis- course to identify current challenges, road-maps, and systematic changes governments, organizations, and institutions require to combat the effects of climate change. Social media is an important platform for climate change discourse due to its widespread use and real-time nature. This makes it possible to analyze the discourse in near real-time, providing valuable insights into public opinions, attitudes, assess topic framing, event dependent attention to the issue and concerns surrounding climate change. We evaluate the contextual and social features that play key roles in the coverage on different platforms.

In this thesis, we focus on the fine-grained classification and stance detection of climate change- related social media text surrounding the United Nations Climate Change Conference. We established two corpora, ClimateStance and ClimateEng with the help of tweets posted during the 2019 United Nations Framework Convention on Climate Change with the Intergovernmental Panel in Geneva. We comprehensively outline the dataset collection, pre-processing, annotation methodology, and dataset composition. We have put together a set of guidelines and specifications for creating expandable corpora ClimateEng, ClimateStance which is a collection of 3777 tweets that have been manually labeled with information about events, states, the categories they belong to, and their corresponding stance. We benchmark both datasets for climate change prevention, stance detection and fine-grained classification using state-of-the-art methods in text classification and experiments along with results are discussed in detail. In addition, we create a dataset called ClimateReddit, which is based on Reddit and includes 6262 comments from climate-change related subreddits. We perform semi-supervised learning on the corpus with pseudo-labelling and manually annotate 329 comments for the tasks of fine-grained classification and stance detection of climate-change data. We compare the results with the best-performing models for both tasks from the supervised experiments. Finally, we provide linguistic analysis of ClimateEng, ClimateStance and ClimateReddit using techniques such as part-of-speech tagging and named-entity recognition.

Further, we extend our work in a code-mixed setting. We collect Hindi and English code-mixed data from twitter during 2020 and construct a corpus of code-mixed Twitter data. We define the task of fine-grained classification for the same and outline data-collection and annotation methodology for code-mix data.

 

June 2023