Gayatri Purigilla, supervised by Dr. Radhika Mamidi received her Master of Science – Dual Degree in Computational Linguistics (CL). Here’s a summary of her research work on Automatic classification of conversational humour with a focus on COVID-19 tweets:
We humans are social beings, and our communication is the most evolved and well structured form of a communication system that we are aware of. An essential aspect of human communication that helps humans bond faster and develop a sense of closeness is the use of humour. Humour that occurs as part of a conversation is known as conversational humour. Conversational humour is a type of humour that is unique and contrary to what it may seem like. It is more than just plugging canned jokes into a conversation. It requires the use of certain techniques and the presence of at least two interlocutors who understand the context of the conversation. The first step towards understanding conversational humour is to identify the different types into which it can be categorised and the techniques that are used to generate each type of humour. Current studies on this front either consider only a subset of these types and techniques or are domain specific. To tackle these challenges, we first propose a hierarchical annotation schema which allows us to get a comprehensive overview of conversational humour. For this task, we use a famous Telugu play, Kanyasulkam, and consider humorous utterances from this play as the dataset. This schema includes tags for type, technique, and benignity and considers cultural nuances in the text, making it an extensive schema for conversational humour. Further, to test the universality of the schema, we built a dataset of a different domain (Covid-19-based humour) and language (English). This dataset was annotated using a part of the annotation schema containing the type and technique tags. Two more tags viz. “Situation” and “Cultural Relevance” were added in the schema to help make the dataset more valuable as a standalone dataset which can be used by researchers from other fields like marketing, sociology, etc. The effectiveness of this dataset is tested with the help of various experiments for binary as well as multi-label multi-class classification using state-of-the-art ML models including but not limited to BERT, RoBERTa, BerTweet, etc. Based on the accuracy and analysis from the experiments, we can show that the annotation schema is universal in terms of language and domain. Such a classification of data can be used to accelerate the annotation process for humour data, and this annotated data can be used for various purposes like marketing, connecting with a target audience based on the relevance tag, and aiding research in the field of conversational humour for building humorous chatbots, and more human-like interactive systems.
July 2023