Likhith Asapu supervised by Dr. Manish Shrivastava received his Master of Science – Dual Degree in Computational Linguistics (CLD). Here’s a summary of his research work on Multilingual Humor Generation through Hindi-English Code-MixedPuns:
Puns, as a form of wordplay, play a significant role in humor, language comprehension, and cultural expression. They rely on phonetic and semantic ambiguities to create humor and are widely used in entertainment, advertising, and literature. While computational approaches to pun generation and detection have advanced in English, pun generation in code-mixed languages remains largely unexplored. Code-mixing, the blending of linguistic elements from multiple languages within a single utterance, presents unique challenges due to its syntactic and semantic complexities. As multilingual communication continues to evolve, developing computational models that can understand and generate humor in such settings is increasingly important. This thesis presents a comprehensive study on the generation of puns in Hindi-English code-mixed text. We begin by constructing a resource of pun-alternate word pairs, which serve as the foundation for pun generation. These pairs are collected by experimenting with various phonetic similarity matching strategies designed to identify humorously interchangeable words across Hindi and English. Using these pairs, we generate puns and analyze their linguistic plausibility and humor potential. We evaluate several pre-trained multilingual language models for their effectiveness in generating syntactically and semantically coherent code-mixed text. Based on these insights, we propose several novel structured prompt-based methods for generating puns in the Hindi-English code-mixed setting. To assess the quality of generated puns, we design a comprehensive human evaluation framework and collect detailed annotations across multiple dimensions of humor and fluency. Human evaluation suggests that our proposed methods significantly outperform baseline approaches in terms of humor quality and contextual relevance. The annotated outputs from our evaluations form the Hindi-English Code-mixed Pun (HECoP) dataset, comprising 2000 human-annotated sentences. We leverage this dataset to compare various multilingual models on the task of pun detection. This resource provides a valuable benchmark for future research in multilingual pun generation and humor detection. Beyond pair-based methods, we further introduce a structured pun generation pipeline capable of generating puns from a single input word without relying on predefined pun-alternate lists. This pipeline integrates phonetic similarity analysis, compatibility scoring, and sentence filtering to enhance the coherence and humorousness of generated content. To the best of our knowledge, this is one of the first comprehensive computational studies focused on pun generation in code-mixed or low resource settings. The methodologies, evaluation frameworks, and datasets introduced in this work lay a strong foundation for future advancements in computational humor and multilingual NLP.
July 2025

