Dhruv Srivastava supervised by Dr. Makarand Tapaswi received his Master of Science in Computer Science and Engineering (CSE). Here’s a summary of his research work on Learning emotion and mental states in movie scenes:
The research work presents a novel task for multi-label emotion (happy, angry, etc.) and mental state (honest, helpful, etc.) recognition at the level of a movie scene and for each character. The approach is based on a multimodal transformer based architecture, EmoTx, that considers videos, multiple characters, and dialogue utterances to make joint predictions. We conduct experiments on the most frequently occurring 10 and 25 labels from the MovieGraphs dataset and a mapping that clusters 181 labels into 26 from the Emotion dataset. The paper includes several ablation studies that analyze the architecture design choices, study the impact of multiple modalities, and feature backbones. We also adapt and compare multiple state-of-the-art models in emotion recognition showing the effectiveness of EmoTx for multi-label emotion recognition. Analyzing the self attention scores reveals that expressive emotions often rely on character tokens (faces) while mental states depend on video and dialog cues. Overall, the paper presents a promising approach for movie story analysis and character emotion understanding. This paper received 7 reviews and a unanimous decision to accept with a final rating of ⅘.
April 2024