Aditya Kumar Singh supervised by Dr. Makarand Tapaswi received his Master of Science in Computer Science and Engineering (CSE). Here’s a summary of his research work on Revolution on Tv show experience using recap for multimodal story summarization:
We introduce multimodal story summarization by leveraging TV episode recaps — short video sequences interweaving key visual moments and dialog from previous episodes to bring viewers up to speed. We propose Plotsnap, a dataset featuring two crime thriller TV shows with rich recaps, and long-form content over 40 minutes per episode. Recap shots are mapped to corresponding sub-stories that serve as labels for story summarization. We propose a hierarchical model Talesumm that (i)~processes entire episodes by creating compact shot and dialog representations, and (ii)~predicts importance scores for each video shot and dialog utterance by enabling interactions between local story groups. Unlike traditional summarization tasks, our method extracts multiple plot points from long-form videos. We present a thorough evaluation on this task, including promising cross-series generalisation. Talesumm shows good results on video summarization benchmarks.
April 2024