Research work on Does Audio Help in Deep Audio Visual Saliency Prediction Models by Dr. Vineet Gandhi and his students – Ritivik Agarwal, Shreyank Jyoti, Rohit Girmaji and Sarath Sivaprasd received the best
student paper award at the 24th ACM International Conference on Multimodal Interaction (ICMI 2022) held from 7 – 11 November at Bangalore.
Research work as explained by Dr. Vineet Gandhi and his students:
Despite existing works of Audio-Visual Saliency Prediction (AVSP) models claiming to achieve promising results by fusing audio modality over visual-only models, these models fail to leverage audio information. In this paper, we investigate the relevance of audio cues in conjunction with the visual ones and conduct extensive analysis by employing well-established audio modules and fusion techniques from diverse correlated audio-visual tasks. Our analysis on ten diverse saliency datasets suggests that none of the methods worked for incorporating audio. Furthermore, we bring to light, why AVSP models show a gain in performance over visual-only models, though the audio branch is agnostic at inference. Our work questions the role of audio in current deep AVSP models and motivates the community to a clear avenue for reconsideration of the complex architectures by demonstrating that simpler alternatives work equally well.
Full Paper: https://dl.acm.org/doi/10.1145/3536221.3556625
ICMI is the premier international forum for multidisciplinary research on multimodal human-human and human-computer interaction, interfaces, and system development. The conference focuses on theoretical and empirical foundations, component technologies, and combined multimodal processing techniques that define the field of multimodal interaction analysis, interface design, and system development. ICMI 2022 featured a single-track main conference which included: keynote speakers, technical full and short papers (including oral and poster presentations), special sessions, demonstrations, exhibits and doctoral spotlight papers.
The theme for this year’s conference was Intelligent and responsible Embodied Conversational Agents (ECAs) in the multi-lingual real world. Embodied Conversational Agents (ECAs) enable natural Human Computer Interaction, inspired by human-human communication. With rapid advances in multimodal analysis, dialog and synthesis technologies, intelligent ECAs are set to enter real world applications. The expected intelligence includes cognitive, social and emotional facets that humans routinely display in conversations. The theme for ICMI 2022 revolved around making the ECAs more robust, responsible and multi-lingual.
Conference page: https://icmi.acm.org/2022/