The British Machine Vision Conference (BMVC) is the British Machine Vision Association (BMVA) annual conference on machine vision, image processing, and pattern recognition. It is one of the major international conferences on computer vision and related areas held in the UK. With increasing popularity and quality, it has established itself as a prestigious event on the vision calendar.
Faculty and students presented the following papers at British Machine Vision Conference (BMVC-2021) held online from 22 — 25 November.
- Audio-Visual Speech Super-Resolution – Rudrabha Mukhopadhyay*, Sindhu B Hegde*, C V Jawahar and Vinay Namboodiri, University of Bath (* indicates equal contribution). Research work as explained by the authors:
In this paper, we present an audio-visual model to perform speech super-resolution at large scale-factors (8x and 16x). Previous works attempted to solve this problem using only the audio modality as input, and thus were limited to low scale-factors of 2x and 4x. In contrast, we propose to incorporate both visual and auditory signals to super-resolve speech of sampling rates as low as 1kHz. In such challenging situations, the visual features assist in learning the content, and improves the quality of the generated speech. Further, we demonstrate the applicability of our approach to arbitrary speech signals where the visual stream is not accessible. Our “pseudo-visual network” precisely synthesizes the visual stream solely from the low-resolution speech input. Extensive experiments illustrate our method’s remarkable results and benefits over state-of-the-art audio-only speech super-resolution approaches. Our project website can be found at http://cvit.iiit.ac.in/research/projects/cvit-projects/audio-visual-speech-super-resolution.
- Personalized One-Shot Lipreading for an ALS Patient – Prof. C V Jawahar, Bipasha Sen, Aditya Agarwal, Rudrabha Mukhopadhyay and Prof. Vinay Namboodiri, University of Bath. Research work as explained by the authors:
We aimed to help ALS patients communicate with their close family, friends, and caregivers. We design a system to lip-read a patient suffering from ALS who has intact mouth movements but lack vocal abilities. Our algorithm specifically uses a very limited amount of real data collected from the patient (one example per word class). Our model uses large amounts of synthetic data which is then adapted to the real data’s domain using domain adaptation.