[month] [year]

WACV 2021

Prof. C V Jawahar and his students Sindhu B Hegde, K R Prajwal, Rudrabha Mukhopadhyay virtually presented a paper on Visual Speech Enhancement Without A Real Visual Stream at the  IEEE Winter Conference on Applications of Computer Vision (WACV-2021) from 5 – 9  January. The authors of this paper are  Sindhu B Hegde, K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, University of Bath and C V Jawahar. 

Research work as explained by the authors:

In this work, we re-think the task of speech enhancement in unconstrained real-world environments. Current state-of-the-art methods use only the audio stream and are limited in their performance in a wide range of real-world noises. Recent works using lip movements as additional cues improve the quality of generated speech over “audio-only” methods. But, these methods cannot be used for several applications where the visual stream is unreliable or completely absent. We propose a new paradigm for speech enhancement by exploiting recent breakthroughs in speech driven lip synthesis. Using one such model as a teacher network, we train a robust student network to produce accurate lip movements that mask away the noise, thus acting as a “visual noise filter”. The intelligibility of the speech enhanced by our pseudo-lip approach is comparable (< 3% difference) to the case of using real lips. This implies that we can exploit the advantages of using lip movements even in the absence of a real video stream. We rigorously evaluate our model using quantitative metrics as well as human evaluations. Additional ablation studies and a demo video on our website containing qualitative comparisons and results clearly illustrate the effectiveness of our approach.

Full paper: https://openaccess.thecvf.com/content/WACV2021/papers/Hegde_Visual_Speech_Enhancement_Without_a_Real_Visual_Stream_WACV_2021_paper.pdf

Demo video: https://www.youtube.com/watch?v=y_oP9t7WEn4&feature=youtu.be

Project page: http://cvit.iiit.ac.in/research/projects/cvit-projects/visual-speech-enhancement-without-a-real-visual-stream/

Code & models: https://github.com/Sindhu-Hegde/pseudo-visual-speech-denoising