K R Prajwal - Dual Degree CSE -

K R Prajwal received his MS-Dual Degree in Computer Science and Engineering (CSE). His research was supervised by Prof. C V Jawahar.

Here’s a summary of Prajwal’s thesis, Lip-syncing Videos In The Wild as explained by him:

The widespread access to the Internet has led to a meteoric rise in audio-visual content consumption. Our content consumption habits have changed from listening to podcasts and radio broadcasts to watching videos on YouTube. We are now increasingly preferring the highly engaging nature of video calls over plain voice calls. Given this considerable shift in desire for audio-visual content, there has also been a surge in video content creation to cater to these consumption needs. In this fabric of video content creation, especially those containing people talking, lies the problem of making these videos accessible across language barriers. If we want to translate a deep learning lecture video in English to Hindi, it is not only that the speech should be translated but also the visual stream, specifically, the lip movements.

Learning to lip-sync arbitrary videos to any desired target speech is a problem with several applications ranging from video translation, to readily creating new content that would otherwise require humongous efforts. However, speaker-independent lip synthesis for any voice, and language is a very challenging task. In this thesis, we tackle the problem of lip-syncing videos in the wild to any given target speech. We propose two new models in this space: one that significantly improves the generation quality and the other significantly improving on lip-sync accuracy.

We show multiple real-world applications on lip-syncing dubbed movies, translating lectures, and animating the lips of real CGI movie characters to any desired speech. Please visit the following links for more details:

LipGAN:

Demo Video: https://www.youtube.com/watch?v=aHG6Oei8jF0
Code: https://github.com/Rudrabha/LipGAN
Project Page: https://cvit.iiit.ac.in/research/projects/cvit-projects/facetoface-translation

Wav2Lip:

Interactive Live demo: https://bhaasha.iiit.ac.in/lipsync/
Paper: https://arxiv.org/abs/2008.10010
Code: https://github.com/Rudrabha/Wav2Lip
A demo video will be updated on the Project page (will be updated): http://cvit.iiit.ac.in/research/projects/cvit-projects/a-lip-sync-expert-is-all-you-need-for-speech-to-lip-generation-in-the-wild/

K R Prajwal – Dual Degree CSE