Prof. Andrew Zisserman, University of Oxford, widely acknowledged as one of the principal architects of modern computer vision, gave an IHub-Data-IIITH distinguished lecture on Automatically generating audio descriptions for movies on 21 August. Here is a summary of Prof. Zisserman’s talk:
Audio Description is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences. For movies, this presents notable challenges – the Audio Description must occur only during existing pauses in dialogue, should refer to characters by name, and ought to aid understanding of the storyline as a whole. This requires a visual-language model that can address all three of the ‘what’, ‘who’, and ‘when’ questions: What is happening in the scene? Who are the characters in the scene? And when should a description be given?
Andrew Zisserman is a British computer scientist and a professor at the University of Oxford, and a researcher in computer vision. Andrew Zisserman is one of the principal architects of modern computer vision. He is best known for his leading role during the 1990s in establishing the computational theory of multiple view reconstruction and the development of practical algorithms that are widely in use today. This culminated in the publication of his book with Richard Harley, already regarded as a standard text. He is a fellow of the Royal Society and is only person to have been awarded the Marr Prize three times.