One usually associates split screens with news channel panel discussions with multiple guests in the same frame. And that is typically done with an elaborate camera crew and their multiple cameras, a cameraperson focussing one camera each on a panel member. Similar is the case of stage performances like a dance recital or a play. Recordings of such dynamic events are typically captured with multiple cameras and cameramen to capture the scene from various viewpoints, or a single wide static high-resolution camera, which are then edited together to create a single video.
Producing theatre or dance performances for the small screen is not only a huge physical effort but also an expensive proposition. Imagine placing multiple cameras in different locations to capture movement. Not to speak of the myriad of lights and appropriately placed microphones to capture sound. Besides, such productions can’t focus on the actors’ or dancers’ faces in close up because of their movement on stage. The wide angle shots of the stage further make it difficult to understand the emotions and facial expressions. Creating good split-screen compositions requires painstakingly creating a set of views that are good individually and can be used together, as well as creating layouts that correctly convey the scene and its details. And all this is done manually!
Prof Vineet Gandhi from the Centre for Visual Information Technology (CVIT) at IIIT-Hyderabad has found an AI solution to this problem. Along with his student Moneish Kumar, and in collaboration with French Institute for Research in Computer Science and Automation (INRIA) and University of Wisconsin-Madison, he has been able to simulate a professional cameraman’s behaviour through mathematical models. In other words, automatically create a split screen video from non-static recordings that can display both the context of the scene as well as close-up details of the performers. This approach is especially attractive for digital heritage projects of the performing arts in general, and social and sports events, etc. Context is provided by a wide-angle shot in the top part of the screen. The split screens, on the bottom half which are essentially zoomed-in views can also track lip movements more easily, which is useful for adding subtitles to each of these partitioned screens to enhance the viewing perception for audibly challenged people.
“From one fixed camera, using AI techniques we automatically generate multiple cameras doing the work of multiple cameramen. My research is concerned with the fact that it should look like there are really cameramen behind the cameras, focussed on and following the movements of each performer,” says Prof. Gandhi. With comparative visual examples of 3 cameras manipulated by 3 cameramen and the final manually edited footage vs. one static camera generating a split screen and then automatically editing it too, Prof. Gandhi demonstrates how the virtual camera performs significantly better than the humanized effort.
“Algorithms in computer vision take care of issues typically faced by professional camera persons while recording dynamic scenes by mathematically ensuring smooth camera movement, as well as preserving order in which the group of characters are split up and their consequent interactions,” explains Prof. Gandhi.
Published and Patented
Prof Gandhi presented his groundbreaking research at the prestigious Eurographics 2017 (an annual conference of the European Association for Computer Graphics) at Lyon, France in April.Incidentally, it had been over a decade since any research work from India had been featured at the 2017 conference. He successfully filed for and was awarded a patent in the US for his research formally titled: System And Method For Automatically Generating Split Screen For A Video Of A Dynamic Scene.
Zero Office Action
Typically once you file an application for a patent for your invention at the US Patent and Trademarks Office (USPTO), a USPTO patent examiner may issue several different types of official queries (known as office actions) about your application, either listing out grounds for rejection or objection, or identifying various deficiencies in your application or correspondence. “On an average, it takes 2-3 years for a patent to be granted. It’s a matter of pride that Vineet’s application for a patent went through in one go, without any office action! They felt it was innovative and granted it immediately.. in less than 6 months,. Novelty and innovation apart, a lot also goes into how well you articulate your idea. The way Vineet explained his invention made a difference”, says Prakash Yalla, Head of Technology, Transfer and Product Labs at IIIT-H.
AI can see the future
31-year-old Prof. Gandhi, who is incidentally one of the youngest Asst. Professors on the IIIT-H campus, strongly believes that the direction of the future is in artificial intelligence. Towards this, he plans on using the technology wherever feasible, for example, on campus for recording of some lectures, stage theatre performances and in the later phase(s) even maybe street theatre where the recording can be done automatically through drones. He says this year they worked on adjusting resolution of visual content to fit various screens through computation in computer vision. He is also working on automating recording of scripts where, by just presenting a script (the input), the algorithm will figure out how each scene, situation or line is to be shot. Another work-in-progress is the creation of automated editing engines. In other words, Prof. Gandhi is training the machine learning gun towards many such problems that are encountered in the realm of video recording, editing, and other aspects of cinematography.
For more information on the award-winning patent, please visit: