IIITH Researchers Set Stage For Artistic AI Video Editing

After an unprecedented year, live entertainment is set to make a comeback in 2021 with live audiences et al. However experts are of the opinion that a hybrid model where the same performance is recorded and streamed will also co-exist. An AI tool developed by researchers from the Centre for Visual Information Technology (CVIT) at IIITH promises to enhance such viewing experiences.

If there’s one thing the year 2020 taught us all, it’s adaptability. With social distancing norms in place, businesses and people from education to entertainment seamlessly transitioned into the digital medium. While musical performers and artistes took to virtual concerts, the theatre industry too, albeit a little reluctantly began to livestream their productions. This process however is far from easy and fraught with unique challenges. Typically video recording a stage performance involves multiple skilled camera persons to capture the act from various angles. Not only is the use of large equipment such as cranes and trolleys not feasible but also manoeuvring cameras during a live show is difficult. Add to this manual editing of the shots which is a slow and tedious process. Hence most production houses work with a single static camera placed far enough to capture the entire stage. Such footage while appropriate for archival purposes fails to capture minute scene nuances that are critical for cinematic storytelling.

GAZED

A team of researchers led by Prof. Vineet Gandhi from the CVIT lab at International Institute of Information Technology Hyderabad (IIITH) has been working on using AI in video editing. Expanding on his prior work where the algorithm automatically generated multiple camera shots from a single hi-resolution video of a staged event, Prof. Gandhi and his students, K L Bhanu Moorthy and Moneish Kumar have now designed a complete end-to-end framework for video editing. Here the AI system incorporates not just feed from a single static camera but also the eye gaze of a user such as the program director or editor who has viewed the recording, and presents an aesthetically stitched together video of the performance. Hence it’s been labelled as the eye GAZe-guided EDiting technique, or GAZED. The research paper titled “GAZED: Gaze-guided Cinematic Editing Of Wide Angle Monocular Video Recording” was presented at the ACM CHI Conference on Human Factors in Computing Systems 2020.

How It Works

The method adopted by the team is similar to the traditional creation of videos where first shots are taken and then manually selected or edited to create a flowing storyline. In this case, for shot generation, an AI engine simulates multiple virtual pan-tilt-zoom (PTZ) cameras from a single video input. From these automatically generated shots, the system then mimics an expert editor and selects the shot that most vividly narrates the plot at each given instance. What makes this system even more unique is that it creates cinematically-looking footage. According to the scientists, while incorporating gaze or eye-tracking data in the selection of shots is a desirable editing technique, relying completely on gaze is not always optimal. Hence in order to provide a smooth narrative, the algorithm has also been trained on cinematic principles such as avoiding cuts between overlapping shots, avoiding rapid shot transitions and maintaining a rhythm.

Comparison

The efficacy of the system was tested on a group of users against a set of automatic editing baselines such as Random, Speech-based editing and so on. It was found that the GAZED technique performed the best and resulted in a smooth, edited output. Since the idea behind the technology is to make video recording and editing cost-effective, the algorithm has been trained on user gaze recorded with a highly affordable eye-tracker. “It is a very dark time for theatre and other live performances. Small-sized companies are being forced to shut down because they can’t invest in heavy camera equipment and transition to live streaming. Our endeavour is to assist such houses and ensure the output is aesthetically top-notch,” says Prof. Vineet Gandhi. The professor also makes a special mention of his student, Bhanu Moorthy who is currently pursuing his PhD. Bhanu has been selected under the prestigious TCS Research Scholarship Program which benevolently supports him to continue research in the same direction.

To read more about the research, click here.

 

 

 

 

Sarita Chebbi is a minimalist runner, practising yogi and baker of all things whole-wheat, and sugar-free. Currently re-learning her ABC’s…the one that goes: A for algorithm, B for Bayesian, C for convolutional (neural network)….

  • 39
  •  
  •  
  •  
  • 39
  •  
  •  
  •  

Leave a Reply

Your email address will not be published. Required fields are marked *

Next post