Makarand Tapaswi is a story-teller. A youthful confluence of industry and academia, he brings the best of industry practice as Senior ML Scientist at Wadhwani AI to his role as Assistant Professor at IIIT Hyderabad’s CVIT, working on AI projects in video and language understanding.
Makarand Tapaswi’s field of specialization is understanding video and language, usually in relation to stories. After 11 years of academic research in machine learning, computer vision, and natural language processing across Europe and Canada, he returned to India to implement his research for community good. He splits his week between his job at Wadhwani AI, a non-profit applied AI institute building AI solutions for social impact, and as faculty at IIITH’s Center for Visual Information Technology.
An academic decade around Europe
It was during his B. Tech. days at the National Institute of Technology Surathkal (2005-2009) that Makarand got interested in digital signal processing. He was one of the few, hand-picked by faculty to do an internship with Microsoft Research; an opportunity that propelled him towards an MS in Europe.
“The first year of my Master’s degree at Barcelona’s Universitat Politècnica de Catalunya (2009-2010) was at the Department of Signal Theory and Communications, where we did all the fun stuff” he observes. His first exposure to video processing was for a project on Formula One racing, building a system to automate the duration for which certain advertisements appeared during the telecast.
Moving from Catalunya to a PhD in Germany
During the second year of his Masters at Karlsruhe Institute of Technology (KIT), Germany, he pursued Electrical Engineering with ICT (2010-2011). Makarand completed his MS thesis working at the Computer Vision for Human Computer Interaction group and decided to continue there for a Ph.D.
“We were lucky to get my MS work published at CVPR, one of the best conferences in the world, specifically for computer vision”, says the scholar. “My thesis titled ‘A Global Model for Person Identification in TV Series’, was my first relatively big project.”. He had successfully worked on person identification for the TV series – The Big Bang Theory. Later, during a summer school in France, he had the good fortune of meeting Andrew Zisserman, an eminent professor at Oxford. “I pitched him some ideas on our work and was super-excited to be accepted for an internship. I also learnt that they were all using the TV series, Buffy – the Vampire Slayer!”
Binge-watching Harry Potter and Game of Thrones as research
A key research question explored during his Ph.D was if they could align different modalities of the same story conveyed through different forms such as a book and a movie. “While working on the Harry Potter and Game of Thrones series and books, there were aspects of natural language processing (NLP) that we dabbled in. Those were fun days, with a lot of movie watching in the lab, disguised as research!” He graduated summa cum laude with his thesis on Story Understanding through Semantic Analysis and Automatic Alignment of Text and Video.
After a second internship during his Ph.D at the University of Toronto, Makarand returned as a Postdoctoral Fellow to the University (2016-2018), working with Sanja Fidler and a group of high-caliber students. “Working with the ML group there was a mind boggling experience as the students were exceptional at such a young age!”, he notes.
Covid and catching the last train out of Paris
Makarand joined Inria, Paris for his second post-doc (2019-2020), working with Ivan Laptev and Josef Sivic, to study whether robots could learn simple actions by watching videos. He used to make frequent trips between France and Germany to meet his wife, Divya, who was pursuing her PhD at MPI Dortmund.
“I was in Paris till March 13th, 2020 when the Covid lockdowns were announced. One day, my wife rang up frantically from Germany and said “Pack your bags and catch the 5 pm train”. He only had a few hours to wrap up everything. “I was puffing down the platforms, lugging a 30 kg suitcase and it was sheer good luck that I boarded the train, 7 seconds before it chugged out of the station. The European borders locked down the next day”.
Coming home to Wadhwani AI and IIIT Hyderabad
Makarand moved back to India in September 2020 and joined Wadhwani AI. The ML scientist’s first project on newborn anthropometry, looks at estimating an infant’s weight from a camera-phone video, with the objective of empowering primary healthcare workers to improve the health of low-weight babies. In addition, he is involved in deploying a reading assessment tool for over 3 lakh students in Gujarat, working on cough-based screening for tuberculosis, and building automated analysis tools to identify abnormalities like enlarged heart in Chest X-rays.
“Around the time that I returned to India, many academics in AI, especially in the US and Canada, were starting to work part-time at an industry. I discussed the possibility of a dual role with the folks at Wadhwani AI. I was primarily looking at IIITH because of their research strengths and Prof. Jawahar and Prof. P. J. Narayanan who had established great personal credibility in computer vision”. Things fell into place and he joined the Computer Vision group in July 2021. Initially, he juggled a 4: 1.5-day schedule but when his group’s research started getting interesting, they finally settled on a 2-day engagement at IIIT Hyderabad, working Wednesdays and Saturdays. “IIIT Hyderabad’s green campus is most appealing”, says Makarand. “My wife is into birding and we’ve often made 6 am trips, to spot birds on campus”.
Projects that stand out
Among two of his papers that were accepted at CVPR 2023, the paper on emotion recognition in movies, looked beyond the basic scale of classic emotions and into the mind of the character as the director intended, to predict subtler emotions like excited, friendly, polite, etc.
“For the paper on understanding time in videos (Arxiv), we trained a model to identify different orderings of a video. By doing that, models had a better sense of time understanding”, says the researcher who has worked on around 50 papers. Makarand, together with Vinoo and their student Jaidev, presented a paper at ISMIR 2022 on automatic sound tracking for books (by using movie soundtracks). This was awarded the Brave New Idea Award. “It was a refreshing project that sought to stitch music to a book reading experience; for the first Harry Potter novel where the movie storyline remains true to the book.”
Makarand’s work on video understanding published at NeurIPS resulted in a follow-up proposal for which he got the Google India Faculty Research Award 2022.
“SERB has approved funding for my Start-up Research Grant application on video understanding! This is my first proposal funded by the Indian government. The key idea was to incorporate names in video descriptions and extend to Indian language movies (Hindi, Marathi and Telugu), using language translation systems”, observes Makarand who speaks Marathi, English, school-level Hindi, and a smattering of Sanskrit and German.
Simple living, Goan style
Growing up in suburban Panaji in Goa, Makarand’s Rosary High school was a 10-minute walk from home. His father, a documentation officer at the National Institute of Oceanography and his mother, an M.Sc. in Biology who took home tuitions, taught their two sons, the value of good education. “When we played cricket in our living room and created mayhem, my dad who lived in his own world, didn’t even notice it. We would gather at sunset for prayers, a practice that continues even today, at home”, says the scholar and disciple of Swami Madhavananda. The best part of his day would start at 2 pm, after school; with harmonium lessons, chess and stargazing with the Association of Friends of Astronomy. During his B. Tech days at Surathkal, he continued his childhood passions, as part of the Star Gazing and Chess clubs, and organizing the college’s tech festival, Engineer.
Little things give me joy
“I chiefly enjoy instrumental music, whether Hindustani or western”, muses Makarand who loves Hans Zimmer and often hums the background score of Interstellar while working. He prefers audio books and is partial to cerebral content on Netflix where the storytelling is smart. He rues the fact that he has an irregular schedule for meditation and exercise. His fondness for jokes keeps him young and energetic.
About work-life balance, Makarand maintains that when you enjoy the things you do, you will never have to work a single day in your life.
“At IIIT, more than just the fact that we work on videos, which is fun and interesting, I get to work with amazing students and see their first papers get through. It takes you back to your younger days and that joy is indescribable”.