Ishan Misra – Sonic histories to visual futures: Tracing a computer vision researcher’s journey 

Ishan Misra’s cutting edge work in computer vision has been widely lauded. He shares his charmed journey, from a rock band in IIIT Hyderabad, and how the internships and change-makers he met along the way, would shape his career in self-supervised machine learning in the domain of computer vision.    

For his pioneering work in self-supervised learning, building AI systems to understand the visual world with minimal human intervention, Ishan Misra was featured in MIT Tech Review’s 35 under 35 List of innovators in 2022. The IIITH alumnus completed his Masters and Ph. D from the Robotics Institute at Carnegie Mellon University (CMU), winning the SCS Distinguished Dissertation Award (Runner Up) 2018 for his Ph.D thesis.   

As a youngster, Ishan was not too invested in science or maths, and was more keen on history. His mathematical prowess was recognized by a school teacher and further encouraged by his older sister that “eventually led me to realizing that engineering is a good bet”. After hearing great things about IIIT Hyderabad on the grapevine and several confabulations later, Ishan joined B. Tech in computer science in 2008. 

A rocky start and a Messy project
When Ishan got off to a rocky start with a bunch of zero scores in his first semester programming practical exams, it was the timely guidance of Prof. P J Narayanan and Prof. Kannan Srinathan that got him out of his slump. “The second semester was a turning point and I spent a lot of time in the library,” remarks Ishan, who enjoyed the first flush of freedom on campus, as a system admin, helping his seniors to upgrade the OS, manage the servers, set up the firewalls etc. 

“For our ITWS 3 course, Prof. Shatrunjay Rawat helped my friend Aditya Deshpande and me to build the Mess portal, digitizing meal orders and automating the backend billing system”. This was the fun part of his IIIT Hyderabad days that would work out nicely later, when he helped set up the servers at Carnegie Mellon. He fondly remembers Prof. Venkatesh Choppella’s classes on functional programming and the free hand given to design the course.  

A rock band and what to do when Life throws a curveball
“When things get hectic, you sing, program, network and flow with the chaos”, philosophizes Ishan who was lead vocalist for Apotheosys, the college rock band, along with being on the founding editorial team of Ping, the college newspaper.  

One semester was especially pretty frenzied, when along with the hectic coursework and troubleshooting the glitches in the Mess billing system, there were hours of practice sessions, for an upcoming gig. “But looking back, one huge learning that came out of the chaos, is the importance of a core group of friends to lean on when life gets super stressful. It also helped me realize that there are so many dimensions to excellence and you need to dedicate yourself completely. At IIIT Hyderabad, I was lucky to have had the benefit of great professors who helped me grow,” muses Ishan who went on to win the Gold Medal for highest CGPA and the best all-rounder award in his cohort in October 2012.  

Game-changing internships at Yale, Inria and Microsoft
Ishan credits his father’s research in natural sciences and the stories of seniors to have inspired his early interest in research. His internship with IIIT senior Yashwanth Narvaneni sparked his interest in  operating systems but it was the summer internship in Yale that would be a challenging yet big leap in the learning curve, especially since he got to meet Avi Silberschatz, author of his operating systems textbook! His internship at Inria Paris “was special because for the first time, I touched linear algebra and computer vision research at its core”, he observes. 

Growing up with Carnegie Mellon
The international internships and Prof. P J Narayanan’s timely guidance would be pivotal in encouraging Ishan to pursue his Masters (2012 -2014) and Ph. D (2014 -2018) in Robotics from Carnegie Mellon University (CMU). CMU’s nurturing environment, amazing faculty, students, and resources were pivotal to Ishan’s growth in computer vision research. 

It was thanks to a fortuitous encounter with Larry Zitnick ex- CMU alumni that Ishan was offered an      internship at Redmond’s Microsoft Research in image search, at a time when deep learning was becoming immensely popular. The intern was part of a prestigious group that included Ross Girshick, Meg Mitchell and Larry Zitnick. “We had 3 accepted papers in this internship”, remarks Ishan, a recipient of the prestigious Siebel Scholarship 2014 and winner of the Best Paper award at WACV 2014 for his study on ‘Data-Driven Exemplar Model Selection’.   

A most enriching Ph.D experience at CMU
In the absence of pristine human labeling of every single image, how do you train ML models for computer vision? “That’s the core question that I tried to answer in my Ph.D Thesis on “Visual Learning with Minimal Human Supervision”, observes Ishan who received the SCS Distinguished Dissertation Award 2018 (runner Up) across CS, NLP, ML, Vision and Robotics.  

Ishan did an internship at Facebook AI Research (FAIR), working on interesting open problems, that made him realize the possibility for good research in industry. “Two papers that were particularly memorable was a Ph.D paper that we wrote in less than 24 hours, that surprisingly for us, received really good reviews and is pretty well cited. There was also the time when my advisor complimented the related work section I had written, noting that it was the best that he had read”, he reveals. “I remain grateful to have had great academic and personal support from my Ph.D Advisors – Martial Hebert and Abhinav Gupta – along with a fantastic lab of great researchers”. 

Self-supervised Learning at Meta
Ishan would continue working on self-supervised learning in computer vision when he joined Facebook New York as a research scientist in 2018.  “I was lucky to find people here who were genuinely excited and found my work valuable”, he adds. “It was a great few years and we produced some of the world’s best models in self-supervised learning and published really foundational research.  Those first three years were such a great learning experience, where I was learning new stuff in maths, scaling systems, ML and worked with brilliant minds, amazing colleagues and great managers, both in New York and Paris”.  

One Image to Bind them all 
At one point, the team decided to explore beyond self-supervised learning and branched into multiple modalities. “We use multiple senses to grasp information. We extended this premise to ML, to learn holistically by simultaneously engaging different modalities in the same space; like audio, image/video, depth (3D), IMU (inertial measurement unit), and text”, he explains. 

Ishan moved into Generative AI earlier last year, working on Image Bind, Meta’s first AI model that connects multiple senses to an image, to understand how it will look, sound, feel and experience in 3D.  

“At Meta, you are surrounded by ambitious and super driven people, relentlessly pushing the limits of technology. It is especially inspiring for AI, to have world experts sitting beside you, like Turing Award 2018  winner for AI, Yann LeCun, (co-wrote a blog on self-supervised learning with). I am most grateful that Meta is a company that values publishing”, says Ishan who has over 43 publications in the last decade. 

Mantra for daily living
With both parents in natural sciences, Ishan was exposed to research at a young age. His father had retired as head of the genetics department from a Pune-based research institute. Having an M.Sc. Chemistry graduate and teacher for a mother, and an elder sister who was a strict teacher was providential “because I was not a very good student”, he smiles.  

The vocalist’s love of music carried over to the CMU Indian Graduate Student Association. Today, the earlier hard rock-heavy metal music choices have been replaced by nuanced playlists for different moods, “like my Paper Deadline playlist that features pop and Indian EDM which my wife introduced me to, when I need to get into the zone”. He is a regular speaker at premier workshops and conferences. Being on the Lex Fridman podcast was a most enjoyable experience for Ishan who believes that he is privileged to be doing the work he loves, when so much is happening in the pandemic-struck world.  

Ishan also credits his wife Saloni for her unwavering and calm support, inspiring him towards higher standards, giving him technical feedback on his research and brainstorming ideas with him. They both enjoy traveling. Their favorite destinations are European hotspots like the south of France, UK along with Mexico and Korea. Ishan loves reading up on the places he visits. “History still fascinates me and I enjoy autobiographies, interesting articles and research papers. I think it is the possibilities of things in the world that allures me”.  

Dogged determination has always stood Ishan in good stead. Take for instance, the time when he decided to do a course on Deep Reinforcement Learning on the fly. He would utilize long flights to finish the textbook, a feat that he is still chuffed about.  

Ishan subscribes to the statement ‘Stay hungry, stay foolish’ and believes that while micro failures are a given in research, one needs to develop a robust support system of family, friends, books and music to tide over the blues.  

Deepa Shailendra is a freelance writer for interior design publications; an irreverent blogger, consultant editor and author of two coffee table books. A social entrepreneur who believes that we are the harbingers of the transformation and can bring the change to better our world.


Leave a Reply

Your email address will not be published. Required fields are marked *

Next post