As ‘AI for India and of India’ gets a leg up with a 988.6 cr funding to BharatGen, IIITH’s Vision Language Model Team which is part of the BharatGen consortium, is playing a pivotal role in India’s sovereign AI ecosystem.
Sovereign. Swadeshi. Self-reliance. All terms associated with the Indian freedom movement. Except that this time around, it refers to India’s strategy in consolidating its position as a frontrunner in the global AI race. Sovereign AI in particular refers to a nation’s capabilities of producing AI using its own infrastructure, compute, data, workforce and business networks. Betting big on technological independence, the Ministry of Electronics and Information Technology (MeitY) has allocated a whopping 988.6 crores to the first government-backed multi-modal sovereign AI initiative – BharatGen.
BharatGen is a national mission in Generative AI that aims to make AI inclusive for the multi-lingual populace with its India-specific solutions. Spearheaded by IIT Bombay under the National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS), it is a consortium of academic partners such as IIIT Hyderabad, IIT Mandi, IIT Kanpur, IIT Hyderabad, IIM Indore and IIT Madras.
Indic VLM – Patram
IIITH has a long-standing legacy of building open, inclusive, and India-specific AI solutions across vision and language. Specifically, IIITH has contributed significantly to Bhashini (National Language Translation Mission) and other initiatives in Indic vision and language technologies.
As a founding member of the BharatGen consortium and key partner, IIITH has been playing a pivotal role in this multi-modal Large Language Model (LLM) project to deliver high-quality text and multimodal content in various Indian languages. The team at IIITH led by Prof. Ravi Kiran Sarvadevabhatla has been working on the suite of Vision projects under BharatGen and recently unveiled Patram – India’s first Vision Language Foundational model for Indic documents.
Patram is a 7-billion parameter model trained to process, understand and respond to queries about scanned and photographed documents. “We trained this model from scratch. The success of any model depends on the quality and quantity of data used to train it. Our team did a great job of curating the data for training so it works really well for a variety of use cases. It can also be fine tuned for domain specific applications like medicine or law,” remarks Prof. Ravi Kiran.
Indic e-commerce VLM – eVikrAI
Prof. Ravi Kiran’s team was also instrumental in the development of e-VikrAI – the first Vision Language Model for Indic e-commerce. e-VikrAI aims to simplify the process of cataloguing for sellers by translating and vocalising product descriptions in various Indian languages thereby eliminating the need for manual input.
IIITH’s Vision Research Centre
The Center for Visual Information Technology (CVIT) at IIITH is one of the foremost labs focused on Vision research in India. “We are always working on presenting our research in top tier forums and being on the cusp of doing cutting edge research,” states Prof. Ravi Kiran, adding that there’s a strong culture of excellence in terms of research and engineering at IIITH in general and CVIT in particular where students have played a critical role in making these Indic models a reality.
Indic VLMs 2.0
“There are many other research groups currently developing a variety of LLM or VLMs. One of the things that we’d like to do is, to find suitable partnerships along the way, whenever it makes sense. The idea is to target a specific domain like Science and have a domain-specific model for it. The goal is to build an ecosystem. Resonate with us and see how we can build partnerships and build an ecosystem really,” says Prof. Ravi Kiran. Also on the roadmap is an improved version of Patram where its capabilities will be improved in various dimensions, such as the ability to process multiple pages at a single time, highlight text (grounding)
“Over the course of a year, we have not only been able to build foundation models but also applications based on these models and they have exhibited good capabilities. But pretty soon it became apparent that in order to ramp up, handle more languages, and have more capable models, the funding we had and the compute we had was not going to be sufficient,” reasons Prof. Ravi Kiran welcoming the fresh and generous infusion of funds from the Center. While the models themselves are open source and available to download for free, the ‘recipes’ on how to train these models are not yet open. “At some point, we would like to share them with the larger community and help make things more transparent,” he muses.
Sharing his views, IIITH’s Director Prof. Sandeep Shukla remarked, “BharatGen is a project of great strategic importance to India in many ways. The outcomes of the project have the potential to power applications in diverse areas – finance, agriculture, legal and cybersecurity. We are proud that IIITH, represented by Prof. Ravi Kiran, continues to play a key role in shaping India’s AI sovereignty.”
Prof. C V Jawahar, head of the Centre for Visual Information Technology, Machine Learning Lab, Executive Education Group and the Dean (R&D) at IIITH said,”BharatGen is a unique experiment in creating powerful AI models for India with cutting edge research from the academia. IIITH is excited to be part of this and leading efforts on multimodal capabilities. In the limited time, this project has already demonstrated its promise with its initial models.”
Speaking about MeitY’s allocation, Telangana IT Minister Sri Duddilla Sridhar Babu remarked, “We are delighted that IIIT Hyderabad, with its proven leadership in AI research, is a founding member of the BharatGen consortium under the IndiaAI Mission. The institute’s pioneering contributions, including the Patram vision-language model and its earlier work in Bhashini, have already advanced India’s capabilities in Indic language and vision technologies. This landmark investment by MeitY will further empower BharatGen to deliver mission-mode innovations.”
Sarita Chebbi is a compulsive early riser. Devourer of all news. Kettlebell enthusiast. Nit-picker of the written word especially when it’s not her own.