As part of the Government-led initiative to create AI models exclusively tailored for India’s diverse cultures and languages, BharatGen formally launches e-VikrAI – an e-commerce tool for Indian languages.
While democratising AI has been the goal of many research groups and startups in India, the government-led effort of creating GenAI for and by Bharat has taken a step closer with the formal launch of BharatGen, a desi version of tools like ChatGPT and Gemini.
Spearheaded by IIT Bombay under the National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS) of the Department of Science and Technology (DST), the initiative aims to create generative AI systems that can generate high-quality text and multimodal (speech and computer vision) content in various Indian languages. The implementation of the project is by the TIH Foundation for IOT and IOE at IIT Bombay with academic partners from other academic Institutes that include IIT Bombay, IIIT Hyderabad, IIT Mandi, IIT Kanpur, IIT Hyderabad, IIM Indore, and IIT Madras.
Vision Language Model For India
Unlike Large Language Models that process vast amounts of textual data to understand and generate human language, Vision Language Models are multi-modal in nature, using both text and images to tackle many tasks.
Prof. Ravi Kiran Sarvadevabhatla of IIITH who leads the computer vision efforts of the BharatGen initiative explains how one of the first vision language model use cases was for the Indian e-commerce sector. “Typically, we are buyers in the online space and it is just a matter of selecting a product, adding it to the cart and clicking ‘Buy’. But it’s a completely different experience as a seller on the same platform.” In addition to the initial registration process, to list products on a platform, sellers need to upload multiple images of the product they intend to sell along with its details and features. “There’s a form that needs to be filled; it involves a lot of writing and can be daunting for non-English speakers,” says Prof. Ravi Kiran. In order to automate and simplify this process, the group created a model that eliminates the need for tedious, manual entry. From the product image that is uploaded by the seller, the model processes the image, analyses it and automatically categorises and generates appropriate descriptions. “What typically takes around 6-8 minutes manually now gets done in 30 seconds,” he remarks.
Enhancing Accessibility
What’s perhaps more interesting is the ability of the model to translate the generated content and vocalise it in a language of the user’s choice. “Our technology might be generating the product description automatically but it is important to communicate this content to the sellers in an Indic language of their choice so that they know exactly how their product is being described”, explains Prof. Ravi Kiran. It is this accessibility in various languages that is the aim of BharatGen.
The e-vikrAI use case was selected as an exhibit at the prestigious Indian Mobile Congress (IMC) 2024 event. The technology attracted a lot of attention and interest from the visitors which included prominent Government officials and tech entrepreneurs.
Other Sectors and Implications
The BharatGen team is currently working on different impactful sectors such as agriculture – where a conversational bot tailored for farmers can answer questions they may have about their crops in a language of their choice; healthcare and the legal industry. “Large, pre-trained multimodal models can be a game changer in improving the productivity and ease of usage in several situations They can also enhance the access to a lot of services to those who are not proficient in English. That is what e-VikrAI tries to do. This is just a beginning and the tools developed by the BharatGen effort will bring advanced AI technology to practically every Indian in the future,” said Prof P J Narayanan, Director of IIITH.
“Until now, efforts to build India-focused Large Language Models and applications have focused on text and speech. We are the first ones to expand this landscape to include images as a modality. We believe the time is ripe for India-focused Large Vision Language Models and their applications,” says Prof. Ravi Kiran.
To view how e-VikrAI works, click here.